* [PATCH V2 net 2/4] net: hns3: refactor MAC autoneg and speed configuration
From: Jijie Shao @ 2026-06-24 14:13 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
netdev, linux-kernel, shaojijie
In-Reply-To: <20260624141319.271439-1-shaojijie@huawei.com>
From: Shuaisong Yang <yangshuaisong@h-partners.com>
Extract the MAC autoneg and speed/duplex/lane configuration logic out
of hclge_mac_init() and encapsulate it into a new dedicated helper
function hclge_set_autoneg_speed_dup().
In the init path (hclge_init_ae_dev), this helper is now called after
hclge_update_port_info() so that firmware-reported autoneg values are
already populated before applying the link configuration.
Introduce a separate req_lane_num field in struct hclge_mac to isolate
the user-requested lane count from mac.lane_num, which firmware may
overwrite via hclge_get_sfp_info() with stale values from a prior link
lifecycle (e.g., lane_num=4 from 100G). During probe, req_lane_num is
initialized to 0, which instructs firmware to auto-select the correct
lane count for the current speed, rather than reusing the firmware-
reported mac.lane_num that may be inconsistent with the target speed.
This prevents probe failures from mismatched (speed, lane_num) pairs.
In the reset path (hclge_reset_ae_dev), it runs immediately after
hclge_mac_init(), using the previously cached req_* values to restore
the link without re-querying firmware.
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
Changes in V2:
- Squashed the former patch 5 ("fix init failure caused by lane_num
contamination") into this patch. The req_lane_num separation is
introduced here to avoid a bisect-time regression where an
intermediate commit could fail probe with an inconsistent
(speed, lane_num) pair.
- Rewrote the commit message to accurately describe the init/reset
path asymmetry and the req_lane_num rationale.
---
.../hisilicon/hns3/hns3pf/hclge_main.c | 55 ++++++++++++++-----
.../hisilicon/hns3/hns3pf/hclge_main.h | 1 +
2 files changed, 42 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 9fe6bc02d71e..fb12ba77228c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1504,6 +1504,11 @@ static int hclge_configure(struct hclge_dev *hdev)
hdev->hw.mac.req_autoneg = AUTONEG_ENABLE;
hdev->hw.mac.req_duplex = DUPLEX_FULL;
+ /* When lane_num is 0, the firmware will automatically
+ * select the appropriate lane_num based on the speed.
+ */
+ hdev->hw.mac.req_lane_num = 0;
+
hclge_parse_link_mode(hdev, cfg.speed_ability);
hdev->hw.mac.max_speed = hclge_get_max_speed(cfg.speed_ability);
@@ -2579,6 +2584,7 @@ static int hclge_cfg_mac_speed_dup_h(struct hnae3_handle *handle, int speed,
if (ret)
return ret;
+ hdev->hw.mac.req_lane_num = lane_num;
hdev->hw.mac.req_speed = (u32)speed;
hdev->hw.mac.req_duplex = duplex;
@@ -2884,20 +2890,6 @@ static int hclge_mac_init(struct hclge_dev *hdev)
if (!test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
hdev->hw.mac.duplex = HCLGE_MAC_FULL;
- if (hdev->hw.mac.support_autoneg) {
- ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg);
- if (ret)
- return ret;
- }
-
- if (!hdev->hw.mac.autoneg) {
- ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
- hdev->hw.mac.req_duplex,
- hdev->hw.mac.lane_num);
- if (ret)
- return ret;
- }
-
mac->link = 0;
if (mac->user_fec_mode & BIT(HNAE3_FEC_USER_DEF)) {
@@ -9316,6 +9308,27 @@ static int hclge_set_wol(struct hnae3_handle *handle,
return ret;
}
+static int hclge_set_autoneg_speed_dup(struct hclge_dev *hdev)
+{
+ int ret;
+
+ if (hdev->hw.mac.support_autoneg) {
+ ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg);
+ if (ret)
+ return ret;
+ }
+
+ if (!hdev->hw.mac.autoneg) {
+ ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
+ hdev->hw.mac.req_duplex,
+ hdev->hw.mac.req_lane_num);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
{
struct pci_dev *pdev = ae_dev->pdev;
@@ -9477,6 +9490,13 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
if (ret)
goto err_ptp_uninit;
+ ret = hclge_set_autoneg_speed_dup(hdev);
+ if (ret) {
+ dev_err(&pdev->dev,
+ "failed to set autoneg speed duplex, ret = %d\n", ret);
+ goto err_ptp_uninit;
+ }
+
INIT_KFIFO(hdev->mac_tnl_log);
hclge_dcb_ops_set(hdev);
@@ -9807,6 +9827,13 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev)
return ret;
}
+ ret = hclge_set_autoneg_speed_dup(hdev);
+ if (ret) {
+ dev_err(&pdev->dev,
+ "failed to set autoneg speed duplex, ret = %d\n", ret);
+ return ret;
+ }
+
ret = hclge_tp_port_init(hdev);
if (ret) {
dev_err(&pdev->dev, "failed to init tp port, ret = %d\n",
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 87adeb64e6ea..7419481422c3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -287,6 +287,7 @@ struct hclge_mac {
u8 support_autoneg;
u8 speed_type; /* 0: sfp speed, 1: active speed */
u8 lane_num;
+ u8 req_lane_num;
u32 speed;
u32 req_speed;
u32 max_speed;
--
2.33.0
^ permalink raw reply related
* [PATCH V2 net 0/4] net: hns3: fix configuration deadlocks and refactor link setup
From: Jijie Shao @ 2026-06-24 14:13 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
netdev, linux-kernel, shaojijie
This patch series addresses a sequence of link configuration deadlocks
and parameter contamination issues in the hns3 network driver, which
typically occur during hardware resets or driver initialization under
specific user-configured scenarios.
The bugs root from asynchronous discrepancies between the MAC state
machine and cached user requests during sudden hardware resets, leading
to invalid parameter combos or frozen registers.
Changes in V2:
- Squashed the former patch 5 ("fix init failure caused by lane_num
contamination") into patch 2, introducing the req_lane_num separation
directly where the helper is created. This avoids a bisect-time
regression where an intermediate commit could fail probe with an
inconsistent (speed, lane_num) pair.
- Added a NULL phydev guard in patch 1 (hclge_set_phy_link_ksettings)
to prevent a kernel panic when firmware reports PHY_INEXISTENT on a
copper port. The previous netdev->phydev check was lost during the
ethtool refactor.
- In patch 1, for copper ports where neither IMP firmware nor a kernel
PHY is available (e.g. PHY_INEXISTENT), hclge_set_phy_link_ksettings()
now returns -ENODEV, and hns3_set_link_ksettings() catches this error
to proceed to the existing MAC-level path (check_ksettings_param
-> cfg_mac_speed_dup_h), preserving compatibility with PHY-less copper
deployments.
- Preserved the 1000BASE-T forced-mode rejection in the kernel PHY
path inside the new hclge_set_phy_link_ksettings() wrapper, closing
a gap identified in community review.
- Fixed a link-loss regression in patch 4 where fiber ports in forced
mode would be configured with the static default_speed instead of the
firmware-probed SFP speed, by synchronizing req_speed from mac.speed
when req_autoneg is overridden to AUTONEG_DISABLE.
- Rewrote the commit message of patch 2 to accurately describe the
init/reset path asymmetry and the req_lane_num rationale.
The series is organized as follows:
- Patch 1 refactors the ethtool link settings entry path to unify copper
port handling (both native kernel PHY_LIB and firmware-controlled PHY)
and ensures req_xxx configurations are uniformly saved across all modes.
For PHY_INEXISTENT copper ports, -ENODEV is returned to allow fallthrough
to MAC-level configuration.
- Patch 2 refactors the MAC initialization by extracting the autoneg and
speed configuration logic out of hclge_mac_init() into a dedicated
helper function, and introduces req_lane_num to isolate the user-
requested lane count from firmware-overwritten mac.lane_num.
- Patch 3 fixes a permanent link-down deadlock after a reset by ensuring
that the driver caches and uses the user's intended autoneg/speed
settings (req_***) rather than unsynchronized runtime states or
SPEED_UNKNOWN tokens.
- Patch 4 fixes a link loss issue on optical ports during initialization
by differentiating autoneg default values between copper and fiber
media types, and synchronizing req_speed with the firmware-probed
SFP speed when forced mode is detected.
Shuaisong Yang (4):
net: hns3: unify copper port ksettings configuration path
net: hns3: refactor MAC autoneg and speed configuration
net: hns3: fix permanent link down deadlock after reset
net: hns3: differentiate autoneg default values between copper and
fiber
.../ethernet/hisilicon/hns3/hns3_ethtool.c | 31 +++--
.../hisilicon/hns3/hns3pf/hclge_main.c | 108 ++++++++++++++----
.../hisilicon/hns3/hns3pf/hclge_main.h | 1 +
3 files changed, 102 insertions(+), 38 deletions(-)
base-commit: d87363b0edfc7504ff2b144fe4cdd8154f90f42e
--
2.33.0
^ permalink raw reply
* [PATCH V2 net 3/4] net: hns3: fix permanent link down deadlock after reset
From: Jijie Shao @ 2026-06-24 14:13 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
netdev, linux-kernel, shaojijie
In-Reply-To: <20260624141319.271439-1-shaojijie@huawei.com>
From: Shuaisong Yang <yangshuaisong@h-partners.com>
Fix a critical race condition deadlock where the network interface
remains permanently Link Down after a hardware reset under specific
ethtool sequences.
This issue exclusively manifests in firmware-controlled PHY topologies
where the driver relies on the IMP firmware to arbitrate link parameters.
Standard devices driven by the kernel's native PHY_LIB are unaffected.
The deadlock occurs via the following path:
1. User disables autoneg and forces an unmatched speed, forcing link
down: `ethtool -s ethx autoneg off speed 10 duplex full`
2. User re-enables autoneg: `ethtool -s ethx autoneg on`. The netdev
stack passes cmd->base.speed as SPEED_UNKNOWN (0xffffffff).
3. Driver saves req_autoneg=1, but before the interface can link up,
a hardware reset is triggered.
4. During reset recovery, MAC init reads the un-synchronized runtime
state mac.autoneg (which is still 0/OFF), misinterprets it as
forced mode, and pushes the cached SPEED_UNKNOWN into the hardware
registers, causing the MAC firmware state machine to freeze.
Meanwhile, PHY init reads req_autoneg=1 and enables PHY autoneg.
Since the MAC is frozen with 0xffffffff and PHY is running autoneg,
they mismatch permanently.
Fix this by:
1. Intercepting SPEED_UNKNOWN/DUPLEX_UNKNOWN in
hclge_set_phy_link_ksettings() and hclge_cfg_mac_speed_dup_h() to
prevent it from corrupting the driver's cached valid configuration.
2. Save req_autoneg in hclge_set_autoneg().
3. Aligning the state judgment in hclge_set_autoneg_speed_dup() to use
req_autoneg instead of the un-synchronized runtime mac.autoneg,
ensuring both MAC and PHY consistently enter the autoneg branch to
eliminate configuration discrepancies during reset recovery.
Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
.../hisilicon/hns3/hns3pf/hclge_main.c | 22 +++++++++++++------
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index fb12ba77228c..d176100d3e4c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2585,8 +2585,10 @@ static int hclge_cfg_mac_speed_dup_h(struct hnae3_handle *handle, int speed,
return ret;
hdev->hw.mac.req_lane_num = lane_num;
- hdev->hw.mac.req_speed = (u32)speed;
- hdev->hw.mac.req_duplex = duplex;
+ if (speed != SPEED_UNKNOWN)
+ hdev->hw.mac.req_speed = (u32)speed;
+ if (duplex != DUPLEX_UNKNOWN)
+ hdev->hw.mac.req_duplex = duplex;
return 0;
}
@@ -2617,6 +2619,7 @@ static int hclge_set_autoneg(struct hnae3_handle *handle, bool enable)
{
struct hclge_vport *vport = hclge_get_vport(handle);
struct hclge_dev *hdev = vport->back;
+ int ret;
if (!hdev->hw.mac.support_autoneg) {
if (enable) {
@@ -2628,7 +2631,10 @@ static int hclge_set_autoneg(struct hnae3_handle *handle, bool enable)
}
}
- return hclge_set_autoneg_en(hdev, enable);
+ ret = hclge_set_autoneg_en(hdev, enable);
+ if (!ret)
+ hdev->hw.mac.req_autoneg = enable;
+ return ret;
}
static int hclge_get_autoneg(struct hnae3_handle *handle)
@@ -3343,8 +3349,10 @@ hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
return ret;
hdev->hw.mac.req_autoneg = cmd->base.autoneg;
- hdev->hw.mac.req_speed = cmd->base.speed;
- hdev->hw.mac.req_duplex = cmd->base.duplex;
+ if (cmd->base.speed != SPEED_UNKNOWN)
+ hdev->hw.mac.req_speed = cmd->base.speed;
+ if (cmd->base.duplex != DUPLEX_UNKNOWN)
+ hdev->hw.mac.req_duplex = cmd->base.duplex;
return 0;
}
@@ -9313,12 +9321,12 @@ static int hclge_set_autoneg_speed_dup(struct hclge_dev *hdev)
int ret;
if (hdev->hw.mac.support_autoneg) {
- ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg);
+ ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.req_autoneg);
if (ret)
return ret;
}
- if (!hdev->hw.mac.autoneg) {
+ if (!hdev->hw.mac.req_autoneg) {
ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
hdev->hw.mac.req_duplex,
hdev->hw.mac.req_lane_num);
--
2.33.0
^ permalink raw reply related
* [PATCH V2 net 4/4] net: hns3: differentiate autoneg default values between copper and fiber
From: Jijie Shao @ 2026-06-24 14:13 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
netdev, linux-kernel, shaojijie
In-Reply-To: <20260624141319.271439-1-shaojijie@huawei.com>
From: Shuaisong Yang <yangshuaisong@h-partners.com>
Fix a link loss issue during driver initialization on optical ports
connected to forced-mode (non-autoneg) remote switches.
Previously, during driver probe or initialization, hclge_configure()
blindly hardcoded hdev->hw.mac.req_autoneg to AUTONEG_ENABLE for all
media types. While this is necessary for copper (BASE-T) ports to
establish a link, many high-speed optical (fiber) ports in data
centers are connected to switches running in forced mode (fixed speed,
autoneg disabled). Forcing autoneg on these optical ports during
initialization causes a permanent link failure since the remote end
refuses to respond to autoneg pulses.
Fix this by implementing media-type differentiated initialization in
hclge_init_ae_dev(). Copper ports continue to default to
AUTONEG_ENABLE, while optical ports strictly inherit the preset
autoneg status pre-configured by the firmware (hdev->hw.mac.autoneg),
preserving native compatibility with forced-mode network environments.
Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
Changes in V2:
- Fix a link-loss regression on fiber ports in forced mode where the
helper would configure hardware with the static default_speed instead
of the firmware-probed SFP speed, by synchronizing req_speed from
mac.speed when req_autoneg is overridden to AUTONEG_DISABLE.
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index d176100d3e4c..fc8587c80813 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -9498,6 +9498,13 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
if (ret)
goto err_ptp_uninit;
+ if (hdev->hw.mac.media_type != HNAE3_MEDIA_TYPE_COPPER) {
+ hdev->hw.mac.req_autoneg = hdev->hw.mac.autoneg;
+ if (hdev->hw.mac.autoneg == AUTONEG_DISABLE &&
+ hdev->hw.mac.speed != SPEED_UNKNOWN)
+ hdev->hw.mac.req_speed = hdev->hw.mac.speed;
+ }
+
ret = hclge_set_autoneg_speed_dup(hdev);
if (ret) {
dev_err(&pdev->dev,
--
2.33.0
^ permalink raw reply related
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: David Laight @ 2026-06-24 14:23 UTC (permalink / raw)
To: Christian König
Cc: Kaitao Cheng, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt, David Howells,
Simona Vetter, Randy Dunlap, Luca Ceresoli, Philipp Stanner,
linux-block, linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel,
io-uring, audit, bpf, netdev, dri-devel, linux-perf-users,
linux-trace-kernel, kexec, live-patching, linux-modules,
linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
damon, llvm, Kaitao Cheng
In-Reply-To: <cf8467c7-b98f-44a5-9cf9-60b43b5da711@amd.com>
On Wed, 24 Jun 2026 15:23:47 +0200
Christian König <christian.koenig@amd.com> wrote:
> On 6/24/26 15:14, Kaitao Cheng wrote:
> >
> >
> > 在 2026/6/22 16:42, David Laight 写道:
> >> On Mon, 22 Jun 2026 12:05:31 +0800
> >> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> >>
> >>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
> >>>
> >>> The list_for_each*_safe() helpers are used when the loop body may
> >>> remove the current entry. Their API exposes the temporary cursor at
> >>> every call site, even though most users only need it for the iterator
> >>> implementation and never reference it in the loop body.
> >>>
> >>> Add *_mutable() variants for list and hlist iteration. The new helpers
> >>> support both forms: callers may keep passing an explicit temporary cursor
> >>> when they need to inspect or reset it, or omit it and let the helper use
> >>> a unique internal cursor.
> >>
> >> I'm not really sure 'mutable' means anything either.
> >> It is possible to make it valid for the loop body (or even other threads)
> >> to delete arbitrary list items - but that needs significant extra overheads.
> >>
> >> It might be worth doing something that doesn't need the extra variable,
> >> but there is little point doing all the churn just to rename things.
> >>
> >>>
> >>> This makes call sites that only mutate the list through the current entry
> >>> less noisy, while keeping the existing *_safe() helpers available for
> >>> compatibility.
> >>>
> >>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> >>> ---
> >>> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
> >>> 1 file changed, 231 insertions(+), 38 deletions(-)
> >>>
> >>> diff --git a/include/linux/list.h b/include/linux/list.h
> >>> index 09d979976b3b..1081def7cea9 100644
> >>> --- a/include/linux/list.h
> >>> +++ b/include/linux/list.h
> >>> @@ -7,6 +7,7 @@
> >>> #include <linux/stddef.h>
> >>> #include <linux/poison.h>
> >>> #include <linux/const.h>
> >>> +#include <linux/args.h>
> >>>
> >>> #include <asm/barrier.h>
> >>>
> >>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
> >>> #define list_for_each_prev(pos, head) \
> >>> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
> >>>
> >>> -/**
> >>> - * list_for_each_safe - iterate over a list safe against removal of list entry
> >>> - * @pos: the &struct list_head to use as a loop cursor.
> >>> - * @n: another &struct list_head to use as temporary storage
> >>> - * @head: the head for your list.
> >>> +/*
> >>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
> >>> */
> >>> #define list_for_each_safe(pos, n, head) \
> >>> for (pos = (head)->next, n = pos->next; \
> >>> !list_is_head(pos, (head)); \
> >>> pos = n, n = pos->next)
> >>>
> >>> +#define __list_for_each_mutable_internal(pos, tmp, head) \
> >>> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
> >>
> >> Use auto
> >>
> >>> + !list_is_head(pos, (head)); \
> >>> + pos = tmp, tmp = pos->next)
> >>> +
> >>> +#define __list_for_each_mutable1(pos, head) \
> >>> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
> >>> +
> >>> +#define __list_for_each_mutable2(pos, next, head) \
> >>> + list_for_each_safe(pos, next, head)
> >>> +
> >>> /**
> >>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
> >>> + * list_for_each_mutable - iterate over a list safe against entry removal
> >>> * @pos: the &struct list_head to use as a loop cursor.
> >>> - * @n: another &struct list_head to use as temporary storage
> >>> - * @head: the head for your list.
> >>> + * @...: either (head) or (next, head)
> >>> + *
> >>> + * next: another &struct list_head to use as optional temporary storage.
> >>> + * The temporary cursor is internal unless explicitly supplied by
> >>> + * the caller.
> >>> + * head: the head for your list.
> >>> + */
> >>> +#define list_for_each_mutable(pos, ...) \
> >>> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
> >>> + (pos, __VA_ARGS__)
> >>
> >> The variable argument count logic really just slows down compilation.
> >> Maybe there aren't enough copies of this code to make that significant.
> >> But just because you can do it doesn't mean it is a gooD idea.
> >> I'm also not sure it really adds anything to the readability.
> >>
> >> And, it you are going to make the middle argument optional there is
> >> no need to change the macro name.
> >
> > Christian König and Jani Nikula also disagree with the variadic-argument
> > implementation approach. If we abandon that method, it means we will
> > inevitably need to add some new macros. If mutable is not a good name,
> > suggestions for better alternatives would be welcome; coming up with a
> > suitable name is indeed rather tricky.
>
> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>
> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
IIRC currently you have a choice of either:
define Item that can't be deleted
list_for_each() The current item.
list_for_each_safe() The next item.
There is also likely to be code that updates the variables to allow
for other scenarios.
Note that if increase a reference count and release a lock then list_for_each()
is likely safer than list_for_each_safe() :-)
list.h has 9 variants of the 'safe' loop.
The bloat of another 9 is getting excessive.
It has to be said that this is one of my least favourite type of list...
David
>
> Regards,
> Christian.
^ permalink raw reply
* Re: [BUG] KFENCE: use-after-free read in udp_tunnel_nic_device_sync_work
From: Sam Sun @ 2026-06-24 14:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
linux-kernel, syzkaller
In-Reply-To: <CANn89iKhPmbJW_6DA1_okSGsr_e_Jz47qns-nFcZpnQZ-nAUOA@mail.gmail.com>
On Wed, Jun 24, 2026 at 10:10 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jun 24, 2026 at 6:59 AM Eric Dumazet <edumazet@google.com> wrote:
>
> > Oh well.
> >
> > u8 need_sync:1;
> > u8 need_replay:1;
> > u8 work_pending:1;
> >
> > These bitfields are not safe, obviously :/
> >
> > Time to convert them to atomic bit operations.
>
> Can you try:
>
> diff --git a/net/ipv4/udp_tunnel_nic.c b/net/ipv4/udp_tunnel_nic.c
> index 9944ed923ddfd10f9adf6ad788c0740daeaf2adb..939d6f656bb71814718bc3bf84be665adad27e4b
> 100644
> --- a/net/ipv4/udp_tunnel_nic.c
> +++ b/net/ipv4/udp_tunnel_nic.c
> @@ -30,9 +30,7 @@ struct udp_tunnel_nic_table_entry {
> * @work: async work for talking to hardware from process context
> * @dev: netdev pointer
> * @lock: protects all fields
> - * @need_sync: at least one port start changed
> - * @need_replay: space was freed, we need a replay of all ports
> - * @work_pending: @work is currently scheduled
> + * @flags: sync, replay, pending flags
> * @n_tables: number of tables under @entries
> * @missed: bitmap of tables which overflown
> * @entries: table of tables of ports currently offloaded
> @@ -44,9 +42,10 @@ struct udp_tunnel_nic {
>
> struct mutex lock;
>
> - u8 need_sync:1;
> - u8 need_replay:1;
> - u8 work_pending:1;
> + unsigned long flags;
> +#define UDP_TUNNEL_NIC_NEED_SYNC 0
> +#define UDP_TUNNEL_NIC_NEED_REPLAY 1
> +#define UDP_TUNNEL_NIC_WORK_PENDING 2
>
> unsigned int n_tables;
> unsigned long missed;
> @@ -116,7 +115,7 @@ udp_tunnel_nic_entry_queue(struct udp_tunnel_nic *utn,
> unsigned int flag)
> {
> entry->flags |= flag;
> - utn->need_sync = 1;
> + set_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
> }
>
> static void
> @@ -283,7 +282,7 @@ udp_tunnel_nic_device_sync_by_table(struct net_device *dev,
> static void
> __udp_tunnel_nic_device_sync(struct net_device *dev, struct
> udp_tunnel_nic *utn)
> {
> - if (!utn->need_sync)
> + if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
> return;
>
> if (dev->udp_tunnel_nic_info->sync_table)
> @@ -291,21 +290,24 @@ __udp_tunnel_nic_device_sync(struct net_device
> *dev, struct udp_tunnel_nic *utn)
> else
> udp_tunnel_nic_device_sync_by_port(dev, utn);
>
> - utn->need_sync = 0;
> + clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
> /* Can't replay directly here, in case we come from the tunnel driver's
> * notification - trying to replay may deadlock inside tunnel driver.
> */
> - utn->need_replay = udp_tunnel_nic_should_replay(dev, utn);
> + if (udp_tunnel_nic_should_replay(dev, utn))
> + set_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
> + else
> + clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
> }
>
> static void
> udp_tunnel_nic_device_sync(struct net_device *dev, struct udp_tunnel_nic *utn)
> {
> - if (!utn->need_sync)
> + if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
> return;
>
> + set_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags);
> queue_work(udp_tunnel_nic_workqueue, &utn->work);
> - utn->work_pending = 1;
> }
>
> static bool
> @@ -348,7 +350,7 @@ udp_tunnel_nic_has_collision(struct net_device
> *dev, struct udp_tunnel_nic *utn,
> if (!udp_tunnel_nic_entry_is_free(entry) &&
> entry->port == ti->port &&
> entry->type != ti->type) {
> - __set_bit(i, &utn->missed);
> + set_bit(i, &utn->missed);
> return true;
> }
> }
> @@ -483,7 +485,7 @@ udp_tunnel_nic_add_new(struct net_device *dev,
> struct udp_tunnel_nic *utn,
> * are no devices currently which have multiple tables accepting
> * the same tunnel type, and false positives are okay.
> */
> - __set_bit(i, &utn->missed);
> + set_bit(i, &utn->missed);
> }
>
> return false;
> @@ -552,7 +554,7 @@ static void __udp_tunnel_nic_reset_ntf(struct
> net_device *dev)
>
> mutex_lock(&utn->lock);
>
> - utn->need_sync = false;
> + clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
> for (i = 0; i < utn->n_tables; i++)
> for (j = 0; j < info->tables[i].n_entries; j++) {
> struct udp_tunnel_nic_table_entry *entry;
> @@ -696,8 +698,8 @@ udp_tunnel_nic_flush(struct net_device *dev,
> struct udp_tunnel_nic *utn)
> for (i = 0; i < utn->n_tables; i++)
> memset(utn->entries[i], 0, array_size(info->tables[i].n_entries,
> sizeof(**utn->entries)));
> - WARN_ON(utn->need_sync);
> - utn->need_replay = 0;
> + WARN_ON(test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags));
> + clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
> }
>
> static void
> @@ -713,8 +715,8 @@ udp_tunnel_nic_replay(struct net_device *dev,
> struct udp_tunnel_nic *utn)
> for (i = 0; i < utn->n_tables; i++)
> for (j = 0; j < info->tables[i].n_entries; j++)
> udp_tunnel_nic_entry_freeze_used(&utn->entries[i][j]);
> - utn->missed = 0;
> - utn->need_replay = 0;
> + bitmap_zero(&utn->missed, UDP_TUNNEL_NIC_MAX_TABLES);
> + clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
>
> if (!info->shared) {
> udp_tunnel_get_rx_info(dev);
> @@ -736,10 +738,10 @@ static void
> udp_tunnel_nic_device_sync_work(struct work_struct *work)
> rtnl_lock();
> mutex_lock(&utn->lock);
>
> - utn->work_pending = 0;
> + clear_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags);
> __udp_tunnel_nic_device_sync(utn->dev, utn);
>
> - if (utn->need_replay)
> + if (test_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags))
> udp_tunnel_nic_replay(utn->dev, utn);
>
> mutex_unlock(&utn->lock);
> @@ -866,6 +868,11 @@ udp_tunnel_nic_unregister(struct net_device *dev,
> struct udp_tunnel_nic *utn)
>
> udp_tunnel_nic_lock(dev);
>
> + if (test_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags)) {
> + udp_tunnel_nic_unlock(dev);
> + return;
> + }
> +
> /* For a shared table remove this dev from the list of sharing devices
> * and if there are other devices just detach.
> */
> @@ -901,12 +908,6 @@ udp_tunnel_nic_unregister(struct net_device *dev,
> struct udp_tunnel_nic *utn)
> udp_tunnel_nic_flush(dev, utn);
> udp_tunnel_nic_unlock(dev);
>
> - /* Wait for the work to be done using the state, netdev core will
> - * retry unregister until we give up our reference on this device.
> - */
> - if (utn->work_pending)
> - return;
> -
> udp_tunnel_nic_free(utn);
> release_dev:
> dev->udp_tunnel_nic = NULL;
I tested this version as well, but it still does not stop the C reproducer
on my side.
This time the VM panicked after about 50 seconds on a debugobjects warning:
[ 50.420529][ T9744] ------------[ cut here ]------------
[ 50.421258][ T9744] ODEBUG: free active (active state 0) object:
ff110001114b5a00 object type: work_struct hint:
udp_tunnel_nic_device_sync_work+0x0/0x940
[ 50.424052][ T9744] WARNING: lib/debugobjects.c:629 at
debug_print_object+0x1a0/0x2e0, CPU#0: repro/9744
[ 50.425766][ T9744] Modules linked in:
[ 50.426279][ T9744] CPU: 0 UID: 0 PID: 9744 Comm: repro Not tainted
7.1.0-11240-g840ef6c78e6a-dirty #33 PREEMPT(full)
[ 50.428614][ T9744] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.15.0-1 04/01/2014
[ 50.429661][ T9744] RIP: 0010:debug_print_object+0x1a5/0x2e0
[ 50.430338][ T9744] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8a 00
00 00 48 8b 14 ed 60 33 1e 8c 48 83 fd 05 77 47 48 8d 3d e0 0c 1e 0c
41 56 4c 89 e6 <67> 48 0f b9 3a 58 83 05 42 30 14 0c 01 48 83 c4 20 5b
5d 41 5c 41
[ 50.432538][ T9744] RSP: 0018:ffa0000012c8ee60 EFLAGS: 00010293
[ 50.433045][ T9744] RAX: dffffc0000000000 RBX: ffa0000012c8ef40 RCX:
0000000000000000
[ 50.433710][ T9744] RDX: ffffffff8c1e32a0 RSI: ffffffff8c1e2e80 RDI:
ffffffff90e31820
[ 50.434390][ T9744] RBP: 0000000000000003 R08: ff110001114b5a00 R09:
ffffffff8bae17e0
[ 50.435050][ T9744] R10: ffffffff90d907d7 R11: 0000000000000000 R12:
ffffffff8c1e2e80
[ 50.435719][ T9744] R13: ffffffff8bae1820 R14: ffffffff8a0f69e0 R15:
ffa0000012c8ef58
[ 50.436393][ T9744] FS: 00007f8430d5b640(0000)
GS:ff1100018394a000(0000) knlGS:0000000000000000
[ 50.437149][ T9744] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 50.437749][ T9744] CR2: 00007f422d7b8000 CR3: 000000010b850000 CR4:
0000000000753ef0
[ 50.438404][ T9744] PKRU: 55555554
[ 50.438716][ T9744] Call Trace:
[ 50.439004][ T9744] <TASK>
[ 50.439277][ T9744] ? __pfx_udp_tunnel_nic_device_sync_work+0x10/0x10
[ 50.439856][ T9744] ? _raw_spin_unlock_irqrestore+0x58/0x70
[ 50.440942][ T9744] debug_check_no_obj_freed+0x3ec/0x520
[ 50.441419][ T9744] ? __udp_tunnel_nic_lock+0x47/0x60
[ 50.441878][ T9744] ? __pfx_debug_check_no_obj_freed+0x10/0x10
[ 50.442403][ T9744] ? kasan_quarantine_put+0x10d/0x230
[ 50.442875][ T9744] ? lockdep_hardirqs_on+0x7c/0x110
[ 50.443325][ T9744] kfree+0x2a0/0x6d0
[ 50.443663][ T9744] ? udp_tunnel_nic_netdevice_event+0xc14/0x1e40
[ 50.444214][ T9744] udp_tunnel_nic_netdevice_event+0xc14/0x1e40
[ 50.444730][ T9744] notifier_call_chain+0xbd/0x430
[ 50.445164][ T9744] ? __pfx_udp_tunnel_nic_netdevice_event+0x10/0x10
[ 50.445729][ T9744] call_netdevice_notifiers_info+0xbe/0x110
[ 50.446236][ T9744] unregister_netdevice_many_notify+0xbab/0x2130
[ 50.446781][ T9744] ? __pfx_unregister_netdevice_many_notify+0x10/0x10
[ 50.447907][ T9744] ? __pfx___mutex_lock+0x10/0x10
[ 50.448351][ T9744] unregister_netdevice_queue+0x305/0x3c0
[ 50.448842][ T9744] ? __pfx_unregister_netdevice_queue+0x10/0x10
[ 50.449369][ T9744] nsim_destroy+0x231/0x980
[ 50.449773][ T9744] __nsim_dev_port_del+0x197/0x2c0
[ 50.450215][ T9744] nsim_dev_reload_destroy+0x105/0x490
[ 50.450677][ T9744] nsim_dev_reload_down+0x67/0xd0
[ 50.451143][ T9744] devlink_reload+0x197/0x7b0
[ 50.451564][ T9744] ? __pfx_devlink_reload+0x10/0x10
[ 50.452020][ T9744] ? security_capable+0x210/0x250
[ 50.452466][ T9744] ? ns_capable+0xe2/0x120
[ 50.452858][ T9744] devlink_nl_reload_doit+0x541/0x1160
[ 50.453323][ T9744] ? __pfx_devlink_nl_reload_doit+0x10/0x10
[ 50.453828][ T9744] ? genl_family_rcv_msg_attrs_parse.constprop.0+0x1e5/0x2f0
[ 50.454458][ T9744] genl_family_rcv_msg_doit+0x1ff/0x2f0
[ 50.454930][ T9744] ? __pfx_genl_family_rcv_msg_doit+0x10/0x10
[ 50.455442][ T9744] ? bpf_lsm_capable+0x9/0x10
[ 50.455845][ T9744] ? security_capable+0x210/0x250
[ 50.456297][ T9744] genl_rcv_msg+0x532/0x7e0
[ 50.456683][ T9744] ? __pfx_genl_rcv_msg+0x10/0x10
[ 50.457115][ T9744] ? __pfx_devlink_nl_pre_doit_dev_lock+0x10/0x10
[ 50.457690][ T9744] ? __pfx_devlink_nl_reload_doit+0x10/0x10
[ 50.458218][ T9744] ? __pfx_devlink_nl_post_doit_dev_lock+0x10/0x10
[ 50.458776][ T9744] ? __lock_acquire+0x476/0x2420
[ 50.459208][ T9744] netlink_rcv_skb+0x147/0x430
[ 50.459633][ T9744] ? __pfx_genl_rcv_msg+0x10/0x10
[ 50.460062][ T9744] ? __pfx_netlink_rcv_skb+0x10/0x10
[ 50.460520][ T9744] ? netlink_deliver_tap+0x1ae/0xd10
[ 50.460976][ T9744] genl_rcv+0x28/0x40
[ 50.461330][ T9744] netlink_unicast+0x58d/0x850
[ 50.461739][ T9744] ? __pfx_netlink_unicast+0x10/0x10
[ 50.462198][ T9744] netlink_sendmsg+0x88d/0xd90
[ 50.462610][ T9744] ? __pfx_netlink_sendmsg+0x10/0x10
[ 50.463092][ T9744] ? __pfx_netlink_sendmsg+0x10/0x10
[ 50.463557][ T9744] ____sys_sendmsg+0xa27/0xb90
[ 50.463982][ T9744] ? __pfx_____sys_sendmsg+0x10/0x10
[ 50.464438][ T9744] ? __pfx_copy_msghdr_from_user+0x10/0x10
[ 50.464938][ T9744] ? find_held_lock+0x2b/0x80
[ 50.465347][ T9744] ? futex_wake+0x4f7/0x5e0
[ 50.465735][ T9744] ___sys_sendmsg+0x11c/0x1b0
[ 50.466129][ T9744] ? __pfx____sys_sendmsg+0x10/0x10
[ 50.466586][ T9744] ? __pfx_futex_wake+0x10/0x10
[ 50.467002][ T9744] ? __fget_files+0x1f1/0x3b0
[ 50.467440][ T9744] ? __fget_files+0x1fb/0x3b0
[ 50.467836][ T9744] ? __lock_acquire+0x450/0x2420
[ 50.468298][ T9744] __sys_sendmsg+0x142/0x1f0
[ 50.468696][ T9744] ? __pfx___sys_sendmsg+0x10/0x10
[ 50.469133][ T9744] ? __cpu_to_node+0x8a/0x130
[ 50.469538][ T9744] do_syscall_64+0x11f/0x860
[ 50.469953][ T9744] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 50.470454][ T9744] RIP: 0033:0x451a4d
[ 50.470783][ T9744] Code: c3 e8 a7 23 00 00 0f 1f 80 00 00 00 00 f3
0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b
4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8
64 89 01 48
[ 50.472393][ T9744] RSP: 002b:00007f8430d5b198 EFLAGS: 00000246
ORIG_RAX: 000000000000002e
[ 50.473102][ T9744] RAX: ffffffffffffffda RBX: 00000000004e9440 RCX:
0000000000451a4d
[ 50.473761][ T9744] RDX: 0000000000000000 RSI: 0000200000000800 RDI:
0000000000000003
[ 50.474419][ T9744] RBP: 00000000004b66b4 R08: 000000000000006d R09:
0000000000000000
[ 50.475085][ T9744] R10: 0000000000000001 R11: 0000000000000246 R12:
0000200000000280
[ 50.475738][ T9744] R13: 0000200000000190 R14: 0000200000000180 R15:
00000000004e9448
[ 50.476418][ T9744] </TASK>
[ 50.476683][ T9744] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 50.477297][ T9744] CPU: 0 UID: 0 PID: 9744 Comm: repro Not tainted
7.1.0-11240-g840ef6c78e6a-dirty #33 PREEMPT(full)
[ 50.478201][ T9744] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.15.0-1 04/01/2014
[ 50.478952][ T9744] Call Trace:
[ 50.479240][ T9744] <TASK>
[ 50.479492][ T9744] dump_stack_lvl+0x3d/0x1b0
[ 50.479895][ T9744] vpanic+0x7f2/0xa70
[ 50.480241][ T9744] ? __pfx_vpanic+0x10/0x10
[ 50.480621][ T9744] ? is_bpf_text_address+0x96/0x1a0
[ 50.481070][ T9744] ? debug_print_object+0x1a0/0x2e0
[ 50.481512][ T9744] panic+0xc2/0xd0
[ 50.481844][ T9744] ? __pfx_panic+0x10/0x10
[ 50.482230][ T9744] ? check_panic_on_warn+0x1f/0xc0
[ 50.482673][ T9744] check_panic_on_warn+0xb1/0xc0
[ 50.483102][ T9744] __warn+0x108/0x3f0
[ 50.483460][ T9744] __report_bug+0x42c/0x510
[ 50.483854][ T9744] ? debug_print_object+0x1a0/0x2e0
[ 50.484297][ T9744] ? __pfx___report_bug+0x10/0x10
[ 50.484721][ T9744] ? __kernel_text_address+0xd/0x40
[ 50.485166][ T9744] ? unwind_get_return_address+0x59/0xa0
[ 50.485664][ T9744] report_bug_entry+0xe1/0x280
[ 50.486068][ T9744] ? debug_print_object+0x1a5/0x2e0
[ 50.486508][ T9744] handle_bug+0x428/0x4e0
[ 50.486889][ T9744] exc_invalid_op+0x35/0x80
[ 50.487281][ T9744] asm_exc_invalid_op+0x1a/0x20
[ 50.487693][ T9744] RIP: 0010:debug_print_object+0x1a5/0x2e0
[ 50.488194][ T9744] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8a 00
00 00 48 8b 14 ed 60 33 1e 8c 48 83 fd 05 77 47 48 8d 3d e0 0c 1e 0c
41 56 4c 89 e6 <67> 48 0f b9 3a 58 83 05 42 30 14 0c 01 48 83 c4 20 5b
5d 41 5c 41
[ 50.489814][ T9744] RSP: 0018:ffa0000012c8ee60 EFLAGS: 00010293
[ 50.490362][ T9744] RAX: dffffc0000000000 RBX: ffa0000012c8ef40 RCX:
0000000000000000
[ 50.491033][ T9744] RDX: ffffffff8c1e32a0 RSI: ffffffff8c1e2e80 RDI:
ffffffff90e31820
[ 50.491698][ T9744] RBP: 0000000000000003 R08: ff110001114b5a00 R09:
ffffffff8bae17e0
[ 50.492384][ T9744] R10: ffffffff90d907d7 R11: 0000000000000000 R12:
ffffffff8c1e2e80
[ 50.493043][ T9744] R13: ffffffff8bae1820 R14: ffffffff8a0f69e0 R15:
ffa0000012c8ef58
[ 50.493718][ T9744] ? __pfx_udp_tunnel_nic_device_sync_work+0x10/0x10
[ 50.494288][ T9744] ? __pfx_udp_tunnel_nic_device_sync_work+0x10/0x10
[ 50.494845][ T9744] ? _raw_spin_unlock_irqrestore+0x58/0x70
[ 50.495347][ T9744] debug_check_no_obj_freed+0x3ec/0x520
[ 50.495825][ T9744] ? __udp_tunnel_nic_lock+0x47/0x60
[ 50.496276][ T9744] ? __pfx_debug_check_no_obj_freed+0x10/0x10
[ 50.496794][ T9744] ? kasan_quarantine_put+0x10d/0x230
[ 50.497266][ T9744] ? lockdep_hardirqs_on+0x7c/0x110
[ 50.497706][ T9744] kfree+0x2a0/0x6d0
[ 50.498038][ T9744] ? udp_tunnel_nic_netdevice_event+0xc14/0x1e40
[ 50.498577][ T9744] udp_tunnel_nic_netdevice_event+0xc14/0x1e40
[ 50.499107][ T9744] notifier_call_chain+0xbd/0x430
[ 50.499544][ T9744] ? __pfx_udp_tunnel_nic_netdevice_event+0x10/0x10
[ 50.500096][ T9744] call_netdevice_notifiers_info+0xbe/0x110
[ 50.500625][ T9744] unregister_netdevice_many_notify+0xbab/0x2130
[ 50.501169][ T9744] ? __pfx_unregister_netdevice_many_notify+0x10/0x10
[ 50.501737][ T9744] ? __pfx___mutex_lock+0x10/0x10
[ 50.502174][ T9744] unregister_netdevice_queue+0x305/0x3c0
[ 50.502660][ T9744] ? __pfx_unregister_netdevice_queue+0x10/0x10
[ 50.503186][ T9744] nsim_destroy+0x231/0x980
[ 50.503564][ T9744] __nsim_dev_port_del+0x197/0x2c0
[ 50.503966][ T9744] nsim_dev_reload_destroy+0x105/0x490
[ 50.504406][ T9744] nsim_dev_reload_down+0x67/0xd0
[ 50.504808][ T9744] devlink_reload+0x197/0x7b0
[ 50.505220][ T9744] ? __pfx_devlink_reload+0x10/0x10
[ 50.505679][ T9744] ? security_capable+0x210/0x250
[ 50.506113][ T9744] ? ns_capable+0xe2/0x120
[ 50.506490][ T9744] devlink_nl_reload_doit+0x541/0x1160
[ 50.506962][ T9744] ? __pfx_devlink_nl_reload_doit+0x10/0x10
[ 50.507471][ T9744] ? genl_family_rcv_msg_attrs_parse.constprop.0+0x1e5/0x2f0
[ 50.508094][ T9744] genl_family_rcv_msg_doit+0x1ff/0x2f0
[ 50.508577][ T9744] ? __pfx_genl_family_rcv_msg_doit+0x10/0x10
[ 50.509098][ T9744] ? bpf_lsm_capable+0x9/0x10
[ 50.509495][ T9744] ? security_capable+0x210/0x250
[ 50.509924][ T9744] genl_rcv_msg+0x532/0x7e0
[ 50.510327][ T9744] ? __pfx_genl_rcv_msg+0x10/0x10
[ 50.510753][ T9744] ? __pfx_devlink_nl_pre_doit_dev_lock+0x10/0x10
[ 50.511291][ T9744] ? __pfx_devlink_nl_reload_doit+0x10/0x10
[ 50.511795][ T9744] ? __pfx_devlink_nl_post_doit_dev_lock+0x10/0x10
[ 50.512365][ T9744] ? __lock_acquire+0x476/0x2420
[ 50.512785][ T9744] netlink_rcv_skb+0x147/0x430
[ 50.513197][ T9744] ? __pfx_genl_rcv_msg+0x10/0x10
[ 50.513627][ T9744] ? __pfx_netlink_rcv_skb+0x10/0x10
[ 50.514090][ T9744] ? netlink_deliver_tap+0x1ae/0xd10
[ 50.514542][ T9744] genl_rcv+0x28/0x40
[ 50.514879][ T9744] netlink_unicast+0x58d/0x850
[ 50.515302][ T9744] ? __pfx_netlink_unicast+0x10/0x10
[ 50.515760][ T9744] netlink_sendmsg+0x88d/0xd90
[ 50.516181][ T9744] ? __pfx_netlink_sendmsg+0x10/0x10
[ 50.516640][ T9744] ? __pfx_netlink_sendmsg+0x10/0x10
[ 50.517109][ T9744] ____sys_sendmsg+0xa27/0xb90
[ 50.517520][ T9744] ? __pfx_____sys_sendmsg+0x10/0x10
[ 50.517965][ T9744] ? __pfx_copy_msghdr_from_user+0x10/0x10
[ 50.518457][ T9744] ? find_held_lock+0x2b/0x80
[ 50.518871][ T9744] ? futex_wake+0x4f7/0x5e0
[ 50.519271][ T9744] ___sys_sendmsg+0x11c/0x1b0
[ 50.519677][ T9744] ? __pfx____sys_sendmsg+0x10/0x10
[ 50.520117][ T9744] ? __pfx_futex_wake+0x10/0x10
[ 50.520542][ T9744] ? __fget_files+0x1f1/0x3b0
[ 50.520945][ T9744] ? __fget_files+0x1fb/0x3b0
[ 50.521347][ T9744] ? __lock_acquire+0x450/0x2420
[ 50.521770][ T9744] __sys_sendmsg+0x142/0x1f0
[ 50.522176][ T9744] ? __pfx___sys_sendmsg+0x10/0x10
[ 50.522617][ T9744] ? __cpu_to_node+0x8a/0x130
[ 50.523021][ T9744] do_syscall_64+0x11f/0x860
[ 50.523417][ T9744] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 50.523926][ T9744] RIP: 0033:0x451a4d
[ 50.524264][ T9744] Code: c3 e8 a7 23 00 00 0f 1f 80 00 00 00 00 f3
0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b
4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8
64 89 01 48
[ 50.525868][ T9744] RSP: 002b:00007f8430d5b198 EFLAGS: 00000246
ORIG_RAX: 000000000000002e
[ 50.526579][ T9744] RAX: ffffffffffffffda RBX: 00000000004e9440 RCX:
0000000000451a4d
[ 50.527243][ T9744] RDX: 0000000000000000 RSI: 0000200000000800 RDI:
0000000000000003
[ 50.527914][ T9744] RBP: 00000000004b66b4 R08: 000000000000006d R09:
0000000000000000
[ 50.528614][ T9744] R10: 0000000000000001 R11: 0000000000000246 R12:
0000200000000280
[ 50.529305][ T9744] R13: 0000200000000190 R14: 0000200000000180 R15:
00000000004e9448
[ 50.529982][ T9744] </TASK>
[ 50.530453][ T9744] Kernel Offset: disabled
Resolved from the patched vmlinux:
udp_tunnel_nic_netdevice_event+0xc14/0x1e40:
udp_tunnel_nic_unregister at net/ipv4/udp_tunnel_nic.c:913
udp_tunnel_nic_netdevice_event at net/ipv4/udp_tunnel_nic.c:943
So we are still freeing struct udp_tunnel_nic while its embedded work_struct
is active. debugobjects catches this at kfree() before the active work gets a
chance to run later and dereference the freed utn.
My read is that the conversion from bitfields to atomic bitops removes the
plain bitfield data race, but UDP_TUNNEL_NIC_WORK_PENDING is still only one
boolean state. It can represent "some work is pending", but it cannot
distinguish between:
idle
queued
running
running and queued again
In particular, the workqueue core clears WORK_STRUCT_PENDING before invoking
the worker. At that point the same work item can be queued again by
udp_tunnel_nic_device_sync(). If an already running instance later executes:
clear_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags);
it can still clear the bit that was set for the requeued instance. Then
udp_tunnel_nic_unregister() may observe UDP_TUNNEL_NIC_WORK_PENDING clear and
free utn, even though debugobjects still sees utn->work as active.
Thanks,
Yue
^ permalink raw reply
* [PATCH v6 01/10] rust: module: move module types into `module.rs`
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Move `Module`, `InPlaceModule`, `ModuleMetadata` and `ThisModule` from
`lib.rs` into a new `rust/kernel/module.rs`. Re-export them from `lib.rs`
to avoid tree-wide changes.
Switch six bus driver registrations from `module.0` to the public
`ThisModule::as_ptr()` accessor, since the field is no longer visible
outside the new `module` submodule.
No functional change.
Assisted-by: opencode:glm-5.2
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/auxiliary.rs | 2 +-
rust/kernel/i2c.rs | 2 +-
rust/kernel/lib.rs | 75 +++++-------------------------------------------
rust/kernel/module.rs | 71 +++++++++++++++++++++++++++++++++++++++++++++
rust/kernel/net/phy.rs | 6 +++-
rust/kernel/pci.rs | 2 +-
rust/kernel/platform.rs | 2 +-
rust/kernel/usb.rs | 2 +-
8 files changed, 88 insertions(+), 74 deletions(-)
diff --git a/rust/kernel/auxiliary.rs b/rust/kernel/auxiliary.rs
index 93c0db1f66555..4a02f83240be3 100644
--- a/rust/kernel/auxiliary.rs
+++ b/rust/kernel/auxiliary.rs
@@ -63,7 +63,7 @@ unsafe fn register(
// SAFETY: `adrv` is guaranteed to be a valid `DriverType`.
to_result(unsafe {
- bindings::__auxiliary_driver_register(adrv.get(), module.0, name.as_char_ptr())
+ bindings::__auxiliary_driver_register(adrv.get(), module.as_ptr(), name.as_char_ptr())
})
}
diff --git a/rust/kernel/i2c.rs b/rust/kernel/i2c.rs
index 7b908f0c5a58d..24eff08f47123 100644
--- a/rust/kernel/i2c.rs
+++ b/rust/kernel/i2c.rs
@@ -142,7 +142,7 @@ unsafe fn register(
}
// SAFETY: `idrv` is guaranteed to be a valid `DriverType`.
- to_result(unsafe { bindings::i2c_register_driver(module.0, idrv.get()) })
+ to_result(unsafe { bindings::i2c_register_driver(module.as_ptr(), idrv.get()) })
}
unsafe fn unregister(idrv: &Opaque<Self::DriverType>) {
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index b72b2fbe046d6..040ae85056509 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -93,6 +93,7 @@
pub mod maple_tree;
pub mod miscdevice;
pub mod mm;
+pub mod module;
pub mod module_param;
#[cfg(CONFIG_NET)]
pub mod net;
@@ -139,79 +140,17 @@
#[doc(hidden)]
pub use bindings;
pub use macros;
+pub use module::{
+ InPlaceModule,
+ Module,
+ ModuleMetadata,
+ ThisModule, //
+};
pub use uapi;
/// Prefix to appear before log messages printed from within the `kernel` crate.
const __LOG_PREFIX: &[u8] = b"rust_kernel\0";
-/// The top level entrypoint to implementing a kernel module.
-///
-/// For any teardown or cleanup operations, your type may implement [`Drop`].
-pub trait Module: Sized + Sync + Send {
- /// Called at module initialization time.
- ///
- /// Use this method to perform whatever setup or registration your module
- /// should do.
- ///
- /// Equivalent to the `module_init` macro in the C API.
- fn init(module: &'static ThisModule) -> error::Result<Self>;
-}
-
-/// A module that is pinned and initialised in-place.
-pub trait InPlaceModule: Sync + Send {
- /// Creates an initialiser for the module.
- ///
- /// It is called when the module is loaded.
- fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, error::Error>;
-}
-
-impl<T: Module> InPlaceModule for T {
- fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, error::Error> {
- let initer = move |slot: *mut Self| {
- let m = <Self as Module>::init(module)?;
-
- // SAFETY: `slot` is valid for write per the contract with `pin_init_from_closure`.
- unsafe { slot.write(m) };
- Ok(())
- };
-
- // SAFETY: On success, `initer` always fully initialises an instance of `Self`.
- unsafe { pin_init::pin_init_from_closure(initer) }
- }
-}
-
-/// Metadata attached to a [`Module`] or [`InPlaceModule`].
-pub trait ModuleMetadata {
- /// The name of the module as specified in the `module!` macro.
- const NAME: &'static crate::str::CStr;
-}
-
-/// Equivalent to `THIS_MODULE` in the C API.
-///
-/// C header: [`include/linux/init.h`](srctree/include/linux/init.h)
-pub struct ThisModule(*mut bindings::module);
-
-// SAFETY: `THIS_MODULE` may be used from all threads within a module.
-unsafe impl Sync for ThisModule {}
-
-impl ThisModule {
- /// Creates a [`ThisModule`] given the `THIS_MODULE` pointer.
- ///
- /// # Safety
- ///
- /// The pointer must be equal to the right `THIS_MODULE`.
- pub const unsafe fn from_ptr(ptr: *mut bindings::module) -> ThisModule {
- ThisModule(ptr)
- }
-
- /// Access the raw pointer for this module.
- ///
- /// It is up to the user to use it correctly.
- pub const fn as_ptr(&self) -> *mut bindings::module {
- self.0
- }
-}
-
#[cfg(not(testlib))]
#[panic_handler]
fn panic(info: &core::panic::PanicInfo<'_>) -> ! {
diff --git a/rust/kernel/module.rs b/rust/kernel/module.rs
new file mode 100644
index 0000000000000..be242a82e86d2
--- /dev/null
+++ b/rust/kernel/module.rs
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Module-related types and helpers.
+
+/// The entrypoint to implementing a kernel module.
+///
+/// For any teardown or cleanup operations, your type may implement [`Drop`].
+pub trait Module: Sized + Sync + Send {
+ /// Called at module initialization time.
+ ///
+ /// Use this method to perform whatever setup or registration your module
+ /// should do.
+ ///
+ /// Equivalent to the `module_init` macro in the C API.
+ fn init(module: &'static ThisModule) -> crate::error::Result<Self>;
+}
+
+/// A module that is pinned and initialised in-place.
+pub trait InPlaceModule: Sync + Send {
+ /// Creates an initialiser for the module.
+ ///
+ /// It is called when the module is loaded.
+ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, crate::error::Error>;
+}
+
+impl<T: Module> InPlaceModule for T {
+ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, crate::error::Error> {
+ let initer = move |slot: *mut Self| {
+ let m = <Self as Module>::init(module)?;
+
+ // SAFETY: `slot` is valid for write per the contract with `pin_init_from_closure`.
+ unsafe { slot.write(m) };
+ Ok(())
+ };
+
+ // SAFETY: On success, `initer` always fully initialises an instance of `Self`.
+ unsafe { pin_init::pin_init_from_closure(initer) }
+ }
+}
+
+/// Metadata attached to a [`Module`] or [`InPlaceModule`].
+pub trait ModuleMetadata {
+ /// The name of the module as specified in the `module!` macro.
+ const NAME: &'static crate::str::CStr;
+}
+
+/// Equivalent to `THIS_MODULE` in the C API.
+///
+/// C header: [`include/linux/init.h`](srctree/include/linux/init.h)
+pub struct ThisModule(*mut crate::bindings::module);
+
+// SAFETY: `THIS_MODULE` may be used from all threads within a module.
+unsafe impl Sync for ThisModule {}
+
+impl ThisModule {
+ /// Creates a [`ThisModule`] given the `THIS_MODULE` pointer.
+ ///
+ /// # Safety
+ ///
+ /// The pointer must be equal to the right `THIS_MODULE`.
+ pub const unsafe fn from_ptr(ptr: *mut crate::bindings::module) -> ThisModule {
+ ThisModule(ptr)
+ }
+
+ /// Access the raw pointer for this module.
+ ///
+ /// It is up to the user to use it correctly.
+ pub const fn as_ptr(&self) -> *mut crate::bindings::module {
+ self.0
+ }
+}
diff --git a/rust/kernel/net/phy.rs b/rust/kernel/net/phy.rs
index 3ca99db5cccf2..8b7036b8fe480 100644
--- a/rust/kernel/net/phy.rs
+++ b/rust/kernel/net/phy.rs
@@ -659,7 +659,11 @@ pub fn register(
// the `drivers` slice are initialized properly. `drivers` will not be moved.
// So it's just an FFI call.
to_result(unsafe {
- bindings::phy_drivers_register(drivers[0].0.get(), drivers.len().try_into()?, module.0)
+ bindings::phy_drivers_register(
+ drivers[0].0.get(),
+ drivers.len().try_into()?,
+ module.as_ptr(),
+ )
})?;
// INVARIANT: The `drivers` slice is successfully registered to the kernel via `phy_drivers_register`.
Ok(Registration { drivers })
diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
index af74ddff6114d..916ed2cb6b70b 100644
--- a/rust/kernel/pci.rs
+++ b/rust/kernel/pci.rs
@@ -86,7 +86,7 @@ unsafe fn register(
// SAFETY: `pdrv` is guaranteed to be a valid `DriverType`.
to_result(unsafe {
- bindings::__pci_register_driver(pdrv.get(), module.0, name.as_char_ptr())
+ bindings::__pci_register_driver(pdrv.get(), module.as_ptr(), name.as_char_ptr())
})
}
diff --git a/rust/kernel/platform.rs b/rust/kernel/platform.rs
index 8917d4ee499fb..9fdbafd53bc21 100644
--- a/rust/kernel/platform.rs
+++ b/rust/kernel/platform.rs
@@ -82,7 +82,7 @@ unsafe fn register(
}
// SAFETY: `pdrv` is guaranteed to be a valid `DriverType`.
- to_result(unsafe { bindings::__platform_driver_register(pdrv.get(), module.0) })
+ to_result(unsafe { bindings::__platform_driver_register(pdrv.get(), module.as_ptr()) })
}
unsafe fn unregister(pdrv: &Opaque<Self::DriverType>) {
diff --git a/rust/kernel/usb.rs b/rust/kernel/usb.rs
index 9c17a672cd275..213db32727c17 100644
--- a/rust/kernel/usb.rs
+++ b/rust/kernel/usb.rs
@@ -63,7 +63,7 @@ unsafe fn register(
// SAFETY: `udrv` is guaranteed to be a valid `DriverType`.
to_result(unsafe {
- bindings::usb_register_driver(udrv.get(), module.0, name.as_char_ptr())
+ bindings::usb_register_driver(udrv.get(), module.as_ptr(), name.as_char_ptr())
})
}
--
2.43.0
^ permalink raw reply related
* [PATCH v6 03/10] rust: doctest: add LocalModule fallback for #[vtable] ThisModule
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Add a `LocalModule` struct with a null-pointer `ModuleMetadata` impl
in the doctest harness, so that `crate::LocalModule` (auto-inserted
by `#[vtable]`) resolves correctly when there is no `module!` macro.
Assisted-by: opencode:glm-5.2
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
scripts/rustdoc_test_gen.rs | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index ee76e96b41eea..198af4e446c8c 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -239,6 +239,22 @@ macro_rules! assert_eq {{
const __LOG_PREFIX: &[u8] = b"rust_doctests_kernel\0";
+/// Dummy module type for doctest context.
+struct LocalModule;
+
+use kernel::{{
+ str::CStr,
+ ModuleMetadata,
+ ThisModule, //
+}};
+use core::ptr::null_mut;
+
+impl ModuleMetadata for LocalModule {{
+ const NAME: &'static CStr = c"rust_doctests_kernel";
+ // SAFETY: `try_module_get`/`module_put` handle null module pointers gracefully.
+ const THIS_MODULE: ThisModule = unsafe {{ ThisModule::from_ptr(null_mut()) }};
+}}
+
{rust_tests}
"#
)
--
2.43.0
^ permalink raw reply related
* [PATCH v6 02/10] rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Since `const_refs_to_static` has been stable as of the MSRV bump, a
`ThisModule` pointer can now be used in const contexts.
Add a `THIS_MODULE` const to the `ModuleMetadata` trait so that modules
can provide their `ThisModule` pointer in const contexts such as static
`file_operations`.
Add a `this_module()` helper to retrieve the `THIS_MODULE` pointer of a
given module type, and update `__init` to use it instead of the
`THIS_MODULE` static generated by the `module!` macro.
The `static THIS_MODULE` generated by the `module!` macro is retained
for backwards compatibility with existing users and removed in a later
patch once all references have been migrated.
Assisted-by: opencode:glm-5.2
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/module.rs | 9 +++++++++
rust/macros/module.rs | 18 +++++++++++++++++-
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/module.rs b/rust/kernel/module.rs
index be242a82e86d2..d713705984477 100644
--- a/rust/kernel/module.rs
+++ b/rust/kernel/module.rs
@@ -42,6 +42,15 @@ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, crate::erro
pub trait ModuleMetadata {
/// The name of the module as specified in the `module!` macro.
const NAME: &'static crate::str::CStr;
+
+ /// The module's `THIS_MODULE` pointer.
+ const THIS_MODULE: ThisModule;
+}
+
+/// Returns a reference to the `THIS_MODULE` of the given module type.
+#[inline]
+pub const fn this_module<M: ModuleMetadata>() -> &'static ThisModule {
+ &M::THIS_MODULE
}
/// Equivalent to `THIS_MODULE` in the C API.
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index 06c18e2075083..aa9a618d5d19e 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -519,6 +519,22 @@ pub(crate) fn module(info: ModuleInfo) -> Result<TokenStream> {
impl ::kernel::ModuleMetadata for #type_ {
const NAME: &'static ::kernel::str::CStr = #name_cstr;
+
+ #[cfg(MODULE)]
+ const THIS_MODULE: ::kernel::ThisModule = {
+ extern "C" {
+ static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
+ }
+
+ // SAFETY: `__this_module` is constructed by the kernel at load time
+ // and lives until the module is unloaded.
+ unsafe { ::kernel::ThisModule::from_ptr(__this_module.get()) }
+ };
+
+ #[cfg(not(MODULE))]
+ const THIS_MODULE: ::kernel::ThisModule = unsafe {
+ ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
+ };
}
// Double nested modules, since then nobody can access the public items inside.
@@ -616,7 +632,7 @@ pub extern "C" fn #ident_exit() {
/// This function must only be called once.
unsafe fn __init() -> ::kernel::ffi::c_int {
let initer = <super::super::LocalModule as ::kernel::InPlaceModule>::init(
- &super::super::THIS_MODULE
+ ::kernel::module::this_module::<super::super::LocalModule>()
);
// SAFETY: No data race, since `__MOD` can only be accessed by this module
// and there only `__init` and `__exit` access it. These functions are only
--
2.43.0
^ permalink raw reply related
* [PATCH v6 00/10] Fix missing fops.owner in Rust DRM/misc abstractions
From: Alvin Sun @ 2026-06-24 14:59 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
During tyr debugfs development, a kernel NULL pointer dereference was
encountered after `rmmod tyr` while gnome-shell still held /dev/card1 open:
```
[158827.868132] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[158827.868918] Mem abort info:
[158827.869177] ESR = 0x0000000086000004
[158827.869519] EC = 0x21: IABT (current EL), IL = 32 bits
[158827.870000] SET = 0, FnV = 0
[158827.870281] EA = 0, S1PTW = 0
[158827.870571] FSC = 0x04: level 0 translation fault
[158827.871043] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108dec000
[158827.871623] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[158827.872242] Internal error: Oops: 0000000086000004 [#1] SMP
[158827.872246] Modules linked in: tyr sunrpc snd_soc_simple_card rk805_pwrkey snd_soc_simple_card_utils rtw88_8822bu display_connector rtw88_usb rtw88_8822b snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec
rtw88_core]
[158827.872337] CPU: 4 UID: 1000 PID: 11276 Comm: gnome-s:disk$0 Tainted: G N 7.1.0-rc1+ #331 PREEMPT
[158827.880534] Tainted: [N]=TEST
[158827.880535] Hardware name: FriendlyElec NanoPi R6C/NanoPi R6C, BIOS v1.1 04/09/2025
[158827.880538] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[158827.880542] pc : 0x0
[158827.880547] lr : _RNvNtCs257m05FHVbX_3tyr2vm8pt_unmap+0x8c/0x12c [tyr]
[158827.880578] sp : ffff800083c236b0
[158827.880579] x29: ffff800083c236d0 x28: ffff00013f8a0000 x27: 0000000000000000
[158827.880585] x26: 000000000000007c x25: ffff000108e6ed80 x24: 0000000000401000
[158827.880590] x23: 0000000000000000 x22: 0000000040000000 x21: 0000000000001000
[158827.880595] x20: ffff00010f778138 x19: 0000000000400000 x18: 00000000ffffffff
[158827.880600] x17: 000000040044ffff x16: 045000f2b5503510 x15: 0720072007200720
[158827.880606] x14: 0720072007200720 x13: 0000000000401000 x12: 0000000000400000
[158827.880611] x11: ffff800083c239d0 x10: ffff000141e4fd88 x9 : 0000000000000000
[158827.880615] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000400000
[158827.880620] x5 : ffff00013f8a0000 x4 : 0000000000000000 x3 : 0000000000000001
[158827.880625] x2 : 0000000000001000 x1 : 0000000000400000 x0 : ffff00010f778138
[158827.880630] Call trace:
[158827.880632] 0x0 (P)
[158827.880635] _RNvXs6_NtCs257m05FHVbX_3tyr2vmNtB5_9GpuVmDataNtNtNtCsgmSOfgXi5CZ_6kernel3drm5gpuvm11DriverGpuVm13sm_step_unmap+0x3c/0x120 [tyr]
[158827.891166] _RNvMs4_NtNtNtCsgmSOfgXi5CZ_6kernel3drm5gpuvm6sm_opsINtB7_5GpuVmNtNtCs257m05FHVbX_3tyr2vm9GpuVmDataE13sm_step_unmapB13_+0x18/0x34 [tyr]
[158827.891187] op_unmap_cb+0x78/0xb0
[158827.891196] __drm_gpuvm_sm_unmap+0x18c/0x1b4
[158827.891204] drm_gpuvm_sm_unmap+0x38/0x4c
[158827.891209] _RNvMs5_NtCs257m05FHVbX_3tyr2vmNtB5_2Vm7exec_op+0x1cc/0x254 [tyr]
[158827.894085] _RNvMs5_NtCs257m05FHVbX_3tyr2vmNtB5_2Vm11unmap_range+0x124/0x188 [tyr]
[158827.894105] _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeNtNtCs257m05FHVbX_3tyr3gem8KernelBoEBK_+0x44/0xd8 [tyr]
[158827.894125] _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeINtNtNtCsgmSOfgXi5CZ_6kernel5alloc4kvec3VecNtNtCs257m05FHVbX_3tyr2fw7SectionNtNtBL_9allocator7KmallocEEB1r_+0x3c/0x100 [tyr]
[158827.894147] _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeINtNtNtCsgmSOfgXi5CZ_6kernel4sync3arc3ArcNtNtCs257m05FHVbX_3tyr2fw8FirmwareEEB1p_+0x94/0x190 [tyr]
[158827.894167] _RNvMs4_NtNtCsgmSOfgXi5CZ_6kernel3drm6deviceINtB5_6DeviceNtNtCs257m05FHVbX_3tyr6driver12TyrDrmDriverE7releaseBW_+0x30/0x98 [tyr]
[158827.899550] drm_dev_put.part.0+0x88/0xc0
[158827.899557] drm_minor_release+0x18/0x28
[158827.899562] drm_release+0x144/0x170
[158827.899567] __fput+0xe4/0x30c
[158827.899573] ____fput+0x14/0x20
[158827.899579] task_work_run+0x7c/0xe8
[158827.899586] do_exit+0x2a8/0xac4
[158827.899590] do_group_exit+0x34/0x90
[158827.899594] get_signal+0xaac/0xabc
[158827.899599] arch_do_signal_or_restart+0x90/0x3e8
[158827.899606] exit_to_user_mode_loop+0x140/0x1d0
[158827.899613] el0_svc+0x2f4/0x2f8
[158827.899620] el0t_64_sync_handler+0xa0/0xe4
[158827.899627] el0t_64_sync+0x198/0x19c
[158827.899632] ---[ end trace 0000000000000000 ]---
```
The root cause: `fops.owner` was `NULL` in Rust DRM drivers, so the kernel
never blocked module unloading while file descriptors were open. This leads to
use-after-free when drm_release (or other fops) is called on freed module code.
The series moves `THIS_MODULE` into the `ModuleMetadata` as a const, threads it
through `#[vtable]` to set `fops.owner` in DRM/miscdevice, and updates configfs
and rnull to use `this_module::<LocalModule>()`.
Assisted-by: opencode:glm-5.2
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
Changes in v6:
- Update MAINTAINERS to cover the new `rust/kernel/module.rs`.
- Link to v5: https://lore.kernel.org/r/20260624-fix-fops-owner-v5-0-aa1cba242f05@linux.dev
Changes in v5:
- Add `#[inline]` to the `this_module()` helper.
- Fix configfs doc comment to reference `crate::LocalModule` instead of
bare `LocalModule`.
- Link to v4: https://lore.kernel.org/r/20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev
Changes in v4:
- Move module-related types into a new `rust/kernel/module.rs`.
- Migrate binder from the `module!`-generated `THIS_MODULE` static to
`this_module::<LocalModule>()`.
- Reorganise the series so that every commit builds independently, and
drop the legacy `THIS_MODULE` static once all users are migrated.
- Link to v3: https://lore.kernel.org/r/20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev
Changes in v3:
- Renamed vtable associated type `ThisModule` to `OwnerModule`
- Added `this_module()` helper for ergonomic `THIS_MODULE` access
- Refined vtable macro implementation: one-liner detection and single `defined_items` set
- Reordered commits to place doctest fallback before vtable auto-insert
- Link to v2: https://lore.kernel.org/r/20260521-fix-fops-owner-v2-0-fd99079c5a04@linux.dev
Changes in v2:
- Merged old `static THIS_MODULE` and v1's `MODULE_PTR` into a single
`ModuleMetadata::THIS_MODULE` const
- `#[vtable]` macro now auto-inserts `type ThisModule`, removing all per-driver
manual patches from v1
- Added configfs & rnull usage site updates and doctest `LocalModule` fallback
- Link to v1: https://lore.kernel.org/r/20260519-fix-fops-owner-v1-0-2ded9830da14@linux.dev
---
Alvin Sun (10):
rust: module: move module types into `module.rs`
rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
rust: doctest: add LocalModule fallback for #[vtable] ThisModule
rust: macros: auto-insert OwnerModule in #[vtable]
rust: drm: set fops.owner from driver module pointer
rust: miscdevice: set fops.owner from driver module pointer
rust: configfs: use `LocalModule` for `THIS_MODULE`
rust: binder: use `LocalModule` for `THIS_MODULE`
rust: macros: remove `THIS_MODULE` static from `module!`
rust: module: update MAINTAINERS to cover module.rs
MAINTAINERS | 2 +-
drivers/android/binder/rust_binder_main.rs | 3 +-
drivers/block/rnull/configfs.rs | 6 +--
rust/kernel/auxiliary.rs | 2 +-
rust/kernel/configfs.rs | 8 +--
rust/kernel/drm/device.rs | 3 +-
rust/kernel/drm/gem/mod.rs | 4 +-
rust/kernel/i2c.rs | 2 +-
rust/kernel/lib.rs | 75 +++-------------------------
rust/kernel/miscdevice.rs | 4 +-
rust/kernel/module.rs | 80 ++++++++++++++++++++++++++++++
rust/kernel/net/phy.rs | 6 ++-
rust/kernel/pci.rs | 2 +-
rust/kernel/platform.rs | 2 +-
rust/kernel/usb.rs | 2 +-
rust/macros/lib.rs | 6 +++
rust/macros/module.rs | 34 ++++++-------
rust/macros/vtable.rs | 41 +++++++++++++--
scripts/rustdoc_test_gen.rs | 16 ++++++
19 files changed, 189 insertions(+), 109 deletions(-)
---
base-commit: b7e5ac83cb16f7ffd11dc23736f84276602100ed
change-id: 20260519-fix-fops-owner-e3a77bb27c6c
prerequisite-change-id: 20260519-miscdev-use-format-9ab7e83b1c11:v3
prerequisite-patch-id: 405b334ff0d48ad350014f05a2321bdbaa025400
prerequisite-patch-id: 604b631c81d5423f4ebb2e12ba2d22e9ce371bfc
prerequisite-patch-id: cb550d94cefe01920e0d3ced2b2bcbecd76f3907
prerequisite-patch-id: 3bc830839742591460cb86d9472c04f4686dc600
prerequisite-patch-id: 571058244bc8c7088638d2e3225713011246c7e9
prerequisite-patch-id: 347c5a3c6dbef9832bfce8419fc23e6e08ba477f
prerequisite-patch-id: 3e202d988b56b88446f7535e90d3f00cf5f15701
Best regards,
--
Alvin Sun <alvin.sun@linux.dev>
^ permalink raw reply
* [PATCH v6 04/10] rust: macros: auto-insert OwnerModule in #[vtable]
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Auto-add `type OwnerModule: ::kernel::ModuleMetadata;` as a required
associated type on the trait side if not already defined, and
auto-insert `type OwnerModule = crate::LocalModule;` on the impl side
if not explicitly provided, eliminating the need to manually declare
and implement `OwnerModule` in every vtable trait and impl.
Assisted-by: opencode:glm-5.2
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/all/DIMMWHUOLPSH.13JFRHDKDQJGO@garyguo.net
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/macros/lib.rs | 6 ++++++
rust/macros/vtable.rs | 41 ++++++++++++++++++++++++++++++++++++-----
2 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/rust/macros/lib.rs b/rust/macros/lib.rs
index 2cfd59e0f9e7c..bc7ded353c5ca 100644
--- a/rust/macros/lib.rs
+++ b/rust/macros/lib.rs
@@ -176,6 +176,12 @@ pub fn module(input: TokenStream) -> TokenStream {
///
/// This macro should not be used when all functions are required.
///
+/// Additionally, this macro automatically handles the `OwnerModule`
+/// associated type: on the trait side, `type OwnerModule: ModuleMetadata;`
+/// is added as a required associated type if not already defined; on the
+/// impl side, `type OwnerModule = LocalModule;` is automatically inserted
+/// if not explicitly defined.
+///
/// # Examples
///
/// ```
diff --git a/rust/macros/vtable.rs b/rust/macros/vtable.rs
index c6510b0c4ea1d..be9a5ed8abe5e 100644
--- a/rust/macros/vtable.rs
+++ b/rust/macros/vtable.rs
@@ -30,6 +30,22 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
const USE_VTABLE_ATTR: ();
});
+ // Add `type OwnerModule: ModuleMetadata` as a required associated type if
+ // the trait does not already define it.
+ if !item
+ .items
+ .iter()
+ .any(|i| matches!(i, TraitItem::Type(t) if t.ident == "OwnerModule"))
+ {
+ gen_items.push(parse_quote! {
+ /// The module implementing this vtable trait.
+ ///
+ /// Automatically set to `crate::LocalModule` by the `#[vtable]`
+ /// impl macro.
+ type OwnerModule: ::kernel::ModuleMetadata;
+ });
+ }
+
for item in &item.items {
if let TraitItem::Fn(fn_item) = item {
let name = &fn_item.sig.ident;
@@ -57,12 +73,18 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
let mut gen_items = Vec::new();
- let mut defined_consts = HashSet::new();
+ let mut defined_items = HashSet::new();
- // Iterate over all user-defined constants to gather any possible explicit overrides.
+ // Iterate over all user-defined items to gather any possible explicit overrides.
for item in &item.items {
- if let ImplItem::Const(const_item) = item {
- defined_consts.insert(const_item.ident.clone());
+ match item {
+ ImplItem::Const(const_item) => {
+ defined_items.insert(const_item.ident.clone());
+ }
+ ImplItem::Type(type_item) => {
+ defined_items.insert(type_item.ident.clone());
+ }
+ _ => {}
}
}
@@ -70,6 +92,15 @@ fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
const USE_VTABLE_ATTR: () = ();
});
+ // Auto-insert `type OwnerModule = crate::LocalModule` if not explicitly defined.
+ // `crate::LocalModule` resolves to the real module type (via `module!`) or a
+ // dummy fallback in non-module contexts (e.g., doctests).
+ if !defined_items.contains(&parse_quote!(OwnerModule)) {
+ gen_items.push(parse_quote! {
+ type OwnerModule = crate::LocalModule;
+ });
+ }
+
for item in &item.items {
if let ImplItem::Fn(fn_item) = item {
let name = &fn_item.sig.ident;
@@ -78,7 +109,7 @@ fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
name.span(),
);
// Skip if it's declared already -- this allows user override.
- if defined_consts.contains(&gen_const_name) {
+ if defined_items.contains(&gen_const_name) {
continue;
}
let cfg_attrs = crate::helpers::gather_cfg_attrs(&fn_item.attrs);
--
2.43.0
^ permalink raw reply related
* [PATCH v6 05/10] rust: drm: set fops.owner from driver module pointer
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Change `create_fops()` to accept an owner module pointer instead of
hardcoding `null_mut()`, ensuring the kernel correctly tracks the
module owning the DRM device's file operations.
Assisted-by: opencode:glm-5.2
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/drm/device.rs | 3 ++-
rust/kernel/drm/gem/mod.rs | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/rust/kernel/drm/device.rs b/rust/kernel/drm/device.rs
index 403fc35353c74..d92cacb665366 100644
--- a/rust/kernel/drm/device.rs
+++ b/rust/kernel/drm/device.rs
@@ -111,7 +111,8 @@ impl<T: drm::Driver> Device<T> {
fops: &Self::GEM_FOPS,
};
- const GEM_FOPS: bindings::file_operations = drm::gem::create_fops();
+ const GEM_FOPS: bindings::file_operations =
+ drm::gem::create_fops(crate::module::this_module::<T::OwnerModule>().as_ptr());
/// Create a new `drm::Device` for a `drm::Driver`.
pub fn new(dev: &device::Device, data: impl PinInit<T::Data, Error>) -> Result<ARef<Self>> {
diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
index 01b5bd47a3332..9a203efc59116 100644
--- a/rust/kernel/drm/gem/mod.rs
+++ b/rust/kernel/drm/gem/mod.rs
@@ -357,10 +357,10 @@ impl<T: DriverObject> AllocImpl for Object<T> {
};
}
-pub(super) const fn create_fops() -> bindings::file_operations {
+pub(super) const fn create_fops(owner: *mut bindings::module) -> bindings::file_operations {
let mut fops: bindings::file_operations = pin_init::zeroed();
- fops.owner = core::ptr::null_mut();
+ fops.owner = owner;
fops.open = Some(bindings::drm_open);
fops.release = Some(bindings::drm_release);
fops.unlocked_ioctl = Some(bindings::drm_ioctl);
--
2.43.0
^ permalink raw reply related
* [PATCH v6 08/10] rust: binder: use `LocalModule` for `THIS_MODULE`
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Replace the `THIS_MODULE` static reference in the binder fops with
`this_module::<LocalModule>()`, consistent with the move of
`THIS_MODULE` into the `ModuleMetadata` trait.
Assisted-by: opencode:glm-5.2
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
drivers/android/binder/rust_binder_main.rs | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/android/binder/rust_binder_main.rs b/drivers/android/binder/rust_binder_main.rs
index dc1941cd2407b..d6ceebbd5f94e 100644
--- a/drivers/android/binder/rust_binder_main.rs
+++ b/drivers/android/binder/rust_binder_main.rs
@@ -17,6 +17,7 @@
bindings::{self, seq_file},
fs::File,
list::{ListArc, ListArcSafe, ListLinksSelfPtr, TryNewListArc},
+ module::this_module,
prelude::*,
seq_file::SeqFile,
seq_print,
@@ -318,7 +319,7 @@ unsafe impl<T> Sync for AssertSync<T> {}
let zeroed_ops = unsafe { core::mem::MaybeUninit::zeroed().assume_init() };
let ops = kernel::bindings::file_operations {
- owner: THIS_MODULE.as_ptr(),
+ owner: this_module::<LocalModule>().as_ptr(),
poll: Some(rust_binder_poll),
unlocked_ioctl: Some(rust_binder_ioctl),
compat_ioctl: bindings::compat_ptr_ioctl,
--
2.43.0
^ permalink raw reply related
* [PATCH v6 07/10] rust: configfs: use `LocalModule` for `THIS_MODULE`
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Replace the `THIS_MODULE` static reference in the `configfs_attrs!`
macro with `this_module::<LocalModule>()`, and update
rnull to import `LocalModule` instead of `THIS_MODULE`, consistent
with the move of `THIS_MODULE` into the `ModuleMetadata` trait.
Assisted-by: opencode:glm-5.2
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
drivers/block/rnull/configfs.rs | 6 ++----
rust/kernel/configfs.rs | 8 +++++---
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index c10a55fc58948..b2547ad1e5ddd 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -1,9 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
-use super::{
- NullBlkDevice,
- THIS_MODULE, //
-};
+use super::NullBlkDevice;
+use crate::LocalModule;
use kernel::{
block::mq::gen_disk::{
GenDisk,
diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
index 2339c6467325d..c31d7882e216d 100644
--- a/rust/kernel/configfs.rs
+++ b/rust/kernel/configfs.rs
@@ -875,7 +875,7 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
/// configfs::Subsystem<Configuration>,
/// Configuration
/// >::new_with_child_ctor::<N,Child>(
-/// &THIS_MODULE,
+/// ::kernel::module::this_module::<crate::LocalModule>(),
/// &CONFIGURATION_ATTRS
/// );
///
@@ -1021,7 +1021,8 @@ macro_rules! configfs_attrs {
static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data> =
$crate::configfs::ItemType::<$container, $data>::new::<N>(
- &THIS_MODULE, &[<$ data:upper _ATTRS >]
+ $crate::module::this_module::<LocalModule>(),
+ &[<$ data:upper _ATTRS >]
);
)?
@@ -1030,7 +1031,8 @@ macro_rules! configfs_attrs {
$crate::configfs::ItemType<$container, $data> =
$crate::configfs::ItemType::<$container, $data>::
new_with_child_ctor::<N, $child>(
- &THIS_MODULE, &[<$ data:upper _ATTRS >]
+ $crate::module::this_module::<LocalModule>(),
+ &[<$ data:upper _ATTRS >]
);
)?
--
2.43.0
^ permalink raw reply related
* [PATCH v6 06/10] rust: miscdevice: set fops.owner from driver module pointer
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Set the miscdevice fops owner field from the driver module pointer
via the `this_module::<T::OwnerModule>()` helper, instead of
defaulting to null.
Assisted-by: opencode:glm-5.2
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/miscdevice.rs | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs
index 83ce50def5ac9..2a4329f98614e 100644
--- a/rust/kernel/miscdevice.rs
+++ b/rust/kernel/miscdevice.rs
@@ -24,12 +24,13 @@
IovIterSource, //
},
mm::virt::VmaNew,
+ module::this_module,
prelude::*,
seq_file::SeqFile,
types::{
ForeignOwnable,
Opaque, //
- },
+ }, //
};
use core::marker::PhantomData;
@@ -430,6 +431,7 @@ impl<T: MiscDevice> MiscdeviceVTable<T> {
} else {
None
},
+ owner: this_module::<T::OwnerModule>().as_ptr(),
..pin_init::zeroed()
};
--
2.43.0
^ permalink raw reply related
* [PATCH v6 09/10] rust: macros: remove `THIS_MODULE` static from `module!`
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
All users have been migrated to `ModuleMetadata::THIS_MODULE` const or
`this_module::<LocalModule>()` helper. The `static THIS_MODULE`
generated by the `module!` macro is no longer referenced anywhere,
so remove it to avoid having two sources of the same `ThisModule`
pointer.
Assisted-by: opencode:glm-5.2
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/macros/module.rs | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index aa9a618d5d19e..23b6a1b456b80 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -497,22 +497,6 @@ pub(crate) fn module(info: ModuleInfo) -> Result<TokenStream> {
/// Used by the printing macros, e.g. [`info!`].
const __LOG_PREFIX: &[u8] = #name_cstr.to_bytes_with_nul();
- // SAFETY: `__this_module` is constructed by the kernel at load time and will not be
- // freed until the module is unloaded.
- #[cfg(MODULE)]
- static THIS_MODULE: ::kernel::ThisModule = unsafe {
- extern "C" {
- static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
- };
-
- ::kernel::ThisModule::from_ptr(__this_module.get())
- };
-
- #[cfg(not(MODULE))]
- static THIS_MODULE: ::kernel::ThisModule = unsafe {
- ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
- };
-
/// The `LocalModule` type is the type of the module created by `module!`,
/// `module_pci_driver!`, `module_platform_driver!`, etc.
type LocalModule = #type_;
--
2.43.0
^ permalink raw reply related
* [PATCH v6 10/10] rust: module: update MAINTAINERS to cover module.rs
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>
Module types now live in `rust/kernel/module.rs` alongside
`rust/kernel/module_param.rs`. Update the MODULE SUPPORT file pattern
from `rust/kernel/module_param.rs` to `rust/kernel/module*.rs` so both
files are covered.
Assisted-by: opencode:glm-5.2
Link: https://lore.kernel.org/rust-for-linux/8ea21b29-9baf-4926-a16f-7d21c5a1a1b8@suse.com
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index e035a3be797c4..74733de3e41ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17984,7 +17984,7 @@ F: include/linux/module*.h
F: kernel/module/
F: lib/test_kmod.c
F: lib/tests/module/
-F: rust/kernel/module_param.rs
+F: rust/kernel/module*.rs
F: rust/macros/module.rs
F: scripts/module*
F: tools/testing/selftests/kmod/
--
2.43.0
^ permalink raw reply related
* Re: [BUG] KFENCE: use-after-free read in udp_tunnel_nic_device_sync_work
From: Eric Dumazet @ 2026-06-24 15:00 UTC (permalink / raw)
To: Sam Sun
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
linux-kernel, syzkaller
In-Reply-To: <CAEkJfYMXsNuJjKWJ5nvvw0afSP77F0WWT0gfj2-sQM3VyZ0brQ@mail.gmail.com>
On Wed, Jun 24, 2026 at 7:46 AM Sam Sun <samsun1006219@gmail.com> wrote:
>
> So we are still freeing struct udp_tunnel_nic while its embedded work_struct
> is active. debugobjects catches this at kfree() before the active work gets a
> chance to run later and dereference the freed utn.
>
> My read is that the conversion from bitfields to atomic bitops removes the
> plain bitfield data race, but UDP_TUNNEL_NIC_WORK_PENDING is still only one
> boolean state. It can represent "some work is pending", but it cannot
> distinguish between:
> idle
> queued
> running
> running and queued again
>
> In particular, the workqueue core clears WORK_STRUCT_PENDING before invoking
> the worker. At that point the same work item can be queued again by
> udp_tunnel_nic_device_sync(). If an already running instance later executes:
>
> clear_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags);
>
> it can still clear the bit that was set for the requeued instance. Then
> udp_tunnel_nic_unregister() may observe UDP_TUNNEL_NIC_WORK_PENDING clear and
> free utn, even though debugobjects still sees utn->work as active.
>
-ETOOMANYBUGS
Ok, we could try to convert pending bit to a refcount.
diff --git a/net/ipv4/udp_tunnel_nic.c b/net/ipv4/udp_tunnel_nic.c
index 9944ed923ddfd10f9adf6ad788c0740daeaf2adb..2e14686f35896cb0caba3f8f587ef8b369090fbf
100644
--- a/net/ipv4/udp_tunnel_nic.c
+++ b/net/ipv4/udp_tunnel_nic.c
@@ -3,6 +3,7 @@
#include <linux/ethtool_netlink.h>
#include <linux/netdevice.h>
+#include <linux/refcount.h>
#include <linux/slab.h>
#include <linux/types.h>
#include <linux/workqueue.h>
@@ -30,9 +31,8 @@ struct udp_tunnel_nic_table_entry {
* @work: async work for talking to hardware from process context
* @dev: netdev pointer
* @lock: protects all fields
- * @need_sync: at least one port start changed
- * @need_replay: space was freed, we need a replay of all ports
- * @work_pending: @work is currently scheduled
+ * @flags: sync, replay flags
+ * @refcnt: reference count
* @n_tables: number of tables under @entries
* @missed: bitmap of tables which overflown
* @entries: table of tables of ports currently offloaded
@@ -44,9 +44,11 @@ struct udp_tunnel_nic {
struct mutex lock;
- u8 need_sync:1;
- u8 need_replay:1;
- u8 work_pending:1;
+ unsigned long flags;
+#define UDP_TUNNEL_NIC_NEED_SYNC 0
+#define UDP_TUNNEL_NIC_NEED_REPLAY 1
+
+ refcount_t refcnt;
unsigned int n_tables;
unsigned long missed;
@@ -116,7 +118,7 @@ udp_tunnel_nic_entry_queue(struct udp_tunnel_nic *utn,
unsigned int flag)
{
entry->flags |= flag;
- utn->need_sync = 1;
+ set_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
}
static void
@@ -283,7 +285,7 @@ udp_tunnel_nic_device_sync_by_table(struct net_device *dev,
static void
__udp_tunnel_nic_device_sync(struct net_device *dev, struct
udp_tunnel_nic *utn)
{
- if (!utn->need_sync)
+ if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
return;
if (dev->udp_tunnel_nic_info->sync_table)
@@ -291,21 +293,24 @@ __udp_tunnel_nic_device_sync(struct net_device
*dev, struct udp_tunnel_nic *utn)
else
udp_tunnel_nic_device_sync_by_port(dev, utn);
- utn->need_sync = 0;
+ clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
/* Can't replay directly here, in case we come from the tunnel driver's
* notification - trying to replay may deadlock inside tunnel driver.
*/
- utn->need_replay = udp_tunnel_nic_should_replay(dev, utn);
+ if (udp_tunnel_nic_should_replay(dev, utn))
+ set_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
+ else
+ clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
}
static void
udp_tunnel_nic_device_sync(struct net_device *dev, struct udp_tunnel_nic *utn)
{
- if (!utn->need_sync)
+ if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
return;
- queue_work(udp_tunnel_nic_workqueue, &utn->work);
- utn->work_pending = 1;
+ if (queue_work(udp_tunnel_nic_workqueue, &utn->work))
+ refcount_inc(&utn->refcnt);
}
static bool
@@ -348,7 +353,7 @@ udp_tunnel_nic_has_collision(struct net_device
*dev, struct udp_tunnel_nic *utn,
if (!udp_tunnel_nic_entry_is_free(entry) &&
entry->port == ti->port &&
entry->type != ti->type) {
- __set_bit(i, &utn->missed);
+ set_bit(i, &utn->missed);
return true;
}
}
@@ -483,7 +488,7 @@ udp_tunnel_nic_add_new(struct net_device *dev,
struct udp_tunnel_nic *utn,
* are no devices currently which have multiple tables accepting
* the same tunnel type, and false positives are okay.
*/
- __set_bit(i, &utn->missed);
+ set_bit(i, &utn->missed);
}
return false;
@@ -552,7 +557,7 @@ static void __udp_tunnel_nic_reset_ntf(struct
net_device *dev)
mutex_lock(&utn->lock);
- utn->need_sync = false;
+ clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
for (i = 0; i < utn->n_tables; i++)
for (j = 0; j < info->tables[i].n_entries; j++) {
struct udp_tunnel_nic_table_entry *entry;
@@ -696,8 +701,8 @@ udp_tunnel_nic_flush(struct net_device *dev,
struct udp_tunnel_nic *utn)
for (i = 0; i < utn->n_tables; i++)
memset(utn->entries[i], 0, array_size(info->tables[i].n_entries,
sizeof(**utn->entries)));
- WARN_ON(utn->need_sync);
- utn->need_replay = 0;
+ WARN_ON(test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags));
+ clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
}
static void
@@ -713,8 +718,8 @@ udp_tunnel_nic_replay(struct net_device *dev,
struct udp_tunnel_nic *utn)
for (i = 0; i < utn->n_tables; i++)
for (j = 0; j < info->tables[i].n_entries; j++)
udp_tunnel_nic_entry_freeze_used(&utn->entries[i][j]);
- utn->missed = 0;
- utn->need_replay = 0;
+ bitmap_zero(&utn->missed, UDP_TUNNEL_NIC_MAX_TABLES);
+ clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
if (!info->shared) {
udp_tunnel_get_rx_info(dev);
@@ -728,6 +733,25 @@ udp_tunnel_nic_replay(struct net_device *dev,
struct udp_tunnel_nic *utn)
udp_tunnel_nic_entry_unfreeze(&utn->entries[i][j]);
}
+static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
+{
+ unsigned int i;
+
+ for (i = 0; i < utn->n_tables; i++)
+ kfree(utn->entries[i]);
+
+ if (utn->dev)
+ dev_put(utn->dev);
+
+ kfree(utn);
+}
+
+static void udp_tunnel_nic_put(struct udp_tunnel_nic *utn)
+{
+ if (refcount_dec_and_test(&utn->refcnt))
+ udp_tunnel_nic_free(utn);
+}
+
static void udp_tunnel_nic_device_sync_work(struct work_struct *work)
{
struct udp_tunnel_nic *utn =
@@ -736,14 +760,15 @@ static void
udp_tunnel_nic_device_sync_work(struct work_struct *work)
rtnl_lock();
mutex_lock(&utn->lock);
- utn->work_pending = 0;
__udp_tunnel_nic_device_sync(utn->dev, utn);
- if (utn->need_replay)
+ if (test_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags))
udp_tunnel_nic_replay(utn->dev, utn);
mutex_unlock(&utn->lock);
rtnl_unlock();
+
+ udp_tunnel_nic_put(utn);
}
static struct udp_tunnel_nic *
@@ -759,6 +784,7 @@ udp_tunnel_nic_alloc(const struct udp_tunnel_nic_info *info,
utn->n_tables = n_tables;
INIT_WORK(&utn->work, udp_tunnel_nic_device_sync_work);
mutex_init(&utn->lock);
+ refcount_set(&utn->refcnt, 1);
for (i = 0; i < n_tables; i++) {
utn->entries[i] = kzalloc_objs(*utn->entries[i],
@@ -776,15 +802,6 @@ udp_tunnel_nic_alloc(const struct
udp_tunnel_nic_info *info,
return NULL;
}
-static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
-{
- unsigned int i;
-
- for (i = 0; i < utn->n_tables; i++)
- kfree(utn->entries[i]);
- kfree(utn);
-}
-
static int udp_tunnel_nic_register(struct net_device *dev)
{
const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
@@ -863,6 +880,7 @@ static void
udp_tunnel_nic_unregister(struct net_device *dev, struct udp_tunnel_nic *utn)
{
const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
+ bool last = true;
udp_tunnel_nic_lock(dev);
@@ -889,6 +907,7 @@ udp_tunnel_nic_unregister(struct net_device *dev,
struct udp_tunnel_nic *utn)
udp_tunnel_drop_rx_info(dev);
utn->dev = first->dev;
udp_tunnel_nic_unlock(dev);
+ last = false;
goto release_dev;
}
@@ -901,16 +920,11 @@ udp_tunnel_nic_unregister(struct net_device
*dev, struct udp_tunnel_nic *utn)
udp_tunnel_nic_flush(dev, utn);
udp_tunnel_nic_unlock(dev);
- /* Wait for the work to be done using the state, netdev core will
- * retry unregister until we give up our reference on this device.
- */
- if (utn->work_pending)
- return;
-
- udp_tunnel_nic_free(utn);
+ udp_tunnel_nic_put(utn);
release_dev:
dev->udp_tunnel_nic = NULL;
- dev_put(dev);
+ if (!last)
+ dev_put(dev);
}
static int
^ permalink raw reply
* [PATCH net] dt-bindings: net: renesas,ether: Drop example "ethernet-phy-ieee802.3-c22" fallback
From: Rob Herring (Arm) @ 2026-06-24 15:02 UTC (permalink / raw)
To: Niklas Söderlund, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Krzysztof Kozlowski, Conor Dooley,
Geert Uytterhoeven, Magnus Damm, Sergei Shtylyov
Cc: netdev, linux-renesas-soc, devicetree, linux-kernel
Fix the Micrel PHY in the example which shouldn't have the
fallback "ethernet-phy-ieee802.3-c22" compatible:
Documentation/devicetree/bindings/net/renesas,ether.example.dtb: ethernet-phy@1 \
(ethernet-phy-id0022.1537): compatible: ['ethernet-phy-id0022.1537', 'ethernet-phy-ieee802.3-c22'] is too long
from schema $id: http://devicetree.org/schemas/net/micrel.yaml
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
Documentation/devicetree/bindings/net/renesas,ether.yaml | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/renesas,ether.yaml b/Documentation/devicetree/bindings/net/renesas,ether.yaml
index f0a52f47f95a..dd7187f12a67 100644
--- a/Documentation/devicetree/bindings/net/renesas,ether.yaml
+++ b/Documentation/devicetree/bindings/net/renesas,ether.yaml
@@ -121,8 +121,7 @@ examples:
#size-cells = <0>;
phy1: ethernet-phy@1 {
- compatible = "ethernet-phy-id0022.1537",
- "ethernet-phy-ieee802.3-c22";
+ compatible = "ethernet-phy-id0022.1537";
reg = <1>;
interrupt-parent = <&irqc0>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
--
2.53.0
^ permalink raw reply related
* Re: [PATCH net v2] net: sungem: fix probe error cleanup
From: Simon Horman @ 2026-06-24 15:06 UTC (permalink / raw)
To: Ruoyu Wang
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260623025759.3468566-1-ruoyuw560@gmail.com>
On Tue, Jun 23, 2026 at 10:57:59AM +0800, Ruoyu Wang wrote:
> gem_init_one() calls gem_remove_one() when register_netdev() fails.
> gem_remove_one() unregisters and frees resources owned by the net_device,
> including the DMA block, MMIO mapping, PCI regions, and the net_device
> itself. gem_init_one() then falls through to its own cleanup labels and
> frees the same resources again.
>
> Keep the register_netdev() error path in gem_init_one(): clear drvdata so
> PM/remove paths do not see a half-registered device, remove the NAPI
> instance added during probe, and let the existing cleanup labels release
> the resources once.
>
> The issue was found by a local static-analysis checker for probe error
> paths. The reported path was manually inspected before sending this fix.
>
> Compile-tested with CONFIG_SUNGEM=y. Runtime testing was not performed
> because no sungem hardware is available.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
> ---
> v2:
> - Add a Fixes tag.
> - Describe how the issue was found.
> - Add testing information.
>
> v1: https://lore.kernel.org/netdev/20260620155326.80582-1-ruoyuw560@gmail.com/
Thanks for the update.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* [PATCH net-next] openvswitch: conntrack: annotate ct limit hlist traversal
From: Runyu Xiao @ 2026-06-24 15:01 UTC (permalink / raw)
To: aconole, echaudro, i.maximets
Cc: davem, edumazet, kuba, pabeni, horms, netdev, dev, linux-kernel,
runyu.xiao, jianhao.xu
ct_limit_set() is documented as being called with ovs_mutex held. It
walks the ct limit hlist with hlist_for_each_entry_rcu(), but the
iterator does not currently pass the OVS lockdep condition used
elsewhere for RCU-protected OVS objects.
Pass lockdep_ovsl_is_held() to the iterator. This matches the function's
existing caller contract and lets CONFIG_PROVE_RCU_LIST distinguish the
ovs_mutex-protected update path from the RCU read-side ct_limit_get()
path.
This was found by our static analysis tool and then manually reviewed
against the current tree. In the reviewed CONFIG_PROVE_RCU_LIST triage
run, the writer-side ct limit update produced the expected "RCU-list
traversed in non-reader section!!" warning while ovs_mutex was held,
with the stack matching ct_limit_set() and ovs_ct_limit_set_zone_limit().
The change is limited to documenting the existing protection contract.
This is a lockdep annotation cleanup. It does not change the conntrack
limit list update or release behavior.
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
net/openvswitch/conntrack.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c6fd9c424e8f..95697d4e16e6 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -883,7 +883,8 @@ static void ct_limit_set(const struct ovs_ct_limit_info *info,
struct hlist_head *head;
head = ct_limit_hash_bucket(info, new_ct_limit->zone);
- hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+ hlist_for_each_entry_rcu(ct_limit, head, hlist_node,
+ lockdep_ovsl_is_held()) {
if (ct_limit->zone == new_ct_limit->zone) {
hlist_replace_rcu(&ct_limit->hlist_node,
&new_ct_limit->hlist_node);
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net] net: pse-pd: scope pse_control regulator handle to kref lifetime
From: Simon Horman @ 2026-06-24 15:12 UTC (permalink / raw)
To: github
Cc: Simon Horman, o.rempel, kory.maincent, andrew+netdev, davem,
edumazet, kuba, pabeni, corey, hkallweit1, linux, netdev,
linux-kernel
In-Reply-To: <20260622192839.2508733-1-github@szelinsky.de>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: pse-pd: scope pse_control regulator handle to kref lifetime
This patch switches the pse_control consumer regulator handle from
devm_regulator_get_exclusive() / devm_regulator_put() to the plain
regulator_get_exclusive() / regulator_put() pair, so the put in
__pse_control_release() works regardless of whether the controller's
devres has already been unwound.
> diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
> index 69dbdbde9d71..a5e6d7b26b9f 100644
> --- a/drivers/net/pse-pd/pse_core.c
> +++ b/drivers/net/pse-pd/pse_core.c
> @@ -1367,7 +1367,7 @@ static void __pse_control_release(struct kref *kref)
>
> if (psec->pcdev->pi[psec->id].admin_state_enabled)
> regulator_disable(psec->ps);
> - devm_regulator_put(psec->ps);
> + regulator_put(psec->ps);
>
> module_put(psec->pcdev->owner);
[High]
Does this fix the use-after-free completely, or only the regulator-put
part of it?
The commit message describes the scenario where the controller is
unbound while a pse_control still has consumers, and explains that
pcdev->dev's devres is released LIFO before pse_controller_unregister()
runs.
In that same scenario, __pse_control_release() still reads
psec->pcdev->pi[psec->id].admin_state_enabled and psec->pcdev->owner
just before the regulator_put():
if (psec->pcdev->pi[psec->id].admin_state_enabled)
regulator_disable(psec->ps);
regulator_put(psec->ps);
module_put(psec->pcdev->owner);
pse_controller_unregister() calls pse_release_pis() which does
kfree(pcdev->pi), and pcdev->pi is plain kzalloc memory rather than
devres-managed. pcdev itself is typically a driver-owned allocation
tied to pcdev->dev's lifetime.
If a later pse_control_put() runs from the unbind path after
pse_controller_unregister() has freed pcdev->pi (and pcdev), would
these pcdev->pi[...] and pcdev->owner reads operate on already-freed
memory?
Similarly, would the regulator_put() then be called against an rdev
that has already been torn down through the controller's devres?
This pre-dates this patch, so it is not a regression introduced here,
but the commit message frames the change as making
__pse_control_release() correct "regardless of whether the
controller's devres has already been unwound", which seems to only
cover the regulator handle and not the pcdev->pi / pcdev->owner reads
on the same code path.
Would a more complete fix also need pse_controller_unregister() to
drain outstanding pse_control references, or have pse_control hold a
refcount on pcdev, so that psec cannot outlive pcdev->pi and pcdev?
^ permalink raw reply
* Re: [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready
From: Runyu Xiao @ 2026-06-24 10:37 UTC (permalink / raw)
To: XIAO WU
Cc: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu
In-Reply-To: <tencent_BD4B709F8D16281265EDBC0DC9EFC8758808@qq.com>
Hi Xiao,
> the error path in smc_listen() does not restore icsk_af_ops when
> kernel_listen() fails
Thanks, this looks like a real error-path bug. I will prepare it as a
separate fix for smc_listen() rather than folding it into this
sk_callback_lock patch.
Runyu
^ permalink raw reply
* Re: [PATCH 0/18] pull request (net-next): ipsec-next 2026-06-12
From: Antony Antony @ 2026-06-24 15:10 UTC (permalink / raw)
To: Jakub Kicinski, Steffen Klassert, Nathan Harold, Yan Yan
Cc: Antony Antony, David Miller, Herbert Xu, netdev, Tobias Brunner,
Sabrina Dubroca
In-Reply-To: <ajDlFUhMfJP36qA8@Antony2201.local>
On Tue, Jun 16, 2026 at 07:54:29AM +0200, Antony Antony wrote:
> On Sat, Jun 13, 2026 at 01:15:52PM -0700, Jakub Kicinski wrote:
> > On Fri, 12 Jun 2026 09:46:16 +0200 Steffen Klassert wrote:
> > > 3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that
> > > allows migrating individual IPsec SAs independently of
> > > their policies. The existing XFRM_MSG_MIGRATE is tightly coupled
> > > to policy+SA migration, lacks SPI for unique SA identification,
> > > and cannot express reqid changes or migrate Transport mode
> > > selectors. The new interface identifies the SA via SPI and mark,
> > > supports reqid changes, address family changes, encap removal,
> > > and uses an atomic create+install flow under x->lock to prevent
> > > SN/IV reuse during AEAD SA migration.
> > > From Antony Antony.
> >
> > Hi! There are some Sashiko comments here, please follow up:
> >
> > https://sashiko.dev/#/patchset/20260612074725.1760473-8-steffen.klassert@secunet.com
> >
>
> Thanks Jakub. I have fixes and testing them now. And I will send fixes soon.
>
> The comments didn't click until I realized xfrm_user_state_lookup() only
> keys on mark.v & mark.m, so distinct (v, m) pairs collapse to the same
> masked value. A lookup key of {0, 0} matches a source SA with mark
> {0, 0xffffff} (both mask to 0), but reusing {0, 0} as the migrated mark
> turns "match only mark 0x00" into "match all traffic".
>
> Fix is copy from old SA than from old_mark passed along. This also pointed
> more issues.
I have fixes queued up for the issues Sashiko found, to send once the
ipsec tree has net-next. What Sashiko pointed are corner cases. IMO
a typical IKE/IPsec daemon would not trigger, but worth fixing.
The fixes address all four High findings and the Medium in patch 16/18.
Finding 6 (patch 05/18, encap removal) was determined to be a false
positive — already reviewed.
One tricky part worth noting: xfrm allows two SAs with the same SPI,
src, dst, and proto, however different mark:
ip xfrm state add src 10.1.1.1 dst 10.1.1.2 spi 0x1000 .. mark 0x1 mask 0xff
ip xfrm state add src 10.1.1.1 dst 10.1.1.2 spi 0x1000 .. mark 0x2 mask 0xff
ip x s
src 10.1.1.1 dst 10.1.1.2
proto esp spi 0x00001000 reqid 100 mode tunnel
replay-window 0
mark 0x2/0xff
aead rfc4106(gcm(aes)) 0x1111111111111111111111111111111111111111 96
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
sel src 0.0.0.0/0 dst 0.0.0.0/0
src 10.1.1.1 dst 10.1.1.2
proto esp spi 0x00001000 reqid 100 mode tunnel
replay-window 0
mark 0x1/0xff
aead rfc4106(gcm(aes)) 0x1111111111111111111111111111111111111111 96
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
sel src 0.0.0.0/0 dst 0.0.0.0/0
Both are accepted: same SPI 0x1000, two distinct SAs with diffrent
mark. Note that both SAs share the same key material and their
independent oseq counters both start at 0 - the encrypted packets
from each produces an identical AES-GCM IV.
Does anyone know whether this is intentional or accidental? Is there a
use case that requires two SAs with identical crypto and replay counter,
however, different marks?
This is also what makes state migration with Mark complex. Since xfrm
permits two SAs to share the same SPI with different marks, migrating
a mark must check whether the target slot is already occupied.
The fix "xfrm: check mark changes for SA tuple collisions in XFRM_MSG_MIGRATE_STATE" does
exactly that, using the effective lookup key m->v & m->m to detect a
collision before proceeding.
Kernel selftests for this series are included in the tree. However,
extensive testing is difficult on my end — *swan cannot easily create
these cases.
Yan/Nathan,
would you be able to run the Android test suite against this branch? to
test migrating SA with mark set.
https://github.com/antonyantony/linux/tree/migrate-state-fixes-v0
-antony
^ permalink raw reply
* Re: [PATCH 1/2] bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
From: Sebastian Andrzej Siewior @ 2026-06-24 15:24 UTC (permalink / raw)
To: Petr Mladek
Cc: K Prateek Nayak, linux-arch, linux-kernel, sched-ext, netdev,
David S . Miller, Andrea Righi, Andrew Morton, Arnd Bergmann,
Ben Segall, Breno Leitao, Changwoo Min, David Vernet,
Dietmar Eggemann, Eric Dumazet, Ingo Molnar, Jakub Kicinski,
John Ogness, Juri Lelli, Paolo Abeni, Peter Zijlstra,
Sergey Senozhatsky, Simon Horman, Steven Rostedt, Tejun Heo,
Vincent Guittot, Vlad Poenaru
In-Reply-To: <ajugq8VAciqtMx9F@pathway.suse.cz>
On 2026-06-24 11:17:31 [+0200], Petr Mladek wrote:
> For Linus, it was a no-go, definitely.
…
> I would vote for adding the WARN_*DEFERRED() into the scheduler code
> at least until majority of console drivers are converted to nbcon API.
I see four nbcon serial console drivers (+netconsole, + drm_log). We
have at least four times that many console drivers. What is the
majority from your point of view? The 8250 should cover all of x86.
> Best Regards,
> Petr
Sebastian
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox