* Re: [PATCH 0/9] bitfield: add FIELD_GET_SIGNED()
From: Andy Shevchenko @ 2026-04-17 18:23 UTC (permalink / raw)
To: Yury Norov
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
On Fri, Apr 17, 2026 at 01:36:11PM -0400, Yury Norov wrote:
> The bitfields are designed in assumption that fields contain unsigned
> integer values, thus extracting the values from the field implies
> zero-extending.
>
> Some drivers need to sign-extend their fields, and currently do it like:
>
> dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
> dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
>
> It's error-prone because it relies on user to provide the correct
> index of the most significant bit.
>
> This series adds a signed version of FIELD_GET(), which is the more
> convenient and compiles (on x86_64) to just a couple instructions:
> shl and sar.
>
> Patch #1 adds FIELD_GET_SIGNED(), and the rest of the series applies it
> tree-wide.
Here the example is missing.
Nevertheless, I looked at the implementation a bit and wondering how would it
work for 64-bit mask of say GENMASK_ULL(63, 60)? Wouldn't it give an overflow?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH] net: hsr: avoid synchronize_net() in hsr_del_port() under rtnl_mutex
From: Eric Dumazet @ 2026-04-17 18:18 UTC (permalink / raw)
To: Shardul Bankar
Cc: davem, kuba, pabeni, horms, netdev, liuhangbin, lukma, acsjakub,
kees, xiaoliang.yang_1, fmancera, linux-kernel, janak,
kalpan.jani, Shardul Bankar, syzbot+f2fbf7478a35a94c8b7c
In-Reply-To: <20260417175322.2701465-1-shardul.b@mpiricsoftware.com>
On Fri, Apr 17, 2026 at 10:53 AM Shardul Bankar <shardulsb08@gmail.com> wrote:
>
> hsr_del_port() calls netdev_rx_handler_unregister(), which calls
> synchronize_net() while rtnl_mutex is held. During netns teardown,
> cleanup_net() walks pernet_list and reaches default_device_exit_batch(),
> which takes rtnl_mutex and iterates devices calling
> rtnl_link_ops->dellink() for each. For HSR this resolves to
> hsr_dellink() -> hsr_del_ports() -> hsr_del_port(), so the
> synchronize_net() call runs under rtnl_mutex. Under contention this
> stalls other rtnl_mutex waiters long enough to trip the hung-task
> detector:
>
> kworker cleanup_net -> default_device_exit_batch -> hsr_del_port
> -> netdev_rx_handler_unregister -> synchronize_rcu_expedited
> [slow, holds rtnl_mutex]
synchronize_rcu_expedited() should be quite fast...
> syz-executor -> ksys_unshare -> setup_net -> ip_tunnel_init_net
> -> rtnl_lock [blocked on rtnl_mutex]
>
> Open-code netdev_rx_handler_unregister() in hsr_del_port() without
> the synchronize_net() step. This is safe because hsr_del_port()
> already defers the port free via kfree_rcu(port, rcu), so the
> rx_handler_data memory remains valid until an RCU grace period
> elapses naturally. hsr_handle_frame() additionally re-validates
> rx_handler via hsr_port_get_rcu() before dereferencing
> rx_handler_data, so a reader that observes rx_handler_data == NULL
> in the brief window between the two clears takes the NULL path and
> returns RX_HANDLER_PASS without touching the port.
>
> Reported-by: syzbot+f2fbf7478a35a94c8b7c@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?id=cb64c22a492202ca929e18262fdb8cb89e635c70
Signature looks like bug fixed recently in wireguard.
commit 60a25ef8dacb3566b1a8c4de00572a498e2a3bf9
Author: Shardul Bankar <shardul.b@mpiricsoftware.com>
Date: Tue Apr 14 17:39:44 2026 +0200
wireguard: device: use exit_rtnl callback instead of manual
rtnl_lock in pre_exit
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
> Testing status:
>
> The hang was originally observed on 7.0-rc6 (commit 7ca6d1cfec80) with
> the syzkaller reproducer and a KASAN+LOCKDEP config. Subsequent
> reproduction attempts with the same reproducer on current mainline did
> not retrigger this specific signature, so I could not verify the fix
> against a live reproducer. The patch is submitted on the basis of
> code analysis.
>
> net/hsr/hsr_slave.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/net/hsr/hsr_slave.c b/net/hsr/hsr_slave.c
> index d9af9e65f72f..5e92f23fa1a5 100644
> --- a/net/hsr/hsr_slave.c
> +++ b/net/hsr/hsr_slave.c
> @@ -239,7 +239,18 @@ void hsr_del_port(struct hsr_port *port)
> if (port != master) {
> netdev_update_features(master->dev);
> dev_set_mtu(master->dev, hsr_get_max_mtu(hsr));
> - netdev_rx_handler_unregister(port->dev);
> + /* Open-code netdev_rx_handler_unregister() without the
> + * synchronize_net() step: holding rtnl_mutex across the
> + * grace-period wait stalls other rtnl_mutex waiters long
> + * enough to trip the hung-task detector under load.
> + * Skipping synchronize_net() is safe because the port is
> + * freed via kfree_rcu() below (so rx_handler_data memory
> + * outlives any in-flight reader), and hsr_handle_frame()
> + * re-validates rx_handler via hsr_port_get_rcu() before
> + * dereferencing rx_handler_data.
> + */
> + RCU_INIT_POINTER(port->dev->rx_handler, NULL);
> + RCU_INIT_POINTER(port->dev->rx_handler_data, NULL);
> if (!port->hsr->fwd_offloaded)
> dev_set_promiscuity(port->dev, -1);
> netdev_upper_dev_unlink(port->dev, master->dev);
You are saying that all netdev_rx_handler_unregister() uses are
potentially a problem.
This can not be true.
^ permalink raw reply
* Re: [PATCH 1/9] bitfield: add FIELD_GET_SIGNED()
From: Andy Shevchenko @ 2026-04-17 18:12 UTC (permalink / raw)
To: Yury Norov
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
In-Reply-To: <20260417173621.368914-2-ynorov@nvidia.com>
On Fri, Apr 17, 2026 at 01:36:12PM -0400, Yury Norov wrote:
> The bitfields are designed in assumption that fields contain unsigned
> integer values, thus extracting the values from the field implies
> zero-extending.
>
> Some drivers need to sign-extend their fields, and currently do it like:
>
> dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
> dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
>
> It's error-prone because it relies on user to provide the correct
> index of the most significant bit and proper 32 vs 64 function flavor.
>
> Thus, introduce a FIELD_GET_SIGNED() macro, which is the more
> convenient and compiles (on x86_64) to just a couple instructions:
> shl and sar.
...
> +#define FIELD_GET_SIGNED(mask, reg) \
> + ({ \
> + __BF_FIELD_CHECK(mask, reg, 0U, "FIELD_GET_SIGNED: "); \
> + ((__signed_scalar_typeof(mask))((long long)(reg) << \
> + __builtin_clzll(mask) >> (__builtin_clzll(mask) + \
> + __builtin_ctzll(mask))));\
I would re-indent these lines as
((__signed_scalar_typeof(mask))
((long long)(reg) << __builtin_clzll(mask) >> \
(__builtin_clzll(mask) + __builtin_ctzll(mask)))); \
> + })
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH net] net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
From: Daniel Golle @ 2026-04-17 18:03 UTC (permalink / raw)
To: Breno Leitao
Cc: Chester A. Unal, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
AngeloGioacchino Del Regno, Russell King, Christian Marangi,
netdev, linux-kernel, linux-arm-kernel, linux-mediatek,
Frank Wunderlich, John Crispin
In-Reply-To: <aeJjcoHyezEgUG_Q@gmail.com>
On Fri, Apr 17, 2026 at 10:46:29AM -0700, Breno Leitao wrote:
> On Fri, Apr 17, 2026 at 04:55:57AM +0100, Daniel Golle wrote:
> > @@ -3404,6 +3449,9 @@ EXPORT_SYMBOL_GPL(mt7530_probe_common);
> > void
> > mt7530_remove_common(struct mt7530_priv *priv)
> > {
> > + if (priv->bus)
> > + cancel_delayed_work_sync(&priv->stats_work);
> > +
>
> Shouldn't you cancel the work later, after dsa_unregister_switch()?
>
> I am wondering if the following race cannot happen:
>
> mt7530_remove_common() someone reading /proc/net/dev
> cancel_delayed_work_sync()
> /* returns: work neither pending
> nor executing - true at this
> instant */
> mt7530_get_stats64()
> mod_delayed_work(...)
> /* work is queued again */
> dsa_unregister_switch()
> return
Thanks you for pointing this out.
cancel_delayed_work_sync() should be moved after dsa_unregister_switch()
to avoid this kind of race.
^ permalink raw reply
* [PATCH] net: hsr: avoid synchronize_net() in hsr_del_port() under rtnl_mutex
From: Shardul Bankar @ 2026-04-17 17:53 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, netdev
Cc: liuhangbin, lukma, acsjakub, kees, xiaoliang.yang_1, fmancera,
linux-kernel, janak, kalpan.jani, shardulsb08, Shardul Bankar,
syzbot+f2fbf7478a35a94c8b7c
hsr_del_port() calls netdev_rx_handler_unregister(), which calls
synchronize_net() while rtnl_mutex is held. During netns teardown,
cleanup_net() walks pernet_list and reaches default_device_exit_batch(),
which takes rtnl_mutex and iterates devices calling
rtnl_link_ops->dellink() for each. For HSR this resolves to
hsr_dellink() -> hsr_del_ports() -> hsr_del_port(), so the
synchronize_net() call runs under rtnl_mutex. Under contention this
stalls other rtnl_mutex waiters long enough to trip the hung-task
detector:
kworker cleanup_net -> default_device_exit_batch -> hsr_del_port
-> netdev_rx_handler_unregister -> synchronize_rcu_expedited
[slow, holds rtnl_mutex]
syz-executor -> ksys_unshare -> setup_net -> ip_tunnel_init_net
-> rtnl_lock [blocked on rtnl_mutex]
Open-code netdev_rx_handler_unregister() in hsr_del_port() without
the synchronize_net() step. This is safe because hsr_del_port()
already defers the port free via kfree_rcu(port, rcu), so the
rx_handler_data memory remains valid until an RCU grace period
elapses naturally. hsr_handle_frame() additionally re-validates
rx_handler via hsr_port_get_rcu() before dereferencing
rx_handler_data, so a reader that observes rx_handler_data == NULL
in the brief window between the two clears takes the NULL path and
returns RX_HANDLER_PASS without touching the port.
Reported-by: syzbot+f2fbf7478a35a94c8b7c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=cb64c22a492202ca929e18262fdb8cb89e635c70
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
Testing status:
The hang was originally observed on 7.0-rc6 (commit 7ca6d1cfec80) with
the syzkaller reproducer and a KASAN+LOCKDEP config. Subsequent
reproduction attempts with the same reproducer on current mainline did
not retrigger this specific signature, so I could not verify the fix
against a live reproducer. The patch is submitted on the basis of
code analysis.
net/hsr/hsr_slave.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/net/hsr/hsr_slave.c b/net/hsr/hsr_slave.c
index d9af9e65f72f..5e92f23fa1a5 100644
--- a/net/hsr/hsr_slave.c
+++ b/net/hsr/hsr_slave.c
@@ -239,7 +239,18 @@ void hsr_del_port(struct hsr_port *port)
if (port != master) {
netdev_update_features(master->dev);
dev_set_mtu(master->dev, hsr_get_max_mtu(hsr));
- netdev_rx_handler_unregister(port->dev);
+ /* Open-code netdev_rx_handler_unregister() without the
+ * synchronize_net() step: holding rtnl_mutex across the
+ * grace-period wait stalls other rtnl_mutex waiters long
+ * enough to trip the hung-task detector under load.
+ * Skipping synchronize_net() is safe because the port is
+ * freed via kfree_rcu() below (so rx_handler_data memory
+ * outlives any in-flight reader), and hsr_handle_frame()
+ * re-validates rx_handler via hsr_port_get_rcu() before
+ * dereferencing rx_handler_data.
+ */
+ RCU_INIT_POINTER(port->dev->rx_handler, NULL);
+ RCU_INIT_POINTER(port->dev->rx_handler_data, NULL);
if (!port->hsr->fwd_offloaded)
dev_set_promiscuity(port->dev, -1);
netdev_upper_dev_unlink(port->dev, master->dev);
--
2.34.1
^ permalink raw reply related
* [PATCH v2 3/3] mISDN: cache stable device names outside the kobject
From: Shuvam Pandey @ 2026-04-17 17:49 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Simon Horman
In-Reply-To: <cover.1776446840.git.shuvampandey1@gmail.com>
mISDN prints and exports device names from several paths that are not tied
to the kobject rename internals. device_rename() replaces the kobject name
string, so reading it directly leaves mISDN dependent on that storage
remaining stable while those paths run.
Keep an mISDN-owned copy of the device name, update it on registration and
successful rename, and use that cached name from the socket, sysfs, stack,
and debug paths.
Assisted-by: Codex:GPT-5.3-Codex
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
---
drivers/isdn/mISDN/core.c | 18 +++++++++++++-----
drivers/isdn/mISDN/layer1.c | 2 +-
drivers/isdn/mISDN/socket.c | 12 ++++++++----
drivers/isdn/mISDN/stack.c | 36 ++++++++++++++++++------------------
drivers/isdn/mISDN/tei.c | 2 +-
include/linux/mISDNif.h | 1 +
6 files changed, 42 insertions(+), 29 deletions(-)
diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c
index 4e2be8f03119b..d89c9e54cb5fa 100644
--- a/drivers/isdn/mISDN/core.c
+++ b/drivers/isdn/mISDN/core.c
@@ -89,8 +89,15 @@ static DEVICE_ATTR_RO(protocol);
static ssize_t name_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
- strcpy(buf, dev_name(dev));
- return strlen(buf);
+ struct mISDNdevice *mdev = dev_to_mISDN(dev);
+ ssize_t len;
+
+ if (!mdev)
+ return -ENODEV;
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s", mdev->name);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RO(name);
@@ -227,9 +234,10 @@ mISDN_register_device(struct mISDNdevice *dev,
dev_set_name(&dev->dev, "%s", name);
else
dev_set_name(&dev->dev, "mISDN%d", dev->id);
+ strscpy(dev->name, dev_name(&dev->dev), sizeof(dev->name));
if (debug & DEBUG_CORE)
printk(KERN_DEBUG "mISDN_register %s %d\n",
- dev_name(&dev->dev), dev->id);
+ dev->name, dev->id);
dev->dev.class = &mISDN_class;
err = create_stack(dev);
@@ -258,7 +266,7 @@ void
mISDN_unregister_device(struct mISDNdevice *dev) {
if (debug & DEBUG_CORE)
printk(KERN_DEBUG "mISDN_unregister %s %d\n",
- dev_name(&dev->dev), dev->id);
+ dev->name, dev->id);
/* sysfs_remove_link(&dev->dev.kobj, "device"); */
/*
* Remove the device from sysfs before taking dev->mutex so bind-side
@@ -358,7 +366,7 @@ const char *mISDNDevName4ch(struct mISDNchannel *ch)
return msg_no_stack;
if (!ch->st->dev)
return msg_no_stackdev;
- return dev_name(&ch->st->dev->dev);
+ return ch->st->dev->name;
};
EXPORT_SYMBOL(mISDNDevName4ch);
diff --git a/drivers/isdn/mISDN/layer1.c b/drivers/isdn/mISDN/layer1.c
index 3fbc170acf9ab..c5a2e9119e868 100644
--- a/drivers/isdn/mISDN/layer1.c
+++ b/drivers/isdn/mISDN/layer1.c
@@ -100,7 +100,7 @@ l1m_debug(struct FsmInst *fi, char *fmt, ...)
vaf.fmt = fmt;
vaf.va = &va;
- printk(KERN_DEBUG "%s: %pV\n", dev_name(&l1->dch->dev.dev), &vaf);
+ printk(KERN_DEBUG "%s: %pV\n", l1->dch->dev.name, &vaf);
va_end(va);
}
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 42cda5b8bbe16..bce71ae5eb7d4 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -499,7 +499,7 @@ data_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
memcpy(di.channelmap, dev->channelmap,
sizeof(di.channelmap));
di.nrbchan = dev->nrbchan;
- strscpy(di.name, dev_name(&dev->dev), sizeof(di.name));
+ strscpy(di.name, dev->name, sizeof(di.name));
device_unlock(&dev->dev);
if (copy_to_user((void __user *)arg, &di, sizeof(di)))
err = -EFAULT;
@@ -826,7 +826,7 @@ base_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
memcpy(di.channelmap, dev->channelmap,
sizeof(di.channelmap));
di.nrbchan = dev->nrbchan;
- strscpy(di.name, dev_name(&dev->dev), sizeof(di.name));
+ strscpy(di.name, dev->name, sizeof(di.name));
device_unlock(&dev->dev);
if (copy_to_user((void __user *)arg, &di, sizeof(di)))
err = -EFAULT;
@@ -846,10 +846,14 @@ base_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
dev = get_mdevice(dn.id);
if (dev) {
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev))
+ if (!device_is_registered(&dev->dev)) {
err = -ENODEV;
- else
+ } else {
err = device_rename(&dev->dev, dn.name);
+ if (!err)
+ strscpy(dev->name, dev_name(&dev->dev),
+ sizeof(dev->name));
+ }
device_unlock(&dev->dev);
put_device(&dev->dev);
} else {
diff --git a/drivers/isdn/mISDN/stack.c b/drivers/isdn/mISDN/stack.c
index 4e96684af0aac..9b39ddb4a0944 100644
--- a/drivers/isdn/mISDN/stack.c
+++ b/drivers/isdn/mISDN/stack.c
@@ -165,7 +165,7 @@ send_msg_to_layer(struct mISDNstack *st, struct sk_buff *skb)
else
printk(KERN_WARNING
"%s: dev(%s) prim(%x) id(%x) no channel\n",
- __func__, dev_name(&st->dev->dev), hh->prim,
+ __func__, st->dev->name, hh->prim,
hh->id);
} else if (lm == 0x8) {
WARN_ON(lm == 0x8);
@@ -175,12 +175,12 @@ send_msg_to_layer(struct mISDNstack *st, struct sk_buff *skb)
else
printk(KERN_WARNING
"%s: dev(%s) prim(%x) id(%x) no channel\n",
- __func__, dev_name(&st->dev->dev), hh->prim,
+ __func__, st->dev->name, hh->prim,
hh->id);
} else {
/* broadcast not handled yet */
printk(KERN_WARNING "%s: dev(%s) prim %x not delivered\n",
- __func__, dev_name(&st->dev->dev), hh->prim);
+ __func__, st->dev->name, hh->prim);
}
return -ESRCH;
}
@@ -202,7 +202,7 @@ mISDNStackd(void *data)
sigfillset(¤t->blocked);
if (*debug & DEBUG_MSG_THREAD)
printk(KERN_DEBUG "mISDNStackd %s started\n",
- dev_name(&st->dev->dev));
+ st->dev->name);
if (st->notify != NULL) {
complete(st->notify);
@@ -238,7 +238,7 @@ mISDNStackd(void *data)
printk(KERN_DEBUG
"%s: %s prim(%x) id(%x) "
"send call(%d)\n",
- __func__, dev_name(&st->dev->dev),
+ __func__, st->dev->name,
mISDN_HEAD_PRIM(skb),
mISDN_HEAD_ID(skb), err);
dev_kfree_skb(skb);
@@ -281,7 +281,7 @@ mISDNStackd(void *data)
mISDN_STACK_ACTION_MASK));
if (*debug & DEBUG_MSG_THREAD)
printk(KERN_DEBUG "%s: %s wake status %08lx\n",
- __func__, dev_name(&st->dev->dev), st->status);
+ __func__, st->dev->name, st->status);
test_and_set_bit(mISDN_STACK_ACTIVE, &st->status);
test_and_clear_bit(mISDN_STACK_WAKEUP, &st->status);
@@ -296,17 +296,17 @@ mISDNStackd(void *data)
#ifdef MISDN_MSG_STATS
printk(KERN_DEBUG "mISDNStackd daemon for %s proceed %d "
"msg %d sleep %d stopped\n",
- dev_name(&st->dev->dev), st->msg_cnt, st->sleep_cnt,
+ st->dev->name, st->msg_cnt, st->sleep_cnt,
st->stopped_cnt);
task_cputime(st->thread, &utime, &stime);
printk(KERN_DEBUG
"mISDNStackd daemon for %s utime(%llu) stime(%llu)\n",
- dev_name(&st->dev->dev), utime, stime);
+ st->dev->name, utime, stime);
printk(KERN_DEBUG
"mISDNStackd daemon for %s nvcsw(%ld) nivcsw(%ld)\n",
- dev_name(&st->dev->dev), st->thread->nvcsw, st->thread->nivcsw);
+ st->dev->name, st->thread->nvcsw, st->thread->nivcsw);
printk(KERN_DEBUG "mISDNStackd daemon for %s killed now\n",
- dev_name(&st->dev->dev));
+ st->dev->name);
#endif
test_and_set_bit(mISDN_STACK_KILLED, &st->status);
test_and_clear_bit(mISDN_STACK_RUNNING, &st->status);
@@ -397,15 +397,15 @@ create_stack(struct mISDNdevice *dev)
newst->own.recv = mISDN_queue_message;
if (*debug & DEBUG_CORE_FUNC)
printk(KERN_DEBUG "%s: st(%s)\n", __func__,
- dev_name(&newst->dev->dev));
+ newst->dev->name);
newst->notify = &done;
newst->thread = kthread_run(mISDNStackd, (void *)newst, "mISDN_%s",
- dev_name(&newst->dev->dev));
+ newst->dev->name);
if (IS_ERR(newst->thread)) {
err = PTR_ERR(newst->thread);
printk(KERN_ERR
"mISDN:cannot create kernel thread for %s (%d)\n",
- dev_name(&newst->dev->dev), err);
+ newst->dev->name, err);
delete_teimanager(dev->teimgr);
kfree(newst);
} else
@@ -424,7 +424,7 @@ connect_layer1(struct mISDNdevice *dev, struct mISDNchannel *ch,
if (*debug & DEBUG_CORE_FUNC)
printk(KERN_DEBUG "%s: %s proto(%x) adr(%d %d %d %d)\n",
- __func__, dev_name(&dev->dev), protocol, adr->dev,
+ __func__, dev->name, protocol, adr->dev,
adr->channel, adr->sapi, adr->tei);
switch (protocol) {
case ISDN_P_NT_S0:
@@ -461,7 +461,7 @@ connect_Bstack(struct mISDNdevice *dev, struct mISDNchannel *ch,
if (*debug & DEBUG_CORE_FUNC)
printk(KERN_DEBUG "%s: %s proto(%x) adr(%d %d %d %d)\n",
- __func__, dev_name(&dev->dev), protocol,
+ __func__, dev->name, protocol,
adr->dev, adr->channel, adr->sapi,
adr->tei);
ch->st = dev->D.st;
@@ -517,7 +517,7 @@ create_l2entity(struct mISDNdevice *dev, struct mISDNchannel *ch,
if (*debug & DEBUG_CORE_FUNC)
printk(KERN_DEBUG "%s: %s proto(%x) adr(%d %d %d %d)\n",
- __func__, dev_name(&dev->dev), protocol,
+ __func__, dev->name, protocol,
adr->dev, adr->channel, adr->sapi,
adr->tei);
rq.protocol = ISDN_P_TE_S0;
@@ -570,7 +570,7 @@ delete_channel(struct mISDNchannel *ch)
}
if (*debug & DEBUG_CORE_FUNC)
printk(KERN_DEBUG "%s: st(%s) protocol(%x)\n", __func__,
- dev_name(&ch->st->dev->dev), ch->protocol);
+ ch->st->dev->name, ch->protocol);
if (ch->protocol >= ISDN_P_B_START) {
if (ch->peer) {
ch->peer->ctrl(ch->peer, CLOSE_CHANNEL, NULL);
@@ -623,7 +623,7 @@ delete_stack(struct mISDNdevice *dev)
if (*debug & DEBUG_CORE_FUNC)
printk(KERN_DEBUG "%s: st(%s)\n", __func__,
- dev_name(&st->dev->dev));
+ st->dev->name);
if (dev->teimgr)
delete_teimanager(dev->teimgr);
if (st->thread) {
diff --git a/drivers/isdn/mISDN/tei.c b/drivers/isdn/mISDN/tei.c
index 2bad3083be901..876c1194920ef 100644
--- a/drivers/isdn/mISDN/tei.c
+++ b/drivers/isdn/mISDN/tei.c
@@ -990,7 +990,7 @@ create_teimgr(struct manager *mgr, struct channel_req *crq)
if (*debug & DEBUG_L2_TEI)
printk(KERN_DEBUG "%s: %s proto(%x) adr(%d %d %d %d)\n",
- __func__, dev_name(&mgr->ch.st->dev->dev),
+ __func__, mgr->ch.st->dev->name,
crq->protocol, crq->adr.dev, crq->adr.channel,
crq->adr.sapi, crq->adr.tei);
if (crq->adr.tei > GROUP_TEI)
diff --git a/include/linux/mISDNif.h b/include/linux/mISDNif.h
index ce26d70c1ebfb..79f6d8f218b13 100644
--- a/include/linux/mISDNif.h
+++ b/include/linux/mISDNif.h
@@ -497,6 +497,7 @@ struct mISDNdevice {
u_int Bprotocols;
u_int nrbchan;
u_char channelmap[MISDN_CHMAP_SIZE];
+ char name[MISDN_MAX_IDLEN];
struct list_head bchannels;
struct mISDNchannel *teimgr;
struct completion released;
--
2.50.1 (Apple Git-155)
^ permalink raw reply related
* [PATCH v2 2/3] mISDN: socket: drop temporary references from get_mdevice()
From: Shuvam Pandey @ 2026-04-17 17:49 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Simon Horman
In-Reply-To: <cover.1776446840.git.shuvampandey1@gmail.com>
IMGETDEVINFO and IMSETDEVNAME only use get_mdevice() for a temporary
lookup, but neither path drops the reference returned by
class_find_device().
Drop the temporary reference once the ioctl finishes. Serialize the name
read and rename paths with device_lock() while doing so, and reject
lookups that raced with unregister after device_del().
Fixes: b36b654a7e82 ("mISDN: Create /sys/class/mISDN")
Cc: stable@vger.kernel.org
Assisted-by: Codex:GPT-5.3-Codex
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
---
drivers/isdn/mISDN/socket.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index bf3ad0a2a42bc..42cda5b8bbe16 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -484,6 +484,13 @@ data_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
if (dev) {
struct mISDN_devinfo di;
+ device_lock(&dev->dev);
+ if (!device_is_registered(&dev->dev)) {
+ device_unlock(&dev->dev);
+ put_device(&dev->dev);
+ err = -ENODEV;
+ break;
+ }
memset(&di, 0, sizeof(di));
di.id = dev->id;
di.Dprotocols = dev->Dprotocols;
@@ -493,8 +500,10 @@ data_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
sizeof(di.channelmap));
di.nrbchan = dev->nrbchan;
strscpy(di.name, dev_name(&dev->dev), sizeof(di.name));
+ device_unlock(&dev->dev);
if (copy_to_user((void __user *)arg, &di, sizeof(di)))
err = -EFAULT;
+ put_device(&dev->dev);
} else
err = -ENODEV;
break;
@@ -802,6 +811,13 @@ base_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
if (dev) {
struct mISDN_devinfo di;
+ device_lock(&dev->dev);
+ if (!device_is_registered(&dev->dev)) {
+ device_unlock(&dev->dev);
+ put_device(&dev->dev);
+ err = -ENODEV;
+ break;
+ }
memset(&di, 0, sizeof(di));
di.id = dev->id;
di.Dprotocols = dev->Dprotocols;
@@ -811,8 +827,10 @@ base_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
sizeof(di.channelmap));
di.nrbchan = dev->nrbchan;
strscpy(di.name, dev_name(&dev->dev), sizeof(di.name));
+ device_unlock(&dev->dev);
if (copy_to_user((void __user *)arg, &di, sizeof(di)))
err = -EFAULT;
+ put_device(&dev->dev);
} else
err = -ENODEV;
break;
@@ -826,10 +844,17 @@ base_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
}
dn.name[sizeof(dn.name) - 1] = '\0';
dev = get_mdevice(dn.id);
- if (dev)
- err = device_rename(&dev->dev, dn.name);
- else
+ if (dev) {
+ device_lock(&dev->dev);
+ if (!device_is_registered(&dev->dev))
+ err = -ENODEV;
+ else
+ err = device_rename(&dev->dev, dn.name);
+ device_unlock(&dev->dev);
+ put_device(&dev->dev);
+ } else {
err = -ENODEV;
+ }
}
break;
default:
--
2.50.1 (Apple Git-155)
^ permalink raw reply related
* [PATCH v2 1/3] mISDN: serialize socket teardown against device unregister
From: Shuvam Pandey @ 2026-04-17 17:49 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Simon Horman
In-Reply-To: <cover.1776446840.git.shuvampandey1@gmail.com>
get_mdevice() returns a referenced struct device, and mISDN sockets keep
that reference in _pms(sk)->dev after bind.
That means mISDN_unregister_device() cannot tear the device down as if no
socket can still reach it. Several teardown paths can otherwise run after
delete_stack() has freed the stack, or after the driver has freed the
embedding object once mISDN_unregister_device() returns.
Close sockets that still point at the device before delete_stack() runs,
wait for the final device release before returning from
mISDN_unregister_device(), and serialize bind against unregister with the
device lock so a new socket cannot attach after unregister has started.
While tightening the close path, reset channel state after CLOSE_CHANNEL so
later socket release does not try to tear the same B-channel down twice,
and make recvmsg/getname tolerate sockets whose device pointer was cleared
by unregister.
Fixes: b36b654a7e82 ("mISDN: Create /sys/class/mISDN")
Cc: stable@vger.kernel.org
Assisted-by: Codex:GPT-5.3-Codex
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
---
drivers/isdn/mISDN/core.c | 19 ++-
drivers/isdn/mISDN/core.h | 1 +
drivers/isdn/mISDN/socket.c | 224 +++++++++++++++++++++++++++++++-----
include/linux/mISDNif.h | 1 +
4 files changed, 216 insertions(+), 29 deletions(-)
diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c
index 8ec2d4d4f1352..4e2be8f03119b 100644
--- a/drivers/isdn/mISDN/core.c
+++ b/drivers/isdn/mISDN/core.c
@@ -26,7 +26,9 @@ static DEFINE_RWLOCK(bp_lock);
static void mISDN_dev_release(struct device *dev)
{
- /* nothing to do: the device is part of its parent's data structure */
+ struct mISDNdevice *mdev = container_of(dev, struct mISDNdevice, dev);
+
+ complete(&mdev->released);
}
static ssize_t id_show(struct device *dev,
@@ -219,6 +221,7 @@ mISDN_register_device(struct mISDNdevice *dev,
return err;
dev->id = err;
+ init_completion(&dev->released);
device_initialize(&dev->dev);
if (name && name[0])
dev_set_name(&dev->dev, "%s", name);
@@ -257,12 +260,24 @@ mISDN_unregister_device(struct mISDNdevice *dev) {
printk(KERN_DEBUG "mISDN_unregister %s %d\n",
dev_name(&dev->dev), dev->id);
/* sysfs_remove_link(&dev->dev.kobj, "device"); */
+ /*
+ * Remove the device from sysfs before taking dev->mutex so bind-side
+ * get_mdevice() users will fail the later device_is_registered()
+ * recheck after they acquire device_lock().
+ */
device_del(&dev->dev);
dev_set_drvdata(&dev->dev, NULL);
-
+ device_lock(&dev->dev);
+ misdn_sock_release_device(dev);
test_and_clear_bit(dev->id, (u_long *)&device_ids);
delete_stack(dev);
+ device_unlock(&dev->dev);
put_device(&dev->dev);
+ /*
+ * Drivers free the enclosing object after unregister returns, so wait
+ * until the last outstanding device reference is dropped.
+ */
+ wait_for_completion(&dev->released);
}
EXPORT_SYMBOL(mISDN_unregister_device);
diff --git a/drivers/isdn/mISDN/core.h b/drivers/isdn/mISDN/core.h
index 5617c06de8e4d..2cd89293bc211 100644
--- a/drivers/isdn/mISDN/core.h
+++ b/drivers/isdn/mISDN/core.h
@@ -41,6 +41,7 @@ extern int connect_layer1(struct mISDNdevice *, struct mISDNchannel *,
u_int, struct sockaddr_mISDN *);
extern int create_l2entity(struct mISDNdevice *, struct mISDNchannel *,
u_int, struct sockaddr_mISDN *);
+void misdn_sock_release_device(struct mISDNdevice *dev);
extern int create_stack(struct mISDNdevice *);
extern int create_teimanager(struct mISDNdevice *);
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 77b900db1cac2..bf3ad0a2a42bc 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -7,6 +7,7 @@
*/
#include <linux/mISDNif.h>
+#include <linux/device.h>
#include <linux/slab.h>
#include <linux/export.h>
#include "core.h"
@@ -57,6 +58,97 @@ static void mISDN_sock_unlink(struct mISDN_sock_list *l, struct sock *sk)
write_unlock_bh(&l->lock);
}
+/*
+ * Socket teardown driven by unregister takes device_lock(dev) before
+ * lock_sock(sk). Bind paths take the same order so unregister can close a
+ * socket without racing a new bind onto the same device.
+ */
+static struct sock *
+misdn_sock_get(struct mISDN_sock_list *l, struct mISDNdevice *dev)
+{
+ struct sock *sk;
+
+ read_lock_bh(&l->lock);
+ sk_for_each(sk, &l->head) {
+ if (READ_ONCE(_pms(sk)->dev) != dev)
+ continue;
+ sock_hold(sk);
+ read_unlock_bh(&l->lock);
+ return sk;
+ }
+ read_unlock_bh(&l->lock);
+ return NULL;
+}
+
+static void data_sock_reset_channel(struct sock *sk)
+{
+ _pms(sk)->ch.protocol = ISDN_P_NONE;
+ _pms(sk)->ch.nr = 0;
+ _pms(sk)->ch.addr = 0;
+ _pms(sk)->ch.st = NULL;
+ _pms(sk)->ch.peer = NULL;
+ _pms(sk)->ch.recv = NULL;
+}
+
+static void data_sock_close(struct sock *sk)
+{
+ bool active = _pms(sk)->ch.protocol != ISDN_P_NONE;
+
+ sk->sk_state = MISDN_CLOSED;
+
+ if (active)
+ delete_channel(&_pms(sk)->ch);
+
+ data_sock_reset_channel(sk);
+
+ if (_pms(sk)->dev) {
+ put_device(&_pms(sk)->dev->dev);
+ _pms(sk)->dev = NULL;
+ }
+}
+
+static void base_sock_close(struct sock *sk)
+{
+ sk->sk_state = MISDN_CLOSED;
+
+ if (_pms(sk)->dev) {
+ put_device(&_pms(sk)->dev->dev);
+ _pms(sk)->dev = NULL;
+ }
+}
+
+void
+misdn_sock_release_device(struct mISDNdevice *dev)
+{
+ struct sock *sk;
+
+ if (dev->D.st) {
+ while ((sk = misdn_sock_get(&dev->D.st->l1sock, dev))) {
+ lock_sock(sk);
+ if (_pms(sk)->dev == dev)
+ data_sock_close(sk);
+ release_sock(sk);
+ sock_put(sk);
+ }
+ }
+
+ while ((sk = misdn_sock_get(&data_sockets, dev))) {
+ lock_sock(sk);
+ if (_pms(sk)->dev == dev)
+ data_sock_close(sk);
+ release_sock(sk);
+ sock_put(sk);
+ }
+
+ while ((sk = misdn_sock_get(&base_sockets, dev))) {
+ lock_sock(sk);
+ if (_pms(sk)->dev == dev)
+ base_sock_close(sk);
+ release_sock(sk);
+ sock_put(sk);
+ }
+}
+
static int
mISDN_send(struct mISDNchannel *ch, struct sk_buff *skb)
{
@@ -86,6 +178,14 @@ mISDN_ctrl(struct mISDNchannel *ch, u_int cmd, void *arg)
switch (cmd) {
case CLOSE_CHANNEL:
msk->sk.sk_state = MISDN_CLOSED;
+ if (msk->ch.protocol >= ISDN_P_B_START) {
+ msk->ch.protocol = ISDN_P_NONE;
+ msk->ch.nr = 0;
+ msk->ch.addr = 0;
+ msk->ch.st = NULL;
+ msk->ch.peer = NULL;
+ msk->ch.recv = NULL;
+ }
break;
}
return 0;
@@ -127,18 +227,30 @@ mISDN_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
if (msg->msg_name) {
DECLARE_SOCKADDR(struct sockaddr_mISDN *, maddr, msg->msg_name);
+ int dev_id, ch_nr, ch_addr;
+
+ lock_sock(sk);
+ if (!_pms(sk)->dev) {
+ release_sock(sk);
+ skb_free_datagram(sk, skb);
+ return -EBADFD;
+ }
+ dev_id = _pms(sk)->dev->id;
+ ch_nr = _pms(sk)->ch.nr;
+ ch_addr = _pms(sk)->ch.addr;
+ release_sock(sk);
maddr->family = AF_ISDN;
- maddr->dev = _pms(sk)->dev->id;
+ maddr->dev = dev_id;
if ((sk->sk_protocol == ISDN_P_LAPD_TE) ||
(sk->sk_protocol == ISDN_P_LAPD_NT)) {
maddr->channel = (mISDN_HEAD_ID(skb) >> 16) & 0xff;
maddr->tei = (mISDN_HEAD_ID(skb) >> 8) & 0xff;
maddr->sapi = mISDN_HEAD_ID(skb) & 0xff;
} else {
- maddr->channel = _pms(sk)->ch.nr;
- maddr->sapi = _pms(sk)->ch.addr & 0xFF;
- maddr->tei = (_pms(sk)->ch.addr >> 8) & 0xFF;
+ maddr->channel = ch_nr;
+ maddr->sapi = ch_addr & 0xFF;
+ maddr->tei = (ch_addr >> 8) & 0xFF;
}
msg->msg_namelen = sizeof(*maddr);
}
@@ -241,16 +353,14 @@ data_sock_release(struct socket *sock)
printk(KERN_DEBUG "%s(%p) sk=%p\n", __func__, sock, sk);
if (!sk)
return 0;
+
+ lock_sock(sk);
+
switch (sk->sk_protocol) {
case ISDN_P_TE_S0:
case ISDN_P_NT_S0:
case ISDN_P_TE_E1:
case ISDN_P_NT_E1:
- if (sk->sk_state == MISDN_BOUND)
- delete_channel(&_pms(sk)->ch);
- else
- mISDN_sock_unlink(&data_sockets, sk);
- break;
case ISDN_P_LAPD_TE:
case ISDN_P_LAPD_NT:
case ISDN_P_B_RAW:
@@ -259,13 +369,11 @@ data_sock_release(struct socket *sock)
case ISDN_P_B_L2DTMF:
case ISDN_P_B_L2DSP:
case ISDN_P_B_L2DSPHDLC:
- delete_channel(&_pms(sk)->ch);
+ data_sock_close(sk);
mISDN_sock_unlink(&data_sockets, sk);
break;
}
- lock_sock(sk);
-
sock_orphan(sk);
skb_queue_purge(&sk->sk_receive_queue);
@@ -466,6 +574,7 @@ data_sock_bind(struct socket *sock, struct sockaddr_unsized *addr, int addr_len)
{
struct sockaddr_mISDN *maddr = (struct sockaddr_mISDN *) addr;
struct sock *sk = sock->sk;
+ struct mISDNdevice *dev, *lockdev;
struct sock *csk;
int err = 0;
@@ -477,13 +586,35 @@ data_sock_bind(struct socket *sock, struct sockaddr_unsized *addr, int addr_len)
return -EINVAL;
lock_sock(sk);
+ if (sk->sk_state == MISDN_CLOSED) {
+ release_sock(sk);
+ return -EBADFD;
+ }
+ if (_pms(sk)->dev) {
+ release_sock(sk);
+ return -EALREADY;
+ }
+ release_sock(sk);
+
+ dev = get_mdevice(maddr->dev);
+ if (!dev)
+ return -ENODEV;
+
+ lockdev = dev;
+ device_lock(&dev->dev);
+ lock_sock(sk);
+
+ if (sk->sk_state == MISDN_CLOSED) {
+ err = -EBADFD;
+ goto done;
+ }
if (_pms(sk)->dev) {
err = -EALREADY;
goto done;
}
- _pms(sk)->dev = get_mdevice(maddr->dev);
- if (!_pms(sk)->dev) {
+
+ if (!device_is_registered(&dev->dev)) {
err = -ENODEV;
goto done;
}
@@ -493,7 +624,7 @@ data_sock_bind(struct socket *sock, struct sockaddr_unsized *addr, int addr_len)
sk_for_each(csk, &data_sockets.head) {
if (sk == csk)
continue;
- if (_pms(csk)->dev != _pms(sk)->dev)
+ if (READ_ONCE(_pms(csk)->dev) != dev)
continue;
if (csk->sk_protocol >= ISDN_P_B_START)
continue;
@@ -516,15 +647,15 @@ data_sock_bind(struct socket *sock, struct sockaddr_unsized *addr, int addr_len)
case ISDN_P_TE_E1:
case ISDN_P_NT_E1:
mISDN_sock_unlink(&data_sockets, sk);
- err = connect_layer1(_pms(sk)->dev, &_pms(sk)->ch,
- sk->sk_protocol, maddr);
+ err = connect_layer1(dev, &_pms(sk)->ch, sk->sk_protocol,
+ maddr);
if (err)
mISDN_sock_link(&data_sockets, sk);
break;
case ISDN_P_LAPD_TE:
case ISDN_P_LAPD_NT:
- err = create_l2entity(_pms(sk)->dev, &_pms(sk)->ch,
- sk->sk_protocol, maddr);
+ err = create_l2entity(dev, &_pms(sk)->ch, sk->sk_protocol,
+ maddr);
break;
case ISDN_P_B_RAW:
case ISDN_P_B_HDLC:
@@ -532,19 +663,26 @@ data_sock_bind(struct socket *sock, struct sockaddr_unsized *addr, int addr_len)
case ISDN_P_B_L2DTMF:
case ISDN_P_B_L2DSP:
case ISDN_P_B_L2DSPHDLC:
- err = connect_Bstack(_pms(sk)->dev, &_pms(sk)->ch,
- sk->sk_protocol, maddr);
+ err = connect_Bstack(dev, &_pms(sk)->ch, sk->sk_protocol,
+ maddr);
break;
default:
err = -EPROTONOSUPPORT;
}
- if (err)
+ if (err) {
+ data_sock_reset_channel(sk);
goto done;
+ }
+ _pms(sk)->dev = dev;
+ dev = NULL;
sk->sk_state = MISDN_BOUND;
_pms(sk)->ch.protocol = sk->sk_protocol;
done:
release_sock(sk);
+ device_unlock(&lockdev->dev);
+ if (dev)
+ put_device(&dev->dev);
return err;
}
@@ -555,10 +693,11 @@ data_sock_getname(struct socket *sock, struct sockaddr *addr,
struct sockaddr_mISDN *maddr = (struct sockaddr_mISDN *) addr;
struct sock *sk = sock->sk;
- if (!_pms(sk)->dev)
- return -EBADFD;
-
lock_sock(sk);
+ if (!_pms(sk)->dev) {
+ release_sock(sk);
+ return -EBADFD;
+ }
maddr->family = AF_ISDN;
maddr->dev = _pms(sk)->dev->id;
@@ -623,6 +762,10 @@ base_sock_release(struct socket *sock)
if (!sk)
return 0;
+ lock_sock(sk);
+ base_sock_close(sk);
+ release_sock(sk);
+
mISDN_sock_unlink(&base_sockets, sk);
sock_orphan(sk);
sock_put(sk);
@@ -700,6 +843,7 @@ base_sock_bind(struct socket *sock, struct sockaddr_unsized *addr, int addr_len)
{
struct sockaddr_mISDN *maddr = (struct sockaddr_mISDN *) addr;
struct sock *sk = sock->sk;
+ struct mISDNdevice *dev, *lockdev;
int err = 0;
if (addr_len < sizeof(struct sockaddr_mISDN))
@@ -709,21 +853,47 @@ base_sock_bind(struct socket *sock, struct sockaddr_unsized *addr, int addr_len)
return -EINVAL;
lock_sock(sk);
+ if (sk->sk_state == MISDN_CLOSED) {
+ release_sock(sk);
+ return -EBADFD;
+ }
+ if (_pms(sk)->dev) {
+ release_sock(sk);
+ return -EALREADY;
+ }
+ release_sock(sk);
+
+ dev = get_mdevice(maddr->dev);
+ if (!dev)
+ return -ENODEV;
+
+ lockdev = dev;
+ device_lock(&dev->dev);
+ lock_sock(sk);
+
+ if (sk->sk_state == MISDN_CLOSED) {
+ err = -EBADFD;
+ goto done;
+ }
if (_pms(sk)->dev) {
err = -EALREADY;
goto done;
}
- _pms(sk)->dev = get_mdevice(maddr->dev);
- if (!_pms(sk)->dev) {
+ if (!device_is_registered(&dev->dev)) {
err = -ENODEV;
goto done;
}
+ _pms(sk)->dev = dev;
+ dev = NULL;
sk->sk_state = MISDN_BOUND;
done:
release_sock(sk);
+ device_unlock(&lockdev->dev);
+ if (dev)
+ put_device(&dev->dev);
return err;
}
diff --git a/include/linux/mISDNif.h b/include/linux/mISDNif.h
index 7aab4a7697369..ce26d70c1ebfb 100644
--- a/include/linux/mISDNif.h
+++ b/include/linux/mISDNif.h
@@ -499,6 +499,7 @@ struct mISDNdevice {
u_char channelmap[MISDN_CHMAP_SIZE];
struct list_head bchannels;
struct mISDNchannel *teimgr;
+ struct completion released;
struct device dev;
};
--
2.50.1 (Apple Git-155)
^ permalink raw reply related
* [PATCH v2 0/3] mISDN: fix socket/device lifetime and naming races
From: Shuvam Pandey @ 2026-04-17 17:49 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Simon Horman
In-Reply-To: <20260414071322.30851-1-shuvampandey1@gmail.com>
This is a respin of the original get_mdevice() reference leak fix.
Patch 1 makes unregister wait for the last device reference and closes
sockets that still point at the device before delete_stack() runs. It
also serializes bind against unregister so a new socket cannot attach
once teardown has started.
Patch 2 drops the temporary get_mdevice() references from IMGETDEVINFO
and IMSETDEVNAME, and serializes those ioctl paths against unregister.
Patch 3 keeps an mISDN-owned copy of the device name so ioctl, sysfs, and
debug paths no longer depend on the kobject name storage remaining stable
across device_rename().
Previous discussion:
https://lore.kernel.org/r/20260414071322.30851-1-shuvampandey1@gmail.com
This series was developed with AI assistance. I reviewed, revised, and
tested it, and I take responsibility for the submission.
---
Changes in v2:
- split the fix into three focused patches
- close sockets before delete_stack() and wait for the final device release
- serialize bind and ioctl lookup/rename paths against unregister
- cache stable device names for mISDN paths outside the kobject
- keep existing debug behavior while switching layer1 name reads to the cached name
- document the device_lock(dev) -> lock_sock(sk) ordering
- build-test the series on arm64 with W=1 for drivers/isdn/mISDN and
drivers/isdn/hardware/mISDN
Shuvam Pandey (3):
mISDN: serialize socket teardown against device unregister
mISDN: socket: drop temporary references from get_mdevice()
mISDN: cache stable device names outside the kobject
drivers/isdn/mISDN/core.c | 37 ++++-
drivers/isdn/mISDN/core.h | 1 +
drivers/isdn/mISDN/layer1.c | 2 +-
drivers/isdn/mISDN/socket.c | 263 +++++++++++++++++++++++++++++++-----
drivers/isdn/mISDN/stack.c | 36 ++---
drivers/isdn/mISDN/tei.c | 2 +-
include/linux/mISDNif.h | 2 +
7 files changed, 284 insertions(+), 59 deletions(-)
--
2.50.1 (Apple Git-155)
^ permalink raw reply
* Re: [PATCH net] net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
From: Breno Leitao @ 2026-04-17 17:46 UTC (permalink / raw)
To: Daniel Golle
Cc: Chester A. Unal, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
AngeloGioacchino Del Regno, Russell King, Christian Marangi,
netdev, linux-kernel, linux-arm-kernel, linux-mediatek,
Frank Wunderlich, John Crispin
In-Reply-To: <79dc0ec5b6be698b14cb66339d6f63033ca2934a.1776397542.git.daniel@makrotopia.org>
On Fri, Apr 17, 2026 at 04:55:57AM +0100, Daniel Golle wrote:
> @@ -3404,6 +3449,9 @@ EXPORT_SYMBOL_GPL(mt7530_probe_common);
> void
> mt7530_remove_common(struct mt7530_priv *priv)
> {
> + if (priv->bus)
> + cancel_delayed_work_sync(&priv->stats_work);
> +
Shouldn't you cancel the work later, after dsa_unregister_switch()?
I am wondering if the following race cannot happen:
mt7530_remove_common() someone reading /proc/net/dev
cancel_delayed_work_sync()
/* returns: work neither pending
nor executing - true at this
instant */
mt7530_get_stats64()
mod_delayed_work(...)
/* work is queued again */
dsa_unregister_switch()
return
^ permalink raw reply
* RE:[net-next v2 4/5] net: stmmac: starfive: Add JHB100 SGMII interface
From: Sai Krishna Gajula @ 2026-04-17 17:44 UTC (permalink / raw)
To: Minda Chen, Alexandre Torgue, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
linux-stm32@st-md-mailman.stormreply.com,
devicetree@vger.kernel.org
In-Reply-To: <20260417024523.107786-5-minda.chen@starfivetech.com>
> -----Original Message-----
> From: Minda Chen <minda.chen@starfivetech.com>
> Sent: Friday, April 17, 2026 8:15 AM
> To: Alexandre Torgue <alexandre.torgue@foss.st.com>; Andrew Lunn
> <andrew+netdev@lunn.ch>; David S . Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; Maxime Coquelin
> <mcoquelin.stm32@gmail.com>; Emil Renner Berthing
> <emil.renner.berthing@canonical.com>; Rob Herring <robh+dt@kernel.org>;
> Krzysztof Kozlowski <krzk+dt@kernel.org>; Conor Dooley
> <conor@kernel.org>; netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; linux-stm32@st-md-
> mailman.stormreply.com; devicetree@vger.kernel.org; Minda Chen
> <minda.chen@starfivetech.com>
> Subject: [net-next v2 4/5] net: stmmac: starfive: Add JHB100
> SGMII interface
>
> Add JHB100 compatible and SGMII support. JHB100 soc contains 2 SGMII
> interfaces and integrated with serdes PHY. SGMII with split TX/RX MAC clock
> and need to set 2. 5M/25M/125M TX/RX clock rate in 10M/100M/1000M
> speed mode. Signed-off-by:
> Add JHB100 compatible and SGMII support. JHB100 soc contains
> 2 SGMII interfaces and integrated with serdes PHY. SGMII with split TX/RX
> MAC clock and need to set 2.5M/25M/125M TX/RX clock rate in
> 10M/100M/1000M speed mode.
>
> Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
> ---
> .../ethernet/stmicro/stmmac/dwmac-starfive.c | 54 ++++++++++++++-----
> 1 file changed, 42 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> index 16b955a6d77b..91698c763dac 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> @@ -26,6 +26,7 @@ struct starfive_dwmac_data { struct starfive_dwmac {
> struct device *dev;
> const struct starfive_dwmac_data *data;
> + struct clk *sgmii_rx;
> };
>
> static int starfive_dwmac_set_mode(struct plat_stmmacenet_data *plat_dat)
> @@ -68,6 +69,24 @@ static int starfive_dwmac_set_mode(struct
> plat_stmmacenet_data *plat_dat)
> return 0;
> }
>
> +static int stmmac_starfive_sgmii_set_clk_rate(void *bsp_priv, struct clk
> *clk_tx_i,
> + phy_interface_t interface, int
> speed) {
phy_interface_t interface is likely unused in stmmac_starfive_sgmii_set_clk_rate → may need __maybe_unused or (void)interface to avoid -Werror=unused-parameter on strict builds.
> + struct starfive_dwmac *dwmac = (void *)bsp_priv;
> + long rate = rgmii_clock(speed);
> + int ret;
> +
> + /* MAC clock rate the same as RGMII */
> + if (rate < 0)
> + return 0;
> +
> + ret = clk_set_rate(clk_tx_i, rate);
> + if (ret)
> + return ret;
> +
> + return clk_set_rate(dwmac->sgmii_rx, rate); }
> +
> static int starfive_dwmac_probe(struct platform_device *pdev) {
> struct plat_stmmacenet_data *plat_dat; @@ -102,24 +121,34 @@
> static int starfive_dwmac_probe(struct platform_device *pdev)
> return dev_err_probe(&pdev->dev, PTR_ERR(clk_gtx),
> "error getting gtx clock\n");
>
> - /* Generally, the rgmii_tx clock is provided by the internal clock,
> - * which needs to match the corresponding clock frequency according
> - * to different speeds. If the rgmii_tx clock is provided by the
> - * external rgmii_rxin, there is no need to configure the clock
> - * internally, because rgmii_rxin will be adaptively adjusted.
> - */
> - if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-
> clk"))
> - plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
> + if (plat_dat->phy_interface == PHY_INTERFACE_MODE_SGMII) {
> + dwmac->sgmii_rx = devm_clk_get_enabled(&pdev->dev,
> "sgmii_rx");
> + if (IS_ERR(dwmac->sgmii_rx))
> + return dev_err_probe(&pdev->dev,
> + PTR_ERR(dwmac->sgmii_rx),
> + "error getting sgmii rx clock\n");
> + plat_dat->set_clk_tx_rate =
> stmmac_starfive_sgmii_set_clk_rate;
> + } else {
> + /*
> + * Generally, the rgmii_tx clock is provided by the internal
> clock,
> + * which needs to match the corresponding clock frequency
> according
> + * to different speeds. If the rgmii_tx clock is provided by the
> + * external rgmii_rxin, there is no need to configure the clock
> + * internally, because rgmii_rxin will be adaptively adjusted.
> + */
> + if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-
> rgmii-clk"))
> + plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
> +
> + err = starfive_dwmac_set_mode(plat_dat);
> + if (err)
> + return err;
> + }
>
> dwmac->dev = &pdev->dev;
> plat_dat->flags |= STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP;
> plat_dat->bsp_priv = dwmac;
> plat_dat->dma_cfg->dche = true;
>
> - err = starfive_dwmac_set_mode(plat_dat);
> - if (err)
> - return err;
> -
> return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res); }
>
> @@ -130,6 +159,7 @@ static const struct starfive_dwmac_data jh7100_data =
> { static const struct of_device_id starfive_dwmac_match[] = {
> { .compatible = "starfive,jh7100-dwmac", .data = &jh7100_data },
> { .compatible = "starfive,jh7110-dwmac" },
> + { .compatible = "starfive,jhb100-dwmac" },
> { /* sentinel */ }
> };
> MODULE_DEVICE_TABLE(of, starfive_dwmac_match);
> --
> 2.17.1
>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
^ permalink raw reply
* [PATCH 9/9] ptp: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
Switch from sign_extend32(FIELD_GET()) to the dedicated
FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
drivers/ptp/ptp_fc3.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ptp/ptp_fc3.c b/drivers/ptp/ptp_fc3.c
index 70002500170e..f0e000428a3f 100644
--- a/drivers/ptp/ptp_fc3.c
+++ b/drivers/ptp/ptp_fc3.c
@@ -55,8 +55,8 @@ static s64 tdc_meas2offset(struct idtfc3 *idtfc3, u64 meas_read)
{
s64 coarse, fine;
- fine = sign_extend64(FIELD_GET(FINE_MEAS_MASK, meas_read), 12);
- coarse = sign_extend64(FIELD_GET(COARSE_MEAS_MASK, meas_read), (39 - 13));
+ fine = FIELD_GET_SIGNED(FINE_MEAS_MASK, meas_read);
+ coarse = FIELD_GET_SIGNED(COARSE_MEAS_MASK, meas_read);
fine = div64_s64(fine * NSEC_PER_SEC, idtfc3->tdc_apll_freq * 62LL);
coarse = div64_s64(coarse * NSEC_PER_SEC, idtfc3->time_ref_freq);
--
2.51.0
^ permalink raw reply related
* [PATCH 8/9] rtc: rv3032: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
Switch from sign_extend32(FIELD_GET()) to the dedicated
FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
drivers/rtc/rtc-rv3032.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/rtc/rtc-rv3032.c b/drivers/rtc/rtc-rv3032.c
index 6c09da7738e1..6bafdec637ae 100644
--- a/drivers/rtc/rtc-rv3032.c
+++ b/drivers/rtc/rtc-rv3032.c
@@ -376,7 +376,7 @@ static int rv3032_read_offset(struct device *dev, long *offset)
if (ret < 0)
return ret;
- steps = sign_extend32(FIELD_GET(RV3032_OFFSET_MSK, value), 5);
+ steps = FIELD_GET_SIGNED(RV3032_OFFSET_MSK, value);
*offset = DIV_ROUND_CLOSEST(steps * OFFSET_STEP_PPT, 1000);
--
2.51.0
^ permalink raw reply related
* [PATCH 7/9] wifi: rtw89: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
Switch from sign_extend32(FIELD_GET()) to the dedicated
FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c | 4 ++--
drivers/net/wireless/realtek/rtw89/rtw8852b_common.c | 4 ++--
drivers/net/wireless/realtek/rtw89/rtw8852b_rfk.c | 4 ++--
drivers/net/wireless/realtek/rtw89/rtw8852c.c | 4 ++--
4 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c b/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c
index 463399413318..8679b21fd3fd 100644
--- a/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c
+++ b/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c
@@ -334,8 +334,8 @@ static void _check_addc(struct rtw89_dev *rtwdev, enum rtw89_rf_path path)
for (i = 0; i < ADDC_T_AVG; i++) {
tmp = rtw89_phy_read32_mask(rtwdev, R_DBG32_D, MASKDWORD);
- dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
- dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
+ dc_re += FIELD_GET_SIGNED(0xfff000, tmp);
+ dc_im += FIELD_GET_SIGNED(0xfff, tmp);
}
dc_re /= ADDC_T_AVG;
diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852b_common.c b/drivers/net/wireless/realtek/rtw89/rtw8852b_common.c
index 65b839323e3e..7894834091fe 100644
--- a/drivers/net/wireless/realtek/rtw89/rtw8852b_common.c
+++ b/drivers/net/wireless/realtek/rtw89/rtw8852b_common.c
@@ -206,9 +206,9 @@ static void rtw8852bx_efuse_parsing_tssi(struct rtw89_dev *rtwdev,
static bool _decode_efuse_gain(u8 data, s8 *high, s8 *low)
{
if (high)
- *high = sign_extend32(FIELD_GET(GENMASK(7, 4), data), 3);
+ *high = FIELD_GET_SIGNED(GENMASK(7, 4), data);
if (low)
- *low = sign_extend32(FIELD_GET(GENMASK(3, 0), data), 3);
+ *low = FIELD_GET(GENMASK(3, 0), data);
return data != 0xff;
}
diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852b_rfk.c b/drivers/net/wireless/realtek/rtw89/rtw8852b_rfk.c
index 70b1515c00fa..8db6ea475128 100644
--- a/drivers/net/wireless/realtek/rtw89/rtw8852b_rfk.c
+++ b/drivers/net/wireless/realtek/rtw89/rtw8852b_rfk.c
@@ -497,8 +497,8 @@ static void _check_addc(struct rtw89_dev *rtwdev, enum rtw89_rf_path path)
for (i = 0; i < ADDC_T_AVG; i++) {
tmp = rtw89_phy_read32_mask(rtwdev, R_DBG32_D, MASKDWORD);
- dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
- dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
+ dc_re += FIELD_GET_SIGNED(0xfff000, tmp);
+ dc_im += FIELD_GET_SIGNED(0xfff, tmp);
}
dc_re /= ADDC_T_AVG;
diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852c.c b/drivers/net/wireless/realtek/rtw89/rtw8852c.c
index 40db7e3c0d97..528f9f4b1fc3 100644
--- a/drivers/net/wireless/realtek/rtw89/rtw8852c.c
+++ b/drivers/net/wireless/realtek/rtw89/rtw8852c.c
@@ -517,9 +517,9 @@ static void rtw8852c_efuse_parsing_tssi(struct rtw89_dev *rtwdev,
static bool _decode_efuse_gain(u8 data, s8 *high, s8 *low)
{
if (high)
- *high = sign_extend32(FIELD_GET(GENMASK(7, 4), data), 3);
+ *high = FIELD_GET_SIGNED(GENMASK(7, 4), data);
if (low)
- *low = sign_extend32(FIELD_GET(GENMASK(3, 0), data), 3);
+ *low = FIELD_GET_SIGNED(GENMASK(3, 0), data);
return data != 0xff;
}
--
2.51.0
^ permalink raw reply related
* [PATCH 4/9] iio: magnetometer: yas530: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
Switch from sign_extend32(FIELD_GET()) to the dedicated
FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
drivers/iio/magnetometer/yamaha-yas530.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/magnetometer/yamaha-yas530.c b/drivers/iio/magnetometer/yamaha-yas530.c
index d49e37edcbed..6a80042602c6 100644
--- a/drivers/iio/magnetometer/yamaha-yas530.c
+++ b/drivers/iio/magnetometer/yamaha-yas530.c
@@ -859,9 +859,9 @@ static int yas530_get_calibration_data(struct yas5xx *yas5xx)
c->f[0] = FIELD_GET(GENMASK(22, 21), val);
c->f[1] = FIELD_GET(GENMASK(14, 13), val);
c->f[2] = FIELD_GET(GENMASK(6, 5), val);
- c->r[0] = sign_extend32(FIELD_GET(GENMASK(28, 23), val), 5);
- c->r[1] = sign_extend32(FIELD_GET(GENMASK(20, 15), val), 5);
- c->r[2] = sign_extend32(FIELD_GET(GENMASK(12, 7), val), 5);
+ c->r[0] = FIELD_GET_SIGNED(GENMASK(28, 23), val);
+ c->r[1] = FIELD_GET_SIGNED(GENMASK(20, 15), val);
+ c->r[2] = FIELD_GET_SIGNED(GENMASK(12, 7), val);
return 0;
}
@@ -914,9 +914,9 @@ static int yas532_get_calibration_data(struct yas5xx *yas5xx)
c->f[0] = FIELD_GET(GENMASK(24, 23), val);
c->f[1] = FIELD_GET(GENMASK(16, 15), val);
c->f[2] = FIELD_GET(GENMASK(8, 7), val);
- c->r[0] = sign_extend32(FIELD_GET(GENMASK(30, 25), val), 5);
- c->r[1] = sign_extend32(FIELD_GET(GENMASK(22, 17), val), 5);
- c->r[2] = sign_extend32(FIELD_GET(GENMASK(14, 7), val), 5);
+ c->r[0] = FIELD_GET_SIGNED(GENMASK(30, 25), val);
+ c->r[1] = FIELD_GET_SIGNED(GENMASK(22, 17), val);
+ c->r[2] = FIELD_GET_SIGNED(GENMASK(14, 7), val);
return 0;
}
--
2.51.0
^ permalink raw reply related
* [PATCH 6/9] iio: mcp9600: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
Switch from sign_extend32(FIELD_GET()) to the dedicated
FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
drivers/iio/temperature/mcp9600.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/temperature/mcp9600.c b/drivers/iio/temperature/mcp9600.c
index aa42c2b1a369..69baf654c9c0 100644
--- a/drivers/iio/temperature/mcp9600.c
+++ b/drivers/iio/temperature/mcp9600.c
@@ -297,7 +297,7 @@ static int mcp9600_read_thresh(struct iio_dev *indio_dev,
* Temperature is stored in two’s complement format in
* bits(15:2), LSB is 0.25 degree celsius.
*/
- *val = sign_extend32(FIELD_GET(MCP9600_ALERT_LIMIT_MASK, ret), 13);
+ *val = FIELD_GET_SIGNED(MCP9600_ALERT_LIMIT_MASK, ret);
*val2 = 4;
return IIO_VAL_FRACTIONAL;
case IIO_EV_INFO_HYSTERESIS:
--
2.51.0
^ permalink raw reply related
* [PATCH 3/9] iio: intel_dc_ti_adc: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
Switch from sign_extend32(FIELD_GET()) to the dedicated
FIELD_GET_SIGNED() and don't provide the fields length explicitly.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
drivers/iio/adc/intel_dc_ti_adc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/intel_dc_ti_adc.c b/drivers/iio/adc/intel_dc_ti_adc.c
index 0fe34f1c338e..b5afad713e2d 100644
--- a/drivers/iio/adc/intel_dc_ti_adc.c
+++ b/drivers/iio/adc/intel_dc_ti_adc.c
@@ -290,8 +290,8 @@ static int dc_ti_adc_probe(struct platform_device *pdev)
if (ret)
return ret;
- info->vbat_zse = sign_extend32(FIELD_GET(DC_TI_VBAT_ZSE, val), 3);
- info->vbat_ge = sign_extend32(FIELD_GET(DC_TI_VBAT_GE, val), 3);
+ info->vbat_zse = FIELD_GET_SIGNED(DC_TI_VBAT_ZSE, val);
+ info->vbat_ge = FIELD_GET_SIGNED(DC_TI_VBAT_GE, val);
dev_dbg(dev, "vbat-zse %d vbat-ge %d\n", info->vbat_zse, info->vbat_ge);
--
2.51.0
^ permalink raw reply related
* [PATCH 5/9] iio: pressure: bmp280: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
Switch from sign_extend32(FIELD_GET()) to the dedicated
FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
drivers/iio/pressure/bmp280-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/pressure/bmp280-core.c b/drivers/iio/pressure/bmp280-core.c
index d983ce9c0b99..f722aea16e0e 100644
--- a/drivers/iio/pressure/bmp280-core.c
+++ b/drivers/iio/pressure/bmp280-core.c
@@ -392,7 +392,7 @@ static int bme280_read_calib(struct bmp280_data *data)
h4_lower = FIELD_GET(BME280_COMP_H4_MASK_LOW, tmp_1);
calib->H4 = sign_extend32(h4_upper | h4_lower, 11);
tmp_3 = get_unaligned_le16(&data->bme280_humid_cal_buf[H5]);
- calib->H5 = sign_extend32(FIELD_GET(BME280_COMP_H5_MASK, tmp_3), 11);
+ calib->H5 = FIELD_GET_SIGNED(BME280_COMP_H5_MASK, tmp_3);
calib->H6 = data->bme280_humid_cal_buf[H6];
return 0;
--
2.51.0
^ permalink raw reply related
* [PATCH 2/9] x86/extable: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
The EX_DATA register is laid out such that EX_DATA_IMM occupied MSB.
It's done to make sure that FIELD_GET() will sign-extend the IMM
field during extraction.
To enforce that, all EX_DATA masks are made signed integers. This
works, but relies on the particular implementation of FIELD_GET(),
i.e. masking then shifting, not vice versa; and the particular
placement of the fields in the register.
Switch to using the dedicated FIELD_GET_SIGNED(), and relax those
limitations.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
arch/x86/include/asm/extable_fixup_types.h | 13 ++++---------
arch/x86/mm/extable.c | 2 +-
2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index 906b0d5541e8..fd0cfb472103 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -2,15 +2,10 @@
#ifndef _ASM_X86_EXTABLE_FIXUP_TYPES_H
#define _ASM_X86_EXTABLE_FIXUP_TYPES_H
-/*
- * Our IMM is signed, as such it must live at the top end of the word. Also,
- * since C99 hex constants are of ambiguous type, force cast the mask to 'int'
- * so that FIELD_GET() will DTRT and sign extend the value when it extracts it.
- */
-#define EX_DATA_TYPE_MASK ((int)0x000000FF)
-#define EX_DATA_REG_MASK ((int)0x00000F00)
-#define EX_DATA_FLAG_MASK ((int)0x0000F000)
-#define EX_DATA_IMM_MASK ((int)0xFFFF0000)
+#define EX_DATA_TYPE_MASK (0x000000FF)
+#define EX_DATA_REG_MASK (0x00000F00)
+#define EX_DATA_FLAG_MASK (0x0000F000)
+#define EX_DATA_IMM_MASK (0xFFFF0000)
#define EX_DATA_REG_SHIFT 8
#define EX_DATA_FLAG_SHIFT 12
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 6b9ff1c6cafa..ae663cf88a3c 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -322,7 +322,7 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
type = FIELD_GET(EX_DATA_TYPE_MASK, e->data);
reg = FIELD_GET(EX_DATA_REG_MASK, e->data);
- imm = FIELD_GET(EX_DATA_IMM_MASK, e->data);
+ imm = FIELD_GET_SIGNED(EX_DATA_IMM_MASK, e->data);
switch (type) {
case EX_TYPE_DEFAULT:
--
2.51.0
^ permalink raw reply related
* [PATCH 1/9] bitfield: add FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
In-Reply-To: <20260417173621.368914-1-ynorov@nvidia.com>
The bitfields are designed in assumption that fields contain unsigned
integer values, thus extracting the values from the field implies
zero-extending.
Some drivers need to sign-extend their fields, and currently do it like:
dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
It's error-prone because it relies on user to provide the correct
index of the most significant bit and proper 32 vs 64 function flavor.
Thus, introduce a FIELD_GET_SIGNED() macro, which is the more
convenient and compiles (on x86_64) to just a couple instructions:
shl and sar.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
include/linux/bitfield.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
index 54aeeef1f0ec..35ef63972810 100644
--- a/include/linux/bitfield.h
+++ b/include/linux/bitfield.h
@@ -178,6 +178,22 @@
__FIELD_GET(_mask, _reg, "FIELD_GET: "); \
})
+/**
+ * FIELD_GET_SIGNED() - extract a signed bitfield element
+ * @mask: shifted mask defining the field's length and position
+ * @reg: value of entire bitfield
+ *
+ * Returns the sign-extended field specified by @_mask from the
+ * bitfield passed in as @_reg by masking and shifting it down.
+ */
+#define FIELD_GET_SIGNED(mask, reg) \
+ ({ \
+ __BF_FIELD_CHECK(mask, reg, 0U, "FIELD_GET_SIGNED: "); \
+ ((__signed_scalar_typeof(mask))((long long)(reg) << \
+ __builtin_clzll(mask) >> (__builtin_clzll(mask) + \
+ __builtin_ctzll(mask))));\
+ })
+
/**
* FIELD_MODIFY() - modify a bitfield element
* @_mask: shifted mask defining the field's length and position
--
2.51.0
^ permalink raw reply related
* [PATCH 0/9] bitfield: add FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-17 17:36 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Jonathan Cameron,
David Lechner, Nuno Sá, Andy Shevchenko, Ping-Ke Shih,
Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexandre Belloni, Yury Norov,
Rasmus Villemoes, Hans de Goede, Linus Walleij, Sakari Ailus,
Salah Triki, Achim Gratz, Ben Collins, linux-kernel, linux-iio,
linux-wireless, netdev, linux-rtc
Cc: Yury Norov
The bitfields are designed in assumption that fields contain unsigned
integer values, thus extracting the values from the field implies
zero-extending.
Some drivers need to sign-extend their fields, and currently do it like:
dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
It's error-prone because it relies on user to provide the correct
index of the most significant bit.
This series adds a signed version of FIELD_GET(), which is the more
convenient and compiles (on x86_64) to just a couple instructions:
shl and sar.
Patch #1 adds FIELD_GET_SIGNED(), and the rest of the series applies it
tree-wide.
Yury Norov (9):
bitfield: add FIELD_GET_SIGNED()
x86/extable: switch to using FIELD_GET_SIGNED()
iio: intel_dc_ti_adc: switch to using
iio: magnetometer: yas530: switch to using FIELD_GET_SIGNED()
iio: pressure: bmp280: switch to using
iio: mcp9600: switch to using FIELD_GET_SIGNED()
wifi: rtw89: switch to using FIELD_GET_SIGNED()
rtc: rv3032: switch to using FIELD_GET_SIGNED()
ptp: switch to using FIELD_GET_SIGNED()
arch/x86/include/asm/extable_fixup_types.h | 13 ++++---------
arch/x86/mm/extable.c | 2 +-
drivers/iio/adc/intel_dc_ti_adc.c | 4 ++--
drivers/iio/magnetometer/yamaha-yas530.c | 12 ++++++------
drivers/iio/pressure/bmp280-core.c | 2 +-
drivers/iio/temperature/mcp9600.c | 2 +-
.../net/wireless/realtek/rtw89/rtw8852a_rfk.c | 4 ++--
.../net/wireless/realtek/rtw89/rtw8852b_common.c | 4 ++--
.../net/wireless/realtek/rtw89/rtw8852b_rfk.c | 4 ++--
drivers/net/wireless/realtek/rtw89/rtw8852c.c | 4 ++--
drivers/ptp/ptp_fc3.c | 4 ++--
drivers/rtc/rtc-rv3032.c | 2 +-
include/linux/bitfield.h | 16 ++++++++++++++++
13 files changed, 42 insertions(+), 31 deletions(-)
--
2.51.0
^ permalink raw reply
* Re: [PATCH net-next 5/6] net: stmmac: move PHY handling out of __stmmac_open()/release()
From: Andrew Lunn @ 2026-04-17 17:30 UTC (permalink / raw)
To: Alexander Stein
Cc: Russell King (Oracle), Maxime Chevallier, Heiner Kallweit,
Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin,
netdev, Paolo Abeni
In-Reply-To: <13939918.O9o76ZdvQC@steina-w>
> Thanks for conforming this for another PHY. What I'm wondering right now:
> Why is the PHY stopped in the first place? We are just changing the MTU, no?
It is not too uncommon to see an MTU change destroying everything and
rebuilding it, especially when it was retrofitted into an older driver
which had fixed MTU.
> I have a proof of concept running, but it needs more cleanup due
> to code duplication.
You probably also want to take a look at the ethtool code for
configuring rings. Oh, it is even worse:
int stmmac_reinit_ringparam(struct net_device *dev, u32 rx_size, u32 tx_size)
{
struct stmmac_priv *priv = netdev_priv(dev);
int ret = 0;
if (netif_running(dev))
stmmac_release(dev);
priv->dma_conf.dma_rx_size = rx_size;
priv->dma_conf.dma_tx_size = tx_size;
if (netif_running(dev))
ret = stmmac_open(dev);
return ret;
}
So not even using __stmmac_release() or __stmmac_open(), and leaving
you with a dead interface if there is not enough memory to allocate
the rings.
These paths should really share the same code.
Andrew
^ permalink raw reply
* [PATCH net v2] ibmveth: Disable GSO for packets with small MSS
From: Mingming Cao @ 2026-04-17 17:29 UTC (permalink / raw)
To: netdev
Cc: davem, kuba, edumazet, pabeni, horms, bjking1, haren, ricklind,
maddy, mpe, linuxppc-dev, stable, Mingming Cao, Shaik Abdulla,
Naveed Ahmed
Some physical adapters on Power systems do not support segmentation
offload when the MSS is less than 224 bytes. Attempting to send such
packets causes the adapter to freeze, stopping all traffic until
manually reset.
Implement ndo_features_check to disable GSO for packets with small MSS
values. The network stack will perform software segmentation instead.
The 224-byte minimum matches ibmvnic
commit <f10b09ef687f> ("ibmvnic: Enforce stronger sanity checks
on GSO packets")
which uses the same physical adapters in SEA configurations.
Validated using iptables to force small MSS values. Without the fix,
the adapter freezes. With the fix, packets are segmented in software
and transmission succeeds.
Fixes: 8641dd85799f ("ibmveth: Add support for TSO")
Cc: stable@vger.kernel.org
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Tested-by: Shaik Abdulla <shaik.abdulla1@ibm.com>
Tested-by: Naveed Ahmed <naveedaus@in.ibm.com>
Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
---
v2: Add Fixes tag as requested by automated checks
drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
drivers/net/ethernet/ibm/ibmveth.h | 1 +
2 files changed, 21 insertions(+)
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 58cc3147afe2..7935c9384ef4 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1756,6 +1756,25 @@ static int ibmveth_set_mac_addr(struct net_device *dev, void *p)
return 0;
}
+static netdev_features_t ibmveth_features_check(struct sk_buff *skb,
+ struct net_device *dev,
+ netdev_features_t features)
+{
+ /* Some physical adapters do not support segmentation offload with
+ * MSS < 224. Disable GSO for such packets to avoid adapter freeze.
+ */
+ if (skb_is_gso(skb)) {
+ if (skb_shinfo(skb)->gso_size < IBMVETH_MIN_LSO_MSS) {
+ netdev_warn_once(dev,
+ "MSS %u too small for LSO, disabling GSO\n",
+ skb_shinfo(skb)->gso_size);
+ features &= ~NETIF_F_GSO_MASK;
+ }
+ }
+
+ return features;
+}
+
static const struct net_device_ops ibmveth_netdev_ops = {
.ndo_open = ibmveth_open,
.ndo_stop = ibmveth_close,
@@ -1767,6 +1786,7 @@ static const struct net_device_ops ibmveth_netdev_ops = {
.ndo_set_features = ibmveth_set_features,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = ibmveth_set_mac_addr,
+ .ndo_features_check = ibmveth_features_check,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = ibmveth_poll_controller,
#endif
diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 068f99df133e..d87713668ed3 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -37,6 +37,7 @@
#define IBMVETH_ILLAN_IPV4_TCP_CSUM 0x0000000000000002UL
#define IBMVETH_ILLAN_ACTIVE_TRUNK 0x0000000000000001UL
+#define IBMVETH_MIN_LSO_MSS 224 /* Minimum MSS for LSO */
/* hcall macros */
#define h_register_logical_lan(ua, buflst, rxq, fltlst, mac) \
plpar_hcall_norets(H_REGISTER_LOGICAL_LAN, ua, buflst, rxq, fltlst, mac)
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [PATCH net] ipv6: Implement limits on extension header parsing
From: Daniel Borkmann @ 2026-04-17 17:18 UTC (permalink / raw)
To: kuba; +Cc: edumazet, dsahern, tom, willemdebruijn.kernel, idosch, pabeni,
netdev
ipv6_{skip_exthdr,find_hdr}() and ip6_tnl_parse_tlv_enc_lim() iterate
over IPv6 extension headers until they find a non-extension-header
protocol or run out of packet data. The loops have no iteration counter,
relying solely on the packet length to bound them. For a crafted packet
with 8-byte extension headers filling a 64KB jumbogram, this means a
worst case of up to ~8k iterations with a skb_header_pointer call each.
ipv6_skip_exthdr(), for example, is used where it parses the inner
quoted packet inside an incoming ICMPv6 error:
- icmpv6_rcv
- checksum validation
- case ICMPV6_DEST_UNREACH
- icmpv6_notify
- pskb_may_pull() <- pull inner IPv6 header
- ipv6_skip_exthdr() <- iterates here
- pskb_may_pull()
- ipprot->err_handler() <- sk lookup (matching sk not required)
The per-iteration cost of ipv6_skip_exthdr itself is generally light,
but skb_header_pointer becomes more costly on reassembled packets: the
first ~1KB of the inner packet are in the skb's linear area, but the
remaining ~63KB are in the frag_list where skb_copy_bits is needed to
read data.
Add a configurable limit via a new sysctl net.ipv6.max_ext_hdrs_number
(default 32, minimum 1). All three extension header walking functions
are bound by this limit. The sysctl is in line with commit 47d3d7ac656a
("ipv6: Implement limits on Hop-by-Hop and Destination options"). The
init_net is used since plumbing a struct net * through all helpers
would touch a lot of callsites.
There's an ongoing IETF draft-ietf-6man-eh-limits-18 that states that
8 extension headers before the transport header is the baseline which
routers MUST handle; section 7 details also why limits are needed.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
Documentation/networking/ip-sysctl.rst | 7 +++++++
include/net/ipv6.h | 2 ++
include/net/netns/ipv6.h | 1 +
net/ipv6/af_inet6.c | 1 +
net/ipv6/exthdrs_core.c | 11 +++++++++++
net/ipv6/ip6_tunnel.c | 5 +++++
net/ipv6/sysctl_net_ipv6.c | 8 ++++++++
7 files changed, 35 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 6921d8594b84..4559a956bbd9 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -2503,6 +2503,13 @@ max_hbh_length - INTEGER
Default: INT_MAX (unlimited)
+max_ext_hdrs_number - INTEGER
+ Maximum number of IPv6 extension headers allowed in a packet.
+ Limits how many extension headers will be traversed. The value
+ is read from the initial netns.
+
+ Default: 32
+
skip_notify_on_dev_down - BOOLEAN
Controls whether an RTM_DELROUTE message is generated for routes
removed when a device is taken down or deleted. IPv4 does not
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 53c5056508be..d7f0d55e6918 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -90,6 +90,8 @@ struct ip_tunnel_info;
#define IP6_DEFAULT_MAX_DST_OPTS_LEN INT_MAX /* No limit */
#define IP6_DEFAULT_MAX_HBH_OPTS_LEN INT_MAX /* No limit */
+#define IP6_DEFAULT_MAX_EXT_HDRS_CNT 32
+
/*
* Addr type
*
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 34bdb1308e8f..5be4dd1c9ae8 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -54,6 +54,7 @@ struct netns_sysctl_ipv6 {
int max_hbh_opts_cnt;
int max_dst_opts_len;
int max_hbh_opts_len;
+ int max_ext_hdrs_cnt;
int seg6_flowlabel;
u32 ioam6_id;
u64 ioam6_id_wide;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 4cbd45b68088..ed7fe6e4a6bd 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -965,6 +965,7 @@ static int __net_init inet6_net_init(struct net *net)
net->ipv6.sysctl.flowlabel_state_ranges = 0;
net->ipv6.sysctl.max_dst_opts_cnt = IP6_DEFAULT_MAX_DST_OPTS_CNT;
net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT;
+ net->ipv6.sysctl.max_ext_hdrs_cnt = IP6_DEFAULT_MAX_EXT_HDRS_CNT;
net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN;
net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN;
net->ipv6.sysctl.fib_notify_on_flag_change = 0;
diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index 49e31e4ae7b7..917307877cbb 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -4,6 +4,8 @@
* not configured or static.
*/
#include <linux/export.h>
+
+#include <net/net_namespace.h>
#include <net/ipv6.h>
/*
@@ -72,7 +74,9 @@ EXPORT_SYMBOL(ipv6_ext_hdr);
int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
__be16 *frag_offp)
{
+ int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
u8 nexthdr = *nexthdrp;
+ int exthdr_cnt = 0;
*frag_offp = 0;
@@ -80,6 +84,8 @@ int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
struct ipv6_opt_hdr _hdr, *hp;
int hdrlen;
+ if (unlikely(exthdr_cnt++ >= exthdr_max))
+ return -1;
if (nexthdr == NEXTHDR_NONE)
return -1;
hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
@@ -188,8 +194,10 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
int target, unsigned short *fragoff, int *flags)
{
+ int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+ int exthdr_cnt = 0;
bool found;
if (fragoff)
@@ -216,6 +224,9 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
return -ENOENT;
}
+ if (unlikely(exthdr_cnt++ >= exthdr_max))
+ return -EBADMSG;
+
hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
if (!hp)
return -EBADMSG;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 0b53488a9229..78e849e167ca 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -396,15 +396,20 @@ ip6_tnl_dev_uninit(struct net_device *dev)
__u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw)
{
+ int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
const struct ipv6hdr *ipv6h = (const struct ipv6hdr *)raw;
unsigned int nhoff = raw - skb->data;
unsigned int off = nhoff + sizeof(*ipv6h);
u8 nexthdr = ipv6h->nexthdr;
+ int exthdr_cnt = 0;
while (ipv6_ext_hdr(nexthdr) && nexthdr != NEXTHDR_NONE) {
struct ipv6_opt_hdr *hdr;
u16 optlen;
+ if (unlikely(exthdr_cnt++ >= exthdr_max))
+ break;
+
if (!pskb_may_pull(skb, off + sizeof(*hdr)))
break;
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index d2cd33e2698d..93f865545a7c 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -135,6 +135,14 @@ static struct ctl_table ipv6_table_template[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = &flowlabel_reflect_max,
},
+ {
+ .procname = "max_ext_hdrs_number",
+ .data = &init_net.ipv6.sysctl.max_ext_hdrs_cnt,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ },
{
.procname = "max_dst_opts_number",
.data = &init_net.ipv6.sysctl.max_dst_opts_cnt,
--
2.43.0
^ permalink raw reply related
* Re: [PATCH] rds: zero per-item info buffer before handing it to visitors
From: Sharath Srinivasan @ 2026-04-17 16:53 UTC (permalink / raw)
To: Michael Bommarito, Allison Henderson, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, linux-rdma, rds-devel, linux-kernel
In-Reply-To: <20260417141916.494761-1-michael.bommarito@gmail.com>
On 2026-04-17 7:19 a.m., Michael Bommarito wrote:
> Yet another from my "clanker." This only applies to people who
> don't use CONFIG_INIT_STACK_ALL_ZERO, but I presume that's
> still enough people that it's worth backporting since it can
> be chained through leaked addresses to defeat KASLR.
>
> rds_for_each_conn_info() and rds_walk_conn_path_info() both hand a
> caller-allocated on-stack u64 buffer to a per-connection visitor and
> then copy the full item_len bytes back to user space via
> rds_info_copy() regardless of how much of the buffer the visitor
> actually wrote.
>
> rds_ib_conn_info_visitor() and rds6_ib_conn_info_visitor() only
> write a subset of their output struct when the underlying
> rds_connection is not in state RDS_CONN_UP (src/dst addr, tos, sl
> and the two GIDs via explicit memsets). Several u32 fields
> (max_send_wr, max_recv_wr, max_send_sge, rdma_mr_max, rdma_mr_size,
> cache_allocs) and the 2-byte alignment hole between sl and
> cache_allocs remain as whatever stack contents preceded the visitor
> call and are then memcpy_to_user()'d out to user space.
>
> struct rds_info_rdma_connection and struct rds6_info_rdma_connection
> are the only rds_info_* structs in include/uapi/linux/rds.h that are
> not marked __attribute__((packed)), so they have a real alignment
> hole. The other info visitors (rds_conn_info_visitor,
> rds6_conn_info_visitor, rds_tcp_tc_info, ...) write all fields of
> their packed output struct today and are not known to be vulnerable,
> but a future visitor that adds a conditional write-path would have
> the same bug.
>
> Reproduction on a kernel built without CONFIG_INIT_STACK_ALL_ZERO=y:
> a local unprivileged user opens AF_RDS, sets SO_RDS_TRANSPORT=IB,
> binds to a local address on an RDMA-capable netdev (rxe soft-RoCE on
> any netdev is sufficient), sendto()'s any peer on the same subnet
> (fails cleanly but installs an rds_connection in the global hash in
> RDS_CONN_CONNECTING), then calls getsockopt(SOL_RDS,
> RDS_INFO_IB_CONNECTIONS). The returned 68-byte item contains 26
> bytes of stack garbage including kernel text/data pointers:
>
> 0..7 0a 63 00 01 0a 63 00 02 src=10.99.0.1 dst=10.99.0.2
> 8..39 00 ... gids (memset-zeroed)
> 40..47 e0 92 a3 81 ff ff ff ff kernel pointer (max_send_wr)
> 48..55 7f 37 b5 81 ff ff ff ff kernel pointer (rdma_mr_max)
> 56..59 01 00 08 00 rdma_mr_size (garbage)
> 60..61 00 00 tos, sl
> 62..63 00 00 alignment padding
> 64..67 18 00 00 00 cache_allocs (garbage)
>
> Fix by zeroing the per-item buffer in both rds_for_each_conn_info()
> and rds_walk_conn_path_info() before invoking the visitor. This
> covers the IPv4/IPv6 IB visitors and hardens all current and future
> visitors against the same class of bug.
>
> No functional change for visitors that fully populate their output.
>
> Fixes: ec16227e1414 ("RDS/IB: Infiniband transport")
LGTM. Reviewed-by: Sharath Srinivasan <sharath.srinivasan@oracle.com>
Thanks,
Sharath
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-7
> ---
> net/rds/connection.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index 412441aaa298..c10b7ed06c49 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -701,6 +701,13 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
> i++, head++) {
> hlist_for_each_entry_rcu(conn, head, c_hash_node) {
>
> + /* Zero the per-item buffer before handing it to the
> + * visitor so any field the visitor does not write -
> + * including implicit alignment padding - cannot leak
> + * stack contents to user space via rds_info_copy().
> + */
> + memset(buffer, 0, item_len);
> +
> /* XXX no c_lock usage.. */
> if (!visitor(conn, buffer))
> continue;
> @@ -750,6 +757,13 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
> */
> cp = conn->c_path;
>
> + /* Zero the per-item buffer for the same reason as
> + * rds_for_each_conn_info(): any byte the visitor
> + * does not write (including alignment padding) must
> + * not leak stack contents via rds_info_copy().
> + */
> + memset(buffer, 0, item_len);
> +
> /* XXX no cp_lock usage.. */
> if (!visitor(cp, buffer))
> continue;
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox