* [PATCH net v2 1/1] net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF
@ 2026-06-24 22:40 Jamal Hadi Salim
2026-06-26 10:16 ` Jamal Hadi Salim
0 siblings, 1 reply; 4+ messages in thread
From: Jamal Hadi Salim @ 2026-06-24 22:40 UTC (permalink / raw)
To: netdev
Cc: davem, edumazet, kuba, pabeni, horms, jiri, victor, security,
zdi-disclosures, stable, Jamal Hadi Salim, kernel test robot
The teql master->slaves singly linked list is not protected against
multiple writes. It can be mod'ed concurently from teql_master_xmit(),
teql_dequeue(), teql_init() and teql_destroy() without holding any list
lock or RCU protection.
zdi-disclosures@trendmicro.com has demonstrated that the qdisc is freed
after an RCU grace period, but teql_master_xmit() running on another
CPU can still hold a stale pointer into the list, resulting in a
slab-use-after-free:
BUG: KASAN: slab-use-after-free in teql_destroy+0x3ca/0x440 linux/net/sched/sch_teql.c:142
Read of size 8 at addr ffff88802923aa80 by task ip/10024
The zdi-disclosures@trendmicro.com repro created concurrent AF_PACKET
senders on a teql device against a thread that repeatedly adds/deletes the
slave qdisc, together with a SLUB spray that reclaims the freed slot; the
resulting UAF is controllable enough to be turned into a read/write
primitive against the freed qdisc object.
The fix?
Add a per-master slaves_lock spinlock that serializes all mutations of
master->slaves and the NEXT_SLAVE() links in teql_destroy() and
teql_qdisc_init(). teql_master_xmit() also takes the same slaves_lock
around those updates.
Annotate master->slaves and the per-slave ->next pointer with __rcu and
use the appropriate RCU accessors everywhere they are touched:
rcu_assign_pointer() on the writer side (under slaves_lock),
rcu_dereference_protected() for the writer-side loads (also under
slaves_lock), rcu_dereference_bh() for the loads in teql_master_xmit() and
rtnl_dereference() for the loads in teql_master_open()/teql_master_mtu(),
which run under RTNL.
Pair this with rcu_read_lock_bh()/rcu_read_unlock_bh() around the list
traversal in teql_master_xmit(), so that readers either observe a fully
linked list or are deferred until the in-flight mutation completes. The two
early-return paths in teql_master_xmit() are updated to release the RCU-bh
read-side critical section before returning, since leaving it held would
disable BH on that CPU for good.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: zdi-disclosures@trendmicro.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606241501.XQBMu4b8-lkp@intel.com/
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
Changes in v2:
Address sparse issues found by kernel test robot <lkp@intel.com>
- rcu annotated struct teql_master->slaves and struct teql_sched_data->next
- teql_destroy/teql_qdisc_init: replace all READ/WRITE_ONCE() with rcu_dereference_protected()/rcu_assign_pointer()
---
net/sched/sch_teql.c | 119 ++++++++++++++++++++++++++++++++++-----------------
1 file changed, 80 insertions(+), 39 deletions(-)
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index e7bbc9e5174d..f4654f1b1200 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -52,7 +52,8 @@
struct teql_master {
struct Qdisc_ops qops;
struct net_device *dev;
- struct Qdisc *slaves;
+ struct Qdisc __rcu *slaves;
+ spinlock_t slaves_lock; /* serializes writes to ->slaves */
struct list_head master_list;
unsigned long tx_bytes;
unsigned long tx_packets;
@@ -61,7 +62,7 @@ struct teql_master {
};
struct teql_sched_data {
- struct Qdisc *next;
+ struct Qdisc __rcu *next;
struct teql_master *m;
struct sk_buff_head q;
};
@@ -101,7 +102,9 @@ teql_dequeue(struct Qdisc *sch)
if (skb == NULL) {
struct net_device *m = qdisc_dev(q);
if (m) {
- dat->m->slaves = sch;
+ spin_lock_bh(&dat->m->slaves_lock);
+ rcu_assign_pointer(dat->m->slaves, sch);
+ spin_unlock_bh(&dat->m->slaves_lock);
netif_wake_queue(m);
}
} else {
@@ -132,34 +135,49 @@ teql_destroy(struct Qdisc *sch)
struct Qdisc *q, *prev;
struct teql_sched_data *dat = qdisc_priv(sch);
struct teql_master *master = dat->m;
+ struct netdev_queue *txq = NULL;
+ bool reset_master_queue = false;
if (!master)
return;
- prev = master->slaves;
+ spin_lock_bh(&master->slaves_lock);
+ prev = rcu_dereference_protected(master->slaves,
+ lockdep_is_held(&master->slaves_lock));
if (prev) {
do {
- q = NEXT_SLAVE(prev);
- if (q == sch) {
- NEXT_SLAVE(prev) = NEXT_SLAVE(q);
- if (q == master->slaves) {
- master->slaves = NEXT_SLAVE(q);
- if (q == master->slaves) {
- struct netdev_queue *txq;
-
- txq = netdev_get_tx_queue(master->dev, 0);
- master->slaves = NULL;
-
- dev_reset_queue(master->dev,
- txq, NULL);
- }
- }
- skb_queue_purge(&dat->q);
- break;
+ struct Qdisc *head, *next;
+
+ q = rcu_dereference_protected(NEXT_SLAVE(prev),
+ lockdep_is_held(&master->slaves_lock));
+ if (q != sch) {
+ prev = q;
+ continue;
}
- } while ((prev = q) != master->slaves);
+ next = rcu_dereference_protected(NEXT_SLAVE(q),
+ lockdep_is_held(&master->slaves_lock));
+ rcu_assign_pointer(NEXT_SLAVE(prev), next);
+
+ head = rcu_dereference_protected(master->slaves,
+ lockdep_is_held(&master->slaves_lock));
+ if (q == head) {
+ rcu_assign_pointer(master->slaves, next);
+ if (q == next) {
+ txq = netdev_get_tx_queue(master->dev, 0);
+ rcu_assign_pointer(master->slaves, NULL);
+ reset_master_queue = true;
+ }
+ }
+ skb_queue_purge(&dat->q);
+ break;
+ } while (prev != rcu_dereference_protected(master->slaves,
+ lockdep_is_held(&master->slaves_lock)));
}
+ spin_unlock_bh(&master->slaves_lock);
+
+ if (reset_master_queue)
+ dev_reset_queue(master->dev, txq, NULL);
}
static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
@@ -168,6 +186,7 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
struct net_device *dev = qdisc_dev(sch);
struct teql_master *m = (struct teql_master *)sch->ops;
struct teql_sched_data *q = qdisc_priv(sch);
+ struct Qdisc *first;
if (dev->hard_header_len > m->dev->hard_header_len)
return -EINVAL;
@@ -184,7 +203,9 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
skb_queue_head_init(&q->q);
- if (m->slaves) {
+ spin_lock_bh(&m->slaves_lock);
+ first = rcu_dereference_protected(m->slaves, lockdep_is_held(&m->slaves_lock));
+ if (first) {
if (m->dev->flags & IFF_UP) {
if ((m->dev->flags & IFF_POINTOPOINT &&
!(dev->flags & IFF_POINTOPOINT)) ||
@@ -192,8 +213,10 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
!(dev->flags & IFF_BROADCAST)) ||
(m->dev->flags & IFF_MULTICAST &&
!(dev->flags & IFF_MULTICAST)) ||
- dev->mtu < m->dev->mtu)
+ dev->mtu < m->dev->mtu) {
+ spin_unlock_bh(&m->slaves_lock);
return -EINVAL;
+ }
} else {
if (!(dev->flags&IFF_POINTOPOINT))
m->dev->flags &= ~IFF_POINTOPOINT;
@@ -204,14 +227,17 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
if (dev->mtu < m->dev->mtu)
m->dev->mtu = dev->mtu;
}
- q->next = NEXT_SLAVE(m->slaves);
- NEXT_SLAVE(m->slaves) = sch;
+ rcu_assign_pointer(q->next,
+ rcu_dereference_protected(NEXT_SLAVE(first),
+ lockdep_is_held(&m->slaves_lock)));
+ rcu_assign_pointer(NEXT_SLAVE(first), sch);
} else {
- q->next = sch;
- m->slaves = sch;
+ rcu_assign_pointer(q->next, sch);
+ rcu_assign_pointer(m->slaves, sch);
m->dev->mtu = dev->mtu;
m->dev->flags = (m->dev->flags&~FMASK)|(dev->flags&FMASK);
}
+ spin_unlock_bh(&m->slaves_lock);
return 0;
}
@@ -285,7 +311,9 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
int subq = skb_get_queue_mapping(skb);
struct sk_buff *skb_res = NULL;
- start = master->slaves;
+ rcu_read_lock_bh();
+
+ start = rcu_dereference_bh(master->slaves);
restart:
nores = 0;
@@ -317,10 +345,14 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
netdev_start_xmit(skb, slave, slave_txq, false) ==
NETDEV_TX_OK) {
__netif_tx_unlock(slave_txq);
- master->slaves = NEXT_SLAVE(q);
+ spin_lock_bh(&master->slaves_lock);
+ rcu_assign_pointer(master->slaves,
+ rcu_dereference_bh(NEXT_SLAVE(q)));
+ spin_unlock_bh(&master->slaves_lock);
netif_wake_queue(dev);
master->tx_packets++;
master->tx_bytes += length;
+ rcu_read_unlock_bh();
return NETDEV_TX_OK;
}
__netif_tx_unlock(slave_txq);
@@ -329,14 +361,18 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
busy = 1;
break;
case 1:
- master->slaves = NEXT_SLAVE(q);
+ spin_lock_bh(&master->slaves_lock);
+ rcu_assign_pointer(master->slaves,
+ rcu_dereference_bh(NEXT_SLAVE(q)));
+ spin_unlock_bh(&master->slaves_lock);
+ rcu_read_unlock_bh();
return NETDEV_TX_OK;
default:
nores = 1;
break;
}
__skb_pull(skb, skb_network_offset(skb));
- } while ((q = NEXT_SLAVE(q)) != start);
+ } while ((q = rcu_dereference_bh(NEXT_SLAVE(q))) != start);
if (nores && skb_res == NULL) {
skb_res = skb;
@@ -345,29 +381,32 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
if (busy) {
netif_stop_queue(dev);
+ rcu_read_unlock_bh();
return NETDEV_TX_BUSY;
}
master->tx_errors++;
drop:
master->tx_dropped++;
+ rcu_read_unlock_bh();
dev_kfree_skb(skb);
return NETDEV_TX_OK;
}
static int teql_master_open(struct net_device *dev)
{
- struct Qdisc *q;
+ struct Qdisc *q, *first;
struct teql_master *m = netdev_priv(dev);
int mtu = 0xFFFE;
unsigned int flags = IFF_NOARP | IFF_MULTICAST;
- if (m->slaves == NULL)
+ first = rtnl_dereference(m->slaves);
+ if (!first)
return -EUNATCH;
flags = FMASK;
- q = m->slaves;
+ q = first;
do {
struct net_device *slave = qdisc_dev(q);
@@ -389,7 +428,7 @@ static int teql_master_open(struct net_device *dev)
flags &= ~IFF_BROADCAST;
if (!(slave->flags&IFF_MULTICAST))
flags &= ~IFF_MULTICAST;
- } while ((q = NEXT_SLAVE(q)) != m->slaves);
+ } while ((q = rtnl_dereference(NEXT_SLAVE(q))) != first);
m->dev->mtu = mtu;
m->dev->flags = (m->dev->flags&~FMASK) | flags;
@@ -417,14 +456,15 @@ static void teql_master_stats64(struct net_device *dev,
static int teql_master_mtu(struct net_device *dev, int new_mtu)
{
struct teql_master *m = netdev_priv(dev);
- struct Qdisc *q;
+ struct Qdisc *q, *first;
- q = m->slaves;
+ first = rtnl_dereference(m->slaves);
+ q = first;
if (q) {
do {
if (new_mtu > qdisc_dev(q)->mtu)
return -EINVAL;
- } while ((q = NEXT_SLAVE(q)) != m->slaves);
+ } while ((q = rtnl_dereference(NEXT_SLAVE(q))) != first);
}
WRITE_ONCE(dev->mtu, new_mtu);
@@ -444,6 +484,7 @@ static __init void teql_master_setup(struct net_device *dev)
struct teql_master *master = netdev_priv(dev);
struct Qdisc_ops *ops = &master->qops;
+ spin_lock_init(&master->slaves_lock);
master->dev = dev;
ops->priv_size = sizeof(struct teql_sched_data);
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net v2 1/1] net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF
2026-06-24 22:40 [PATCH net v2 1/1] net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF Jamal Hadi Salim
@ 2026-06-26 10:16 ` Jamal Hadi Salim
2026-06-26 14:15 ` Simon Horman
0 siblings, 1 reply; 4+ messages in thread
From: Jamal Hadi Salim @ 2026-06-26 10:16 UTC (permalink / raw)
To: netdev
Cc: davem, edumazet, kuba, pabeni, horms, jiri, victor, security,
zdi-disclosures, stable, kernel test robot
"
On Wed, Jun 24, 2026 at 6:40 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> The teql master->slaves singly linked list is not protected against
> multiple writes. It can be mod'ed concurently from teql_master_xmit(),
> teql_dequeue(), teql_init() and teql_destroy() without holding any list
> lock or RCU protection.
>
> zdi-disclosures@trendmicro.com has demonstrated that the qdisc is freed
> after an RCU grace period, but teql_master_xmit() running on another
> CPU can still hold a stale pointer into the list, resulting in a
> slab-use-after-free:
>
> BUG: KASAN: slab-use-after-free in teql_destroy+0x3ca/0x440 linux/net/sched/sch_teql.c:142
> Read of size 8 at addr ffff88802923aa80 by task ip/10024
>
> The zdi-disclosures@trendmicro.com repro created concurrent AF_PACKET
> senders on a teql device against a thread that repeatedly adds/deletes the
> slave qdisc, together with a SLUB spray that reclaims the freed slot; the
> resulting UAF is controllable enough to be turned into a read/write
> primitive against the freed qdisc object.
>
> The fix?
> Add a per-master slaves_lock spinlock that serializes all mutations of
> master->slaves and the NEXT_SLAVE() links in teql_destroy() and
> teql_qdisc_init(). teql_master_xmit() also takes the same slaves_lock
> around those updates.
> Annotate master->slaves and the per-slave ->next pointer with __rcu and
> use the appropriate RCU accessors everywhere they are touched:
> rcu_assign_pointer() on the writer side (under slaves_lock),
> rcu_dereference_protected() for the writer-side loads (also under
> slaves_lock), rcu_dereference_bh() for the loads in teql_master_xmit() and
> rtnl_dereference() for the loads in teql_master_open()/teql_master_mtu(),
> which run under RTNL.
> Pair this with rcu_read_lock_bh()/rcu_read_unlock_bh() around the list
> traversal in teql_master_xmit(), so that readers either observe a fully
> linked list or are deferred until the in-flight mutation completes. The two
> early-return paths in teql_master_xmit() are updated to release the RCU-bh
> read-side critical section before returning, since leaving it held would
> disable BH on that CPU for good.
>
sashiko-gemini's complaints:
https://sashiko.dev/#/patchset/20260624224016.24018-1-jhs%40mojatatu.com
seem bogus to me (someone correct me if i am wrong). I am only going
to address the first claim of "TOCTOU / "resurrection" race in
teql_master_xmit()"
teql_master_xmit() holds rcu_read_lock_bh() across the entire
traversal. teql_destroy() freeing can only proceed once the qdisc's
RCU grace period has elapsed - so where is this TOCTOU? Let's say this
were true: both calls hold the slaves_lock.
The other issues are of similar nature.
OTOH, sashiko-claude
(https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260624224016.24018-1-jhs%40mojatatu.com)
does make some valid claims which are low value, so not sure a resend
is worth it.
For example in claim 1 it says "Should the changelog mention this
teql_dequeue() site too?" Sure I can - but just because I provided
extra information in the commit log, which I could have omitted, now I
have to add more info? ;-> The second claim is "rcu_dereference_bh()
should be rcu_dereference_protected() on writer side". Sparse didnt
complain and i dont see this as breakage rather a consistency measure.
Unless I am missing something ..
cheers,
jamal
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v2 1/1] net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF
2026-06-26 10:16 ` Jamal Hadi Salim
@ 2026-06-26 14:15 ` Simon Horman
2026-06-26 16:11 ` Jamal Hadi Salim
0 siblings, 1 reply; 4+ messages in thread
From: Simon Horman @ 2026-06-26 14:15 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: netdev, davem, edumazet, kuba, pabeni, jiri, victor, security,
zdi-disclosures, stable, kernel test robot
On Fri, Jun 26, 2026 at 06:16:43AM -0400, Jamal Hadi Salim wrote:
> "
>
> On Wed, Jun 24, 2026 at 6:40 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > The teql master->slaves singly linked list is not protected against
> > multiple writes. It can be mod'ed concurently from teql_master_xmit(),
> > teql_dequeue(), teql_init() and teql_destroy() without holding any list
> > lock or RCU protection.
> >
> > zdi-disclosures@trendmicro.com has demonstrated that the qdisc is freed
> > after an RCU grace period, but teql_master_xmit() running on another
> > CPU can still hold a stale pointer into the list, resulting in a
> > slab-use-after-free:
> >
> > BUG: KASAN: slab-use-after-free in teql_destroy+0x3ca/0x440 linux/net/sched/sch_teql.c:142
> > Read of size 8 at addr ffff88802923aa80 by task ip/10024
> >
> > The zdi-disclosures@trendmicro.com repro created concurrent AF_PACKET
> > senders on a teql device against a thread that repeatedly adds/deletes the
> > slave qdisc, together with a SLUB spray that reclaims the freed slot; the
> > resulting UAF is controllable enough to be turned into a read/write
> > primitive against the freed qdisc object.
> >
> > The fix?
> > Add a per-master slaves_lock spinlock that serializes all mutations of
> > master->slaves and the NEXT_SLAVE() links in teql_destroy() and
> > teql_qdisc_init(). teql_master_xmit() also takes the same slaves_lock
> > around those updates.
> > Annotate master->slaves and the per-slave ->next pointer with __rcu and
> > use the appropriate RCU accessors everywhere they are touched:
> > rcu_assign_pointer() on the writer side (under slaves_lock),
> > rcu_dereference_protected() for the writer-side loads (also under
> > slaves_lock), rcu_dereference_bh() for the loads in teql_master_xmit() and
> > rtnl_dereference() for the loads in teql_master_open()/teql_master_mtu(),
> > which run under RTNL.
> > Pair this with rcu_read_lock_bh()/rcu_read_unlock_bh() around the list
> > traversal in teql_master_xmit(), so that readers either observe a fully
> > linked list or are deferred until the in-flight mutation completes. The two
> > early-return paths in teql_master_xmit() are updated to release the RCU-bh
> > read-side critical section before returning, since leaving it held would
> > disable BH on that CPU for good.
> >
>
> sashiko-gemini's complaints:
> https://sashiko.dev/#/patchset/20260624224016.24018-1-jhs%40mojatatu.com
> seem bogus to me (someone correct me if i am wrong). I am only going
> to address the first claim of "TOCTOU / "resurrection" race in
> teql_master_xmit()"
> teql_master_xmit() holds rcu_read_lock_bh() across the entire
> traversal. teql_destroy() freeing can only proceed once the qdisc's
> RCU grace period has elapsed - so where is this TOCTOU? Let's say this
> were true: both calls hold the slaves_lock.
> The other issues are of similar nature.
Hi Jamal,
I think the central question here is about the protection offered by RCU
in these code paths. And while I agree it protects the use of elements
of the list, I think the problem flagged relates to the management of
the list itself.
The example AI gave me when I asked is like this:
Assume a TEQL master has one slave, `q`.
The list is circular: `q->next == q`.
1. CPU A (Transmitting): Enters `teql_master_xmit()`.
It reads `master->sla ves` and gets a local pointer to `q`.
2. CPU B (Destroying): Calls `teql_destroy(q)`.
It takes the lock, unlinks `q`, and sets `master->slaves = NULL`.
The list is now logically empty.
3. CPU A: Finishes its work and prepares to rotate the list head
to the next slave.
It takes the lock.
4. CPU A (The "Use" / The Resurrection):
It executes: `rcu_assign_pointer(master->slaves, NEXT_SLAVE(q));`
Because `q` was circular, `NEXT_SLAVE(q)` is still `q`.
5. CPU A: Releases the lock.
**The global `master->slaves` is now `q` again.**
6. The System: The RCU grace period expires. CPU B finishes
`teql_destroy()` and the memory for `q` is freed.
The global `master->slaves` pointer is now a **dangling pointer**
pointing to freed memory.
> OTOH, sashiko-claude
> (https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260624224016.24018-1-jhs%40mojatatu.com)
> does make some valid claims which are low value, so not sure a resend
> is worth it.
> For example in claim 1 it says "Should the changelog mention this
> teql_dequeue() site too?" Sure I can - but just because I provided
> extra information in the commit log, which I could have omitted, now I
> have to add more info? ;->
FWIIW, I think there is a value in tightening up the commit message.
E.g. so it's accurate when we look at again in two years time.
But I also lean towards it not being necessary to post an update
only to address this.
> The second claim is "rcu_dereference_bh()
> should be rcu_dereference_protected() on writer side". Sparse didnt
> complain and i dont see this as breakage rather a consistency measure.
I think it would be good to address in the long run. But as per my comment
immediately above, I also lean towards it not being necessary to post an
update only to address this.
> Unless I am missing something ..
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v2 1/1] net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF
2026-06-26 14:15 ` Simon Horman
@ 2026-06-26 16:11 ` Jamal Hadi Salim
0 siblings, 0 replies; 4+ messages in thread
From: Jamal Hadi Salim @ 2026-06-26 16:11 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, davem, edumazet, kuba, pabeni, jiri, victor, security,
zdi-disclosures, stable, kernel test robot
Hi Simon,
On Fri, Jun 26, 2026 at 10:15 AM Simon Horman <horms@kernel.org> wrote:
>
> On Fri, Jun 26, 2026 at 06:16:43AM -0400, Jamal Hadi Salim wrote:
> > "
> >
> > On Wed, Jun 24, 2026 at 6:40 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > >
> > > The teql master->slaves singly linked list is not protected against
> > > multiple writes. It can be mod'ed concurently from teql_master_xmit(),
> > > teql_dequeue(), teql_init() and teql_destroy() without holding any list
> > > lock or RCU protection.
> > >
> > > zdi-disclosures@trendmicro.com has demonstrated that the qdisc is freed
> > > after an RCU grace period, but teql_master_xmit() running on another
> > > CPU can still hold a stale pointer into the list, resulting in a
> > > slab-use-after-free:
> > >
> > > BUG: KASAN: slab-use-after-free in teql_destroy+0x3ca/0x440 linux/net/sched/sch_teql.c:142
> > > Read of size 8 at addr ffff88802923aa80 by task ip/10024
> > >
> > > The zdi-disclosures@trendmicro.com repro created concurrent AF_PACKET
> > > senders on a teql device against a thread that repeatedly adds/deletes the
> > > slave qdisc, together with a SLUB spray that reclaims the freed slot; the
> > > resulting UAF is controllable enough to be turned into a read/write
> > > primitive against the freed qdisc object.
> > >
> > > The fix?
> > > Add a per-master slaves_lock spinlock that serializes all mutations of
> > > master->slaves and the NEXT_SLAVE() links in teql_destroy() and
> > > teql_qdisc_init(). teql_master_xmit() also takes the same slaves_lock
> > > around those updates.
> > > Annotate master->slaves and the per-slave ->next pointer with __rcu and
> > > use the appropriate RCU accessors everywhere they are touched:
> > > rcu_assign_pointer() on the writer side (under slaves_lock),
> > > rcu_dereference_protected() for the writer-side loads (also under
> > > slaves_lock), rcu_dereference_bh() for the loads in teql_master_xmit() and
> > > rtnl_dereference() for the loads in teql_master_open()/teql_master_mtu(),
> > > which run under RTNL.
> > > Pair this with rcu_read_lock_bh()/rcu_read_unlock_bh() around the list
> > > traversal in teql_master_xmit(), so that readers either observe a fully
> > > linked list or are deferred until the in-flight mutation completes. The two
> > > early-return paths in teql_master_xmit() are updated to release the RCU-bh
> > > read-side critical section before returning, since leaving it held would
> > > disable BH on that CPU for good.
> > >
> >
> > sashiko-gemini's complaints:
> > https://sashiko.dev/#/patchset/20260624224016.24018-1-jhs%40mojatatu.com
> > seem bogus to me (someone correct me if i am wrong). I am only going
> > to address the first claim of "TOCTOU / "resurrection" race in
> > teql_master_xmit()"
> > teql_master_xmit() holds rcu_read_lock_bh() across the entire
> > traversal. teql_destroy() freeing can only proceed once the qdisc's
> > RCU grace period has elapsed - so where is this TOCTOU? Let's say this
> > were true: both calls hold the slaves_lock.
> > The other issues are of similar nature.
>
> Hi Jamal,
>
> I think the central question here is about the protection offered by RCU
> in these code paths. And while I agree it protects the use of elements
> of the list, I think the problem flagged relates to the management of
> the list itself.
>
> The example AI gave me when I asked is like this:
>
> Assume a TEQL master has one slave, `q`.
> The list is circular: `q->next == q`.
>
> 1. CPU A (Transmitting): Enters `teql_master_xmit()`.
> It reads `master->sla ves` and gets a local pointer to `q`.
>
> 2. CPU B (Destroying): Calls `teql_destroy(q)`.
> It takes the lock, unlinks `q`, and sets `master->slaves = NULL`.
> The list is now logically empty.
>
> 3. CPU A: Finishes its work and prepares to rotate the list head
> to the next slave.
> It takes the lock.
>
> 4. CPU A (The "Use" / The Resurrection):
> It executes: `rcu_assign_pointer(master->slaves, NEXT_SLAVE(q));`
> Because `q` was circular, `NEXT_SLAVE(q)` is still `q`.
>
> 5. CPU A: Releases the lock.
> **The global `master->slaves` is now `q` again.**
>
> 6. The System: The RCU grace period expires. CPU B finishes
> `teql_destroy()` and the memory for `q` is freed.
>
> The global `master->slaves` pointer is now a **dangling pointer**
> pointing to freed memory.
>
Yeah, thats the same earlier claim of TOCTOU (what sashiko-gemini
claimed was "resurrecting the freed q")
My view is rcu read lock blocks the subsequent call_rcu free - and
destroy() and xmit() already serialize on slaves_lock.
I could be totaly wrong, but it's almost like sashiko-gemini thinks
that the list-mutation lock _alone_ governs the object lifetime.
The rcu read-side critical section prevents the UAF, not just the
slaves_lock alone
Only reason i added slaves_lock was to prevent corrupting the list
state (whereas the RCU read lock prevents premature free).
In step #4 above this thing somehow leaves out any mention of the rcu
read lock entirely and places the free in step 6 as if it was
independent of CPU A's critical section.
I am not sure how to improve it.
> > OTOH, sashiko-claude
> > (https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260624224016.24018-1-jhs%40mojatatu.com)
> > does make some valid claims which are low value, so not sure a resend
> > is worth it.
> > For example in claim 1 it says "Should the changelog mention this
> > teql_dequeue() site too?" Sure I can - but just because I provided
> > extra information in the commit log, which I could have omitted, now I
> > have to add more info? ;->
>
> FWIIW, I think there is a value in tightening up the commit message.
> E.g. so it's accurate when we look at again in two years time.
> But I also lean towards it not being necessary to post an update
> only to address this.
>
>
> > The second claim is "rcu_dereference_bh()
> > should be rcu_dereference_protected() on writer side". Sparse didnt
> > complain and i dont see this as breakage rather a consistency measure.
>
> I think it would be good to address in the long run. But as per my comment
> immediately above, I also lean towards it not being necessary to post an
> update only to address this.
I can resend with these two taken care of - but i am skeptical of what
sashiko-gemini is claiming (and i admit as a human the AI may see
something i am totally missing).
cheers,
jamal
>
> > Unless I am missing something ..
> >
> > cheers,
> > jamal
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-26 16:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 22:40 [PATCH net v2 1/1] net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF Jamal Hadi Salim
2026-06-26 10:16 ` Jamal Hadi Salim
2026-06-26 14:15 ` Simon Horman
2026-06-26 16:11 ` Jamal Hadi Salim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox