* Re: [PATCH v5.15-v6.1] netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR
From: Greg KH @ 2026-04-13 11:59 UTC (permalink / raw)
To: Keerthana K
Cc: stable, pablo, kadlec, fw, davem, edumazet, kuba, pabeni,
netfilter-devel, coreteam, netdev, linux-kernel, ajay.kaher,
alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
tapas.kundu, Stefano Brivio, Mukul Sikka, Brennan Lamoreaux
In-Reply-To: <20260413043247.3327855-1-keerthana.kalyanasundaram@broadcom.com>
On Mon, Apr 13, 2026 at 04:32:47AM +0000, Keerthana K wrote:
> From: Florian Westphal <fw@strlen.de>
>
> commit 07ace0bbe03b3d8e85869af1dec5e4087b1d57b8 upstream
>
> pipapo relies on kmalloc(0) returning ZERO_SIZE_PTR (i.e., not NULL
> but pointer is invalid).
>
> Rework this to not call slab allocator when we'd request a 0-byte
> allocation.
>
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Mukul Sikka <mukul.sikka@broadcom.com>
> Signed-off-by: Brennan Lamoreaux <brennan.lamoreaux@broadcom.com>
> [Keerthana: In older stable branches (v6.6 and earlier), the allocation logic in
> pipapo_clone() still relies on `src->rules` rather than `src->rules_alloc`
> (introduced in v6.9 via 9f439bd6ef4f). Consequently, the previously
> backported INT_MAX clamping check uses `src->rules`. This patch correctly
> moves that `src->rules > (INT_MAX / ...)` check inside the new
> `if (src->rules > 0)` block]
> Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com>
> ---
> net/netfilter/nft_set_pipapo.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
Does not apply to 5.15.y :(
^ permalink raw reply
* Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
From: fengchengwen @ 2026-04-13 12:04 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
In-Reply-To: <20260413100152.GG21470@unreal>
On 4/13/2026 6:01 PM, Leon Romanovsky wrote:
> On Fri, Apr 10, 2026 at 10:30:52PM +0800, fengchengwen wrote:
>> Hi all,
>>
>> I'm writing to propose adding a sysfs interface to expose and configure the
>> PCIe TPH
>> Steering Tag for PCIe devices, which is retrieved inside the kernel.
>>
>>
>> Background: The TPH Steering Tag is tightly coupled with both a PCIe device
>> (identified
>> by its BDF) and a CPU core. It can only be obtained in kernel mode. To allow
>> user-space
>> applications to fetch and set this value securely and conveniently, we need
>> a standard
>> kernel-to-user interface.
>>
>>
>> Proposed Solution: Add several sysfs attributes under each PCIe device's
>> sysfs directory:
>> 1. /sys/bus/pci/devices/<BDF>/tph_mode to query the TPH mode (interrupt or
>> device specific)
>> 2. /sys/bus/pci/devices/<BDF>/tph_enable to control the TPH feature
>> 3. /sys/bus/pci/devices/<BDF>/tph_st to support both read and write
>> operations, e.g.:
>> Read operation:
>> echo "cpu=3" > /sys/bus/pci/devices/0000:01:00.0/tph_st
>> cat /sys/bus/pci/devices/0000:01:00.0/tph_st
>> Write operation:
>> echo "index=10 st=123" > /sys/bus/pci/devices/0000:01:00.0/tph_st
>>
>>
>> The design strictly follows PCI subsystem sysfs standards and has the
>> following key properties:
>>
>> 1. Dynamic Visibility: The sysfs attributes will only be present for PCIe
>> devices that
>> support TPH Steering Tag. Devices without TPH capability will not show
>> these nodes,
>> avoiding unnecessary user confusion.
>>
>> 2. Permission Control: The attributes will use 0600 file permissions,
>> ensuring only
>> privileged root users can read or write them, which satisfies security
>> requirements
>> for hardware configuration interfaces.
>>
>> 3. Standard Implementation Location: The interface will be implemented in
>> drivers/pci/pci-sysfs.c, the canonical location for all PCI device sysfs
>> attributes,
>> ensuring consistency and maintainability within the PCI subsystem.
>>
>>
>> Why sysfs instead of alternatives like VFIO-PCI ioctl:
>>
>> - Universality: sysfs does not require binding the device to a special
>> driver such as
>> vfio-pci. It is available to any privileged user-space component,
>> including system
>> utilities, daemons, and monitoring tools.
>>
>> - Simplicity: Both user-space usage (cat/echo) and kernel implementation are
>> straightforward, reducing code complexity and long-term maintenance cost.
>>
>> - Design Alignment: TPH Steering Tag is a generic PCIe device feature, not
>> specific to
>> user-space drivers like DPDK or VFIO. Exposing it via sysfs matches the
>> kernel's
>> standard pattern for hardware capabilities.
>>
>>
>> I look forward to your comments about this design before submitting the
>> final patch.
>
> You need to explain more clearly why this write functionality is useful
> and necessary outside the VFIO/RDMA context:
> https://lore.kernel.org/all/20260324234615.3731237-1-zhipingz@meta.com/
>
> AFAIK, for non-VFIO TPH callers, kernel has enough knowledge to set
> right ST values.
>
> There are several comments regarding the implementation, but those can wait
> until the rationale behind the proposal is fully clarified.
Thanks for your review and comments.
Let me clarify the rationale behind this user-space sysfs interface:
1. VFIO is just one of the user-space device access frameworks.
There are many other in-kernel frameworks that expose devices
to user space, such as UIO, UACCE, etc., which may also require
TPH Steering Tag support.
2. The kernel can automatically program Steering Tags only when
the device provides a standard ST table in MSI-X or config space.
However, many devices implement vendor-specific or platform-specific
Steering Tag programming methods that cannot be fully handled
by the generic kernel code.
3. For such devices, user-space applications or framework drivers
need to retrieve and configure TPH Steering Tags directly.
A unified sysfs interface allows all user-space frameworks
(not just VFIO) to use a common, standard way to manage
TPH Steering Tags, rather than implementing duplicated logic
in each subsystem.
This interface provides a uniform method for any user-space
device access solution to work with TPH, which is why I believe
it is useful and necessary beyond the VFIO/RDMA case.
Thanks
>
> Thanks
>
>>
>> Best regards,
>> Chengwen Feng
>>
^ permalink raw reply
* Re: [PATCH v3 net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats()
From: Toke Høiland-Jørgensen @ 2026-04-13 12:07 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
In-Reply-To: <20260410182257.774311-14-edumazet@google.com>
Eric Dumazet <edumazet@google.com> writes:
> cake_dump_stats() and cake_dump_class_stats() run without qdisc
> spinlock being held.
>
> Add READ_ONCE()/WRITE_ONCE() annotations.
>
> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>
> ---
> net/sched/sch_cake.c | 404 ++++++++++++++++++++++++-------------------
> 1 file changed, 225 insertions(+), 179 deletions(-)
One of these diffstats is not like the others - thanks for tackling this :)
A few nits below:
> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> index 32e672820c00a88c6d8fe77a6308405e016525ea..f523f0aa4d830e9d3ec4d43bb123e1dc4f8f289d 100644
> --- a/net/sched/sch_cake.c
> +++ b/net/sched/sch_cake.c
> @@ -399,14 +399,14 @@ static void cake_configure_rates(struct Qdisc *sch, u64 rate, bool rate_adjust);
> * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
> */
>
> -static void cobalt_newton_step(struct cobalt_vars *vars)
> +static void cobalt_newton_step(struct cobalt_vars *vars, u32 count)
> {
> u32 invsqrt, invsqrt2;
> u64 val;
>
> invsqrt = vars->rec_inv_sqrt;
> invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
> - val = (3LL << 32) - ((u64)vars->count * invsqrt2);
> + val = (3LL << 32) - ((u64)count * invsqrt2);
>
> val >>= 2; /* avoid overflow in following multiply */
> val = (val * invsqrt) >> (32 - 2 + 1);
> @@ -414,12 +414,12 @@ static void cobalt_newton_step(struct cobalt_vars *vars)
> vars->rec_inv_sqrt = val;
> }
>
> -static void cobalt_invsqrt(struct cobalt_vars *vars)
> +static void cobalt_invsqrt(struct cobalt_vars *vars, u32 count)
> {
> - if (vars->count < REC_INV_SQRT_CACHE)
> - vars->rec_inv_sqrt = inv_sqrt_cache[vars->count];
> + if (count < REC_INV_SQRT_CACHE)
> + vars->rec_inv_sqrt = inv_sqrt_cache[count];
> else
> - cobalt_newton_step(vars);
> + cobalt_newton_step(vars, count);
> }
>
> static void cobalt_vars_init(struct cobalt_vars *vars)
> @@ -449,16 +449,19 @@ static bool cobalt_queue_full(struct cobalt_vars *vars,
> bool up = false;
>
> if (ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
> - up = !vars->p_drop;
> - vars->p_drop += p->p_inc;
> - if (vars->p_drop < p->p_inc)
> - vars->p_drop = ~0;
> - vars->blue_timer = now;
> - }
> - vars->dropping = true;
> - vars->drop_next = now;
> + u32 p_drop = vars->p_drop;
> +
> + up = !p_drop;
> + p_drop += p->p_inc;
> + if (p_drop < p->p_inc)
> + p_drop = ~0;
> + WRITE_ONCE(vars->p_drop, p_drop);
> + WRITE_ONCE(vars->blue_timer, now);
> + }
> + WRITE_ONCE(vars->dropping, true);
> + WRITE_ONCE(vars->drop_next, now);
> if (!vars->count)
> - vars->count = 1;
> + WRITE_ONCE(vars->count, 1);
>
> return up;
> }
> @@ -474,21 +477,25 @@ static bool cobalt_queue_empty(struct cobalt_vars *vars,
>
> if (vars->p_drop &&
> ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
> - if (vars->p_drop < p->p_dec)
> - vars->p_drop = 0;
> + u32 p_drop = vars->p_drop;
> +
> + if (p_drop < p->p_dec)
> + p_drop = 0;
> else
> - vars->p_drop -= p->p_dec;
> - vars->blue_timer = now;
> - down = !vars->p_drop;
> + p_drop -= p->p_dec;
> + WRITE_ONCE(vars->p_drop, p_drop);
> + WRITE_ONCE(vars->blue_timer, now);
> + down = !p_drop;
> }
> - vars->dropping = false;
> + WRITE_ONCE(vars->dropping, false);
>
> if (vars->count && ktime_to_ns(ktime_sub(now, vars->drop_next)) >= 0) {
> - vars->count--;
> - cobalt_invsqrt(vars);
> - vars->drop_next = cobalt_control(vars->drop_next,
> - p->interval,
> - vars->rec_inv_sqrt);
> + WRITE_ONCE(vars->count, vars->count - 1);
> + cobalt_invsqrt(vars, vars->count);
> + WRITE_ONCE(vars->drop_next,
> + cobalt_control(vars->drop_next,
> + p->interval,
> + vars->rec_inv_sqrt));
> }
>
> return down;
> @@ -507,6 +514,7 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
> bool next_due, over_target;
> ktime_t schedule;
> u64 sojourn;
> + u32 count;
>
> /* The 'schedule' variable records, in its sign, whether 'now' is before or
> * after 'drop_next'. This allows 'drop_next' to be updated before the next
> @@ -528,45 +536,50 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
> over_target = sojourn > p->target &&
> sojourn > p->mtu_time * bulk_flows * 2 &&
> sojourn > p->mtu_time * 4;
> - next_due = vars->count && ktime_to_ns(schedule) >= 0;
> + count = vars->count;
> + next_due = count && ktime_to_ns(schedule) >= 0;
>
> vars->ecn_marked = false;
>
> if (over_target) {
> if (!vars->dropping) {
> - vars->dropping = true;
> - vars->drop_next = cobalt_control(now,
> - p->interval,
> - vars->rec_inv_sqrt);
> + WRITE_ONCE(vars->dropping, true);
> + WRITE_ONCE(vars->drop_next,
> + cobalt_control(now,
> + p->interval,
> + vars->rec_inv_sqrt));
> }
> - if (!vars->count)
> - vars->count = 1;
> + if (!count)
> + count = 1;
> } else if (vars->dropping) {
> - vars->dropping = false;
> + WRITE_ONCE(vars->dropping, false);
> }
>
> if (next_due && vars->dropping) {
> /* Use ECN mark if possible, otherwise drop */
> - if (!(vars->ecn_marked = INET_ECN_set_ce(skb)))
> + vars->ecn_marked = INET_ECN_set_ce(skb);
> + if (!vars->ecn_marked)
> reason = QDISC_DROP_CONGESTED;
>
> - vars->count++;
> - if (!vars->count)
> - vars->count--;
> - cobalt_invsqrt(vars);
> - vars->drop_next = cobalt_control(vars->drop_next,
> - p->interval,
> - vars->rec_inv_sqrt);
> + count++;
> + if (!count)
> + count--;
> + cobalt_invsqrt(vars, count);
> + WRITE_ONCE(vars->drop_next,
> + cobalt_control(vars->drop_next,
> + p->interval,
> + vars->rec_inv_sqrt));
> schedule = ktime_sub(now, vars->drop_next);
> } else {
> while (next_due) {
> - vars->count--;
> - cobalt_invsqrt(vars);
> - vars->drop_next = cobalt_control(vars->drop_next,
> - p->interval,
> - vars->rec_inv_sqrt);
> + count--;
> + cobalt_invsqrt(vars, count);
> + WRITE_ONCE(vars->drop_next,
> + cobalt_control(vars->drop_next,
> + p->interval,
> + vars->rec_inv_sqrt));
> schedule = ktime_sub(now, vars->drop_next);
> - next_due = vars->count && ktime_to_ns(schedule) >= 0;
> + next_due = count && ktime_to_ns(schedule) >= 0;
> }
> }
>
> @@ -575,11 +588,12 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
> get_random_u32() < vars->p_drop)
> reason = QDISC_DROP_FLOOD_PROTECTION;
>
> + WRITE_ONCE(vars->count, count);
> /* Overload the drop_next field as an activity timeout */
> - if (!vars->count)
> - vars->drop_next = ktime_add_ns(now, p->interval);
> + if (count)
This seems to reverse the conditional?
> + WRITE_ONCE(vars->drop_next, ktime_add_ns(now, p->interval));
> else if (ktime_to_ns(schedule) > 0 && reason == QDISC_DROP_UNSPEC)
> - vars->drop_next = now;
> + WRITE_ONCE(vars->drop_next, now);
>
> return reason;
> }
> @@ -813,7 +827,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> i++, k = (k + 1) % CAKE_SET_WAYS) {
> if (q->tags[outer_hash + k] == flow_hash) {
> if (i)
> - q->way_hits++;
> + WRITE_ONCE(q->way_hits, q->way_hits + 1);
>
> if (!q->flows[outer_hash + k].set) {
> /* need to increment host refcnts */
> @@ -831,7 +845,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> for (i = 0; i < CAKE_SET_WAYS;
> i++, k = (k + 1) % CAKE_SET_WAYS) {
> if (!q->flows[outer_hash + k].set) {
> - q->way_misses++;
> + WRITE_ONCE(q->way_misses, q->way_misses + 1);
> allocate_src = cake_dsrc(flow_mode);
> allocate_dst = cake_ddst(flow_mode);
> goto found;
> @@ -841,7 +855,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> /* With no empty queues, default to the original
> * queue, accept the collision, update the host tags.
> */
> - q->way_collisions++;
> + WRITE_ONCE(q->way_collisions, q->way_collisions + 1);
> allocate_src = cake_dsrc(flow_mode);
> allocate_dst = cake_ddst(flow_mode);
>
> @@ -875,7 +889,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> q->flows[reduced_hash].srchost = srchost_idx;
>
> if (q->flows[reduced_hash].set == CAKE_SET_BULK)
> - cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
> + cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash],
> + flow_mode);
> }
>
> if (allocate_dst) {
> @@ -899,7 +914,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> q->flows[reduced_hash].dsthost = dsthost_idx;
>
> if (q->flows[reduced_hash].set == CAKE_SET_BULK)
> - cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
> + cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash],
> + flow_mode);
> }
> }
>
> @@ -1379,9 +1395,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
> len -= off;
>
> if (qd->max_netlen < len)
> - qd->max_netlen = len;
> + WRITE_ONCE(qd->max_netlen, len);
> if (qd->min_netlen > len)
> - qd->min_netlen = len;
> + WRITE_ONCE(qd->min_netlen, len);
>
> len += q->rate_overhead;
>
> @@ -1401,9 +1417,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
> }
>
> if (qd->max_adjlen < len)
> - qd->max_adjlen = len;
> + WRITE_ONCE(qd->max_adjlen, len);
> if (qd->min_adjlen > len)
> - qd->min_adjlen = len;
> + WRITE_ONCE(qd->min_adjlen, len);
>
> return len;
> }
> @@ -1416,7 +1432,7 @@ static u32 cake_overhead(struct cake_sched_data *q, const struct sk_buff *skb)
> u16 segs = qdisc_pkt_segs(skb);
> u32 len = qdisc_pkt_len(skb);
>
> - q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8);
> + WRITE_ONCE(q->avg_netoff, cake_ewma(q->avg_netoff, off << 16, 8));
>
> if (segs == 1)
> return cake_calc_overhead(q, len, off);
> @@ -1590,16 +1606,17 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
> }
>
> if (cobalt_queue_full(&flow->cvars, &b->cparams, now))
> - b->unresponsive_flow_count++;
> + WRITE_ONCE(b->unresponsive_flow_count,
> + b->unresponsive_flow_count + 1);
>
> len = qdisc_pkt_len(skb);
> q->buffer_used -= skb->truesize;
> - b->backlogs[idx] -= len;
> - b->tin_backlog -= len;
> + WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] - len);
> + WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
> qstats_backlog_sub(sch, len);
>
> - flow->dropped++;
> - b->tin_dropped++;
> + WRITE_ONCE(flow->dropped, flow->dropped + 1);
> + WRITE_ONCE(b->tin_dropped, b->tin_dropped + 1);
>
> if (q->config->rate_flags & CAKE_FLAG_INGRESS)
> cake_advance_shaper(q, b, skb, now, true);
> @@ -1795,7 +1812,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> }
>
> if (unlikely(len > b->max_skblen))
> - b->max_skblen = len;
> + WRITE_ONCE(b->max_skblen, len);
>
> if (qdisc_pkt_segs(skb) > 1 && q->config->rate_flags & CAKE_FLAG_SPLIT_GSO) {
> struct sk_buff *segs, *nskb;
> @@ -1819,13 +1836,13 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> numsegs++;
> slen += segs->len;
> q->buffer_used += segs->truesize;
> - b->packets++;
Right above this hunk we do sch->q.qlen++; - does that need changing as
well?
> }
>
> /* stats */
> - b->bytes += slen;
> - b->backlogs[idx] += slen;
> - b->tin_backlog += slen;
> + WRITE_ONCE(b->bytes, b->bytes + slen);
> + WRITE_ONCE(b->packets, b->packets + numsegs);
> + WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + slen);
> + WRITE_ONCE(b->tin_backlog, b->tin_backlog + slen);
> qstats_backlog_add(sch, slen);
> q->avg_window_bytes += slen;
>
> @@ -1843,10 +1860,10 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> ack = cake_ack_filter(q, flow);
>
> if (ack) {
> - b->ack_drops++;
> + WRITE_ONCE(b->ack_drops, b->ack_drops + 1);
> qdisc_qstats_drop(sch);
> ack_pkt_len = qdisc_pkt_len(ack);
> - b->bytes += ack_pkt_len;
> + WRITE_ONCE(b->bytes, b->bytes + ack_pkt_len);
> q->buffer_used += skb->truesize - ack->truesize;
> if (q->config->rate_flags & CAKE_FLAG_INGRESS)
> cake_advance_shaper(q, b, ack, now, true);
> @@ -1859,10 +1876,10 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> }
>
> /* stats */
> - b->packets++;
> - b->bytes += len - ack_pkt_len;
> - b->backlogs[idx] += len - ack_pkt_len;
> - b->tin_backlog += len - ack_pkt_len;
> + WRITE_ONCE(b->packets, b->packets + 1);
> + WRITE_ONCE(b->bytes, b->bytes + len - ack_pkt_len);
> + WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + len - ack_pkt_len);
> + WRITE_ONCE(b->tin_backlog, b->tin_backlog + len - ack_pkt_len);
> qstats_backlog_add(sch, len - ack_pkt_len);
> q->avg_window_bytes += len - ack_pkt_len;
> }
> @@ -1894,9 +1911,9 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> u64 b = q->avg_window_bytes * (u64)NSEC_PER_SEC;
>
> b = div64_u64(b, window_interval);
> - q->avg_peak_bandwidth =
> - cake_ewma(q->avg_peak_bandwidth, b,
> - b > q->avg_peak_bandwidth ? 2 : 8);
> + WRITE_ONCE(q->avg_peak_bandwidth,
> + cake_ewma(q->avg_peak_bandwidth, b,
> + b > q->avg_peak_bandwidth ? 2 : 8));
> q->avg_window_bytes = 0;
> q->avg_window_begin = now;
>
> @@ -1917,27 +1934,30 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> if (!flow->set) {
> list_add_tail(&flow->flowchain, &b->new_flows);
> } else {
> - b->decaying_flow_count--;
> + WRITE_ONCE(b->decaying_flow_count,
> + b->decaying_flow_count - 1);
> list_move_tail(&flow->flowchain, &b->new_flows);
> }
> flow->set = CAKE_SET_SPARSE;
> - b->sparse_flow_count++;
> + WRITE_ONCE(b->sparse_flow_count,
> + b->sparse_flow_count + 1);
>
> - flow->deficit = cake_get_flow_quantum(b, flow, q->config->flow_mode);
> + WRITE_ONCE(flow->deficit,
> + cake_get_flow_quantum(b, flow, q->config->flow_mode));
> } else if (flow->set == CAKE_SET_SPARSE_WAIT) {
> /* this flow was empty, accounted as a sparse flow, but actually
> * in the bulk rotation.
> */
> flow->set = CAKE_SET_BULK;
> - b->sparse_flow_count--;
> - b->bulk_flow_count++;
> + WRITE_ONCE(b->sparse_flow_count, b->sparse_flow_count - 1);
> + WRITE_ONCE(b->bulk_flow_count, b->bulk_flow_count + 1);
>
> cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
> cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
> }
>
> if (q->buffer_used > q->buffer_max_used)
> - q->buffer_max_used = q->buffer_used;
> + WRITE_ONCE(q->buffer_max_used, q->buffer_used);
>
> if (q->buffer_used <= q->buffer_limit)
> return NET_XMIT_SUCCESS;
> @@ -1976,8 +1996,8 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
> if (flow->head) {
> skb = dequeue_head(flow);
> len = qdisc_pkt_len(skb);
> - b->backlogs[q->cur_flow] -= len;
> - b->tin_backlog -= len;
> + WRITE_ONCE(b->backlogs[q->cur_flow], b->backlogs[q->cur_flow] - len);
> + WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
> qstats_backlog_sub(sch, len);
> q->buffer_used -= skb->truesize;
> qdisc_qlen_dec(sch);
> @@ -2042,7 +2062,7 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
>
> cake_configure_rates(sch, new_rate, true);
> q->last_checked_active = now;
> - q->active_queues = num_active_qs;
> + WRITE_ONCE(q->active_queues, num_active_qs);
> }
>
> begin:
> @@ -2149,8 +2169,10 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> */
> if (flow->set == CAKE_SET_SPARSE) {
> if (flow->head) {
> - b->sparse_flow_count--;
> - b->bulk_flow_count++;
> + WRITE_ONCE(b->sparse_flow_count,
> + b->sparse_flow_count - 1);
> + WRITE_ONCE(b->bulk_flow_count,
> + b->bulk_flow_count + 1);
>
> cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
> cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
> @@ -2165,7 +2187,8 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> }
> }
>
> - flow->deficit += cake_get_flow_quantum(b, flow, q->config->flow_mode);
> + WRITE_ONCE(flow->deficit,
> + flow->deficit + cake_get_flow_quantum(b, flow, q->config->flow_mode));
> list_move_tail(&flow->flowchain, &b->old_flows);
>
> goto retry;
> @@ -2177,7 +2200,8 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> if (!skb) {
> /* this queue was actually empty */
> if (cobalt_queue_empty(&flow->cvars, &b->cparams, now))
> - b->unresponsive_flow_count--;
> + WRITE_ONCE(b->unresponsive_flow_count,
> + b->unresponsive_flow_count - 1);
>
> if (flow->cvars.p_drop || flow->cvars.count ||
> ktime_before(now, flow->cvars.drop_next)) {
> @@ -2187,16 +2211,22 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> list_move_tail(&flow->flowchain,
> &b->decaying_flows);
> if (flow->set == CAKE_SET_BULK) {
> - b->bulk_flow_count--;
> + WRITE_ONCE(b->bulk_flow_count,
> + b->bulk_flow_count - 1);
>
> - cake_dec_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
> - cake_dec_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
> + cake_dec_srchost_bulk_flow_count(b, flow,
> + q->config->flow_mode);
> + cake_dec_dsthost_bulk_flow_count(b, flow,
> + q->config->flow_mode);
These seem like unnecessary whitespace changes?
>
> - b->decaying_flow_count++;
> + WRITE_ONCE(b->decaying_flow_count,
> + b->decaying_flow_count + 1);
> } else if (flow->set == CAKE_SET_SPARSE ||
> flow->set == CAKE_SET_SPARSE_WAIT) {
> - b->sparse_flow_count--;
> - b->decaying_flow_count++;
> + WRITE_ONCE(b->sparse_flow_count,
> + b->sparse_flow_count - 1);
> + WRITE_ONCE(b->decaying_flow_count,
> + b->decaying_flow_count + 1);
> }
> flow->set = CAKE_SET_DECAYING;
> } else {
> @@ -2204,14 +2234,20 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> list_del_init(&flow->flowchain);
> if (flow->set == CAKE_SET_SPARSE ||
> flow->set == CAKE_SET_SPARSE_WAIT)
> - b->sparse_flow_count--;
> + WRITE_ONCE(b->sparse_flow_count,
> + b->sparse_flow_count - 1);
> else if (flow->set == CAKE_SET_BULK) {
> - b->bulk_flow_count--;
> + WRITE_ONCE(b->bulk_flow_count,
> + b->bulk_flow_count - 1);
>
> - cake_dec_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
> - cake_dec_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
Same here?
-Toke
^ permalink raw reply
* Re: [PATCH v11 net-next 4/7] devlink: Implement devlink param multi attribute nested data values
From: Jiri Pirko @ 2026-04-13 12:08 UTC (permalink / raw)
To: Ratheesh Kannoth
Cc: netdev, linux-kernel, linux-rdma, sgoutham, andrew+netdev, davem,
edumazet, kuba, pabeni, donald.hunter, horms, chuck.lever,
matttbe, cjubran, saeedm, leon, tariqt, mbloch, dtatulea
In-Reply-To: <20260409025055.1664053-5-rkannoth@marvell.com>
Thu, Apr 09, 2026 at 04:50:52AM +0200, rkannoth@marvell.com wrote:
>From: Saeed Mahameed <saeedm@nvidia.com>
[...]
>diff --git a/net/devlink/param.c b/net/devlink/param.c
>index 4595fffbd825..8c9165797b32 100644
>--- a/net/devlink/param.c
>+++ b/net/devlink/param.c
>@@ -252,6 +252,14 @@ devlink_nl_param_value_put(struct sk_buff *msg, enum devlink_param_type type,
> return -EMSGSIZE;
> }
> break;
>+ case DEVLINK_PARAM_TYPE_U64_ARRAY:
>+ if (val->u64arr.size > __DEVLINK_PARAM_MAX_ARRAY_SIZE)
From UAPI perspective, what's the motivation for such limitation? I
don't think we need it. Whatever kernel/user fits into skb is okay, no?
>+ return -EMSGSIZE;
>+
>+ for (int i = 0; i < val->u64arr.size; i++)
>+ if (nla_put_uint(msg, nla_type, val->u64arr.val[i]))
>+ return -EMSGSIZE;
>+ break;
> }
> return 0;
> }
[...]
^ permalink raw reply
* Re: [PATCH net] net: usb: cdc_ncm: reject negative chained NDP offsets
From: Oliver Neukum @ 2026-04-13 12:11 UTC (permalink / raw)
To: Greg Kroah-Hartman, Oliver Neukum
Cc: linux-usb, netdev, linux-kernel, Oliver Neukum, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
stable
In-Reply-To: <2026041325-giggly-wrecking-e6ef@gregkh>
On 13.04.26 12:43, Greg Kroah-Hartman wrote:
> On Mon, Apr 13, 2026 at 10:36:19AM +0200, Oliver Neukum wrote:
>>
>>
>> On 11.04.26 12:53, Greg Kroah-Hartman wrote:
>>> cdc_ncm_rx_fixup() reads dwNextNdpIndex from each NDP32 to chain to the
>>> next one. The 32-bit value from the device is stored into the signed
>>> int ndpoffset so that means values with the high bit set become
>>
>> Well, then isn't the problem rather that you should not store an
>> unsigned value in a signed variable?
>
> No. well, yes. but no.
>
> cdc_ncm_rx_verify_nth16() returns an int, and is negative if something
> went wrong, so we need it that way, and then we need to check it, like
> we properly do at the top of the loop, it's just that at the bottom of
> the loop we also need to do the same exact thing.
Doesn't that suggest that cdc_ncm_rx_verify_nth16() is the problem?
To be precise, the way it indicates errors?
As this is an offset into a buffer and the header must be at the start
of the buffer, isn't 0 the natural indication of an error?
Regards
Oliver
^ permalink raw reply
* [PATCH net v2 1/1] af_unix: Reject SIOCATMARK on non-stream sockets
From: Ren Wei @ 2026-04-13 12:29 UTC (permalink / raw)
To: netdev
Cc: kuniyu, davem, edumazet, kuba, pabeni, horms, rao.shoaib,
yifanwucs, tomapufckgml, yuantan098, bird, enjou1224z,
wangjiexun2025, n05ec
From: Jiexun Wang <wangjiexun2025@gmail.com>
SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.
In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.
Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Ren Wei <enjou1224z@gmail.com>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
Changes in v2:
- Rework the fix based on maintainer feedback.
- Drop the receive-queue locking approach and reject SIOCATMARK on
non-stream sockets instead, since it is only meaningful for MSG_OOB.
- V1 link: https://lore.kernel.org/netdev/f6cbbc8da90e95584847b5ceb60aae830d1631c2.1775731983.git.wangjiexun2025@gmail.com/
net/unix/af_unix.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b23c33df8b46..09d43b4813b1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3300,6 +3300,9 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
struct sk_buff *skb;
int answ = 0;
+ if (sk->sk_type != SOCK_STREAM)
+ return -EOPNOTSUPP;
+
mutex_lock(&u->iolock);
skb = skb_peek(&sk->sk_receive_queue);
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net 1/2] netfilter: skip recording stale or retransmitted INIT
From: Marcelo Ricardo Leitner @ 2026-04-13 12:35 UTC (permalink / raw)
To: Xin Long
Cc: network dev, linux-sctp, davem, kuba, Eric Dumazet, Paolo Abeni,
Simon Horman, Florian Westphal, Yi Chen
In-Reply-To: <6e09f9a8d1f13f3ce691c696d3dd7b2a2e6c6184.1775847557.git.lucien.xin@gmail.com>
On Fri, Apr 10, 2026 at 02:59:16PM -0400, Xin Long wrote:
> An INIT whose init_tag matches the peer's vtag does not provide new state
> information. It indicates either:
>
> - a stale INIT (after INIT-ACK has already been seen on the same side), or
> - a retransmitted INIT (after INIT has already been recorded on the same
> side).
>
> In both cases, the INIT must not update ct->proto.sctp.init[] state, since
> it does not advance the handshake tracking and may otherwise corrupt
> INIT/INIT-ACK validation logic.
>
> Allow INIT processing only when the conntrack entry is newly created
> (SCTP_CONNTRACK_NONE), or when the init_tag differs from the stored peer
> vtag.
>
> Note it skips the check for the ct with old_state SCTP_CONNTRACK_NONE in
> nf_conntrack_sctp_packet(), as it is just created in sctp_new() where it
> set ct->proto.sctp.vtag[IP_CT_DIR_REPLY] = ih->init_tag.
>
> Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
^ permalink raw reply
* Re: [PATCH net 2/2] sctp: discard stale INIT after handshake completion
From: Marcelo Ricardo Leitner @ 2026-04-13 12:36 UTC (permalink / raw)
To: Xin Long
Cc: network dev, linux-sctp, davem, kuba, Eric Dumazet, Paolo Abeni,
Simon Horman, Florian Westphal, Yi Chen
In-Reply-To: <bea8a0dfcc56b9980cb914b54cffa9dd9948ba75.1775847557.git.lucien.xin@gmail.com>
On Fri, Apr 10, 2026 at 02:59:17PM -0400, Xin Long wrote:
> After an association reaches ESTABLISHED, the peer’s init_tag is already
> known from the handshake. Any subsequent INIT with the same init_tag is
> not a valid restart, but a delayed or duplicate INIT.
>
> Drop such INIT chunks in sctp_sf_do_unexpected_init() instead of
> processing them as new association attempts.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
^ permalink raw reply
* Re: [PATCH net] sctp: fix missing encap_port propagation for GSO fragments
From: Marcelo Ricardo Leitner @ 2026-04-13 12:37 UTC (permalink / raw)
To: Xin Long
Cc: network dev, linux-sctp, davem, kuba, Eric Dumazet, Paolo Abeni,
Simon Horman
In-Reply-To: <ea65ed61b3598d8b4940f0170b9aa1762307e6c3.1776017631.git.lucien.xin@gmail.com>
On Sun, Apr 12, 2026 at 02:13:51PM -0400, Xin Long wrote:
> encap_port in SCTP_INPUT_CB(skb) is used by sctp_vtag_verify() for
> SCTP-over-UDP processing. In the GSO case, it is only set on the head
> skb, while fragment skbs leave it 0.
>
> This results in fragment skbs seeing encap_port == 0, breaking
> SCTP-over-UDP connections.
>
> Fix it by propagating encap_port from the head skb cb when initializing
> fragment skbs in sctp_inq_pop().
>
> Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
^ permalink raw reply
* Re: [PATCH net] sctp: disable BH before calling udp_tunnel_xmit_skb()
From: Marcelo Ricardo Leitner @ 2026-04-13 12:39 UTC (permalink / raw)
To: Xin Long
Cc: network dev, linux-sctp, davem, kuba, Eric Dumazet, Paolo Abeni,
Simon Horman, Weiming Shi
In-Reply-To: <c874a8548221dcd56ff03c65ba75a74e6cf99119.1776017727.git.lucien.xin@gmail.com>
On Sun, Apr 12, 2026 at 02:15:27PM -0400, Xin Long wrote:
> udp_tunnel_xmit_skb() / udp_tunnel6_xmit_skb() are expected to run with
> BH disabled. After commit 6f1a9140ecda ("add xmit recursion limit to
> tunnel xmit functions"), on the path:
>
> udp(6)_tunnel_xmit_skb() -> ip(6)tunnel_xmit()
>
> dev_xmit_recursion_inc()/dec() must stay balanced on the same CPU.
>
> Without local_bh_disable(), the context may move between CPUs, which can
> break the inc/dec pairing. This may lead to incorrect recursion level
> detection and cause packets to be dropped in ip(6)_tunnel_xmit() or
> __dev_queue_xmit().
>
> Fix it by disabling BH around both IPv4 and IPv6 SCTP UDP xmit paths.
>
> In my testing, after enabling the SCTP over UDP:
>
> # ip net exec ha sysctl -w net.sctp.udp_port=9899
> # ip net exec ha sysctl -w net.sctp.encap_port=9899
> # ip net exec hb sysctl -w net.sctp.udp_port=9899
> # ip net exec hb sysctl -w net.sctp.encap_port=9899
>
> # ip net exec ha iperf3 -s
>
> - without this patch:
>
> # ip net exec hb iperf3 -c 192.168.0.1 --sctp
> [ 5] 0.00-10.00 sec 37.2 MBytes 31.2 Mbits/sec sender
> [ 5] 0.00-10.00 sec 37.1 MBytes 31.1 Mbits/sec receiver
>
> - with this patch:
>
> # ip net exec hb iperf3 -c 192.168.0.1 --sctp
> [ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec sender
> [ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec receiver
>
> Fixes: 6f1a9140ecda ("add xmit recursion limit to tunnel xmit functions")
> Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Nice catch!
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
^ permalink raw reply
* Re: [PATCH net] net: usb: cdc_ncm: reject negative chained NDP offsets
From: Greg Kroah-Hartman @ 2026-04-13 12:24 UTC (permalink / raw)
To: Oliver Neukum
Cc: linux-usb, netdev, linux-kernel, Oliver Neukum, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
stable
In-Reply-To: <198c1240-80a6-456c-8b12-25158c90c965@suse.com>
On Mon, Apr 13, 2026 at 02:11:50PM +0200, Oliver Neukum wrote:
> On 13.04.26 12:43, Greg Kroah-Hartman wrote:
> > On Mon, Apr 13, 2026 at 10:36:19AM +0200, Oliver Neukum wrote:
> > >
> > >
> > > On 11.04.26 12:53, Greg Kroah-Hartman wrote:
> > > > cdc_ncm_rx_fixup() reads dwNextNdpIndex from each NDP32 to chain to the
> > > > next one. The 32-bit value from the device is stored into the signed
> > > > int ndpoffset so that means values with the high bit set become
> > >
> > > Well, then isn't the problem rather that you should not store an
> > > unsigned value in a signed variable?
> >
> > No. well, yes. but no.
> >
> > cdc_ncm_rx_verify_nth16() returns an int, and is negative if something
> > went wrong, so we need it that way, and then we need to check it, like
> > we properly do at the top of the loop, it's just that at the bottom of
> > the loop we also need to do the same exact thing.
>
> Doesn't that suggest that cdc_ncm_rx_verify_nth16() is the problem?
> To be precise, the way it indicates errors?
> As this is an offset into a buffer and the header must be at the start
> of the buffer, isn't 0 the natural indication of an error?
Maybe? I really don't know, sorry, parsing the cdc_ncm buffer is not
something I looked too deeply into :)
greg k-h
^ permalink raw reply
* Re: [PATCH v11 net-next 5/7] octeontx2-af: npc: cn20k: add subbank search order control
From: Paolo Abeni @ 2026-04-13 12:56 UTC (permalink / raw)
To: Ratheesh Kannoth, netdev, linux-kernel, linux-rdma
Cc: sgoutham, andrew+netdev, davem, edumazet, kuba, donald.hunter,
horms, jiri, chuck.lever, matttbe, cjubran, saeedm, leon, tariqt,
mbloch, dtatulea
In-Reply-To: <20260409025055.1664053-6-rkannoth@marvell.com>
On 4/9/26 4:50 AM, Ratheesh Kannoth wrote:
> CN20K NPC MCAM is split into 32 subbanks that are searched in a
> predefined order during allocation. Lower-numbered subbanks have
> higher priority than higher-numbered ones.
>
> Add a runtime devlink parameter "srch_order" (
> DEVLINK_PARAM_TYPE_U32_ARRAY) to control the order in which
> subbanks are searched during MCAM allocation.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
> ---
> .../ethernet/marvell/octeontx2/af/cn20k/npc.c | 91 +++++++++++++++++-
> .../ethernet/marvell/octeontx2/af/cn20k/npc.h | 2 +
> .../marvell/octeontx2/af/rvu_devlink.c | 92 +++++++++++++++++--
> 3 files changed, 173 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
> index e854b85ced9e..153765b3e504 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
> @@ -3317,7 +3317,7 @@ rvu_mbox_handler_npc_cn20k_get_kex_cfg(struct rvu *rvu,
> return 0;
> }
>
> -static int *subbank_srch_order;
> +static u32 *subbank_srch_order;
>
> static void npc_populate_restricted_idxs(int num_subbanks)
> {
> @@ -3329,7 +3329,7 @@ static int npc_create_srch_order(int cnt)
> {
> int val = 0;
>
> - subbank_srch_order = kcalloc(cnt, sizeof(int),
> + subbank_srch_order = kcalloc(cnt, sizeof(u32),
> GFP_KERNEL);
> if (!subbank_srch_order)
> return -ENOMEM;
> @@ -3809,6 +3809,93 @@ static void npc_unlock_all_subbank(void)
> mutex_unlock(&npc_priv.sb[i].lock);
> }
>
> +int npc_cn20k_search_order_set(struct rvu *rvu,
> + u64 arr[MAX_NUM_SUB_BANKS], int cnt)
> +{
> + struct npc_mcam *mcam = &rvu->hw->mcam;
> + u32 fslots[MAX_NUM_SUB_BANKS][2];
> + u32 uslots[MAX_NUM_SUB_BANKS][2];
> + int fcnt = 0, ucnt = 0;
> + struct npc_subbank *sb;
> + int idx, val, rc = 0;
> +
> + unsigned long index;
> + void *v;
> +
> + if (cnt != npc_priv.num_subbanks) {
> + dev_err(rvu->dev, "Number of entries(%u) != %u\n",
> + cnt, npc_priv.num_subbanks);
> + return -EINVAL;
> + }
> +
> + mutex_lock(&mcam->lock);
> + npc_lock_all_subbank();
> + restrict_valid = false;
> +
> + for (int i = 0; i < cnt; i++)
> + subbank_srch_order[i] = (u32)arr[i];
> +
> + xa_for_each(&npc_priv.xa_sb_used, index, v) {
> + val = xa_to_value(v);
> + uslots[ucnt][0] = index;
> + uslots[ucnt][1] = val;
> + xa_erase(&npc_priv.xa_sb_used, index);
> + ucnt++;
> + }
> +
> + xa_for_each(&npc_priv.xa_sb_free, index, v) {
> + val = xa_to_value(v);
> + fslots[fcnt][0] = index;
> + fslots[fcnt][1] = val;
> + xa_erase(&npc_priv.xa_sb_free, index);
> + fcnt++;
> + }
> +
> + /* xa_store() is done under lock. If xa_store fails
> + * ,no rollback is planned as it might also fail.
Why do you need to go throuh erase and add loop? Why can't you directly
xa_store() the new value? Note that xa_store() can fail due to memory
pressure.
Avoiding the previous erase will prevent deallocation and re allocation
and will avoid any reasonable xa_store() failure.
AFAICS there are a few more items reported by sashiko, please have a look:
https://sashiko.dev/#/patchset/20260409025055.1664053-1-rkannoth%40marvell.com
/P
^ permalink raw reply
* Re: [net,PATCH v2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Sebastian Andrzej Siewior @ 2026-04-13 12:57 UTC (permalink / raw)
To: Jakub Kicinski, Marek Vasut
Cc: netdev, stable, David S. Miller, Andrew Lunn, Eric Dumazet,
Nicolai Buchwitz, Paolo Abeni, Ronald Wahl, Yicong Hui,
linux-kernel, Thomas Gleixner
In-Reply-To: <20260412105125.48f0c58f@kernel.org>
On 2026-04-12 10:51:25 [-0700], Jakub Kicinski wrote:
> > Does the backtrace make the problem clearer, with the annotation above ?
>
> Sebastian, do you have any recommendation here? tl;dr is that the driver does
…
What about this:
--- a/drivers/net/ethernet/micrel/ks8851_par.c
+++ b/drivers/net/ethernet/micrel/ks8851_par.c
@@ -63,7 +63,7 @@ static void ks8851_lock_par(struct ks8851_net *ks, unsigned long *flags)
{
struct ks8851_net_par *ksp = to_ks8851_par(ks);
- spin_lock_irqsave(&ksp->lock, *flags);
+ spin_lock_bh(&ksp->lock);
}
/**
@@ -77,7 +77,7 @@ static void ks8851_unlock_par(struct ks8851_net *ks, unsigned long *flags)
{
struct ks8851_net_par *ksp = to_ks8851_par(ks);
- spin_unlock_irqrestore(&ksp->lock, *flags);
+ spin_unlock_bh(&ksp->lock);
}
/**
I don't see why it needs to disable interrupts. This seems to be used by
the _par driver and the _common part. The comments refer to DMA but I
see only FIFO access.
And while at it, I would recommend to
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 8048770958d60..f1c662887646c 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -378,9 +378,12 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
if (status & IRQ_LCI)
mii_check_link(&ks->mii);
- if (status & IRQ_RXI)
+ if (status & IRQ_RXI) {
+ local_bh_disable();
while ((skb = __skb_dequeue(&rxq)))
netif_rx(skb);
+ local_bh_enable();
+ }
return IRQ_HANDLED;
}
Because otherwise it will kick-off backlog NAPI after every packet if
multiple packets are available.
Sebastian
^ permalink raw reply related
* Re: [PATCH net-next 0/3] Follow-ups to nk_qlease net selftests
From: Daniel Borkmann @ 2026-04-13 13:02 UTC (permalink / raw)
To: kuba; +Cc: pabeni, dw, razor, netdev
In-Reply-To: <20260413114011.588162-1-daniel@iogearbox.net>
On 4/13/26 1:40 PM, Daniel Borkmann wrote:
> This is a set of follow-ups addressing [0]:
>
> - Split netdevsim tests from HW tests in nk_qlease and move the SW
> tests under selftests/net/
> - Remove multiple ksft_run()s to fix the recently enforced hard-fail
> - Move all the setup inside the test cases for the ones under
> selftests/net/ (I'll defer the HW ones to David)
> - Add more test coverage related to queue leasing behavior and corner
> cases, so now we have 45 tests in nk_qlease.py with netdevsim
> which does not need special HW
>
> [0] https://lore.kernel.org/netdev/20260409181950.7e099b6c@kernel.org
Few comments on the sashiko and ruff review [1,2]:
- re: "the socket would stay open until the cyclic garbage collector runs"
imho that is fine since this would mean there's an error somewhere and
test does not run as expected / would fail, and socket is still being
closed eventually
- re "del test_ns" with sleep to wait for cleanup_net.. was done similarly
as in already merged patches, I can think of a different/better way with
a wait loop where applicable to remove any potential for flakiness
- The other things flagged by Gemini also make sense
- Missed the ruff one in "[E741] Ambiguous variable name: `l`" will fix
I'm planning to address these in a v2 of the series, but as per netdev rule
will wait 24h before resend unless you'd like me to explicitly resend earlier
(given merge win timing).
Thanks,
Daniel
[1] https://sashiko.dev/#/patchset/20260413114011.588162-1-daniel%40iogearbox.net
[2] https://patchwork.kernel.org/project/netdevbpf/list/?series=1080682
^ permalink raw reply
* Re: [patch 14/38] slub: Use prandom instead of get_cycles()
From: hu.shengming @ 2026-04-13 13:02 UTC (permalink / raw)
To: harry
Cc: tglx, linux-kernel, vbabka, linux-mm, arnd, x86, baolu.lu, iommu,
m.grzeschik, netdev, linux-wireless, herbert, linux-crypto, dwmw2,
bernie, linux-fbdev, tytso, linux-ext4, akpm, urezki, elver,
dvyukov, kasan-dev, ryabinin.a.a, t.sailer, linux-hams, Jason,
richard.henderson, linux-alpha, linux, linux-arm-kernel,
catalin.marinas, chenhuacai, loongarch, geert, linux-m68k,
dinguyen, jonas, linux-openrisc, deller, linux-parisc, mpe,
linuxppc-dev, pjw, linux-riscv, hca, linux-s390, davem,
sparclinux, hao.li, cl, rientjes, roman.gushchin
In-Reply-To: <adyyNeVTkXQlnh_2@hyeyoo>
Harry wrote:
> [Resending after fixing broken email headers]
>
> On Fri, Apr 10, 2026 at 02:19:37PM +0200, Thomas Gleixner wrote:
> > The decision whether to scan remote nodes is based on a 'random' number
> > retrieved via get_cycles(). get_cycles() is about to be removed.
> >
> > There is already prandom state in the code, so use that instead.
> >
> > Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> > Cc: Vlastimil Babka <vbabka@kernel.org>
> > Cc: linux-mm@kvack.org
> > ---
>
> Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
>
> Is this for this merge window?
>
> This may conflict with upcoming changes on freelist shuffling [1]
> (not queued for slab/for-next yet though), but it should be easy to
> resolve.
>
Hi Harry,
Would you like me to wait for this patch to land linux-next and then
rebase and send v6 on top?
Thanks,
--
With Best Regards,
Shengming
> [Cc'ing Shengming and SLAB ALLOCATOR folks]
> [1] https://lore.kernel.org/linux-mm/20260409204352095kKWVYKtZImN59ybO6iRNj@zte.com.cn
>
> --
> Cheers,
> Harry / Hyeonggon
>
> > mm/slub.c | 37 +++++++++++++++++++++++--------------
> > 1 file changed, 23 insertions(+), 14 deletions(-)
> >
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3302,6 +3302,25 @@ static inline struct slab *alloc_slab_pa
> > return slab;
> > }
> >
> > +#if defined(CONFIG_SLAB_FREELIST_RANDOM) || defined(CONFIG_NUMA)
> > +static DEFINE_PER_CPU(struct rnd_state, slab_rnd_state);
> > +
> > +static unsigned int slab_get_prandom_state(unsigned int limit)
> > +{
> > + struct rnd_state *state;
> > + unsigned int res;
> > +
> > + /*
> > + * An interrupt or NMI handler might interrupt and change
> > + * the state in the middle, but that's safe.
> > + */
> > + state = &get_cpu_var(slab_rnd_state);
> > + res = prandom_u32_state(state) % limit;
> > + put_cpu_var(slab_rnd_state);
> > + return res;
> > +}
> > +#endif
> > +
> > #ifdef CONFIG_SLAB_FREELIST_RANDOM
> > /* Pre-initialize the random sequence cache */
> > static int init_cache_random_seq(struct kmem_cache *s)
> > @@ -3365,8 +3384,6 @@ static void *next_freelist_entry(struct
> > return (char *)start + idx;
> > }
> >
> > -static DEFINE_PER_CPU(struct rnd_state, slab_rnd_state);
> > -
> > /* Shuffle the single linked freelist based on a random pre-computed sequence */
> > static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
> > bool allow_spin)
> > @@ -3383,15 +3400,7 @@ static bool shuffle_freelist(struct kmem
> > if (allow_spin) {
> > pos = get_random_u32_below(freelist_count);
> > } else {
> > - struct rnd_state *state;
> > -
> > - /*
> > - * An interrupt or NMI handler might interrupt and change
> > - * the state in the middle, but that's safe.
> > - */
> > - state = &get_cpu_var(slab_rnd_state);
> > - pos = prandom_u32_state(state) % freelist_count;
> > - put_cpu_var(slab_rnd_state);
> > + pos = slab_get_prandom_state(freelist_count);
> > }
> >
> > page_limit = slab->objects * s->size;
> > @@ -3882,7 +3891,7 @@ static void *get_from_any_partial(struct
> > * with available objects.
> > */
> > if (!s->remote_node_defrag_ratio ||
> > - get_cycles() % 1024 > s->remote_node_defrag_ratio)
> > + slab_get_prandom_state(1024) > s->remote_node_defrag_ratio)
> > return NULL;
> >
> > do {
> > @@ -7102,7 +7111,7 @@ static unsigned int
> >
> > /* see get_from_any_partial() for the defrag ratio description */
> > if (!s->remote_node_defrag_ratio ||
> > - get_cycles() % 1024 > s->remote_node_defrag_ratio)
> > + slab_get_prandom_state(1024) > s->remote_node_defrag_ratio)
> > return 0;
> >
> > do {
> > @@ -8421,7 +8430,7 @@ void __init kmem_cache_init_late(void)
> > flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM | WQ_PERCPU,
> > 0);
> > WARN_ON(!flushwq);
> > -#ifdef CONFIG_SLAB_FREELIST_RANDOM
> > +#if defined(CONFIG_SLAB_FREELIST_RANDOM) || defined(CONFIG_NUMA)
> > prandom_init_once(&slab_rnd_state);
> > #endif
> > }
> >
> >
^ permalink raw reply
* Re: [PATCH net-next v7 04/10] selftests: net: Add tests for failover of team-aggregated ports
From: Paolo Abeni @ 2026-04-13 13:05 UTC (permalink / raw)
To: Marc Harvey, Jiri Pirko, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Shuah Khan, Simon Horman
Cc: netdev, linux-kernel, linux-kselftest, Kuniyuki Iwashima
In-Reply-To: <20260409-teaming-driver-internal-v7-4-f47e7589685d@google.com>
On 4/9/26 4:59 AM, Marc Harvey wrote:
> There are currently no kernel tests that verify the effect of setting
> the enabled team driver option. In a followup patch, there will be
> changes to this option, so it will be important to make sure it still
> behaves as it does now.
>
> The test verifies that tcp continues to work across two different team
> devices in separate network namespaces, even when member links are
> manually disabled.
>
> Signed-off-by: Marc Harvey <marcharvey@google.com>
> ---
> Changes in v6:
> - Use a tcp port with no associated service.
> - Make tcpdump helper function not string-replace port numbers with
> associated service names, even on Fedora, which has a tcpdump patch
> that changes the required flag.
> - Link to v5: https://lore.kernel.org/netdev/20260406-teaming-driver-internal-v5-4-e8a3f348a1c5@google.com/
>
> Changes in v5:
> - Use tcpdump for collecting traffic, rather than reading rx counters.
> - Link to v4: https://lore.kernel.org/netdev/20260403-teaming-driver-internal-v4-4-d3032f33ca25@google.com/
>
> Changes in v2:
> - Fix shellcheck failures.
> - Remove dependency on net forwarding lib and pipe viewer tools.
> - Use iperf3 for tcp instead of netcat.
> - Link to v1: https://lore.kernel.org/all/20260331053353.2504254-5-marcharvey@google.com/
> ---
> tools/testing/selftests/drivers/net/team/Makefile | 2 +
> tools/testing/selftests/drivers/net/team/config | 4 +
> .../testing/selftests/drivers/net/team/team_lib.sh | 148 +++++++++++++++++++
> .../drivers/net/team/transmit_failover.sh | 158 +++++++++++++++++++++
> tools/testing/selftests/net/forwarding/lib.sh | 9 +-
> 5 files changed, 319 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile
> index 02d6f51d5a06..777da2e0429e 100644
> --- a/tools/testing/selftests/drivers/net/team/Makefile
> +++ b/tools/testing/selftests/drivers/net/team/Makefile
> @@ -7,9 +7,11 @@ TEST_PROGS := \
> options.sh \
> propagation.sh \
> refleak.sh \
> + transmit_failover.sh \
> # end of TEST_PROGS
>
> TEST_INCLUDES := \
> + team_lib.sh \
> ../bonding/lag_lib.sh \
> ../../../net/forwarding/lib.sh \
> ../../../net/in_netns.sh \
> diff --git a/tools/testing/selftests/drivers/net/team/config b/tools/testing/selftests/drivers/net/team/config
> index 5d36a22ef080..8f04ae419c53 100644
> --- a/tools/testing/selftests/drivers/net/team/config
> +++ b/tools/testing/selftests/drivers/net/team/config
> @@ -6,4 +6,8 @@ CONFIG_NETDEVSIM=m
> CONFIG_NET_IPGRE=y
> CONFIG_NET_TEAM=y
> CONFIG_NET_TEAM_MODE_ACTIVEBACKUP=y
> +CONFIG_NET_TEAM_MODE_BROADCAST=y
> CONFIG_NET_TEAM_MODE_LOADBALANCE=y
> +CONFIG_NET_TEAM_MODE_RANDOM=y
> +CONFIG_NET_TEAM_MODE_ROUNDROBIN=y
> +CONFIG_VETH=y
> diff --git a/tools/testing/selftests/drivers/net/team/team_lib.sh b/tools/testing/selftests/drivers/net/team/team_lib.sh
> new file mode 100644
> index 000000000000..2057f5edee79
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/team/team_lib.sh
> @@ -0,0 +1,148 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +test_dir="$(dirname "$0")"
> +export REQUIRE_MZ=no
> +export NUM_NETIFS=0
> +# shellcheck disable=SC1091
> +source "${test_dir}/../../../net/forwarding/lib.sh"
> +
> +TCP_PORT="43434"
> +
> +# Create a team interface inside of a given network namespace with a given
> +# mode, members, and IP address.
> +# Arguments:
> +# namespace - Network namespace to put the team interface into.
> +# team - The name of the team interface to setup.
> +# mode - The team mode of the interface.
> +# ip_address - The IP address to assign to the team interface.
> +# prefix_length - The prefix length for the IP address subnet.
> +# $@ - members - The member interfaces of the aggregation.
> +setup_team()
> +{
> + local namespace=$1
> + local team=$2
> + local mode=$3
> + local ip_address=$4
> + local prefix_length=$5
> + shift 5
> + local members=("$@")
> +
> + # Prerequisite: team must have no members
> + for member in "${members[@]}"; do
> + ip -n "${namespace}" link set "${member}" nomaster
> + done
> +
> + # Prerequisite: team must have no address in order to set it
> + # shellcheck disable=SC2086
> + ip -n "${namespace}" addr del "${ip_address}/${prefix_length}" \
> + ${NODAD} dev "${team}"
> +
> + echo "Setting team in ${namespace} to mode ${mode}"
> +
> + if ! ip -n "${namespace}" link set "${team}" down; then
> + echo "Failed to bring team device down"
> + return 1
> + fi
> + if ! ip netns exec "${namespace}" teamnl "${team}" setoption mode \
> + "${mode}"; then
> + echo "Failed to set ${team} mode to '${mode}'"
> + return 1
> + fi
> +
> + # Aggregate the members into teams.
> + for member in "${members[@]}"; do
> + ip -n "${namespace}" link set "${member}" master "${team}"
> + done
> +
> + # Bring team devices up and give them addresses.
> + if ! ip -n "${namespace}" link set "${team}" up; then
> + echo "Failed to set ${team} up"
> + return 1
> + fi
> +
> + # shellcheck disable=SC2086
> + if ! ip -n "${namespace}" addr add "${ip_address}/${prefix_length}" \
> + ${NODAD} dev "${team}"; then
> + echo "Failed to give ${team} IP address in ${namespace}"
> + return 1
> + fi
> +}
> +
> +# This is global used to keep track of the sender's iperf3 process, so that it
> +# can be terminated.
> +declare sender_pid
> +
> +# Start sending and receiving TCP traffic with iperf3.
> +# Globals:
> +# sender_pid - The process ID of the iperf3 sender process. Used to kill it
> +# later.
> +start_listening_and_sending()
> +{
> + ip netns exec "${NS2}" iperf3 -s -p "${TCP_PORT}" --logfile /dev/null &
> + # Wait for server to become reachable before starting client.
> + slowwait 5 ip netns exec "${NS1}" iperf3 -c "${NS2_IP}" -p \
> + "${TCP_PORT}" -t 1 --logfile /dev/null
Note for a possible follow-up: the iperf3 server is apparently never
stopped. You could used the wait_local_port_listen helper and
the`--one-off` iperf3 command line argument to avoid that (or explicitly
killing the server pid at cleanup time)
/P
^ permalink raw reply
* Re: [PATCH net-next v7 05/10] selftests: net: Add test for enablement of ports with teamd
From: Paolo Abeni @ 2026-04-13 13:07 UTC (permalink / raw)
To: Marc Harvey, Jiri Pirko, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Shuah Khan, Simon Horman
Cc: netdev, linux-kernel, linux-kselftest, Kuniyuki Iwashima
In-Reply-To: <20260409-teaming-driver-internal-v7-5-f47e7589685d@google.com>
On 4/9/26 4:59 AM, Marc Harvey wrote:
> There are no tests that verify enablement and disablement of team driver
> ports with teamd. This should work even with changes to the enablement
> option, so it is important to test.
>
> This test sets up an active-backup network configuration across two
> network namespaces, and tries to send traffic while changing which
> link is the active one.
>
> Also increase the team test timeout to 300 seconds, because gracefully
> killing teamd can take 30 seconds for each instance.
>
> Signed-off-by: Marc Harvey <marcharvey@google.com>
> ---
> Changes in v7:
> - Increase test timeout to 300 seconds, since terminating teamd can
> take 30 seconds during test cleanup.
> - Link to v6: https://lore.kernel.org/netdev/20260408-teaming-driver-internal-v6-5-e5bcdcf72504@google.com/
>
> Changes in v6:
> - Remove manual changing of member port states to UP, not needed.
> - Link to v5: https://lore.kernel.org/netdev/20260406-teaming-driver-internal-v5-5-e8a3f348a1c5@google.com/
>
> Changes in v5:
> - Make test wait for inactive link to stop receiving traffic after
> setting it to inactive, since there was a race condition.
> - Change test teardown to try graceful shutdown first, then use
> sigkill if needed.
> - Manually delete leftover teamd files during teardown.
> - Use tcpdump instead of checking rx counters.
> - Link to v4: https://lore.kernel.org/netdev/20260403-teaming-driver-internal-v4-5-d3032f33ca25@google.com/
>
> Changed in v3:
> - Make test cleanup kill teamd instead of terminate.
> - Link to v2: https://lore.kernel.org/netdev/20260401-teaming-driver-internal-v2-5-f80c1291727b@google.com/
>
> Changes in v2:
> - Fix shellcheck failures.
> - Remove dependency on net forwarding lib and pipe viewer tools.
> - Use iperf3 for tcp instead of netcat.
> - Link to v1: https://lore.kernel.org/all/20260331053353.2504254-6-marcharvey@google.com/
> ---
> tools/testing/selftests/drivers/net/team/Makefile | 1 +
> tools/testing/selftests/drivers/net/team/settings | 1 +
> .../testing/selftests/drivers/net/team/team_lib.sh | 26 +++
> .../drivers/net/team/teamd_activebackup.sh | 246 +++++++++++++++++++++
> tools/testing/selftests/net/lib.sh | 13 ++
> 5 files changed, 287 insertions(+)
>
> diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile
> index 777da2e0429e..dab922d7f83d 100644
> --- a/tools/testing/selftests/drivers/net/team/Makefile
> +++ b/tools/testing/selftests/drivers/net/team/Makefile
> @@ -7,6 +7,7 @@ TEST_PROGS := \
> options.sh \
> propagation.sh \
> refleak.sh \
> + teamd_activebackup.sh \
> transmit_failover.sh \
> # end of TEST_PROGS
>
> diff --git a/tools/testing/selftests/drivers/net/team/settings b/tools/testing/selftests/drivers/net/team/settings
> new file mode 100644
> index 000000000000..694d70710ff0
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/team/settings
> @@ -0,0 +1 @@
> +timeout=300
> diff --git a/tools/testing/selftests/drivers/net/team/team_lib.sh b/tools/testing/selftests/drivers/net/team/team_lib.sh
> index 2057f5edee79..02ef0ee02d6a 100644
> --- a/tools/testing/selftests/drivers/net/team/team_lib.sh
> +++ b/tools/testing/selftests/drivers/net/team/team_lib.sh
> @@ -146,3 +146,29 @@ did_interface_receive()
> false
> fi
> }
> +
> +# Return true if the given interface in the given namespace does NOT receive
> +# traffic over a 1 second period.
> +# Arguments:
> +# interface - The name of the interface.
> +# ip_address - The destination IP address.
> +# namespace - The name of the namespace that the interface is in.
> +check_no_traffic()
> +{
> + local interface="$1"
> + local ip_address="$2"
> + local namespace="$3"
> + local rc
> +
> + save_tcpdump_outputs "${namespace}" "${interface}"
> + did_interface_receive "${interface}" "${ip_address}"
> + rc=$?
> +
> + clear_tcpdump_outputs "${interface}"
> +
> + if [[ "${rc}" -eq 0 ]]; then
> + return 1
> + else
> + return 0
> + fi
> +}
> diff --git a/tools/testing/selftests/drivers/net/team/teamd_activebackup.sh b/tools/testing/selftests/drivers/net/team/teamd_activebackup.sh
> new file mode 100755
> index 000000000000..2b26a697e179
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/team/teamd_activebackup.sh
> @@ -0,0 +1,246 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# These tests verify that teamd is able to enable and disable ports via the
> +# active backup runner.
> +#
> +# Topology:
> +#
> +# +-------------------------+ NS1
> +# | test_team1 |
> +# | + |
> +# | eth0 | eth1 |
> +# | +---+---+ |
> +# | | | |
> +# +-------------------------+
> +# | |
> +# +-------------------------+ NS2
> +# | | | |
> +# | +-------+ |
> +# | eth0 | eth1 |
> +# | + |
> +# | test_team2 |
> +# +-------------------------+
> +
> +export ALL_TESTS="teamd_test_active_backup"
> +
> +test_dir="$(dirname "$0")"
> +# shellcheck disable=SC1091
> +source "${test_dir}/../../../net/lib.sh"
> +# shellcheck disable=SC1091
> +source "${test_dir}/team_lib.sh"
> +
> +NS1=""
> +NS2=""
> +export NODAD="nodad"
> +PREFIX_LENGTH="64"
> +NS1_IP="fd00::1"
> +NS2_IP="fd00::2"
> +NS1_IP4="192.168.0.1"
> +NS2_IP4="192.168.0.2"
> +NS1_TEAMD_CONF=""
> +NS2_TEAMD_CONF=""
> +NS1_TEAMD_PID=""
> +NS2_TEAMD_PID=""
> +
> +while getopts "4" opt; do
> + case $opt in
> + 4)
> + echo "IPv4 mode selected."
> + export NODAD=
> + PREFIX_LENGTH="24"
> + NS1_IP="${NS1_IP4}"
> + NS2_IP="${NS2_IP4}"
> + ;;
> + \?)
> + echo "Invalid option: -${OPTARG}" >&2
> + exit 1
> + ;;
> + esac
> +done
> +
> +teamd_config_create()
> +{
> + local runner=$1
> + local dev=$2
> + local conf
> +
> + conf=$(mktemp)
> +
> + cat > "${conf}" <<-EOF
> + {
> + "device": "${dev}",
> + "runner": {"name": "${runner}"},
> + "ports": {
> + "eth0": {},
> + "eth1": {}
> + }
> + }
> + EOF
> + echo "${conf}"
> +}
> +
> +# Create the network namespaces, veth pair, and team devices in the specified
> +# runner.
> +# Globals:
> +# RET - Used by test infra, set by `check_err` functions.
> +# Arguments:
> +# runner - The Teamd runner to use for the Team devices.
> +environment_create()
> +{
> + local runner=$1
> +
> + echo "Setting up two-link aggregation for runner ${runner}"
> + echo "Teamd version is: $(teamd --version)"
> + trap environment_destroy EXIT
> +
> + setup_ns ns1 ns2
> + NS1="${NS_LIST[0]}"
> + NS2="${NS_LIST[1]}"
> +
> + for link in $(seq 0 1); do
> + ip -n "${NS1}" link add "eth${link}" type veth peer name \
> + "eth${link}" netns "${NS2}"
> + check_err $? "Failed to create veth pair"
> + done
> +
> + NS1_TEAMD_CONF=$(teamd_config_create "${runner}" "test_team1")
> + NS2_TEAMD_CONF=$(teamd_config_create "${runner}" "test_team2")
> + echo "Conf files are ${NS1_TEAMD_CONF} and ${NS2_TEAMD_CONF}"
> +
> + ip netns exec "${NS1}" teamd -d -f "${NS1_TEAMD_CONF}"
> + check_err $? "Failed to create team device in ${NS1}"
> + NS1_TEAMD_PID=$(pgrep -f "teamd -d -f ${NS1_TEAMD_CONF}")
> +
> + ip netns exec "${NS2}" teamd -d -f "${NS2_TEAMD_CONF}"
> + check_err $? "Failed to create team device in ${NS2}"
> + NS2_TEAMD_PID=$(pgrep -f "teamd -d -f ${NS2_TEAMD_CONF}")
> +
> + echo "Created team devices"
> + echo "Teamd PIDs are ${NS1_TEAMD_PID} and ${NS2_TEAMD_PID}"
> +
> + ip -n "${NS1}" link set test_team1 up
> + check_err $? "Failed to set test_team1 up in ${NS1}"
> + ip -n "${NS2}" link set test_team2 up
> + check_err $? "Failed to set test_team2 up in ${NS2}"
> +
> + ip -n "${NS1}" addr add "${NS1_IP}/${PREFIX_LENGTH}" "${NODAD}" dev \
> + test_team1
Note for a possible follow-up: it looks like that the above will fail with:
Error: either "local" is duplicate, or "" is garbage.
when running in ipv4 mode (not invoked by the CI/self-test infra), due
to the quotes around ${NODAD}.
/P
^ permalink raw reply
* Re: [PATCH v3 net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats()
From: Eric Dumazet @ 2026-04-13 13:11 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet
In-Reply-To: <87se8zcbcy.fsf@toke.dk>
On Mon, Apr 13, 2026 at 5:07 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Eric Dumazet <edumazet@google.com> writes:
>
> > cake_dump_stats() and cake_dump_class_stats() run without qdisc
> > spinlock being held.
> >
> > Add READ_ONCE()/WRITE_ONCE() annotations.
> >
> > Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>
> > ---
> > net/sched/sch_cake.c | 404 ++++++++++++++++++++++++-------------------
> > 1 file changed, 225 insertions(+), 179 deletions(-)
>
> One of these diffstats is not like the others - thanks for tackling this :)
>
> A few nits below:
>
> > diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> > index 32e672820c00a88c6d8fe77a6308405e016525ea..f523f0aa4d830e9d3ec4d43bb123e1dc4f8f289d 100644
> > --- a/net/sched/sch_cake.c
> > +++ b/net/sched/sch_cake.c
> > @@ -399,14 +399,14 @@ static void cake_configure_rates(struct Qdisc *sch, u64 rate, bool rate_adjust);
> > * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
> > */
> >
> > -static void cobalt_newton_step(struct cobalt_vars *vars)
> > +static void cobalt_newton_step(struct cobalt_vars *vars, u32 count)
> > {
> > u32 invsqrt, invsqrt2;
> > u64 val;
> >
> > invsqrt = vars->rec_inv_sqrt;
> > invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
> > - val = (3LL << 32) - ((u64)vars->count * invsqrt2);
> > + val = (3LL << 32) - ((u64)count * invsqrt2);
> >
> > val >>= 2; /* avoid overflow in following multiply */
> > val = (val * invsqrt) >> (32 - 2 + 1);
> > @@ -414,12 +414,12 @@ static void cobalt_newton_step(struct cobalt_vars *vars)
> > vars->rec_inv_sqrt = val;
> > }
> >
> > -static void cobalt_invsqrt(struct cobalt_vars *vars)
> > +static void cobalt_invsqrt(struct cobalt_vars *vars, u32 count)
> > {
> > - if (vars->count < REC_INV_SQRT_CACHE)
> > - vars->rec_inv_sqrt = inv_sqrt_cache[vars->count];
> > + if (count < REC_INV_SQRT_CACHE)
> > + vars->rec_inv_sqrt = inv_sqrt_cache[count];
> > else
> > - cobalt_newton_step(vars);
> > + cobalt_newton_step(vars, count);
> > }
> >
> > static void cobalt_vars_init(struct cobalt_vars *vars)
> > @@ -449,16 +449,19 @@ static bool cobalt_queue_full(struct cobalt_vars *vars,
> > bool up = false;
> >
> > if (ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
> > - up = !vars->p_drop;
> > - vars->p_drop += p->p_inc;
> > - if (vars->p_drop < p->p_inc)
> > - vars->p_drop = ~0;
> > - vars->blue_timer = now;
> > - }
> > - vars->dropping = true;
> > - vars->drop_next = now;
> > + u32 p_drop = vars->p_drop;
> > +
> > + up = !p_drop;
> > + p_drop += p->p_inc;
> > + if (p_drop < p->p_inc)
> > + p_drop = ~0;
> > + WRITE_ONCE(vars->p_drop, p_drop);
> > + WRITE_ONCE(vars->blue_timer, now);
> > + }
> > + WRITE_ONCE(vars->dropping, true);
> > + WRITE_ONCE(vars->drop_next, now);
> > if (!vars->count)
> > - vars->count = 1;
> > + WRITE_ONCE(vars->count, 1);
> >
> > return up;
> > }
> > @@ -474,21 +477,25 @@ static bool cobalt_queue_empty(struct cobalt_vars *vars,
> >
> > if (vars->p_drop &&
> > ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
> > - if (vars->p_drop < p->p_dec)
> > - vars->p_drop = 0;
> > + u32 p_drop = vars->p_drop;
> > +
> > + if (p_drop < p->p_dec)
> > + p_drop = 0;
> > else
> > - vars->p_drop -= p->p_dec;
> > - vars->blue_timer = now;
> > - down = !vars->p_drop;
> > + p_drop -= p->p_dec;
> > + WRITE_ONCE(vars->p_drop, p_drop);
> > + WRITE_ONCE(vars->blue_timer, now);
> > + down = !p_drop;
> > }
> > - vars->dropping = false;
> > + WRITE_ONCE(vars->dropping, false);
> >
> > if (vars->count && ktime_to_ns(ktime_sub(now, vars->drop_next)) >= 0) {
> > - vars->count--;
> > - cobalt_invsqrt(vars);
> > - vars->drop_next = cobalt_control(vars->drop_next,
> > - p->interval,
> > - vars->rec_inv_sqrt);
> > + WRITE_ONCE(vars->count, vars->count - 1);
> > + cobalt_invsqrt(vars, vars->count);
> > + WRITE_ONCE(vars->drop_next,
> > + cobalt_control(vars->drop_next,
> > + p->interval,
> > + vars->rec_inv_sqrt));
> > }
> >
> > return down;
> > @@ -507,6 +514,7 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
> > bool next_due, over_target;
> > ktime_t schedule;
> > u64 sojourn;
> > + u32 count;
> >
> > /* The 'schedule' variable records, in its sign, whether 'now' is before or
> > * after 'drop_next'. This allows 'drop_next' to be updated before the next
> > @@ -528,45 +536,50 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
> > over_target = sojourn > p->target &&
> > sojourn > p->mtu_time * bulk_flows * 2 &&
> > sojourn > p->mtu_time * 4;
> > - next_due = vars->count && ktime_to_ns(schedule) >= 0;
> > + count = vars->count;
> > + next_due = count && ktime_to_ns(schedule) >= 0;
> >
> > vars->ecn_marked = false;
> >
> > if (over_target) {
> > if (!vars->dropping) {
> > - vars->dropping = true;
> > - vars->drop_next = cobalt_control(now,
> > - p->interval,
> > - vars->rec_inv_sqrt);
> > + WRITE_ONCE(vars->dropping, true);
> > + WRITE_ONCE(vars->drop_next,
> > + cobalt_control(now,
> > + p->interval,
> > + vars->rec_inv_sqrt));
> > }
> > - if (!vars->count)
> > - vars->count = 1;
> > + if (!count)
> > + count = 1;
> > } else if (vars->dropping) {
> > - vars->dropping = false;
> > + WRITE_ONCE(vars->dropping, false);
> > }
> >
> > if (next_due && vars->dropping) {
> > /* Use ECN mark if possible, otherwise drop */
> > - if (!(vars->ecn_marked = INET_ECN_set_ce(skb)))
> > + vars->ecn_marked = INET_ECN_set_ce(skb);
> > + if (!vars->ecn_marked)
> > reason = QDISC_DROP_CONGESTED;
> >
> > - vars->count++;
> > - if (!vars->count)
> > - vars->count--;
> > - cobalt_invsqrt(vars);
> > - vars->drop_next = cobalt_control(vars->drop_next,
> > - p->interval,
> > - vars->rec_inv_sqrt);
> > + count++;
> > + if (!count)
> > + count--;
> > + cobalt_invsqrt(vars, count);
> > + WRITE_ONCE(vars->drop_next,
> > + cobalt_control(vars->drop_next,
> > + p->interval,
> > + vars->rec_inv_sqrt));
> > schedule = ktime_sub(now, vars->drop_next);
> > } else {
> > while (next_due) {
> > - vars->count--;
> > - cobalt_invsqrt(vars);
> > - vars->drop_next = cobalt_control(vars->drop_next,
> > - p->interval,
> > - vars->rec_inv_sqrt);
> > + count--;
> > + cobalt_invsqrt(vars, count);
> > + WRITE_ONCE(vars->drop_next,
> > + cobalt_control(vars->drop_next,
> > + p->interval,
> > + vars->rec_inv_sqrt));
> > schedule = ktime_sub(now, vars->drop_next);
> > - next_due = vars->count && ktime_to_ns(schedule) >= 0;
> > + next_due = count && ktime_to_ns(schedule) >= 0;
> > }
> > }
> >
> > @@ -575,11 +588,12 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
> > get_random_u32() < vars->p_drop)
> > reason = QDISC_DROP_FLOOD_PROTECTION;
> >
> > + WRITE_ONCE(vars->count, count);
> > /* Overload the drop_next field as an activity timeout */
> > - if (!vars->count)
> > - vars->drop_next = ktime_add_ns(now, p->interval);
> > + if (count)
>
> This seems to reverse the conditional?
Ah right, thanks !
>
> > + WRITE_ONCE(vars->drop_next, ktime_add_ns(now, p->interval));
> > else if (ktime_to_ns(schedule) > 0 && reason == QDISC_DROP_UNSPEC)
> > - vars->drop_next = now;
> > + WRITE_ONCE(vars->drop_next, now);
> >
> > return reason;
> > }
> > @@ -813,7 +827,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> > i++, k = (k + 1) % CAKE_SET_WAYS) {
> > if (q->tags[outer_hash + k] == flow_hash) {
> > if (i)
> > - q->way_hits++;
> > + WRITE_ONCE(q->way_hits, q->way_hits + 1);
> >
> > if (!q->flows[outer_hash + k].set) {
> > /* need to increment host refcnts */
> > @@ -831,7 +845,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> > for (i = 0; i < CAKE_SET_WAYS;
> > i++, k = (k + 1) % CAKE_SET_WAYS) {
> > if (!q->flows[outer_hash + k].set) {
> > - q->way_misses++;
> > + WRITE_ONCE(q->way_misses, q->way_misses + 1);
> > allocate_src = cake_dsrc(flow_mode);
> > allocate_dst = cake_ddst(flow_mode);
> > goto found;
> > @@ -841,7 +855,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> > /* With no empty queues, default to the original
> > * queue, accept the collision, update the host tags.
> > */
> > - q->way_collisions++;
> > + WRITE_ONCE(q->way_collisions, q->way_collisions + 1);
> > allocate_src = cake_dsrc(flow_mode);
> > allocate_dst = cake_ddst(flow_mode);
> >
> > @@ -875,7 +889,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> > q->flows[reduced_hash].srchost = srchost_idx;
> >
> > if (q->flows[reduced_hash].set == CAKE_SET_BULK)
> > - cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
> > + cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash],
> > + flow_mode);
> > }
> >
> > if (allocate_dst) {
> > @@ -899,7 +914,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
> > q->flows[reduced_hash].dsthost = dsthost_idx;
> >
> > if (q->flows[reduced_hash].set == CAKE_SET_BULK)
> > - cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
> > + cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash],
> > + flow_mode);
> > }
> > }
> >
> > @@ -1379,9 +1395,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
> > len -= off;
> >
> > if (qd->max_netlen < len)
> > - qd->max_netlen = len;
> > + WRITE_ONCE(qd->max_netlen, len);
> > if (qd->min_netlen > len)
> > - qd->min_netlen = len;
> > + WRITE_ONCE(qd->min_netlen, len);
> >
> > len += q->rate_overhead;
> >
> > @@ -1401,9 +1417,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
> > }
> >
> > if (qd->max_adjlen < len)
> > - qd->max_adjlen = len;
> > + WRITE_ONCE(qd->max_adjlen, len);
> > if (qd->min_adjlen > len)
> > - qd->min_adjlen = len;
> > + WRITE_ONCE(qd->min_adjlen, len);
> >
> > return len;
> > }
> > @@ -1416,7 +1432,7 @@ static u32 cake_overhead(struct cake_sched_data *q, const struct sk_buff *skb)
> > u16 segs = qdisc_pkt_segs(skb);
> > u32 len = qdisc_pkt_len(skb);
> >
> > - q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8);
> > + WRITE_ONCE(q->avg_netoff, cake_ewma(q->avg_netoff, off << 16, 8));
> >
> > if (segs == 1)
> > return cake_calc_overhead(q, len, off);
> > @@ -1590,16 +1606,17 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
> > }
> >
> > if (cobalt_queue_full(&flow->cvars, &b->cparams, now))
> > - b->unresponsive_flow_count++;
> > + WRITE_ONCE(b->unresponsive_flow_count,
> > + b->unresponsive_flow_count + 1);
> >
> > len = qdisc_pkt_len(skb);
> > q->buffer_used -= skb->truesize;
> > - b->backlogs[idx] -= len;
> > - b->tin_backlog -= len;
> > + WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] - len);
> > + WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
> > qstats_backlog_sub(sch, len);
> >
> > - flow->dropped++;
> > - b->tin_dropped++;
> > + WRITE_ONCE(flow->dropped, flow->dropped + 1);
> > + WRITE_ONCE(b->tin_dropped, b->tin_dropped + 1);
> >
> > if (q->config->rate_flags & CAKE_FLAG_INGRESS)
> > cake_advance_shaper(q, b, skb, now, true);
> > @@ -1795,7 +1812,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > }
> >
> > if (unlikely(len > b->max_skblen))
> > - b->max_skblen = len;
> > + WRITE_ONCE(b->max_skblen, len);
> >
> > if (qdisc_pkt_segs(skb) > 1 && q->config->rate_flags & CAKE_FLAG_SPLIT_GSO) {
> > struct sk_buff *segs, *nskb;
> > @@ -1819,13 +1836,13 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > numsegs++;
> > slen += segs->len;
> > q->buffer_used += segs->truesize;
> > - b->packets++;
>
> Right above this hunk we do sch->q.qlen++; - does that need changing as
> well?
This was changed to qdisc_qlen_inc() in a prior commit in this series.
( net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec() )
>
> > }
> >
> > /* stats */
> > - b->bytes += slen;
> > - b->backlogs[idx] += slen;
> > - b->tin_backlog += slen;
> > + WRITE_ONCE(b->bytes, b->bytes + slen);
> > + WRITE_ONCE(b->packets, b->packets + numsegs);
> > + WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + slen);
> > + WRITE_ONCE(b->tin_backlog, b->tin_backlog + slen);
> > qstats_backlog_add(sch, slen);
> > q->avg_window_bytes += slen;
> >
> > @@ -1843,10 +1860,10 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > ack = cake_ack_filter(q, flow);
> >
> > if (ack) {
> > - b->ack_drops++;
> > + WRITE_ONCE(b->ack_drops, b->ack_drops + 1);
> > qdisc_qstats_drop(sch);
> > ack_pkt_len = qdisc_pkt_len(ack);
> > - b->bytes += ack_pkt_len;
> > + WRITE_ONCE(b->bytes, b->bytes + ack_pkt_len);
> > q->buffer_used += skb->truesize - ack->truesize;
> > if (q->config->rate_flags & CAKE_FLAG_INGRESS)
> > cake_advance_shaper(q, b, ack, now, true);
> > @@ -1859,10 +1876,10 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > }
> >
> > /* stats */
> > - b->packets++;
> > - b->bytes += len - ack_pkt_len;
> > - b->backlogs[idx] += len - ack_pkt_len;
> > - b->tin_backlog += len - ack_pkt_len;
> > + WRITE_ONCE(b->packets, b->packets + 1);
> > + WRITE_ONCE(b->bytes, b->bytes + len - ack_pkt_len);
> > + WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + len - ack_pkt_len);
> > + WRITE_ONCE(b->tin_backlog, b->tin_backlog + len - ack_pkt_len);
> > qstats_backlog_add(sch, len - ack_pkt_len);
> > q->avg_window_bytes += len - ack_pkt_len;
> > }
> > @@ -1894,9 +1911,9 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > u64 b = q->avg_window_bytes * (u64)NSEC_PER_SEC;
> >
> > b = div64_u64(b, window_interval);
> > - q->avg_peak_bandwidth =
> > - cake_ewma(q->avg_peak_bandwidth, b,
> > - b > q->avg_peak_bandwidth ? 2 : 8);
> > + WRITE_ONCE(q->avg_peak_bandwidth,
> > + cake_ewma(q->avg_peak_bandwidth, b,
> > + b > q->avg_peak_bandwidth ? 2 : 8));
> > q->avg_window_bytes = 0;
> > q->avg_window_begin = now;
> >
> > @@ -1917,27 +1934,30 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > if (!flow->set) {
> > list_add_tail(&flow->flowchain, &b->new_flows);
> > } else {
> > - b->decaying_flow_count--;
> > + WRITE_ONCE(b->decaying_flow_count,
> > + b->decaying_flow_count - 1);
> > list_move_tail(&flow->flowchain, &b->new_flows);
> > }
> > flow->set = CAKE_SET_SPARSE;
> > - b->sparse_flow_count++;
> > + WRITE_ONCE(b->sparse_flow_count,
> > + b->sparse_flow_count + 1);
> >
> > - flow->deficit = cake_get_flow_quantum(b, flow, q->config->flow_mode);
> > + WRITE_ONCE(flow->deficit,
> > + cake_get_flow_quantum(b, flow, q->config->flow_mode));
> > } else if (flow->set == CAKE_SET_SPARSE_WAIT) {
> > /* this flow was empty, accounted as a sparse flow, but actually
> > * in the bulk rotation.
> > */
> > flow->set = CAKE_SET_BULK;
> > - b->sparse_flow_count--;
> > - b->bulk_flow_count++;
> > + WRITE_ONCE(b->sparse_flow_count, b->sparse_flow_count - 1);
> > + WRITE_ONCE(b->bulk_flow_count, b->bulk_flow_count + 1);
> >
> > cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
> > cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
> > }
> >
> > if (q->buffer_used > q->buffer_max_used)
> > - q->buffer_max_used = q->buffer_used;
> > + WRITE_ONCE(q->buffer_max_used, q->buffer_used);
> >
> > if (q->buffer_used <= q->buffer_limit)
> > return NET_XMIT_SUCCESS;
> > @@ -1976,8 +1996,8 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
> > if (flow->head) {
> > skb = dequeue_head(flow);
> > len = qdisc_pkt_len(skb);
> > - b->backlogs[q->cur_flow] -= len;
> > - b->tin_backlog -= len;
> > + WRITE_ONCE(b->backlogs[q->cur_flow], b->backlogs[q->cur_flow] - len);
> > + WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
> > qstats_backlog_sub(sch, len);
> > q->buffer_used -= skb->truesize;
> > qdisc_qlen_dec(sch);
> > @@ -2042,7 +2062,7 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> >
> > cake_configure_rates(sch, new_rate, true);
> > q->last_checked_active = now;
> > - q->active_queues = num_active_qs;
> > + WRITE_ONCE(q->active_queues, num_active_qs);
> > }
> >
> > begin:
> > @@ -2149,8 +2169,10 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> > */
> > if (flow->set == CAKE_SET_SPARSE) {
> > if (flow->head) {
> > - b->sparse_flow_count--;
> > - b->bulk_flow_count++;
> > + WRITE_ONCE(b->sparse_flow_count,
> > + b->sparse_flow_count - 1);
> > + WRITE_ONCE(b->bulk_flow_count,
> > + b->bulk_flow_count + 1);
> >
> > cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
> > cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
> > @@ -2165,7 +2187,8 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> > }
> > }
> >
> > - flow->deficit += cake_get_flow_quantum(b, flow, q->config->flow_mode);
> > + WRITE_ONCE(flow->deficit,
> > + flow->deficit + cake_get_flow_quantum(b, flow, q->config->flow_mode));
> > list_move_tail(&flow->flowchain, &b->old_flows);
> >
> > goto retry;
> > @@ -2177,7 +2200,8 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> > if (!skb) {
> > /* this queue was actually empty */
> > if (cobalt_queue_empty(&flow->cvars, &b->cparams, now))
> > - b->unresponsive_flow_count--;
> > + WRITE_ONCE(b->unresponsive_flow_count,
> > + b->unresponsive_flow_count - 1);
> >
> > if (flow->cvars.p_drop || flow->cvars.count ||
> > ktime_before(now, flow->cvars.drop_next)) {
> > @@ -2187,16 +2211,22 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
> > list_move_tail(&flow->flowchain,
> > &b->decaying_flows);
> > if (flow->set == CAKE_SET_BULK) {
> > - b->bulk_flow_count--;
> > + WRITE_ONCE(b->bulk_flow_count,
> > + b->bulk_flow_count - 1);
> >
> > - cake_dec_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
> > - cake_dec_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
> > + cake_dec_srchost_bulk_flow_count(b, flow,
> > + q->config->flow_mode);
> > + cake_dec_dsthost_bulk_flow_count(b, flow,
> > + q->config->flow_mode);
>
> These seem like unnecessary whitespace changes?
Line length was 105 ... a bit over the recommended limit.
^ permalink raw reply
* [PATCH v2] vsock/virtio: fix accept queue count leak on transport mismatch
From: Dudu Lu @ 2026-04-13 13:14 UTC (permalink / raw)
To: netdev; +Cc: stefanha, sgarzare, mst, jasowang, Dudu Lu
virtio_transport_recv_listen() calls sk_acceptq_added() before
vsock_assign_transport(). If vsock_assign_transport() fails or
selects a different transport, the error path returns without
calling sk_acceptq_removed(), permanently incrementing
sk_ack_backlog.
After approximately backlog+1 such failures, sk_acceptq_is_full()
returns true, causing the listener to reject all new connections.
Fix by moving sk_acceptq_added() to after the transport validation,
matching the pattern used by vmci_transport and hyperv_transport.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
---
net/vmw_vsock/virtio_transport_common.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 8a9fb23c6e85..e01d983488e5 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1560,8 +1560,6 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
return -ENOMEM;
}
- sk_acceptq_added(sk);
-
lock_sock_nested(child, SINGLE_DEPTH_NESTING);
child->sk_state = TCP_ESTABLISHED;
@@ -1583,6 +1581,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
return ret;
}
+ sk_acceptq_added(sk);
if (virtio_transport_space_update(child, skb))
child->sk_write_space(child);
--
2.39.3 (Apple Git-145)
^ permalink raw reply related
* Re: [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps
From: Eric Dumazet @ 2026-04-13 13:16 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>
On Fri, Apr 10, 2026 at 11:23 AM Eric Dumazet <edumazet@google.com> wrote:
>
> We add annotations for data-races, so that most dump methods
> can run in parallel with data path.
>
> Then change mq and mqprio to no longer acquire each children
> qdisc spinlock.
>
> Next round of patches will wait for linux-7.2.
>
> v2/v3: addressed most sashiko.dev feedbacks.
> I think remaining problems (in red offloads) are minor
> and can be fixed later.
1) An issue was spooted in sch_cake.c (patch (13/15)
2) net-next has been closed.
Therefore, I will send a V4 in in 2 weeks.
pw-bot: cr
^ permalink raw reply
* Re: [PATCH v11 net-next 4/7] devlink: Implement devlink param multi attribute nested data values
From: Paolo Abeni @ 2026-04-13 13:18 UTC (permalink / raw)
To: Ratheesh Kannoth
Cc: netdev, linux-kernel, linux-rdma, sgoutham, andrew+netdev, davem,
edumazet, kuba, donald.hunter, horms, jiri, chuck.lever, matttbe,
cjubran, saeedm, leon, tariqt, mbloch, dtatulea
In-Reply-To: <adzMvyIr7-uBtGlI@rkannoth-OptiPlex-7090>
On 4/13/26 1:00 PM, Ratheesh Kannoth wrote:
> On 2026-04-13 at 16:24:41, Paolo Abeni (pabeni@redhat.com) wrote:
>> On 4/9/26 4:50 AM, Ratheesh Kannoth wrote:
>>> @@ -441,6 +448,7 @@ union devlink_param_value {
>>> u64 vu64;
>>> char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE];
>>> bool vbool;
>>> + struct devlink_param_u64_array u64arr;
>>
>> You mentioned that you intend to handle the possible CONFIG_FRAME_WARN
>> with a separate patch. IMHO such patch need to be part of this series,
>> or things will stay broken for an undefined amount of time until such
>> patch is merged separatelly.
>
> Patch no: 3 in the same series.
> https://lore.kernel.org/netdev/20260409025055.1664053-4-rkannoth@marvell.com/#t
I fear that is not enough ?!? i.e. what's about
devl_param_driverinit_value_set()? Likely devlink_param->validate is
called with enough space available in the stack to not care about the
huge argument, but the mentioned helper is called quite deeper.
/P
^ permalink raw reply
* [GIT PULL] bluetooth-next 2026-04-13
From: Luiz Augusto von Dentz @ 2026-04-13 13:22 UTC (permalink / raw)
To: davem, kuba; +Cc: linux-bluetooth, netdev
The following changes since commit 42f9b4c6ef19e71d2c7d9bfd3c5037d4fe434ad7:
tools: ynl: tests: fix leading space on Makefile target (2026-04-09 20:41:40 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git tags/for-net-next-2026-04-13
for you to fetch changes up to c347ca17d62a32c25564fee0ca3a2a7bc2d5fd6f:
Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling (2026-04-13 09:19:42 -0400)
----------------------------------------------------------------
bluetooth-next pull request for net-next:
core:
- hci_core: Rate limit the logging of invalid ISO handle
- hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists
- hci_event: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER
- hci_event: fix potential UAF in SSP passkey handlers
- HCI: Avoid a couple -Wflex-array-member-not-at-end warnings
- L2CAP: CoC: Disconnect if received packet size exceeds MPS
- L2CAP: Add missing chan lock in l2cap_ecred_reconf_rsp
- L2CAP: Fix printing wrong information if SDU length exceeds MTU
- SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec
drivers:
- btusb: MT7922: Add VID/PID 0489/e174
- btusb: Add Lite-On 04ca:3807 for MediaTek MT7921
- btusb: Add MT7927 IDs ASUS ROG Crosshair X870E Hero, Lenovo Legion Pro 7
16ARX9, Gigabyte Z790 AORUS MASTER X, MSI X870E Ace Max, TP-Link
Archer TBE550E, ASUS X870E / ProArt X870E-Creator.
- btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede
- btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede
- btusb: MediaTek MT7922: Add VID 0489 & PID e11d
- btintel: Add support for Scorpious Peak2 support
- btintel: Add support for Scorpious Peak2F support
- btintel_pcie: Add device id of Scorpius Peak2, Nova Lake-PCD-H
- btintel_pcie: Add device id of Scorpious2, Nova Lake-PCD-S
- btmtk: Add reset mechanism if downloading firmware failed
- btmtk: Add MT6639 (MT7927) Bluetooth support
- btmtk: fix ISO interface setup for single alt setting
- btmtk: add MT7902 SDIO support
- Bluetooth: btmtk: add MT7902 MCU support
- btbcm: Add entry for BCM4343A2 UART Bluetooth
- qca: enable pwrseq support for wcn39xx devices
- hci_qca: Fix BT not getting powered-off on rmmod
- hci_qca: disable power control for WCN7850 when bt_en is not defined
- hci_qca: Fix missing wakeup during SSR memdump handling
- hci_ldisc: Clear HCI_UART_PROTO_INIT on error
- mmc: sdio: add MediaTek MT7902 SDIO device ID
- hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x
----------------------------------------------------------------
Arnd Bergmann (1):
Bluetooth: btmtk: hide unused btmtk_mt6639_devs[] array
Chris Lu (4):
Bluetooth: btusb: MT7922: Add VID/PID 0489/e174
Bluetooth: btmtk: improve mt79xx firmware setup retry flow
Bluetooth: btmtk: add status check in mt79xx firmware setup
Bluetooth: btmtk: Add reset mechanism if downloading firmware failed
Christian Eggers (1):
Bluetooth: L2CAP: CoC: Disconnect if received packet size exceeds MPS
Dmitry Baryshkov (1):
Bluetooth: qca: enable pwrseq support for WCN39xx devices
Dongyang Jin (1):
Bluetooth: btbcm: remove done label in btbcm_patchram
Dudu Lu (1):
Bluetooth: l2cap: Add missing chan lock in l2cap_ecred_reconf_rsp
Dylan Eray (1):
Bluetooth: btusb: Add Lite-On 04ca:3807 for MediaTek MT7921
Gustavo A. R. Silva (1):
Bluetooth: hci.h: Avoid a couple -Wflex-array-member-not-at-end warnings
Hans de Goede (2):
Bluetooth: hci_qca: Fix confusing shutdown() and power_off() naming
Bluetooth: hci_qca: Fix BT not getting powered-off on rmmod
Javier Tia (8):
Bluetooth: btmtk: Add MT6639 (MT7927) Bluetooth support
Bluetooth: btmtk: fix ISO interface setup for single alt setting
Bluetooth: btusb: Add MT7927 ID for ASUS ROG Crosshair X870E Hero
Bluetooth: btusb: Add MT7927 ID for Lenovo Legion Pro 7 16ARX9
Bluetooth: btusb: Add MT7927 ID for Gigabyte Z790 AORUS MASTER X
Bluetooth: btusb: Add MT7927 ID for MSI X870E Ace Max
Bluetooth: btusb: Add MT7927 ID for TP-Link Archer TBE550E
Bluetooth: btusb: Add MT7927 ID for ASUS X870E / ProArt X870E-Creator
Johan Hovold (2):
Bluetooth: btusb: refactor endpoint lookup
Bluetooth: btmtk: refactor endpoint lookup
Jonathan Rissanen (1):
Bluetooth: hci_ldisc: Clear HCI_UART_PROTO_INIT on error
Kamiyama Chiaki (1):
Bluetooth: btusb: MediaTek MT7922: Add VID 0489 & PID e11d
Kiran K (10):
Bluetooth: btintel: Add support for hybrid signature for ScP2 onwards
Bluetooth: btintel: Replace CNVi id with hardware variant
Bluetooth: btintel: Add support for Scorpious Peak2 support
Bluetooth: btintel: Add DSBR support for ScP2 onwards
Bluetooth: btintel_pcie: Add support for exception dump for ScP2
Bluetooth: btintel: Add support for Scorpious Peak2F support
Bluetooth: btintel_pcie: Add support for exception dump for ScP2F
Bluetooth: btintel_pcie: Add device id of Scorpius Peak2, Nova Lake-PCD-H
Bluetooth: btintel_pcie: Add device id of Scorpious2, Nova Lake-PCD-S
Bluetooth: btintel_pcie: Align shared DMA memory to 128 bytes
Luiz Augusto von Dentz (2):
Bluetooth: btintel_pci: Fix btintel_pcie_read_hwexp code style
Bluetooth: L2CAP: Fix printing wrong information if SDU length exceeds MTU
Lukas Kraft (1):
bluetooth: btusb: Fix whitespace in btusb.c
Marek Vasut (1):
Bluetooth: btbcm: Add entry for BCM4343A2 UART Bluetooth
Pauli Virtanen (3):
Bluetooth: hci_core: Rate limit the logging of invalid ISO handle
Bluetooth: hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists
Bluetooth: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER
Sean Wang (8):
mmc: sdio: add MediaTek MT7902 SDIO device ID
Bluetooth: btmtk: add MT7902 MCU support
Bluetooth: btusb: Add new VID/PID 13d3/3579 for MT7902
Bluetooth: btusb: Add new VID/PID 13d3/3580 for MT7902
Bluetooth: btusb: Add new VID/PID 13d3/3594 for MT7902
Bluetooth: btusb: Add new VID/PID 13d3/3596 for MT7902
Bluetooth: btusb: Add new VID/PID 0e8d/1ede for MT7902
Bluetooth: btmtk: add MT7902 SDIO support
Shuai Zhang (2):
Bluetooth: hci_qca: disable power control for WCN7850 when bt_en is not defined
Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling
Shuvam Pandey (1):
Bluetooth: hci_event: fix potential UAF in SSP passkey handlers
Stefan Metzmacher (1):
Bluetooth: SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec
Stefano Radaelli (1):
Bluetooth: hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x
Thorsten Blum (3):
Bluetooth: btintel_pcie: Replace snprintf("%s") with strscpy
Bluetooth: btintel_pcie: Use struct_size to improve hci_drv_read_info
Bluetooth: btintel_pcie: use strscpy to copy plain strings
Vivek Sahu (1):
Bluetooth: qca: Refactor code on the basis of chipset names
drivers/bluetooth/btbcm.c | 11 ++--
drivers/bluetooth/btintel.c | 109 ++++++++++++++++++++++++++++------
drivers/bluetooth/btintel.h | 20 +++++--
drivers/bluetooth/btintel_pcie.c | 122 ++++++++++++++++++++++++---------------
drivers/bluetooth/btintel_pcie.h | 3 -
drivers/bluetooth/btmtk.c | 115 ++++++++++++++++++++++++++----------
drivers/bluetooth/btmtk.h | 9 ++-
drivers/bluetooth/btmtksdio.c | 44 +++++++++-----
drivers/bluetooth/btqca.c | 37 ++++++------
drivers/bluetooth/btusb.c | 84 +++++++++++++--------------
drivers/bluetooth/hci_ldisc.c | 3 +
drivers/bluetooth/hci_ll.c | 10 ++++
drivers/bluetooth/hci_qca.c | 84 ++++++++++++++++-----------
include/linux/mmc/sdio_ids.h | 1 +
include/net/bluetooth/hci.h | 16 +++--
net/bluetooth/hci_conn.c | 4 +-
net/bluetooth/hci_core.c | 4 +-
net/bluetooth/hci_event.c | 21 ++++---
net/bluetooth/hci_sync.c | 2 +-
net/bluetooth/l2cap_core.c | 15 ++++-
net/bluetooth/sco.c | 3 +-
21 files changed, 476 insertions(+), 241 deletions(-)
^ permalink raw reply
* Re: [PATCH net-next v7 00/10] Decouple receive and transmit enablement in team driver
From: patchwork-bot+netdevbpf @ 2026-04-13 13:30 UTC (permalink / raw)
To: Marc Harvey
Cc: jiri, andrew+netdev, davem, edumazet, kuba, pabeni, shuah, horms,
netdev, linux-kernel, linux-kselftest, kuniyu
In-Reply-To: <20260409-teaming-driver-internal-v7-0-f47e7589685d@google.com>
Hello:
This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Thu, 09 Apr 2026 02:59:22 +0000 you wrote:
> Allow independent control over receive and transmit enablement states
> for aggregated ports in the team driver.
>
> The motivation is that IEE 802.3ad LACP "independent control" can't
> be implemented for the team driver currently. This was added to the
> bonding driver in commit 240fd405528b ("bonding: Add independent
> control state machine").
>
> [...]
Here is the summary with links:
- [net-next,v7,01/10] net: team: Annotate reads and writes for mixed lock accessed values
https://git.kernel.org/netdev/net-next/c/3faf0ce6e499
- [net-next,v7,02/10] net: team: Remove unused team_mode_op, port_enabled
https://git.kernel.org/netdev/net-next/c/014f249121d7
- [net-next,v7,03/10] net: team: Rename port_disabled team mode op to port_tx_disabled
https://git.kernel.org/netdev/net-next/c/cfa477df2cc6
- [net-next,v7,04/10] selftests: net: Add tests for failover of team-aggregated ports
https://git.kernel.org/netdev/net-next/c/05e352444b24
- [net-next,v7,05/10] selftests: net: Add test for enablement of ports with teamd
https://git.kernel.org/netdev/net-next/c/10407eebe886
- [net-next,v7,06/10] net: team: Rename enablement functions and struct members to tx
https://git.kernel.org/netdev/net-next/c/fa6ed31dd913
- [net-next,v7,07/10] net: team: Track rx enablement separately from tx enablement
https://git.kernel.org/netdev/net-next/c/68f0833f279a
- [net-next,v7,08/10] net: team: Add new rx_enabled team port option
https://git.kernel.org/netdev/net-next/c/0e47569a574d
- [net-next,v7,09/10] net: team: Add new tx_enabled team port option
https://git.kernel.org/netdev/net-next/c/bb9215a98179
- [net-next,v7,10/10] selftests: net: Add tests for team driver decoupled tx and rx control
https://git.kernel.org/netdev/net-next/c/d3870724eb16
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH v10 1/2] net: mhi: Enable Ethernet interface support
From: Paolo Abeni @ 2026-04-13 13:31 UTC (permalink / raw)
To: Vivek Pernamitta, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski
Cc: netdev, linux-kernel
In-Reply-To: <20260409-vdev_b1_eth_b1_next-20260408-v10-1-6d44ca48f189@oss.qualcomm.com>
On 4/9/26 8:08 AM, Vivek Pernamitta wrote:
> @@ -208,17 +235,20 @@ static void mhi_net_dl_callback(struct mhi_device *mhi_dev,
> skb = mhi_net_skb_agg(mhi_netdev, skb);
> mhi_netdev->skbagg_head = NULL;
> }
> -
> - switch (skb->data[0] & 0xf0) {
> - case 0x40:
> - skb->protocol = htons(ETH_P_IP);
> - break;
> - case 0x60:
> - skb->protocol = htons(ETH_P_IPV6);
> - break;
> - default:
> - skb->protocol = htons(ETH_P_MAP);
> - break;
> + if (mhi_netdev->ndev->type == ARPHRD_ETHER) {
> + skb->protocol = eth_type_trans(skb, mhi_netdev->ndev);
Sashiko says:
Is there a risk of an out-of-bounds read or kernel panic here if a malformed
or fragmented packet is received?
eth_type_trans() assumes the SKB has at least a 14-byte MAC header and calls
skb_pull_inline(). If the linear portion of the SKB is smaller than 14
bytes,
__skb_pull() will trigger a BUG_ON(skb->len < skb->data_len).
Should we call pskb_may_pull(skb, ETH_HLEN) before parsing the Ethernet
header?
> + } else {
> + switch (skb->data[0] & 0xf0) {
> + case 0x40:
> + skb->protocol = htons(ETH_P_IP);
> + break;
> + case 0x60:
> + skb->protocol = htons(ETH_P_IPV6);
> + break;
> + default:
> + skb->protocol = htons(ETH_P_MAP);
> + break;
> + }
> }
>
> u64_stats_update_begin(&mhi_netdev->stats.rx_syncp);
> @@ -306,6 +336,9 @@ static int mhi_net_newlink(struct mhi_device *mhi_dev, struct net_device *ndev)
> struct mhi_net_dev *mhi_netdev;
> int err;
>
> + if (ndev->header_ops)
> + eth_hw_addr_random(ndev);
> +
> mhi_netdev = netdev_priv(ndev);
>
> dev_set_drvdata(&mhi_dev->dev, mhi_netdev);
> @@ -356,7 +389,8 @@ static int mhi_net_probe(struct mhi_device *mhi_dev,
> int err;
>
> ndev = alloc_netdev(sizeof(struct mhi_net_dev), info->netname,
> - NET_NAME_PREDICTABLE, mhi_net_setup);
> + NET_NAME_ENUM, info->ethernet_if ?
> + mhi_ethernet_setup : mhi_net_setup);
Sashiko says:
Does changing the name assignment type from NET_NAME_PREDICTABLE to
NET_NAME_ENUM break backwards compatibility for existing legacy interfaces?
NET_NAME_PREDICTABLE instructs userspace to leave the kernel-provided
name alone, while NET_NAME_ENUM signals that the interface is a generic
enumeration and should be renamed. Applying this to existing interfaces
like mhi_hwip0 and mhi_swip0 might cause them to be unexpectedly renamed
on boot, potentially breaking existing userspace network configurations.
please have a look at the full report:
https://sashiko.dev/#/patchset/20260409-vdev_b1_eth_b1_next-20260408-v10-0-6d44ca48f189%40oss.qualcomm.com
/P
^ permalink raw reply
* Re: [PATCH] xfrm: fix memory leak in xfrm_add_policy()
From: Sabrina Dubroca @ 2026-04-13 13:32 UTC (permalink / raw)
To: Deepanshu Kartikey
Cc: steffen.klassert, herbert, davem, edumazet, kuba, pabeni, horms,
leon, netdev, linux-kernel, syzbot+901d48e0b95aed4a2548
In-Reply-To: <20260412020809.35465-1-kartikey406@gmail.com>
2026-04-12, 07:38:09 +0530, Deepanshu Kartikey wrote:
> When xfrm_policy_insert() fails, the error path performs manual
> cleanup by calling xfrm_dev_policy_free(), security_xfrm_policy_free()
> and kfree() directly. This is incorrect because xfrm_policy_destroy()
> already handles all of these, causing a memory leak detected by
> kmemleak.
What is missing in the current code? "we have a better way to do this"
is not a bugfix, it's a clean up. The kmemleak report says that we're
leaking the xfrm_policy struct on this codepath, which doesn't make
sense, that's covered by the existing kfree(xp).
Also, please use "PATCH ipsec" for fixes to net/xfrm and the rest of
the IPsec implementation.
--
Sabrina
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox