* [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucas1p1.samsung.com> @ 2025-05-30 10:34 ` e.kubanski [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p4> ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: e.kubanski @ 2025-05-30 10:34 UTC (permalink / raw) To: netdev, linux-kernel Cc: bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, e.kubanski Move xsk completion queue descriptor write-back to destructor. Fix xsk descriptor management in completion queue. Descriptor management mechanism didn't take care of situations where completion queue submission can happen out-of-order to descriptor write-back. __xsk_generic_xmit() was assigning descriptor to slot right after completion queue slot reservation. If multiple CPUs access the same completion queue after xmit, this can result in out-of-order submission of invalid descriptor batch. SKB destructor call can submit descriptor batch that is currently in use by other CPU, instead of correct transmitted ones. This could result in User-Space <-> Kernel-Space data race. Forbid possible out-of-order submissions: CPU A: Reservation + Descriptor Write CPU B: Reservation + Descriptor Write CPU B: Submit (submitted first batch reserved by CPU A) CPU A: Submit (submitted second batch reserved by CPU B) Move Descriptor Write to submission phase: CPU A: Reservation (only moves local writer) CPU B: Reservation (only moves local writer) CPU B: Descriptor Write + Submit CPU A: Descriptor Write + Submit This solves potential out-of-order free of xsk buffers. Signed-off-by: Eryk Kubanski <e.kubanski@partner.samsung.com> Fixes: e6c4047f5122 ("xsk: Use xsk_buff_pool directly for cq functions") --- include/linux/skbuff.h | 2 ++ net/xdp/xsk.c | 17 +++++++++++------ net/xdp/xsk_queue.h | 11 +++++++++++ 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5520524c93bf..cc37b62638cd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -624,6 +624,8 @@ struct skb_shared_info { void *destructor_arg; }; + u64 xsk_descs[MAX_SKB_FRAGS]; + /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; }; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 72c000c0ae5f..2987e81482d7 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -528,24 +528,24 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags); } -static int xsk_cq_reserve_addr_locked(struct xsk_buff_pool *pool, u64 addr) +static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) { unsigned long flags; int ret; spin_lock_irqsave(&pool->cq_lock, flags); - ret = xskq_prod_reserve_addr(pool->cq, addr); + ret = xskq_prod_reserve(pool->cq); spin_unlock_irqrestore(&pool->cq_lock, flags); return ret; } -static void xsk_cq_submit_locked(struct xsk_buff_pool *pool, u32 n) +static void xsk_cq_submit_locked(struct xsk_buff_pool *pool, u64 *descs, u32 n) { unsigned long flags; spin_lock_irqsave(&pool->cq_lock, flags); - xskq_prod_submit_n(pool->cq, n); + xskq_prod_write_submit_addr_n(pool->cq, descs, n); spin_unlock_irqrestore(&pool->cq_lock, flags); } @@ -572,7 +572,9 @@ static void xsk_destruct_skb(struct sk_buff *skb) *compl->tx_timestamp = ktime_get_tai_fast_ns(); } - xsk_cq_submit_locked(xdp_sk(skb->sk)->pool, xsk_get_num_desc(skb)); + xsk_cq_submit_locked(xdp_sk(skb->sk)->pool, + skb_shinfo(skb)->xsk_descs, + xsk_get_num_desc(skb)); sock_wfree(skb); } @@ -754,7 +756,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb->priority = READ_ONCE(xs->sk.sk_priority); skb->mark = READ_ONCE(xs->sk.sk_mark); skb->destructor = xsk_destruct_skb; + xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta); + skb_shinfo(skb)->xsk_descs[xsk_get_num_desc(skb)] = desc->addr; xsk_set_destructor_arg(skb); return skb; @@ -765,6 +769,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, if (err == -EOVERFLOW) { /* Drop the packet */ + skb_shinfo(xs->skb)->xsk_descs[xsk_get_num_desc(xs->skb)] = desc->addr; xsk_set_destructor_arg(xs->skb); xsk_drop_skb(xs->skb); xskq_cons_release(xs->tx); @@ -807,7 +812,7 @@ static int __xsk_generic_xmit(struct sock *sk) * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - err = xsk_cq_reserve_addr_locked(xs->pool, desc.addr); + err = xsk_cq_reserve_locked(xs->pool); if (err) { err = -EAGAIN; goto out; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 46d87e961ad6..06ce89aae217 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -436,6 +436,17 @@ static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries) __xskq_prod_submit(q, q->ring->producer + nb_entries); } +static inline void xskq_prod_write_submit_addr_n(struct xsk_queue *q, u64 *addrs, u32 nb_entries) +{ + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + u32 prod = q->ring->producer; + + for (u32 i = 0; i < nb_entries; ++i) + ring->desc[prod++ & q->ring_mask] = addrs[i]; + + __xskq_prod_submit(q, prod); +} + static inline bool xskq_prod_is_empty(struct xsk_queue *q) { /* No barriers needed since data is not accessed */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p4>]
* RE: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p4> @ 2025-05-30 11:56 ` Eryk Kubanski 0 siblings, 0 replies; 18+ messages in thread From: Eryk Kubanski @ 2025-05-30 11:56 UTC (permalink / raw) To: Eryk Kubanski, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: bjorn@kernel.org, magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com It seems that CI tests have failed: First test_progs failure (test_progs-aarch64-gcc-14): #610/3 xdp_adjust_tail/xdp_adjust_tail_grow2 ... test_xdp_adjust_tail_grow2:FAIL:case-128 retval unexpected case-128 retval: actual 1 != expected 3 test_xdp_adjust_tail_grow2:FAIL:case-128 data_size_out unexpected case-128 data_size_out: actual 128 != expected 3520 test_xdp_adjust_tail_grow2:FAIL:case-128-data cnt unexpected case-128-data cnt: actual 0 != expected 3392 test_xdp_adjust_tail_grow2:FAIL:case-128-data data_size_out unexpected case-128-data data_size_out: actual 128 != expected 3520 ... #620 xdp_do_redirect ... test_max_pkt_size:FAIL:prog_run_max_size unexpected error: -22 (errno 22) But Im not sure why? My changes impact only AF_XDP generic_xmit functions, these bpf tests don't touch AF_XDP sockets xmit path. Changes should impact only sendmsg socket syscall. Most of changes are Translation Unit local to xsk module. The only thing i think could impact that is skb_shared_info, but why? xsk tests didn't fail. Could you help me figure it out? Should I be worried? Maybe tests are broken? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-05-30 10:34 ` [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() e.kubanski [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p4> @ 2025-05-30 16:07 ` Stanislav Fomichev [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p1> 2025-06-04 14:41 ` kernel test robot 3 siblings, 0 replies; 18+ messages in thread From: Stanislav Fomichev @ 2025-05-30 16:07 UTC (permalink / raw) To: e.kubanski Cc: netdev, linux-kernel, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon On 05/30, e.kubanski wrote: > Move xsk completion queue descriptor write-back to destructor. > > Fix xsk descriptor management in completion queue. Descriptor > management mechanism didn't take care of situations where > completion queue submission can happen out-of-order to > descriptor write-back. > > __xsk_generic_xmit() was assigning descriptor to slot right > after completion queue slot reservation. If multiple CPUs > access the same completion queue after xmit, this can result > in out-of-order submission of invalid descriptor batch. > SKB destructor call can submit descriptor batch that is > currently in use by other CPU, instead of correct transmitted > ones. This could result in User-Space <-> Kernel-Space data race. > > Forbid possible out-of-order submissions: > CPU A: Reservation + Descriptor Write > CPU B: Reservation + Descriptor Write > CPU B: Submit (submitted first batch reserved by CPU A) > CPU A: Submit (submitted second batch reserved by CPU B) > > Move Descriptor Write to submission phase: > CPU A: Reservation (only moves local writer) > CPU B: Reservation (only moves local writer) > CPU B: Descriptor Write + Submit > CPU A: Descriptor Write + Submit > > This solves potential out-of-order free of xsk buffers. I'm not sure I understand what's the issue here. If you're using the same XSK from different CPUs, you should take care of the ordering yourself on the userspace side? > Signed-off-by: Eryk Kubanski <e.kubanski@partner.samsung.com> > Fixes: e6c4047f5122 ("xsk: Use xsk_buff_pool directly for cq functions") > --- > include/linux/skbuff.h | 2 ++ > net/xdp/xsk.c | 17 +++++++++++------ > net/xdp/xsk_queue.h | 11 +++++++++++ > 3 files changed, 24 insertions(+), 6 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 5520524c93bf..cc37b62638cd 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -624,6 +624,8 @@ struct skb_shared_info { > void *destructor_arg; > }; > > + u64 xsk_descs[MAX_SKB_FRAGS]; This is definitely a no-go (sk_buff and skb_shared_info space is precious). ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p1>]
* RE: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p1> @ 2025-06-02 9:27 ` Eryk Kubanski 2025-06-02 15:28 ` Stanislav Fomichev ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Eryk Kubanski @ 2025-06-02 9:27 UTC (permalink / raw) To: Stanislav Fomichev Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com > I'm not sure I understand what's the issue here. If you're using the > same XSK from different CPUs, you should take care of the ordering > yourself on the userspace side? It's not a problem with user-space Completion Queue READER side. Im talking exclusively about kernel-space Completion Queue WRITE side. This problem can occur when multiple sockets are bound to the same umem, device, queue id. In this situation Completion Queue is shared. This means it can be accessed by multiple threads on kernel-side. Any use is indeed protected by spinlock, however any write sequence (Acquire write slot as writer, write to slot, submit write slot to reader) isn't atomic in any way and it's possible to submit not-yet-sent packet descriptors back to user-space as TX completed. Up untill now, all write-back operations had two phases, each phase locks the spinlock and unlocks it: 1) Acquire slot + Write descriptor (increase cached-writer by N + write values) 2) Submit slot to the reader (increase writer by N) Slot submission was solely based on the timing. Let's consider situation, where two different threads issue a syscall for two different AF_XDP sockets that are bound to the same umem, dev, queue-id. AF_XDP setup: kernel-space Write Read +--+ +--+ | | | | | | | | | | | | Completion | | | | Fill Queue | | | | Queue | | | | | | | | | | | | | | | | +--+ +--+ Read Write user-space +--------+ +--------+ | AF_XDP | | AF_XDP | +--------+ +--------+ Possible out-of-order scenario: writer cached_writer1 cached_writer2 | | | | | | | | | | | | +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ | | | | | | | | | Completion Queue | | | | | | | | | | | | | | | | | | +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ | | | | | | |-----------------| | A) T1 syscall | | writes 2 | | descriptors |-----------------------------------| B) T2 syscall writes 4 descriptors Notes: 1) T1 and T2 AF_XDP sockets are two different sockets, __xsk_generic_xmit will obtain two different mutexes. 2) T1 and T2 can be executed simultaneously, there is no critical section whatsoever between them. 3) T1 and T2 will obtain Completion Queue Lock for acquire + write, only slot acquire + write are under lock. 4) T1 and T2 completion (skb destructor) doesn't need to be the same order as A) and B). 5) What if T1 fails after T2 acquires slots? cached_writer will be decreased by 2, T2 will submit failed descriptors of T1 (they shall be retransmitted in next TX). Submission of writer will move writer by 4 slots 2 of these slots have failed T1 values. Last two slots of T2 will be missing, descriptor leak. 6) What if T2 completes before T1? writer will be moved by 4 slots. 2 of them are slots filled by T1. T2 will complete 2 own slots and 2 slots of T1, It's bad. T1 will complete last 2 slots of T2, also bad. This out-of-order completion can effectively cause User-space <-> Kernel-space data race. This patch solves that, by only acquiring cached_writer first and do the completion (sumission (write + increase writer)) after. This is the only way to make that bulletproof for multithreaded access, failures and out-of-order skb completions. > This is definitely a no-go (sk_buff and skb_shared_info space is > precious). Okay so where should I store It? Can you give me some advice? I left that there, because there is every information related to skb desctruction. Additionally this is the only place in skb related code that defines anything related to xsk: metadata, number of descriptors. SKBUFF doesn't. I need to hold this information somewhere, and skbuff or skb_shared_info are the only place I can store it. This need to be invariant across all skb fragments, and be released after skb completes. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-06-02 9:27 ` Eryk Kubanski @ 2025-06-02 15:28 ` Stanislav Fomichev 2025-06-02 16:03 ` Maciej Fijalkowski [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p2> [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p3> 2025-07-03 23:37 ` Jason Xing 2 siblings, 2 replies; 18+ messages in thread From: Stanislav Fomichev @ 2025-06-02 15:28 UTC (permalink / raw) To: Eryk Kubanski Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com On 06/02, Eryk Kubanski wrote: > > I'm not sure I understand what's the issue here. If you're using the > > same XSK from different CPUs, you should take care of the ordering > > yourself on the userspace side? > > It's not a problem with user-space Completion Queue READER side. > Im talking exclusively about kernel-space Completion Queue WRITE side. > > This problem can occur when multiple sockets are bound to the same > umem, device, queue id. In this situation Completion Queue is shared. > This means it can be accessed by multiple threads on kernel-side. > Any use is indeed protected by spinlock, however any write sequence > (Acquire write slot as writer, write to slot, submit write slot to reader) > isn't atomic in any way and it's possible to submit not-yet-sent packet > descriptors back to user-space as TX completed. > > Up untill now, all write-back operations had two phases, each phase > locks the spinlock and unlocks it: > 1) Acquire slot + Write descriptor (increase cached-writer by N + write values) > 2) Submit slot to the reader (increase writer by N) > > Slot submission was solely based on the timing. Let's consider situation, > where two different threads issue a syscall for two different AF_XDP sockets > that are bound to the same umem, dev, queue-id. > > AF_XDP setup: > > kernel-space > > Write Read > +--+ +--+ > | | | | > | | | | > | | | | > Completion | | | | Fill > Queue | | | | Queue > | | | | > | | | | > | | | | > | | | | > +--+ +--+ > Read Write > user-space > > > +--------+ +--------+ > | AF_XDP | | AF_XDP | > +--------+ +--------+ > > > > > > Possible out-of-order scenario: > > > writer cached_writer1 cached_writer2 > | | | > | | | > | | | > | | | > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > | | | | | | | | | > Completion Queue | | | | | | | | | > | | | | | | | | | > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > | | | > | | | > |-----------------| | > A) T1 syscall | | > writes 2 | | > descriptors |-----------------------------------| > B) T2 syscall writes 4 descriptors > > > > > Notes: > 1) T1 and T2 AF_XDP sockets are two different sockets, > __xsk_generic_xmit will obtain two different mutexes. > 2) T1 and T2 can be executed simultaneously, there is no > critical section whatsoever between them. XSK represents a single queue and each queue is single producer single consumer. The fact that you can dup a socket and call sendmsg from different threads/processes does not lift that restriction. I think if you add synchronization on the userspace (lock(); sendmsg(); unlock();), that should help, right? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-06-02 15:28 ` Stanislav Fomichev @ 2025-06-02 16:03 ` Maciej Fijalkowski [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p2> 1 sibling, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2025-06-02 16:03 UTC (permalink / raw) To: Stanislav Fomichev Cc: Eryk Kubanski, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com On Mon, Jun 02, 2025 at 08:28:51AM -0700, Stanislav Fomichev wrote: > On 06/02, Eryk Kubanski wrote: > > > I'm not sure I understand what's the issue here. If you're using the > > > same XSK from different CPUs, you should take care of the ordering > > > yourself on the userspace side? > > > > It's not a problem with user-space Completion Queue READER side. > > Im talking exclusively about kernel-space Completion Queue WRITE side. > > > > This problem can occur when multiple sockets are bound to the same > > umem, device, queue id. In this situation Completion Queue is shared. > > This means it can be accessed by multiple threads on kernel-side. > > Any use is indeed protected by spinlock, however any write sequence > > (Acquire write slot as writer, write to slot, submit write slot to reader) > > isn't atomic in any way and it's possible to submit not-yet-sent packet > > descriptors back to user-space as TX completed. > > > > Up untill now, all write-back operations had two phases, each phase > > locks the spinlock and unlocks it: > > 1) Acquire slot + Write descriptor (increase cached-writer by N + write values) > > 2) Submit slot to the reader (increase writer by N) > > > > Slot submission was solely based on the timing. Let's consider situation, > > where two different threads issue a syscall for two different AF_XDP sockets > > that are bound to the same umem, dev, queue-id. > > > > AF_XDP setup: > > > > kernel-space > > > > Write Read > > +--+ +--+ > > | | | | > > | | | | > > | | | | > > Completion | | | | Fill > > Queue | | | | Queue > > | | | | > > | | | | > > | | | | > > | | | | > > +--+ +--+ > > Read Write > > user-space > > > > > > +--------+ +--------+ > > | AF_XDP | | AF_XDP | > > +--------+ +--------+ > > > > > > > > > > > > Possible out-of-order scenario: > > > > > > writer cached_writer1 cached_writer2 > > | | | > > | | | > > | | | > > | | | > > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > > | | | | | | | | | > > Completion Queue | | | | | | | | | > > | | | | | | | | | > > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > > | | | > > | | | > > |-----------------| | > > A) T1 syscall | | > > writes 2 | | > > descriptors |-----------------------------------| > > B) T2 syscall writes 4 descriptors > > > > > > > > > > Notes: > > 1) T1 and T2 AF_XDP sockets are two different sockets, > > __xsk_generic_xmit will obtain two different mutexes. > > 2) T1 and T2 can be executed simultaneously, there is no > > critical section whatsoever between them. > > XSK represents a single queue and each queue is single producer single > consumer. The fact that you can dup a socket and call sendmsg from > different threads/processes does not lift that restriction. I think > if you add synchronization on the userspace (lock(); sendmsg(); > unlock();), that should help, right? Eryk, can you tell us a bit more about HW you're using? The problem you described simply can not happen for HW with in-order completions. You can't complete descriptor from slot 5 without going through completion of slot 3. So our assumption is you're using HW with out-of-order completions, correct? If that is the case then we have to think about possible solutions which probably won't be straight-forward. As Stan said current fix is a no-go. ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p2>]
* RE: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p2> @ 2025-06-02 16:18 ` Eryk Kubanski 2025-06-04 13:50 ` Maciej Fijalkowski 2025-06-04 14:15 ` Eryk Kubanski 2025-06-10 9:35 ` Eryk Kubanski 2 siblings, 1 reply; 18+ messages in thread From: Eryk Kubanski @ 2025-06-02 16:18 UTC (permalink / raw) To: Maciej Fijalkowski, Stanislav Fomichev Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com > Eryk, can you tell us a bit more about HW you're using? The problem you > described simply can not happen for HW with in-order completions. You > can't complete descriptor from slot 5 without going through completion of > slot 3. So our assumption is you're using HW with out-of-order > completions, correct? Maciej this isn't reproduced on any hardware. I found this bug while working on generic AF_XDP. We're using MACVLAN deployment where, two or more sockets share single MACVLAN device queue. It doesn't even need to go out of host... SKB doesn't even need to complete in this case to observe this bug. It's enough if earlier writer just fails after descriptor write. This case is writen in my diagram Notes 5). Are you sure that __dev_direct_xmit will keep the packets on the same thread? What's about NAPI, XPS, IRQs, etc? If sendmsg() is issued by two threads, you don't know which one will complete faster. You can still have out-of-order completion in relation to descrpitor CQ write. This isn't problem with out-of-order HW completion, but the problem with out-of-order completion in relation to sendmsg() call and descriptor write. But this doesn't even need to be sent, as I explained above, situation where one of threads fails is more than enough to catch that bug. > If that is the case then we have to think about possible solutions which > probably won't be straight-forward. As Stan said current fix is a no-go. Okay what is your idea? In my opinion the only thing I can do is to just push the descriptors before or after __dev_direct_xmit() and keep these descriptors in some stack array. However this won't be compatible with behaviour of DRV deployed AF_XDP. Descriptors will be returned right after copy to SKB instead of after SKB is sent. If this is fine for you, It's fine for me. Otherwise this need to be tied to SKB lifetime, but how? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-06-02 16:18 ` Eryk Kubanski @ 2025-06-04 13:50 ` Maciej Fijalkowski 0 siblings, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2025-06-04 13:50 UTC (permalink / raw) To: Eryk Kubanski Cc: Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com On Mon, Jun 02, 2025 at 06:18:57PM +0200, Eryk Kubanski wrote: > > Eryk, can you tell us a bit more about HW you're using? The problem you > > described simply can not happen for HW with in-order completions. You > > can't complete descriptor from slot 5 without going through completion of > > slot 3. So our assumption is you're using HW with out-of-order > > completions, correct? > > Maciej this isn't reproduced on any hardware. > I found this bug while working on generic AF_XDP. > > We're using MACVLAN deployment where, two or more > sockets share single MACVLAN device queue. > It doesn't even need to go out of host... > > SKB doesn't even need to complete in this case > to observe this bug. It's enough if earlier writer > just fails after descriptor write. This case is > writen in my diagram Notes 5). Thanks for shedding a bit more light on it. In the future it would be nice if you would be able to come up with a reproducer of a bug that others could use on their side. Plus the overview of your deployment from the beginning would also help with people understanding the issue :) > > Are you sure that __dev_direct_xmit will keep > the packets on the same thread? What's about > NAPI, XPS, IRQs, etc? > > If sendmsg() is issued by two threads, you don't > know which one will complete faster. You can still > have out-of-order completion in relation to > descrpitor CQ write. > > This isn't problem with out-of-order HW completion, > but the problem with out-of-order completion in relation > to sendmsg() call and descriptor write. > > But this doesn't even need to be sent, as I > explained above, situation where one of threads > fails is more than enough to catch that bug. > > > If that is the case then we have to think about possible solutions which > > probably won't be straight-forward. As Stan said current fix is a no-go. > > Okay what is your idea? In my opinion the only > thing I can do is to just push the descriptors > before or after __dev_direct_xmit() and keep > these descriptors in some stack array. > However this won't be compatible with behaviour > of DRV deployed AF_XDP. Descriptors will be returned > right after copy to SKB instead of after SKB is sent. > If this is fine for you, It's fine for me. > > Otherwise this need to be tied to SKB lifetime, > but how? I'm looking into it, bottom line is that we discussed it with Magnus and agree that issue you're reporting needs to be addressed. I'll get back to you to discuss potential way of attacking it. Thanks! ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p2> 2025-06-02 16:18 ` Eryk Kubanski @ 2025-06-04 14:15 ` Eryk Kubanski 2025-06-09 19:41 ` Maciej Fijalkowski 2025-06-10 9:35 ` Eryk Kubanski 2 siblings, 1 reply; 18+ messages in thread From: Eryk Kubanski @ 2025-06-04 14:15 UTC (permalink / raw) To: Maciej Fijalkowski Cc: Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com > Thanks for shedding a bit more light on it. In the future it would be nice > if you would be able to come up with a reproducer of a bug that others > could use on their side. Plus the overview of your deployment from the > beginning would also help with people understanding the issue :) Sure, sorry for not giving that in advance, I found this issue during code analysis, not during deployment. It's not that simple to catch. I thought that in finite time we will agree :D. Next patchsets from me will have more information up-front. > I'm looking into it, bottom line is that we discussed it with Magnus and > agree that issue you're reporting needs to be addressed. > I'll get back to you to discuss potential way of attacking it. > Thanks! Thank you. Will this be discussed in the same mailing chain? Technically we need to tie descriptor write-back with skb lifetime. xsk_build_skb() function builds skb for TX, if i understand correctly this can work both ways either we perform zero-copy, so specific buffer page is attached to skb with given offset and size. OR perform the copy. If there was no zerocopy case, we could store it on stack array and simply recycle descriptor back right away without waiting for SKB completion. This zero-copy case makes it impossible right? We need to store these descriptors somewhere else and tie it to SKB destruction :(. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-06-04 14:15 ` Eryk Kubanski @ 2025-06-09 19:41 ` Maciej Fijalkowski 0 siblings, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2025-06-09 19:41 UTC (permalink / raw) To: Eryk Kubanski Cc: Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com On Wed, Jun 04, 2025 at 04:15:21PM +0200, Eryk Kubanski wrote: > > Thanks for shedding a bit more light on it. In the future it would be nice > > if you would be able to come up with a reproducer of a bug that others > > could use on their side. Plus the overview of your deployment from the > > beginning would also help with people understanding the issue :) > > Sure, sorry for not giving that in advance, I found this issue > during code analysis, not during deployment. > It's not that simple to catch. > I thought that in finite time we will agree :D. > Next patchsets from me will have more information up-front. > > > I'm looking into it, bottom line is that we discussed it with Magnus and > > agree that issue you're reporting needs to be addressed. > > I'll get back to you to discuss potential way of attacking it. > > Thanks! > > Thank you. > Will this be discussed in the same mailing chain? I've come with something as below. Idea is to embed addr at the end of linear part of skb/at the end of page frag. For first case we account 8 more bytes when calling sock_alloc_send_skb(), for the latter we alloc whole page anyways so we can just use the last 8 bytes. then in destructor we have access to addrs used during xmit descriptor production. This solution is free of additional struct members so performance-wise it should not be as impactful as previous approach. --- net/xdp/xsk.c | 37 ++++++++++++++++++++++++++++++------- net/xdp/xsk_queue.h | 8 ++++++++ 2 files changed, 38 insertions(+), 7 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 72c000c0ae5f..22f314ea9dc2 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -528,24 +528,39 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags); } -static int xsk_cq_reserve_addr_locked(struct xsk_buff_pool *pool, u64 addr) +static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) { unsigned long flags; int ret; spin_lock_irqsave(&pool->cq_lock, flags); - ret = xskq_prod_reserve_addr(pool->cq, addr); + ret = xskq_prod_reserve(pool->cq); spin_unlock_irqrestore(&pool->cq_lock, flags); return ret; } -static void xsk_cq_submit_locked(struct xsk_buff_pool *pool, u32 n) +static void xsk_cq_submit_locked(struct xsk_buff_pool *pool, struct sk_buff *skb) { + size_t addr_sz = sizeof(((struct xdp_desc *)0)->addr); unsigned long flags; + int nr_frags, i; + u64 addr; spin_lock_irqsave(&pool->cq_lock, flags); - xskq_prod_submit_n(pool->cq, n); + + addr = *(u64 *)(skb->head + skb->end - addr_sz); + xskq_prod_write_addr(pool->cq, addr); + + nr_frags = skb_shinfo(skb)->nr_frags; + + for (i = 0; i < nr_frags; i++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + + addr = *(u64 *)(skb_frag_address(frag) + PAGE_SIZE - addr_sz); + xskq_prod_write_addr(pool->cq, addr); + } + spin_unlock_irqrestore(&pool->cq_lock, flags); } @@ -572,7 +587,7 @@ static void xsk_destruct_skb(struct sk_buff *skb) *compl->tx_timestamp = ktime_get_tai_fast_ns(); } - xsk_cq_submit_locked(xdp_sk(skb->sk)->pool, xsk_get_num_desc(skb)); + xsk_cq_submit_locked(xdp_sk(skb->sk)->pool, skb); sock_wfree(skb); } @@ -656,6 +671,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, struct xdp_desc *desc) { + size_t addr_sz = sizeof(desc->addr); struct xsk_tx_metadata *meta = NULL; struct net_device *dev = xs->dev; struct sk_buff *skb = xs->skb; @@ -671,6 +687,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, } else { u32 hr, tr, len; void *buffer; + u8 *trailer; buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); len = desc->len; @@ -680,7 +697,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); tr = dev->needed_tailroom; - skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); + skb = sock_alloc_send_skb(&xs->sk, + hr + len + tr + addr_sz, + 1, &err); if (unlikely(!skb)) goto free_err; @@ -690,6 +709,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, err = skb_store_bits(skb, 0, buffer, len); if (unlikely(err)) goto free_err; + trailer = skb->head + skb->end - addr_sz; + memcpy(trailer, &desc->addr, addr_sz); + } else { int nr_frags = skb_shinfo(skb)->nr_frags; struct page *page; @@ -708,6 +730,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, vaddr = kmap_local_page(page); memcpy(vaddr, buffer, len); + memcpy(vaddr + PAGE_SIZE - addr_sz, &desc->addr, addr_sz); kunmap_local(vaddr); skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE); @@ -807,7 +830,7 @@ static int __xsk_generic_xmit(struct sock *sk) * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - err = xsk_cq_reserve_addr_locked(xs->pool, desc.addr); + err = xsk_cq_reserve_locked(xs->pool); if (err) { err = -EAGAIN; goto out; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 46d87e961ad6..9cd65d1bc81b 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -390,6 +390,14 @@ static inline int xskq_prod_reserve_addr(struct xsk_queue *q, u64 addr) return 0; } +static inline void xskq_prod_write_addr(struct xsk_queue *q, u64 addr) +{ + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + + /* A, matches D */ + ring->desc[q->ring->producer++ & q->ring_mask] = addr; +} + static inline void xskq_prod_write_addr_batch(struct xsk_queue *q, struct xdp_desc *descs, u32 nb_entries) { > > Technically we need to tie descriptor write-back > with skb lifetime. > xsk_build_skb() function builds skb for TX, > if i understand correctly this can work both ways > either we perform zero-copy, so specific buffer > page is attached to skb with given offset and size. > OR perform the copy. > > If there was no zerocopy case, we could store it > on stack array and simply recycle descriptor back > right away without waiting for SKB completion. > > This zero-copy case makes it impossible right? > We need to store these descriptors somewhere else > and tie it to SKB destruction :(. ^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: Re: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p2> 2025-06-02 16:18 ` Eryk Kubanski 2025-06-04 14:15 ` Eryk Kubanski @ 2025-06-10 9:35 ` Eryk Kubanski 2 siblings, 0 replies; 18+ messages in thread From: Eryk Kubanski @ 2025-06-10 9:35 UTC (permalink / raw) To: Maciej Fijalkowski Cc: Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com xsk_build_skb() doesn't seem to handle zerocopy case in that situation. (IFF_TX_SKB_NO_LINEAR device flag) How to return descriptors back after building skb in zerocopy mode? How does it work in this situation? ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p3>]
* RE: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p3> @ 2025-06-02 15:58 ` Eryk Kubanski 2025-06-10 9:11 ` Re: " Eryk Kubanski 1 sibling, 0 replies; 18+ messages in thread From: Eryk Kubanski @ 2025-06-02 15:58 UTC (permalink / raw) To: Stanislav Fomichev Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com > XSK represents a single queue and each queue is single producer single > consumer. The fact that you can dup a socket and call sendmsg from > different threads/processes does not lift that restriction. I think > if you add synchronization on the userspace (lock(); sendmsg(); > unlock();), that should help, right? It's not dup() of fd, It's perfectly legal AF_XDP setup. You can share single device queue between multiple AF_XDP sockets. Then RX and TX queue are per socket, while FILL / COMP are per queue id of device. Access to FILL and COMP must be synchronized on both user-space and kernel-space side. https://docs.kernel.org/networking/af_xdp.html > XDP_SHARED_UMEM bind flag > This flag enables you to bind multiple sockets to the same UMEM. > It works on the same queue id, between queue ids and between netdevs/devices. > In this mode, each socket has their own RX and TX rings as usual, but you are going > to have one or more FILL and COMPLETION ring pairs. You have to create one of these > pairs per unique netdev and queue id tuple that you bind to. Im not using sendmsg on dupped socket descriptor. It's just another socket bound to the same netdev, queue id pair. Even if that was the case, calling sendmsg on multiple threads should be perfectly legal in this situation. But i don't do that, each socket is handled exclusively by single thread. This is simply one of available AF_XDP deployments. It should just work. I don't need to introduce any locking scheme on my own except FILL/COMP queue locking, which I did. Definetly holding single netdevice-wide lock for any RX/TX operation isn't an option. It should just work. This problem is definetly in kernel. Provided explanation clearly proves It. It's caught red handed. Please analyze the code and see it for yourself. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Re: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p3> 2025-06-02 15:58 ` Eryk Kubanski @ 2025-06-10 9:11 ` Eryk Kubanski 2025-06-11 13:10 ` Maciej Fijalkowski 1 sibling, 1 reply; 18+ messages in thread From: Eryk Kubanski @ 2025-06-10 9:11 UTC (permalink / raw) To: Maciej Fijalkowski Cc: Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com > I've come with something as below. Idea is to embed addr at the end of > linear part of skb/at the end of page frag. Are you sure that this is safe for other components? So instead of storing entire array at the skb_shared_info (skb->end), we store It 8-bytes per PAGE fragment and 8-byte at skb->end. Technically noone should edit skb past-the-end, it looks good to me. In xsk_cq_submit_locked() you use only xskq_prod_write_addr. I think that this may cause synchronization issues on reader side. You don't perform ATOMIC_RELEASE, so this producer incrementation is atomic (u32) but it doesn't synchronize address write. I think that you should accumulate local producer and store it with ATOMIC_RELEASE after writing descriptors. In current situation someone may see producer incrementation, but address stored in this bank doesn't need to be synchronized yet. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-06-10 9:11 ` Re: " Eryk Kubanski @ 2025-06-11 13:10 ` Maciej Fijalkowski 0 siblings, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2025-06-11 13:10 UTC (permalink / raw) To: Eryk Kubanski Cc: Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com On Tue, Jun 10, 2025 at 11:11:25AM +0200, Eryk Kubanski wrote: > > I've come with something as below. Idea is to embed addr at the end of > > linear part of skb/at the end of page frag. > > Are you sure that this is safe for other components? > > So instead of storing entire array at the skb_shared_info (skb->end), > we store It 8-bytes per PAGE fragment and 8-byte at skb->end. > Technically noone should edit skb past-the-end, it > looks good to me. > > In xsk_cq_submit_locked() you use only xskq_prod_write_addr. > I think that this may cause synchronization issues on reader side. > You don't perform ATOMIC_RELEASE, so this producer incrementation > is atomic (u32) but it doesn't synchronize address write. > > I think that you should accumulate local producer and > store it with ATOMIC_RELEASE after writing descriptors. > In current situation someone may see producer incrementation, > but address stored in this bank doesn't need to be synchronized yet. Hi Eryk, yes I missed the smp_store_release that __xskq_prod_submit does - magic of late night refactors :) but main point was to share the approach in terms of addr storage. As you also said IFF_TX_SKB_NO_LINEAR case needs to be addressed as well as it uses the same skb destructor. Since these pages come directly from umem we don't need this quirk, we should be able to get the addr from page. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-06-02 9:27 ` Eryk Kubanski 2025-06-02 15:28 ` Stanislav Fomichev [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p3> @ 2025-07-03 23:37 ` Jason Xing 2025-07-04 12:34 ` Maciej Fijalkowski 2 siblings, 1 reply; 18+ messages in thread From: Jason Xing @ 2025-07-03 23:37 UTC (permalink / raw) To: e.kubanski Cc: Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com On Mon, Jun 2, 2025 at 5:28 PM Eryk Kubanski <e.kubanski@partner.samsung.com> wrote: > > > I'm not sure I understand what's the issue here. If you're using the > > same XSK from different CPUs, you should take care of the ordering > > yourself on the userspace side? > > It's not a problem with user-space Completion Queue READER side. > Im talking exclusively about kernel-space Completion Queue WRITE side. > > This problem can occur when multiple sockets are bound to the same > umem, device, queue id. In this situation Completion Queue is shared. > This means it can be accessed by multiple threads on kernel-side. > Any use is indeed protected by spinlock, however any write sequence > (Acquire write slot as writer, write to slot, submit write slot to reader) > isn't atomic in any way and it's possible to submit not-yet-sent packet > descriptors back to user-space as TX completed. > > Up untill now, all write-back operations had two phases, each phase > locks the spinlock and unlocks it: > 1) Acquire slot + Write descriptor (increase cached-writer by N + write values) > 2) Submit slot to the reader (increase writer by N) > > Slot submission was solely based on the timing. Let's consider situation, > where two different threads issue a syscall for two different AF_XDP sockets > that are bound to the same umem, dev, queue-id. > > AF_XDP setup: > > kernel-space > > Write Read > +--+ +--+ > | | | | > | | | | > | | | | > Completion | | | | Fill > Queue | | | | Queue > | | | | > | | | | > | | | | > | | | | > +--+ +--+ > Read Write > user-space > > > +--------+ +--------+ > | AF_XDP | | AF_XDP | > +--------+ +--------+ > > > > > > Possible out-of-order scenario: > > > writer cached_writer1 cached_writer2 > | | | > | | | > | | | > | | | > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > | | | | | | | | | > Completion Queue | | | | | | | | | > | | | | | | | | | > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > | | | > | | | > |-----------------| | > A) T1 syscall | | > writes 2 | | > descriptors |-----------------------------------| > B) T2 syscall writes 4 descriptors > Hi ALL, Since Maciej posted a related patch to fix this issue, it took me a little while to trace back to this thread. So here we are. > Notes: > 1) T1 and T2 AF_XDP sockets are two different sockets, > __xsk_generic_xmit will obtain two different mutexes. > 2) T1 and T2 can be executed simultaneously, there is no > critical section whatsoever between them. > 3) T1 and T2 will obtain Completion Queue Lock for acquire + write, > only slot acquire + write are under lock. > 4) T1 and T2 completion (skb destructor) > doesn't need to be the same order as A) and B). > 5) What if T1 fails after T2 acquires slots? What does it mean by 'fails'. Could you point out the accurate function you said? > cached_writer will be decreased by 2, T2 will > submit failed descriptors of T1 (they shall be > retransmitted in next TX). > Submission of writer will move writer by 4 slots > 2 of these slots have failed T1 values. Last two > slots of T2 will be missing, descriptor leak. I wonder why the leak problem happens? IIUC, in the __xsk_generic_xmit() + copy mode, xsk only tries to send the descriptor from its own tx ring to the driver, like virtio_net as an example. As you said, there are two xsks running in parallel. Why could T2 send the descriptors that T1 puts into the completion queue? __dev_direct_xmit() only passes the @skb that is built based on the addr from per xsk tx ring. Here are some maps related to the process you talked about: case 1) // T1 writes 2 descs in cq [--1--][--2--][-null-][-null-][-null-][-null-][-null-] | cached_prod // T1 fails because of NETDEV_TX_BUSY, and cq.cached_prod is decreased by 2. [-null-][-null-][-null-][-null-][-null-][-null-][-null-] | cached_prod // T2 starts to write at the first unused descs [--1--][--2--][--3--][--4--][-null-][-null-][-null-] | cached_prod So why can T2 send out the descs belonging to T1? In __xsk_generic_xmit(), xsk_cq_reserve_addr_locked() initialises the addr of acquired desc so it overwrites the invalid one previously owned by T1. The addr is from per xsk tx ring... I'm lost. Could you please share the detailed/key functions to shed more lights on this? Thanks in advance. I know you're not running on the (virtual) nic actually, but I still want to know the possibility of the issue with normal end-to-end transmission. In the virtio_net driver, __dev_direct_xmit() returns BUSY only if the BQL takes effect, so your case might not happen here? The reason why I asked is that I have a similar use case with virtio_net and I am trying to understand whether it can happen in the future. Thanks, Jason > 6) What if T2 completes before T1? writer will be > moved by 4 slots. 2 of them are slots filled by T1. > T2 will complete 2 own slots and 2 slots of T1, It's bad. > T1 will complete last 2 slots of T2, also bad. > > This out-of-order completion can effectively cause User-space <-> Kernel-space > data race. This patch solves that, by only acquiring cached_writer first and > do the completion (sumission (write + increase writer)) after. This is the only > way to make that bulletproof for multithreaded access, failures and > out-of-order skb completions. > > > This is definitely a no-go (sk_buff and skb_shared_info space is > > precious). > > Okay so where should I store It? Can you give me some advice? > > I left that there, because there is every information related to > skb desctruction. Additionally this is the only place in skb related > code that defines anything related to xsk: metadata, number of descriptors. > SKBUFF doesn't. I need to hold this information somewhere, and skbuff or > skb_shared_info are the only place I can store it. This need to be invariant > across all skb fragments, and be released after skb completes. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-07-03 23:37 ` Jason Xing @ 2025-07-04 12:34 ` Maciej Fijalkowski 2025-07-04 15:29 ` Jason Xing 0 siblings, 1 reply; 18+ messages in thread From: Maciej Fijalkowski @ 2025-07-04 12:34 UTC (permalink / raw) To: Jason Xing Cc: e.kubanski, Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com On Fri, Jul 04, 2025 at 07:37:22AM +0800, Jason Xing wrote: > On Mon, Jun 2, 2025 at 5:28 PM Eryk Kubanski > <e.kubanski@partner.samsung.com> wrote: > > > > > I'm not sure I understand what's the issue here. If you're using the > > > same XSK from different CPUs, you should take care of the ordering > > > yourself on the userspace side? > > > > It's not a problem with user-space Completion Queue READER side. > > Im talking exclusively about kernel-space Completion Queue WRITE side. > > > > This problem can occur when multiple sockets are bound to the same > > umem, device, queue id. In this situation Completion Queue is shared. > > This means it can be accessed by multiple threads on kernel-side. > > Any use is indeed protected by spinlock, however any write sequence > > (Acquire write slot as writer, write to slot, submit write slot to reader) > > isn't atomic in any way and it's possible to submit not-yet-sent packet > > descriptors back to user-space as TX completed. > > > > Up untill now, all write-back operations had two phases, each phase > > locks the spinlock and unlocks it: > > 1) Acquire slot + Write descriptor (increase cached-writer by N + write values) > > 2) Submit slot to the reader (increase writer by N) > > > > Slot submission was solely based on the timing. Let's consider situation, > > where two different threads issue a syscall for two different AF_XDP sockets > > that are bound to the same umem, dev, queue-id. > > > > AF_XDP setup: > > > > kernel-space > > > > Write Read > > +--+ +--+ > > | | | | > > | | | | > > | | | | > > Completion | | | | Fill > > Queue | | | | Queue > > | | | | > > | | | | > > | | | | > > | | | | > > +--+ +--+ > > Read Write > > user-space > > > > > > +--------+ +--------+ > > | AF_XDP | | AF_XDP | > > +--------+ +--------+ > > > > > > > > > > > > Possible out-of-order scenario: > > > > > > writer cached_writer1 cached_writer2 > > | | | > > | | | > > | | | > > | | | > > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > > | | | | | | | | | > > Completion Queue | | | | | | | | | > > | | | | | | | | | > > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > > | | | > > | | | > > |-----------------| | > > A) T1 syscall | | > > writes 2 | | > > descriptors |-----------------------------------| > > B) T2 syscall writes 4 descriptors > > > > Hi ALL, > > Since Maciej posted a related patch to fix this issue, it took me a > little while to trace back to this thread. So here we are. > > > Notes: > > 1) T1 and T2 AF_XDP sockets are two different sockets, > > __xsk_generic_xmit will obtain two different mutexes. > > 2) T1 and T2 can be executed simultaneously, there is no > > critical section whatsoever between them. > > 3) T1 and T2 will obtain Completion Queue Lock for acquire + write, > > only slot acquire + write are under lock. > > 4) T1 and T2 completion (skb destructor) > > doesn't need to be the same order as A) and B). > > 5) What if T1 fails after T2 acquires slots? > > What does it mean by 'fails'. Could you point out the accurate > function you said? > > > cached_writer will be decreased by 2, T2 will > > submit failed descriptors of T1 (they shall be > > retransmitted in next TX). > > Submission of writer will move writer by 4 slots > > 2 of these slots have failed T1 values. Last two > > slots of T2 will be missing, descriptor leak. > > I wonder why the leak problem happens? IIUC, in the > __xsk_generic_xmit() + copy mode, xsk only tries to send the > descriptor from its own tx ring to the driver, like virtio_net as an > example. As you said, there are two xsks running in parallel. Why > could T2 send the descriptors that T1 puts into the completion queue? > __dev_direct_xmit() only passes the @skb that is built based on the > addr from per xsk tx ring. I admit it is non-trivial case. Per my understanding before, based on Eryk's example, if T1 failed xmit and reduced the cached_prod, T2 in its skb destructor would release two T1 umem addresses and two T2 addrs instead of 4 T2 addrs. Putting this aside though, we had *correct* behavior before xsk multi-buffer support, we should not let that change make it into kernel in the first place. Hence my motivation to restore it.$ > > Here are some maps related to the process you talked about: > case 1) > // T1 writes 2 descs in cq > [--1--][--2--][-null-][-null-][-null-][-null-][-null-] > | > cached_prod > > // T1 fails because of NETDEV_TX_BUSY, and cq.cached_prod is decreased by 2. > [-null-][-null-][-null-][-null-][-null-][-null-][-null-] > | > cached_prod > > // T2 starts to write at the first unused descs > [--1--][--2--][--3--][--4--][-null-][-null-][-null-] > | > cached_prod > So why can T2 send out the descs belonging to T1? In > __xsk_generic_xmit(), xsk_cq_reserve_addr_locked() initialises the > addr of acquired desc so it overwrites the invalid one previously > owned by T1. The addr is from per xsk tx ring... I'm lost. Could you > please share the detailed/key functions to shed more lights on this? > Thanks in advance. Take another look at Eryk's example. The case he was providing was when t1 produced smaller amount of addrs followed by t2 with bigger count. Then due to t1 failure, t2 was providing addrs produced by t1. Your example talks about immediate failure of t1 whereas Eryk talked about: 1. t1 produces addrs to cq 2. t2 produces addrs to cq 3. t2 starts xmit 4. t1 fails for some reason down in __xsk_generic_xmit() 4a. t1 reduces cached_prod 5. t2 completes, updates global state of cq's producer and exposing addrs produced by t1 and misses part of addrs produced by t2 > > I know you're not running on the (virtual) nic actually, but I still > want to know the possibility of the issue with normal end-to-end > transmission. In the virtio_net driver, __dev_direct_xmit() returns > BUSY only if the BQL takes effect, so your case might not happen here? > The reason why I asked is that I have a similar use case with > virtio_net and I am trying to understand whether it can happen in the > future. > > Thanks, > Jason > > > > 6) What if T2 completes before T1? writer will be > > moved by 4 slots. 2 of them are slots filled by T1. > > T2 will complete 2 own slots and 2 slots of T1, It's bad. > > T1 will complete last 2 slots of T2, also bad. > > > > This out-of-order completion can effectively cause User-space <-> Kernel-space > > data race. This patch solves that, by only acquiring cached_writer first and > > do the completion (sumission (write + increase writer)) after. This is the only > > way to make that bulletproof for multithreaded access, failures and > > out-of-order skb completions. > > > > > This is definitely a no-go (sk_buff and skb_shared_info space is > > > precious). > > > > Okay so where should I store It? Can you give me some advice? > > > > I left that there, because there is every information related to > > skb desctruction. Additionally this is the only place in skb related > > code that defines anything related to xsk: metadata, number of descriptors. > > SKBUFF doesn't. I need to hold this information somewhere, and skbuff or > > skb_shared_info are the only place I can store it. This need to be invariant > > across all skb fragments, and be released after skb completes. > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-07-04 12:34 ` Maciej Fijalkowski @ 2025-07-04 15:29 ` Jason Xing 0 siblings, 0 replies; 18+ messages in thread From: Jason Xing @ 2025-07-04 15:29 UTC (permalink / raw) To: Maciej Fijalkowski Cc: e.kubanski, Stanislav Fomichev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com On Fri, Jul 4, 2025 at 8:35 PM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Fri, Jul 04, 2025 at 07:37:22AM +0800, Jason Xing wrote: > > On Mon, Jun 2, 2025 at 5:28 PM Eryk Kubanski > > <e.kubanski@partner.samsung.com> wrote: > > > > > > > I'm not sure I understand what's the issue here. If you're using the > > > > same XSK from different CPUs, you should take care of the ordering > > > > yourself on the userspace side? > > > > > > It's not a problem with user-space Completion Queue READER side. > > > Im talking exclusively about kernel-space Completion Queue WRITE side. > > > > > > This problem can occur when multiple sockets are bound to the same > > > umem, device, queue id. In this situation Completion Queue is shared. > > > This means it can be accessed by multiple threads on kernel-side. > > > Any use is indeed protected by spinlock, however any write sequence > > > (Acquire write slot as writer, write to slot, submit write slot to reader) > > > isn't atomic in any way and it's possible to submit not-yet-sent packet > > > descriptors back to user-space as TX completed. > > > > > > Up untill now, all write-back operations had two phases, each phase > > > locks the spinlock and unlocks it: > > > 1) Acquire slot + Write descriptor (increase cached-writer by N + write values) > > > 2) Submit slot to the reader (increase writer by N) > > > > > > Slot submission was solely based on the timing. Let's consider situation, > > > where two different threads issue a syscall for two different AF_XDP sockets > > > that are bound to the same umem, dev, queue-id. > > > > > > AF_XDP setup: > > > > > > kernel-space > > > > > > Write Read > > > +--+ +--+ > > > | | | | > > > | | | | > > > | | | | > > > Completion | | | | Fill > > > Queue | | | | Queue > > > | | | | > > > | | | | > > > | | | | > > > | | | | > > > +--+ +--+ > > > Read Write > > > user-space > > > > > > > > > +--------+ +--------+ > > > | AF_XDP | | AF_XDP | > > > +--------+ +--------+ > > > > > > > > > > > > > > > > > > Possible out-of-order scenario: > > > > > > > > > writer cached_writer1 cached_writer2 > > > | | | > > > | | | > > > | | | > > > | | | > > > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > > > | | | | | | | | | > > > Completion Queue | | | | | | | | | > > > | | | | | | | | | > > > +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ > > > | | | > > > | | | > > > |-----------------| | > > > A) T1 syscall | | > > > writes 2 | | > > > descriptors |-----------------------------------| > > > B) T2 syscall writes 4 descriptors > > > > > > > Hi ALL, > > > > Since Maciej posted a related patch to fix this issue, it took me a > > little while to trace back to this thread. So here we are. > > > > > Notes: > > > 1) T1 and T2 AF_XDP sockets are two different sockets, > > > __xsk_generic_xmit will obtain two different mutexes. > > > 2) T1 and T2 can be executed simultaneously, there is no > > > critical section whatsoever between them. > > > 3) T1 and T2 will obtain Completion Queue Lock for acquire + write, > > > only slot acquire + write are under lock. > > > 4) T1 and T2 completion (skb destructor) > > > doesn't need to be the same order as A) and B). > > > 5) What if T1 fails after T2 acquires slots? > > > > What does it mean by 'fails'. Could you point out the accurate > > function you said? > > > > > cached_writer will be decreased by 2, T2 will > > > submit failed descriptors of T1 (they shall be > > > retransmitted in next TX). > > > Submission of writer will move writer by 4 slots > > > 2 of these slots have failed T1 values. Last two > > > slots of T2 will be missing, descriptor leak. > > > > I wonder why the leak problem happens? IIUC, in the > > __xsk_generic_xmit() + copy mode, xsk only tries to send the > > descriptor from its own tx ring to the driver, like virtio_net as an > > example. As you said, there are two xsks running in parallel. Why > > could T2 send the descriptors that T1 puts into the completion queue? > > __dev_direct_xmit() only passes the @skb that is built based on the > > addr from per xsk tx ring. > > I admit it is non-trivial case. > > Per my understanding before, based on Eryk's example, if T1 failed xmit > and reduced the cached_prod, T2 in its skb destructor would release two T1 > umem addresses and two T2 addrs instead of 4 T2 addrs. > > Putting this aside though, we had *correct* behavior before xsk > multi-buffer support, we should not let that change make it into kernel in > the first place. Hence my motivation to restore it.$ > > > > > Here are some maps related to the process you talked about: > > case 1) > > // T1 writes 2 descs in cq > > [--1--][--2--][-null-][-null-][-null-][-null-][-null-] > > | > > cached_prod > > > > // T1 fails because of NETDEV_TX_BUSY, and cq.cached_prod is decreased by 2. > > [-null-][-null-][-null-][-null-][-null-][-null-][-null-] > > | > > cached_prod > > > > // T2 starts to write at the first unused descs > > [--1--][--2--][--3--][--4--][-null-][-null-][-null-] > > | > > cached_prod > > So why can T2 send out the descs belonging to T1? In > > __xsk_generic_xmit(), xsk_cq_reserve_addr_locked() initialises the > > addr of acquired desc so it overwrites the invalid one previously > > owned by T1. The addr is from per xsk tx ring... I'm lost. Could you > > please share the detailed/key functions to shed more lights on this? > > Thanks in advance. > > Take another look at Eryk's example. The case he was providing was when t1 > produced smaller amount of addrs followed by t2 with bigger count. Then > due to t1 failure, t2 was providing addrs produced by t1. > > Your example talks about immediate failure of t1 whereas Eryk talked > about: > 1. t1 produces addrs to cq > 2. t2 produces addrs to cq > 3. t2 starts xmit > 4. t1 fails for some reason down in __xsk_generic_xmit() > 4a. t1 reduces cached_prod > 5. t2 completes, updates global state of cq's producer and exposing addrs > produced by t1 and misses part of addrs produced by t2 Wow, thanks for sharing your understanding on this. It's very clear and easy to understand to me. Thanks, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() 2025-05-30 10:34 ` [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() e.kubanski ` (2 preceding siblings ...) [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p1> @ 2025-06-04 14:41 ` kernel test robot 3 siblings, 0 replies; 18+ messages in thread From: kernel test robot @ 2025-06-04 14:41 UTC (permalink / raw) To: e.kubanski Cc: oe-lkp, lkp, netdev, bpf, linux-kernel, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, e.kubanski, oliver.sang Hello, kernel test robot noticed a 16.6% regression of hackbench.throughput on: commit: 2adc2445a5ae93efed1e2e6646a37a3afff8c0e9 ("[PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit()") url: https://github.com/intel-lab-lkp/linux/commits/e-kubanski/xsk-Fix-out-of-order-segment-free-in-__xsk_generic_xmit/20250530-183723 base: https://git.kernel.org/cgit/linux/kernel/git/bpf/bpf.git master patch link: https://lore.kernel.org/all/20250530103456.53564-1-e.kubanski@partner.samsung.com/ patch subject: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() testcase: hackbench config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory parameters: nr_threads: 50% iterations: 4 mode: threads ipc: socket cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+---------------------------------------------------------------------------------------------+ | testcase: change | netperf: netperf.Throughput_Mbps 93440.1% improvement | | test machine | 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory | | test parameters | cluster=cs-localhost | | | cpufreq_governor=performance | | | ip=ipv4 | | | nr_threads=50% | | | runtime=300s | | | test=SCTP_STREAM | +------------------+---------------------------------------------------------------------------------------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202506042255.a3161554-lkp@intel.com Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250604/202506042255.a3161554-lkp@intel.com ========================================================================================= compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase: gcc-12/performance/socket/4/x86_64-rhel-9.4/threads/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench commit: 90b83efa67 ("Merge tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next") 2adc2445a5 ("xsk: Fix out of order segment free in __xsk_generic_xmit()") 90b83efa6701656e 2adc2445a5ae93efed1e2e6646a ---------------- --------------------------- %stddev %change %stddev \ | \ 8018 -11.4% 7101 sched_debug.cpu.curr->pid.avg 195.95 +14.6% 224.54 uptime.boot 0.01 -16.8% 0.01 vmstat.swap.so 764179 ± 3% -8.2% 701708 vmstat.system.in 2.13 ± 13% -0.5 1.63 ± 2% mpstat.cpu.all.idle% 0.53 -0.1 0.44 mpstat.cpu.all.irq% 7.09 -1.1 5.95 mpstat.cpu.all.usr% 558.00 ± 9% +31.5% 733.67 ± 26% perf-c2c.DRAM.local 23177 ± 6% +17.3% 27181 ± 6% perf-c2c.DRAM.remote 161171 +3.4% 166696 perf-c2c.HITM.total 92761 +4.0% 96435 proc-vmstat.nr_slab_reclaimable 3143593 ± 4% +9.5% 3442455 ± 6% proc-vmstat.nr_unaccepted 2154280 ± 8% +19.0% 2563937 ± 11% proc-vmstat.numa_interleave 3260150 ± 6% +63.8% 5339352 ± 6% proc-vmstat.pgalloc_dma32 34526 ± 3% +30.6% 45105 ± 18% proc-vmstat.pgrefill 1700255 +119.7% 3735986 ± 2% proc-vmstat.pgskip_device 423595 -16.6% 353120 hackbench.throughput 418312 -17.0% 347114 hackbench.throughput_avg 423595 -16.6% 353120 hackbench.throughput_best 408057 -17.1% 338325 hackbench.throughput_worst 144.25 +20.3% 173.60 hackbench.time.elapsed_time 144.25 +20.3% 173.60 hackbench.time.elapsed_time.max 1.098e+08 ± 2% +17.6% 1.291e+08 hackbench.time.involuntary_context_switches 81915 +1.7% 83317 hackbench.time.minor_page_faults 16768 +22.4% 20519 hackbench.time.system_time 4.043e+08 ± 4% +24.1% 5.016e+08 hackbench.time.voluntary_context_switches 4.874e+10 -11.8% 4.301e+10 perf-stat.i.branch-instructions 0.45 +0.1 0.54 perf-stat.i.branch-miss-rate% 2.13e+08 +5.9% 2.255e+08 perf-stat.i.branch-misses 7.45 ± 2% +0.2 7.69 perf-stat.i.cache-miss-rate% 1.501e+08 ± 4% -16.6% 1.253e+08 perf-stat.i.cache-misses 2.054e+09 -19.6% 1.652e+09 perf-stat.i.cache-references 1.38 +14.9% 1.59 perf-stat.i.cpi 3.224e+11 +1.3% 3.265e+11 perf-stat.i.cpu-cycles 404652 ± 3% +12.2% 454021 perf-stat.i.cpu-migrations 2182 ± 4% +20.8% 2635 perf-stat.i.cycles-between-cache-misses 2.333e+11 -11.9% 2.054e+11 perf-stat.i.instructions 0.73 -12.8% 0.63 perf-stat.i.ipc 0.44 +0.1 0.52 perf-stat.overall.branch-miss-rate% 7.31 ± 2% +0.3 7.58 perf-stat.overall.cache-miss-rate% 1.38 +15.0% 1.59 perf-stat.overall.cpi 2151 ± 4% +21.2% 2607 perf-stat.overall.cycles-between-cache-misses 0.72 -13.0% 0.63 perf-stat.overall.ipc 4.841e+10 -11.7% 4.277e+10 perf-stat.ps.branch-instructions 2.114e+08 +6.0% 2.241e+08 perf-stat.ps.branch-misses 1.491e+08 ± 4% -16.5% 1.245e+08 perf-stat.ps.cache-misses 2.039e+09 -19.5% 1.642e+09 perf-stat.ps.cache-references 3.201e+11 +1.4% 3.246e+11 perf-stat.ps.cpu-cycles 401515 ± 3% +12.3% 450975 perf-stat.ps.cpu-migrations 2.317e+11 -11.8% 2.042e+11 perf-stat.ps.instructions 3.368e+13 +6.0% 3.569e+13 perf-stat.total.instructions 1.64 ±205% -98.2% 0.03 ±183% perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 65.76 ± 92% -73.9% 17.16 ± 93% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.76 ± 75% -73.3% 1.54 ± 42% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 100.68 ±112% +329.5% 432.45 ± 77% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio 3.19 ±212% -99.0% 0.03 ±183% perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 1402 ± 35% -62.6% 524.30 ± 93% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.57 ± 14% -88.9% 0.62 ±223% perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags 4.98 ± 12% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 224.22 ± 65% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64 475.81 ± 41% -89.0% 52.54 ± 81% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 5.33 ±124% +628.1% 38.83 ± 41% perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc 12463 ± 15% -84.9% 1881 ±223% perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags 23237 ± 16% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 13.33 ±216% +1085.0% 158.00 ± 54% perf-sched.wait_and_delay.count.devkmsg_read.vfs_read.ksys_read.do_syscall_64 14.17 ± 61% -100.0% 0.00 perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64 610.67 ± 21% -25.9% 452.67 ± 27% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 24.50 ±157% +597.3% 170.83 ± 37% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 1018 ± 33% -88.8% 113.91 ±223% perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags 1165 ± 29% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 1285 ± 70% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64 1.64 ±205% -98.2% 0.03 ±183% perf-sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 7.82 ±216% -99.1% 0.07 ± 70% perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0 224.22 ± 65% -100.0% 0.00 perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64 473.32 ± 42% -90.2% 46.32 ± 78% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 100.68 ±112% +329.5% 432.45 ± 77% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio 3.19 ±212% -99.0% 0.03 ±183% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 169.23 ±219% -99.9% 0.24 ± 65% perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0 1285 ± 70% -100.0% 0.00 perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64 9.37 ± 2% -3.6 5.80 perf-profile.calltrace.cycles-pp.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 5.90 ± 4% -2.2 3.66 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg 2.45 ± 10% -1.7 0.76 perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 47.96 -1.3 46.61 perf-profile.calltrace.cycles-pp.write 6.20 -1.2 4.99 perf-profile.calltrace.cycles-pp.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write 4.58 ± 2% -0.9 3.71 perf-profile.calltrace.cycles-pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write 6.97 -0.8 6.13 perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 6.88 -0.8 6.04 perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 6.70 ± 2% -0.8 5.90 perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg 3.73 ± 4% -0.7 3.04 perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter 3.60 ± 4% -0.7 2.93 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg 3.54 ± 4% -0.7 2.89 perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable 2.89 ± 5% -0.6 2.25 perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write 3.40 ± 2% -0.6 2.80 perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 2.62 ± 7% -0.6 2.06 ± 4% perf-profile.calltrace.cycles-pp.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 2.94 -0.5 2.42 perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write 3.15 ± 4% -0.5 2.67 perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 3.20 ± 2% -0.5 2.72 perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 3.12 ± 5% -0.5 2.65 perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 3.09 ± 4% -0.5 2.62 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg 3.05 ± 2% -0.4 2.60 perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg 2.97 ± 2% -0.4 2.54 perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic 2.46 -0.4 2.03 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write 2.44 -0.4 2.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read 2.84 ± 2% -0.4 2.42 perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic 1.72 ± 3% -0.4 1.32 perf-profile.calltrace.cycles-pp.skb_set_owner_w.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write 1.90 -0.4 1.51 perf-profile.calltrace.cycles-pp.clear_bhb_loop.write 1.30 -0.4 0.91 perf-profile.calltrace.cycles-pp.__slab_free.kfree.skb_release_data.consume_skb.unix_stream_read_generic 2.28 ± 2% -0.3 1.94 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 1.86 -0.3 1.52 perf-profile.calltrace.cycles-pp.clear_bhb_loop.read 3.08 -0.3 2.78 perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kfree.skb_release_data.consume_skb.unix_stream_read_generic 0.61 ± 6% -0.3 0.33 ± 70% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule 3.12 -0.3 2.86 perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic 1.62 ± 6% -0.3 1.36 ± 2% perf-profile.calltrace.cycles-pp.skb_queue_tail.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write 1.28 -0.2 1.05 perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write 2.91 ± 2% -0.2 2.68 perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor 1.70 ± 4% -0.2 1.47 perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key 1.13 -0.2 0.92 perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 1.24 -0.2 1.04 perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write 0.98 ± 6% -0.2 0.78 ± 8% perf-profile.calltrace.cycles-pp.fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 1.20 ± 6% -0.2 1.00 perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic 1.12 ± 6% -0.2 0.93 perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule 1.16 ± 5% -0.2 0.97 perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_timeout 0.84 ± 5% -0.2 0.66 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter 1.42 ± 4% -0.2 1.24 perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common 1.35 ± 4% -0.2 1.18 perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function 0.78 ± 6% -0.2 0.61 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg 0.71 -0.1 0.59 perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 0.66 ± 2% -0.1 0.54 perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter 2.15 ± 2% -0.1 2.04 perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter 0.81 ± 4% -0.1 0.71 perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up 0.93 ± 3% -0.1 0.83 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic 0.60 ± 6% -0.1 0.51 perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 0.90 ± 3% -0.1 0.81 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_timeout 2.57 -0.1 2.48 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags 0.70 -0.1 0.64 perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg 0.75 -0.1 0.69 perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_free_hook.kfree.skb_release_data.consume_skb 0.58 -0.0 0.54 perf-profile.calltrace.cycles-pp.mod_objcg_state.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags 0.73 -0.0 0.69 perf-profile.calltrace.cycles-pp.unix_write_space.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb 0.78 +0.2 0.93 perf-profile.calltrace.cycles-pp.obj_cgroup_charge.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb 37.84 +0.6 38.45 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 36.76 +0.8 37.56 perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 35.40 +1.1 36.46 perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64 49.97 +2.0 52.00 perf-profile.calltrace.cycles-pp.read 44.96 +2.9 47.89 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read 44.66 +3.0 47.64 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 42.34 +3.4 45.72 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 40.65 +3.7 44.36 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 39.50 +3.9 43.41 perf-profile.calltrace.cycles-pp.sock_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 38.66 +4.1 42.71 perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64 38.11 +4.1 42.26 perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.vfs_read.ksys_read 37.67 +4.2 41.89 perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.vfs_read 18.28 +4.4 22.69 perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write 15.55 ± 2% +4.9 20.48 perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write 15.24 ± 2% +5.0 20.23 perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter 2.08 ± 10% +7.3 9.40 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.get_partial_node.___slab_alloc.__kmalloc_node_track_caller_noprof 2.10 ± 10% +7.3 9.44 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.get_partial_node.___slab_alloc.__kmalloc_node_track_caller_noprof.kmalloc_reserve 2.36 ± 9% +7.6 9.93 ± 2% perf-profile.calltrace.cycles-pp.get_partial_node.___slab_alloc.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb 7.51 ± 2% +7.6 15.09 perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg 6.99 ± 2% +7.7 14.64 perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 2.89 ± 8% +8.0 10.89 ± 2% perf-profile.calltrace.cycles-pp.___slab_alloc.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags 12.34 +10.2 22.53 perf-profile.calltrace.cycles-pp.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 9.02 ± 3% +10.7 19.71 perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 8.36 ± 3% +10.7 19.08 perf-profile.calltrace.cycles-pp.kfree.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg 3.21 ± 10% +11.2 14.44 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_partials.kfree.skb_release_data 3.30 ± 10% +11.3 14.61 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__put_partials.kfree.skb_release_data.consume_skb 3.44 ± 10% +11.5 14.89 ± 2% perf-profile.calltrace.cycles-pp.__put_partials.kfree.skb_release_data.consume_skb.unix_stream_read_generic 9.43 ± 2% -3.6 5.84 perf-profile.children.cycles-pp.kmem_cache_free 5.98 ± 3% -2.3 3.72 perf-profile.children.cycles-pp.kmem_cache_alloc_node_noprof 48.56 -1.5 47.11 perf-profile.children.cycles-pp.write 6.23 -1.2 5.02 perf-profile.children.cycles-pp.sock_def_readable 6.60 -0.9 5.68 perf-profile.children.cycles-pp.__memcg_slab_free_hook 4.32 ± 2% -0.9 3.40 perf-profile.children.cycles-pp._raw_spin_lock 7.01 -0.8 6.16 perf-profile.children.cycles-pp.unix_stream_read_actor 4.86 ± 2% -0.8 4.02 perf-profile.children.cycles-pp.__wake_up_sync_key 6.91 -0.8 6.08 perf-profile.children.cycles-pp.skb_copy_datagram_iter 6.76 -0.8 5.95 perf-profile.children.cycles-pp.__skb_datagram_iter 3.64 ± 6% -0.8 2.87 ± 5% perf-profile.children.cycles-pp.fdget_pos 3.80 -0.7 3.06 perf-profile.children.cycles-pp.clear_bhb_loop 4.00 ± 3% -0.7 3.35 perf-profile.children.cycles-pp.__wake_up_common 3.87 ± 4% -0.6 3.24 perf-profile.children.cycles-pp.autoremove_wake_function 3.83 ± 4% -0.6 3.20 perf-profile.children.cycles-pp.try_to_wake_up 4.15 ± 3% -0.6 3.53 perf-profile.children.cycles-pp.__schedule 2.47 -0.6 1.86 perf-profile.children.cycles-pp.__slab_free 4.04 ± 3% -0.6 3.49 perf-profile.children.cycles-pp.schedule 3.05 -0.5 2.52 perf-profile.children.cycles-pp.entry_SYSCALL_64 3.00 -0.5 2.49 perf-profile.children.cycles-pp.skb_copy_datagram_from_iter 4.49 -0.5 3.99 perf-profile.children.cycles-pp.__check_object_size 3.24 ± 2% -0.5 2.76 perf-profile.children.cycles-pp.skb_release_head_state 3.11 ± 2% -0.5 2.65 perf-profile.children.cycles-pp.unix_destruct_scm 4.98 -0.4 4.53 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook 3.00 ± 2% -0.4 2.56 perf-profile.children.cycles-pp._copy_to_iter 3.51 ± 4% -0.4 3.08 perf-profile.children.cycles-pp.schedule_timeout 2.88 ± 2% -0.4 2.46 perf-profile.children.cycles-pp.sock_wfree 1.74 ± 3% -0.4 1.33 perf-profile.children.cycles-pp.skb_set_owner_w 0.45 ± 26% -0.4 0.07 ± 10% perf-profile.children.cycles-pp.common_startup_64 0.45 ± 26% -0.4 0.07 ± 10% perf-profile.children.cycles-pp.cpu_startup_entry 0.44 ± 26% -0.4 0.07 ± 10% perf-profile.children.cycles-pp.do_idle 0.44 ± 26% -0.4 0.07 ± 11% perf-profile.children.cycles-pp.start_secondary 1.82 ± 2% -0.3 1.51 perf-profile.children.cycles-pp.its_return_thunk 0.36 ± 28% -0.3 0.06 ± 8% perf-profile.children.cycles-pp.cpuidle_idle_call 0.34 ± 27% -0.3 0.04 ± 45% perf-profile.children.cycles-pp.cpuidle_enter 0.33 ± 27% -0.3 0.04 ± 71% perf-profile.children.cycles-pp.acpi_idle_do_entry 0.33 ± 27% -0.3 0.04 ± 71% perf-profile.children.cycles-pp.acpi_idle_enter 0.34 ± 27% -0.3 0.04 ± 45% perf-profile.children.cycles-pp.cpuidle_enter_state 0.33 ± 28% -0.3 0.04 ± 71% perf-profile.children.cycles-pp.acpi_safe_halt 0.33 ± 28% -0.3 0.04 ± 71% perf-profile.children.cycles-pp.pv_native_safe_halt 3.18 -0.3 2.90 perf-profile.children.cycles-pp.simple_copy_to_iter 1.94 ± 3% -0.3 1.67 perf-profile.children.cycles-pp.ttwu_do_activate 0.35 ± 16% -0.3 0.08 ± 5% perf-profile.children.cycles-pp.asm_sysvec_call_function_single 1.65 ± 6% -0.3 1.39 ± 2% perf-profile.children.cycles-pp.skb_queue_tail 2.93 -0.2 2.69 perf-profile.children.cycles-pp.check_heap_object 1.38 -0.2 1.14 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 2.66 -0.2 2.43 perf-profile.children.cycles-pp.mod_objcg_state 1.72 ± 3% -0.2 1.50 perf-profile.children.cycles-pp.enqueue_task 1.27 -0.2 1.06 perf-profile.children.cycles-pp._copy_from_iter 1.63 ± 3% -0.2 1.42 perf-profile.children.cycles-pp.enqueue_task_fair 1.30 ± 5% -0.2 1.11 perf-profile.children.cycles-pp.try_to_block_task 1.32 ± 5% -0.2 1.14 perf-profile.children.cycles-pp.dequeue_entities 0.64 ± 12% -0.2 0.46 ± 3% perf-profile.children.cycles-pp.__switch_to 1.05 ± 3% -0.2 0.87 perf-profile.children.cycles-pp.exit_to_user_mode_loop 0.90 -0.2 0.73 perf-profile.children.cycles-pp.fput 1.50 ± 2% -0.2 1.32 perf-profile.children.cycles-pp.__pick_next_task 0.79 ± 7% -0.2 0.62 perf-profile.children.cycles-pp.raw_spin_rq_lock_nested 1.33 ± 4% -0.2 1.16 perf-profile.children.cycles-pp.dequeue_task_fair 0.61 ± 4% -0.2 0.45 perf-profile.children.cycles-pp.select_task_rq 1.46 ± 2% -0.2 1.30 perf-profile.children.cycles-pp.pick_next_task_fair 1.00 ± 3% -0.1 0.86 perf-profile.children.cycles-pp.update_load_avg 0.77 -0.1 0.63 perf-profile.children.cycles-pp.__cond_resched 0.51 ± 3% -0.1 0.37 perf-profile.children.cycles-pp.select_task_rq_fair 0.76 -0.1 0.62 perf-profile.children.cycles-pp.mutex_lock 0.85 ± 4% -0.1 0.72 perf-profile.children.cycles-pp.update_curr 0.65 -0.1 0.52 perf-profile.children.cycles-pp.skb_unlink 0.77 -0.1 0.65 ± 2% perf-profile.children.cycles-pp.__check_heap_object 0.61 -0.1 0.49 perf-profile.children.cycles-pp.__build_skb_around 0.18 ± 7% -0.1 0.06 ± 9% perf-profile.children.cycles-pp.sysvec_call_function_single 0.97 ± 3% -0.1 0.85 perf-profile.children.cycles-pp.enqueue_entity 0.74 ± 4% -0.1 0.63 perf-profile.children.cycles-pp.dequeue_entity 0.46 ± 3% -0.1 0.34 perf-profile.children.cycles-pp.prepare_to_wait 0.61 -0.1 0.50 perf-profile.children.cycles-pp.rw_verify_area 0.62 -0.1 0.51 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 0.88 -0.1 0.78 perf-profile.children.cycles-pp.refill_obj_stock 0.15 ± 6% -0.1 0.05 perf-profile.children.cycles-pp.__sysvec_call_function_single 0.59 ± 3% -0.1 0.49 ± 2% perf-profile.children.cycles-pp.__virt_addr_valid 0.62 ± 3% -0.1 0.51 perf-profile.children.cycles-pp.syscall_return_via_sysret 0.21 ± 8% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.select_idle_sibling 0.46 -0.1 0.37 perf-profile.children.cycles-pp.mutex_unlock 0.55 -0.1 0.46 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.50 ± 2% -0.1 0.41 perf-profile.children.cycles-pp.scm_recv_unix 0.51 ± 2% -0.1 0.42 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.46 ± 3% -0.1 0.38 perf-profile.children.cycles-pp.pick_task_fair 0.44 -0.1 0.36 perf-profile.children.cycles-pp.x64_sys_call 0.45 ± 2% -0.1 0.37 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.44 ± 2% -0.1 0.36 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt 0.45 ± 2% -0.1 0.38 perf-profile.children.cycles-pp.switch_fpu_return 0.37 ± 2% -0.1 0.30 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.36 ± 3% -0.1 0.29 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler 0.33 ± 3% -0.1 0.27 ± 2% perf-profile.children.cycles-pp.update_process_times 0.12 ± 5% -0.1 0.06 perf-profile.children.cycles-pp.available_idle_cpu 0.14 ± 6% -0.1 0.09 ± 4% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.30 ± 2% -0.1 0.24 ± 2% perf-profile.children.cycles-pp.__scm_recv_common 0.46 ± 7% -0.1 0.41 perf-profile.children.cycles-pp.set_next_entity 0.32 ± 3% -0.0 0.27 ± 2% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore 0.29 ± 5% -0.0 0.24 perf-profile.children.cycles-pp.wakeup_preempt 0.25 -0.0 0.20 perf-profile.children.cycles-pp.kmalloc_size_roundup 0.26 ± 3% -0.0 0.22 perf-profile.children.cycles-pp.prepare_task_switch 0.29 ± 3% -0.0 0.24 ± 3% perf-profile.children.cycles-pp.update_cfs_group 0.26 ± 3% -0.0 0.22 perf-profile.children.cycles-pp.rcu_all_qs 0.32 ± 6% -0.0 0.28 perf-profile.children.cycles-pp.__rseq_handle_notify_resume 0.75 -0.0 0.71 perf-profile.children.cycles-pp.unix_write_space 0.19 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.finish_task_switch 0.22 ± 6% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.sched_tick 0.31 ± 4% -0.0 0.26 perf-profile.children.cycles-pp.restore_fpregs_from_fpstate 0.24 -0.0 0.20 perf-profile.children.cycles-pp.security_file_permission 0.16 ± 4% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.unix_maybe_add_creds 0.30 ± 4% -0.0 0.26 perf-profile.children.cycles-pp.__update_load_avg_se 0.18 ± 7% -0.0 0.14 ± 4% perf-profile.children.cycles-pp.task_tick_fair 0.25 ± 5% -0.0 0.21 ± 2% perf-profile.children.cycles-pp.pick_eevdf 0.16 ± 6% -0.0 0.13 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi 0.21 ± 3% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.check_stack_object 0.21 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.update_rq_clock_task 0.18 ± 3% -0.0 0.15 ± 3% perf-profile.children.cycles-pp.unix_scm_to_skb 0.23 ± 7% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq 0.19 ± 6% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.do_perf_trace_sched_wakeup_template 0.16 ± 3% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.is_vmalloc_addr 0.14 ± 3% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.security_socket_sendmsg 0.18 ± 2% -0.0 0.15 perf-profile.children.cycles-pp.put_pid 0.16 ± 3% -0.0 0.13 perf-profile.children.cycles-pp.put_prev_entity 0.24 ± 2% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.wake_affine 0.15 ± 2% -0.0 0.12 perf-profile.children.cycles-pp.manage_oob 0.22 ± 5% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.rseq_ip_fixup 0.32 ± 2% -0.0 0.30 perf-profile.children.cycles-pp.reweight_entity 0.16 ± 2% -0.0 0.13 ± 5% perf-profile.children.cycles-pp.security_socket_recvmsg 0.13 ± 2% -0.0 0.11 ± 3% perf-profile.children.cycles-pp.security_socket_getpeersec_dgram 0.09 ± 7% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.cpuacct_charge 0.30 ± 4% -0.0 0.28 perf-profile.children.cycles-pp.__enqueue_entity 0.16 ± 3% -0.0 0.13 ± 2% perf-profile.children.cycles-pp.update_curr_se 0.11 ± 3% -0.0 0.09 perf-profile.children.cycles-pp.wait_for_unix_gc 0.10 -0.0 0.08 perf-profile.children.cycles-pp.skb_put 0.12 ± 7% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.os_xsave 0.10 ± 3% -0.0 0.08 perf-profile.children.cycles-pp.skb_free_head 0.13 ± 5% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.update_entity_lag 0.15 ± 7% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.update_rq_clock 0.10 ± 8% -0.0 0.08 perf-profile.children.cycles-pp.___perf_sw_event 0.12 ± 6% -0.0 0.10 perf-profile.children.cycles-pp.perf_tp_event 0.09 -0.0 0.08 ± 6% perf-profile.children.cycles-pp.__switch_to_asm 0.08 ± 6% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.kfree_skbmem 0.09 ± 7% -0.0 0.08 perf-profile.children.cycles-pp.rseq_update_cpu_node_id 0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.__x64_sys_write 0.06 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.finish_wait 0.07 -0.0 0.06 perf-profile.children.cycles-pp.__x64_sys_read 0.06 -0.0 0.05 perf-profile.children.cycles-pp.__irq_exit_rcu 0.10 ± 10% +0.0 0.12 ± 3% perf-profile.children.cycles-pp.detach_tasks 0.24 ± 10% +0.0 0.27 ± 2% perf-profile.children.cycles-pp.sched_balance_newidle 0.24 ± 9% +0.0 0.27 ± 2% perf-profile.children.cycles-pp.sched_balance_rq 0.16 ± 3% +0.0 0.19 perf-profile.children.cycles-pp.put_cpu_partial 0.19 +0.0 0.24 ± 2% perf-profile.children.cycles-pp.css_rstat_updated 0.00 +0.1 0.05 perf-profile.children.cycles-pp.__refill_stock 0.00 +0.1 0.05 perf-profile.children.cycles-pp.sysvec_call_function 0.16 ± 2% +0.1 0.21 ± 2% perf-profile.children.cycles-pp.refill_stock 0.14 ± 2% +0.1 0.19 ± 3% perf-profile.children.cycles-pp.try_charge_memcg 0.34 ± 3% +0.1 0.40 ± 4% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.00 +0.1 0.07 ± 5% perf-profile.children.cycles-pp.asm_sysvec_call_function 0.32 ± 5% +0.1 0.40 ± 5% perf-profile.children.cycles-pp.__mod_memcg_state 0.42 ± 2% +0.1 0.56 perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages 0.12 ± 11% +0.1 0.26 ± 4% perf-profile.children.cycles-pp.get_any_partial 37.92 +0.6 38.52 perf-profile.children.cycles-pp.vfs_write 36.83 +0.8 37.63 perf-profile.children.cycles-pp.sock_write_iter 35.67 +1.0 36.68 perf-profile.children.cycles-pp.unix_stream_sendmsg 50.55 +1.9 52.47 perf-profile.children.cycles-pp.read 88.56 +2.4 90.94 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 87.95 +2.5 90.43 perf-profile.children.cycles-pp.do_syscall_64 42.42 +3.4 45.79 perf-profile.children.cycles-pp.ksys_read 40.72 +3.7 44.41 perf-profile.children.cycles-pp.vfs_read 39.55 +3.9 43.45 perf-profile.children.cycles-pp.sock_read_iter 38.72 +4.0 42.77 perf-profile.children.cycles-pp.sock_recvmsg 38.16 +4.1 42.29 perf-profile.children.cycles-pp.unix_stream_recvmsg 37.91 +4.2 42.09 perf-profile.children.cycles-pp.unix_stream_read_generic 18.34 +4.4 22.74 perf-profile.children.cycles-pp.sock_alloc_send_pskb 15.61 ± 2% +4.9 20.54 perf-profile.children.cycles-pp.alloc_skb_with_frags 15.35 ± 2% +5.0 20.32 perf-profile.children.cycles-pp.__alloc_skb 4.43 ± 11% +6.1 10.58 ± 2% perf-profile.children.cycles-pp.get_partial_node 5.34 ± 9% +6.3 11.65 ± 2% perf-profile.children.cycles-pp.___slab_alloc 7.59 ± 2% +7.6 15.16 perf-profile.children.cycles-pp.kmalloc_reserve 7.08 ± 2% +7.7 14.73 perf-profile.children.cycles-pp.__kmalloc_node_track_caller_noprof 6.27 ± 11% +9.1 15.34 ± 2% perf-profile.children.cycles-pp.__put_partials 12.41 +10.2 22.59 perf-profile.children.cycles-pp.consume_skb 9.06 ± 3% +10.7 19.74 perf-profile.children.cycles-pp.skb_release_data 8.42 ± 3% +10.7 19.13 perf-profile.children.cycles-pp.kfree 13.33 ± 8% +14.3 27.59 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 11.76 ± 9% +14.3 26.04 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 3.88 ± 2% -0.8 3.08 perf-profile.self.cycles-pp.__memcg_slab_free_hook 3.58 ± 6% -0.8 2.82 ± 5% perf-profile.self.cycles-pp.fdget_pos 3.76 -0.7 3.03 perf-profile.self.cycles-pp.clear_bhb_loop 3.19 ± 3% -0.6 2.57 perf-profile.self.cycles-pp._raw_spin_lock 2.41 -0.6 1.82 perf-profile.self.cycles-pp.__slab_free 2.71 ± 4% -0.6 2.12 perf-profile.self.cycles-pp.unix_stream_sendmsg 2.39 ± 2% -0.4 1.95 perf-profile.self.cycles-pp.unix_stream_read_generic 2.96 ± 2% -0.4 2.53 perf-profile.self.cycles-pp._copy_to_iter 1.71 ± 3% -0.4 1.30 perf-profile.self.cycles-pp.skb_set_owner_w 2.40 -0.4 2.02 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook 2.10 ± 3% -0.4 1.73 perf-profile.self.cycles-pp.sock_wfree 2.08 -0.4 1.71 perf-profile.self.cycles-pp.do_syscall_64 1.92 -0.3 1.58 perf-profile.self.cycles-pp.kmem_cache_free 1.58 ± 8% -0.3 1.26 perf-profile.self.cycles-pp.sock_def_readable 2.60 ± 4% -0.3 2.28 perf-profile.self.cycles-pp._raw_spin_lock_irqsave 2.22 -0.3 1.95 perf-profile.self.cycles-pp.mod_objcg_state 1.43 ± 4% -0.2 1.19 ± 3% perf-profile.self.cycles-pp.read 1.41 -0.2 1.18 perf-profile.self.cycles-pp.write 1.34 -0.2 1.10 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 1.16 -0.2 0.95 perf-profile.self.cycles-pp.__alloc_skb 1.24 -0.2 1.03 perf-profile.self.cycles-pp._copy_from_iter 1.34 -0.2 1.14 perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof 1.07 ± 2% -0.2 0.88 perf-profile.self.cycles-pp.sock_write_iter 0.63 ± 12% -0.2 0.46 ± 3% perf-profile.self.cycles-pp.__switch_to 0.94 ± 2% -0.2 0.78 perf-profile.self.cycles-pp.its_return_thunk 0.94 ± 2% -0.2 0.77 perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof 0.85 -0.2 0.69 perf-profile.self.cycles-pp.fput 0.84 -0.2 0.69 perf-profile.self.cycles-pp.sock_read_iter 0.87 ± 2% -0.1 0.72 ± 3% perf-profile.self.cycles-pp.vfs_read 0.83 -0.1 0.70 perf-profile.self.cycles-pp.entry_SYSCALL_64 0.75 ± 3% -0.1 0.61 perf-profile.self.cycles-pp.vfs_write 0.72 ± 2% -0.1 0.61 ± 2% perf-profile.self.cycles-pp.__check_heap_object 0.57 ± 2% -0.1 0.47 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.59 -0.1 0.49 perf-profile.self.cycles-pp.__check_object_size 0.56 -0.1 0.46 perf-profile.self.cycles-pp.__build_skb_around 0.61 ± 2% -0.1 0.50 perf-profile.self.cycles-pp.__skb_datagram_iter 2.12 ± 2% -0.1 2.02 perf-profile.self.cycles-pp.check_heap_object 0.84 -0.1 0.74 perf-profile.self.cycles-pp.refill_obj_stock 0.54 ± 3% -0.1 0.45 ± 2% perf-profile.self.cycles-pp.__virt_addr_valid 0.42 -0.1 0.34 perf-profile.self.cycles-pp.mutex_unlock 0.55 ± 3% -0.1 0.46 perf-profile.self.cycles-pp.syscall_return_via_sysret 0.45 -0.1 0.37 perf-profile.self.cycles-pp.mutex_lock 0.47 -0.1 0.38 perf-profile.self.cycles-pp.kfree 0.40 ± 3% -0.1 0.32 perf-profile.self.cycles-pp.__schedule 0.46 -0.1 0.38 perf-profile.self.cycles-pp.unix_write_space 0.96 -0.1 0.89 perf-profile.self.cycles-pp.obj_cgroup_charge 0.41 -0.1 0.34 ± 2% perf-profile.self.cycles-pp.sock_alloc_send_pskb 0.36 ± 2% -0.1 0.30 perf-profile.self.cycles-pp.rw_verify_area 0.38 -0.1 0.32 perf-profile.self.cycles-pp.x64_sys_call 0.36 ± 2% -0.1 0.30 perf-profile.self.cycles-pp.__cond_resched 0.33 ± 2% -0.1 0.27 perf-profile.self.cycles-pp.ksys_read 0.33 -0.1 0.27 perf-profile.self.cycles-pp.ksys_write 0.37 ± 2% -0.1 0.31 perf-profile.self.cycles-pp.sock_recvmsg 0.36 ± 2% -0.1 0.31 perf-profile.self.cycles-pp.update_load_avg 0.33 ± 2% -0.1 0.27 perf-profile.self.cycles-pp.skb_copy_datagram_from_iter 0.28 ± 2% -0.1 0.23 perf-profile.self.cycles-pp.alloc_skb_with_frags 0.11 ± 7% -0.1 0.06 perf-profile.self.cycles-pp.available_idle_cpu 0.26 -0.0 0.21 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.27 -0.0 0.22 ± 3% perf-profile.self.cycles-pp.unix_stream_recvmsg 0.28 ± 3% -0.0 0.24 perf-profile.self.cycles-pp.update_cfs_group 0.28 ± 2% -0.0 0.24 ± 2% perf-profile.self.cycles-pp.kmalloc_reserve 0.30 ± 4% -0.0 0.26 perf-profile.self.cycles-pp.restore_fpregs_from_fpstate 0.27 ± 3% -0.0 0.23 perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore 0.28 ± 6% -0.0 0.24 perf-profile.self.cycles-pp.update_curr 0.24 -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__scm_recv_common 0.20 ± 2% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.kmalloc_size_roundup 0.20 -0.0 0.16 ± 3% perf-profile.self.cycles-pp.security_file_permission 0.28 ± 5% -0.0 0.24 ± 2% perf-profile.self.cycles-pp.__update_load_avg_se 0.19 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.unix_destruct_scm 0.12 ± 4% -0.0 0.09 perf-profile.self.cycles-pp.unix_maybe_add_creds 0.22 ± 7% -0.0 0.18 perf-profile.self.cycles-pp.__update_load_avg_cfs_rq 0.19 ± 2% -0.0 0.16 perf-profile.self.cycles-pp.update_rq_clock_task 0.18 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.prepare_task_switch 0.20 ± 2% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.rcu_all_qs 0.14 ± 4% -0.0 0.11 perf-profile.self.cycles-pp.skb_queue_tail 0.17 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.enqueue_task_fair 0.17 ± 3% -0.0 0.14 perf-profile.self.cycles-pp.scm_recv_unix 0.16 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.unix_scm_to_skb 0.20 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.pick_eevdf 0.13 ± 2% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.manage_oob 0.12 ± 3% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.simple_copy_to_iter 0.14 ± 2% -0.0 0.11 ± 3% perf-profile.self.cycles-pp.switch_fpu_return 0.17 ± 2% -0.0 0.14 perf-profile.self.cycles-pp.try_to_wake_up 0.16 ± 2% -0.0 0.14 ± 2% perf-profile.self.cycles-pp.check_stack_object 0.15 -0.0 0.12 ± 4% perf-profile.self.cycles-pp.skb_unlink 0.13 ± 2% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.__wake_up_common 0.11 ± 4% -0.0 0.09 perf-profile.self.cycles-pp.finish_task_switch 0.12 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.put_pid 0.13 ± 2% -0.0 0.11 perf-profile.self.cycles-pp.consume_skb 0.13 ± 2% -0.0 0.11 perf-profile.self.cycles-pp.pick_next_task_fair 0.14 ± 2% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.skb_copy_datagram_iter 0.30 ± 2% -0.0 0.28 perf-profile.self.cycles-pp.__enqueue_entity 0.12 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.is_vmalloc_addr 0.11 ± 3% -0.0 0.09 perf-profile.self.cycles-pp.security_socket_getpeersec_dgram 0.12 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.pick_task_fair 0.10 -0.0 0.08 perf-profile.self.cycles-pp.security_socket_sendmsg 0.10 -0.0 0.08 perf-profile.self.cycles-pp.select_task_rq 0.08 ± 5% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.cpuacct_charge 0.13 ± 4% -0.0 0.11 perf-profile.self.cycles-pp.dequeue_entity 0.13 -0.0 0.11 ± 5% perf-profile.self.cycles-pp.security_socket_recvmsg 0.11 -0.0 0.09 perf-profile.self.cycles-pp.skb_release_head_state 0.11 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.enqueue_entity 0.12 ± 5% -0.0 0.10 perf-profile.self.cycles-pp.os_xsave 0.14 ± 3% -0.0 0.12 perf-profile.self.cycles-pp.update_curr_se 0.13 ± 5% -0.0 0.11 perf-profile.self.cycles-pp.exit_to_user_mode_loop 0.14 ± 4% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.task_h_load 0.12 ± 6% -0.0 0.11 ± 3% perf-profile.self.cycles-pp.dequeue_entities 0.09 -0.0 0.07 ± 6% perf-profile.self.cycles-pp.wait_for_unix_gc 0.21 ± 6% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.__dequeue_entity 0.10 ± 5% -0.0 0.08 perf-profile.self.cycles-pp.__get_user_8 0.11 ± 6% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.prepare_to_wait 0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp.skb_free_head 0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp.unix_stream_read_actor 0.09 ± 4% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.__switch_to_asm 0.12 ± 6% -0.0 0.11 perf-profile.self.cycles-pp.avg_vruntime 0.09 ± 4% -0.0 0.08 perf-profile.self.cycles-pp.place_entity 0.07 ± 5% -0.0 0.06 perf-profile.self.cycles-pp.select_task_rq_fair 0.06 ± 6% -0.0 0.05 perf-profile.self.cycles-pp.___perf_sw_event 0.08 -0.0 0.07 ± 5% perf-profile.self.cycles-pp.skb_put 0.07 -0.0 0.06 perf-profile.self.cycles-pp.propagate_entity_load_avg 0.07 -0.0 0.06 perf-profile.self.cycles-pp.wakeup_preempt 0.06 -0.0 0.05 perf-profile.self.cycles-pp.kfree_skbmem 0.06 -0.0 0.05 perf-profile.self.cycles-pp.select_idle_sibling 0.12 ± 3% +0.0 0.16 ± 4% perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages 0.15 ± 3% +0.0 0.19 perf-profile.self.cycles-pp.put_cpu_partial 0.10 ± 3% +0.0 0.14 ± 2% perf-profile.self.cycles-pp.refill_stock 0.17 +0.0 0.21 perf-profile.self.cycles-pp.css_rstat_updated 0.10 ± 4% +0.0 0.15 ± 3% perf-profile.self.cycles-pp.try_charge_memcg 0.27 ± 4% +0.0 0.31 ± 5% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.25 ± 3% +0.1 0.32 perf-profile.self.cycles-pp.__put_partials 0.46 ± 3% +0.1 0.60 perf-profile.self.cycles-pp.get_partial_node 0.88 +0.2 1.03 perf-profile.self.cycles-pp.___slab_alloc 11.74 ± 9% +14.3 26.02 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath *************************************************************************************************** lkp-srf-2sp3: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory ========================================================================================= cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase: cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/50%/debian-12-x86_64-20240206.cgz/300s/lkp-srf-2sp3/SCTP_STREAM/netperf commit: 90b83efa67 ("Merge tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next") 2adc2445a5 ("xsk: Fix out of order segment free in __xsk_generic_xmit()") 90b83efa6701656e 2adc2445a5ae93efed1e2e6646a ---------------- --------------------------- %stddev %change %stddev \ | \ 97891 -14.8% 83440 uptime.idle 5.813e+10 -24.7% 4.376e+10 cpuidle..time 1432696 +8071.0% 1.171e+08 cpuidle..usage 2.67 ±129% +13268.8% 356.50 ±106% perf-c2c.DRAM.local 36.50 ± 97% +1.7e+05% 62210 ±106% perf-c2c.HITM.local 57.83 ± 96% +1.1e+05% 62785 ±106% perf-c2c.HITM.total 7646 ± 24% +3575.6% 281040 ±196% numa-meminfo.node0.Shmem 352954 ± 57% +397.3% 1755395 ± 35% numa-meminfo.node1.Active 352954 ± 57% +397.3% 1755395 ± 35% numa-meminfo.node1.Active(anon) 24856 ± 7% +5388.7% 1364277 ± 33% numa-meminfo.node1.Shmem 900437 ± 18% +33130.0% 2.992e+08 numa-numastat.node0.local_node 980894 ± 15% +30412.3% 2.993e+08 numa-numastat.node0.numa_hit 843029 ± 20% +33579.9% 2.839e+08 numa-numastat.node1.local_node 960593 ± 15% +29466.0% 2.84e+08 numa-numastat.node1.numa_hit 99.95 -26.6% 73.35 vmstat.cpu.id 1.43 ± 4% +3253.2% 47.88 ± 2% vmstat.procs.r 3269 +21229.7% 697353 vmstat.system.cs 5726 ± 2% +7837.1% 454486 vmstat.system.in 99.91 -26.7 73.24 mpstat.cpu.all.idle% 0.01 ± 2% +0.3 0.29 mpstat.cpu.all.irq% 0.03 +6.7 6.71 mpstat.cpu.all.soft% 0.03 ± 2% +19.7 19.68 mpstat.cpu.all.sys% 0.03 +0.1 0.08 ± 3% mpstat.cpu.all.usr% 1.00 +15883.3% 159.83 ± 57% mpstat.max_utilization.seconds 2.31 ± 3% +1565.7% 38.48 ± 7% mpstat.max_utilization_pct 827685 +197.1% 2459378 ± 3% meminfo.Active 827685 +197.1% 2459378 ± 3% meminfo.Active(anon) 3564506 +45.3% 5177743 meminfo.Cached 1002604 +162.8% 2634788 ± 3% meminfo.Committed_AS 73396 +27.4% 93496 ± 3% meminfo.Mapped 6584486 +30.6% 8597070 meminfo.Memused 32406 +4978.1% 1645637 ± 5% meminfo.Shmem 7674969 +22.1% 9367878 meminfo.max_used_kB 1475286 ± 37% +10036.1% 1.495e+08 numa-vmstat.node0.nr_unaccepted 1910 ± 24% +3574.4% 70214 ±196% numa-vmstat.node0.nr_writeback_temp 901028 ± 18% +33108.2% 2.992e+08 numa-vmstat.node0.numa_interleave 88219 ± 57% +397.4% 438783 ± 35% numa-vmstat.node1.nr_inactive_anon 1466007 ± 37% +9598.9% 1.422e+08 numa-vmstat.node1.nr_unaccepted 6201 ± 7% +5398.9% 341002 ± 33% numa-vmstat.node1.nr_writeback_temp 88219 ± 57% +397.4% 438783 ± 35% numa-vmstat.node1.nr_zone_active_anon 843731 ± 20% +33552.0% 2.839e+08 numa-vmstat.node1.numa_interleave 4.10 +93440.1% 3835 netperf.ThroughputBoth_Mbps 393.60 +93440.1% 368173 netperf.ThroughputBoth_total_Mbps 4.10 +93440.1% 3835 netperf.Throughput_Mbps 393.60 +93440.1% 368173 netperf.Throughput_total_Mbps 48.00 ± 9% +1.5e+05% 71216 netperf.time.involuntary_context_switches 36976 +109.6% 77489 netperf.time.minor_page_faults 2.00 +89408.3% 1790 netperf.time.percent_of_cpu_this_job_got 2.65 ± 5% +2e+05% 5407 netperf.time.system_time 70144 +75110.0% 52755811 netperf.time.voluntary_context_switches 69331 +76173.9% 52881525 netperf.workload 18597 +20.9% 22487 proc-vmstat.nr_anon_pages 206982 +196.9% 614442 ± 3% proc-vmstat.nr_inactive_anon 891142 +45.2% 1294018 proc-vmstat.nr_mapped 130794 +2.6% 134152 proc-vmstat.nr_slab_reclaimable 2940843 +9808.2% 2.914e+08 proc-vmstat.nr_unaccepted 36043 +2.6% 36994 proc-vmstat.nr_unevictable 8116 +4963.7% 410990 ± 5% proc-vmstat.nr_writeback_temp 206982 +196.9% 614442 ± 3% proc-vmstat.nr_zone_active_anon 2758 ± 12% +466.4% 15626 ± 11% proc-vmstat.numa_hint_faults 3103 ±183% +1123.1% 37957 ± 15% proc-vmstat.numa_hint_faults_local 5180 ± 88% +991.6% 56556 ± 10% proc-vmstat.numa_huge_pte_updates 1746716 +33285.4% 5.831e+08 proc-vmstat.numa_interleave 3103 ±183% +1123.1% 37957 ± 15% proc-vmstat.numa_pages_migrated 22862693 +81149.0% 1.858e+10 proc-vmstat.pgalloc_dma32 962895 +11.6% 1074237 proc-vmstat.pglazyfree 42493 +28.3% 54507 proc-vmstat.pgrefill 22791848 +81398.7% 1.858e+10 proc-vmstat.pgskip_device 199829 +2.3% 204523 proc-vmstat.workingset_nodereclaim 8.77 ± 2% -96.3% 0.32 perf-stat.i.MPKI 77235971 +7812.2% 6.111e+09 perf-stat.i.branch-instructions 1.93 -1.4 0.55 perf-stat.i.branch-miss-rate% 3411141 +880.6% 33448138 perf-stat.i.branch-misses 22.30 ± 2% -21.6 0.71 perf-stat.i.cache-miss-rate% 1665598 ± 2% +511.2% 10179755 perf-stat.i.cache-misses 7250815 +31692.2% 2.305e+09 perf-stat.i.cache-references 3208 +21832.1% 703722 perf-stat.i.context-switches 3.61 +37.5% 4.96 perf-stat.i.cpi 7.967e+08 +19662.7% 1.574e+11 perf-stat.i.cpu-cycles 268.09 +1572.7% 4484 perf-stat.i.cpu-migrations 470.08 +3392.1% 16415 perf-stat.i.cycles-between-cache-misses 3.771e+08 +8315.9% 3.173e+10 perf-stat.i.instructions 0.31 -33.5% 0.21 perf-stat.i.ipc 2789 +12.7% 3145 perf-stat.i.minor-faults 2789 +12.7% 3145 perf-stat.i.page-faults 4.42 ± 2% -92.7% 0.32 perf-stat.overall.MPKI 4.42 -3.9 0.55 perf-stat.overall.branch-miss-rate% 22.97 ± 2% -22.5 0.44 perf-stat.overall.cache-miss-rate% 2.11 +134.8% 4.96 perf-stat.overall.cpi 478.64 ± 2% +3130.5% 15462 perf-stat.overall.cycles-between-cache-misses 0.47 -57.4% 0.20 perf-stat.overall.ipc 1638835 -88.9% 181352 perf-stat.overall.path-length 76999412 +7810.5% 6.091e+09 perf-stat.ps.branch-instructions 3401355 +880.2% 33339495 perf-stat.ps.branch-misses 1659790 ± 2% +511.5% 10149201 perf-stat.ps.cache-misses 7226565 +31693.0% 2.298e+09 perf-stat.ps.cache-references 3197 +21832.0% 701377 perf-stat.ps.context-switches 7.941e+08 +19659.7% 1.569e+11 perf-stat.ps.cpu-cycles 267.21 +1572.9% 4470 perf-stat.ps.cpu-migrations 3.759e+08 +8313.8% 3.163e+10 perf-stat.ps.instructions 2780 +12.7% 3133 perf-stat.ps.minor-faults 2780 +12.7% 3133 perf-stat.ps.page-faults 1.136e+11 +8340.5% 9.59e+12 perf-stat.total.instructions 3740 ± 12% +29693.1% 1114329 ± 10% sched_debug.cfs_rq:/.avg_vruntime.avg 56542 ± 15% +2339.5% 1379378 ± 8% sched_debug.cfs_rq:/.avg_vruntime.max 369.53 ± 16% +1.6e+05% 579756 ± 20% sched_debug.cfs_rq:/.avg_vruntime.min 6898 ± 20% +1820.7% 132511 ± 26% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.02 ± 12% +1118.1% 0.22 ± 6% sched_debug.cfs_rq:/.h_nr_queued.avg 0.13 ± 6% +212.0% 0.40 sched_debug.cfs_rq:/.h_nr_queued.stddev 0.02 ± 12% +1116.5% 0.22 ± 5% sched_debug.cfs_rq:/.h_nr_runnable.avg 0.13 ± 6% +211.4% 0.40 sched_debug.cfs_rq:/.h_nr_runnable.stddev 14.16 ±132% +48472.6% 6875 ± 21% sched_debug.cfs_rq:/.left_deadline.avg 2717 ±132% +27785.8% 757891 ± 19% sched_debug.cfs_rq:/.left_deadline.max 195.63 ±132% +35354.2% 69359 ± 15% sched_debug.cfs_rq:/.left_deadline.stddev 14.11 ±133% +48627.1% 6874 ± 21% sched_debug.cfs_rq:/.left_vruntime.avg 2708 ±133% +27874.6% 757818 ± 19% sched_debug.cfs_rq:/.left_vruntime.max 194.99 ±133% +35466.9% 69352 ± 15% sched_debug.cfs_rq:/.left_vruntime.stddev 618037 ± 36% -66.7% 205834 ± 48% sched_debug.cfs_rq:/.load.max 51379 ± 30% -57.6% 21767 ± 30% sched_debug.cfs_rq:/.load.stddev 687.50 ± 9% -64.9% 241.44 ± 24% sched_debug.cfs_rq:/.load_avg.max 83.99 ± 6% -49.9% 42.10 ± 29% sched_debug.cfs_rq:/.load_avg.stddev 3740 ± 12% +29693.1% 1114329 ± 10% sched_debug.cfs_rq:/.min_vruntime.avg 56542 ± 15% +2339.5% 1379378 ± 8% sched_debug.cfs_rq:/.min_vruntime.max 369.53 ± 16% +1.6e+05% 579756 ± 20% sched_debug.cfs_rq:/.min_vruntime.min 6899 ± 20% +1820.7% 132511 ± 26% sched_debug.cfs_rq:/.min_vruntime.stddev 0.02 ± 13% +1109.9% 0.22 ± 5% sched_debug.cfs_rq:/.nr_queued.avg 0.13 ± 7% +210.9% 0.40 ± 2% sched_debug.cfs_rq:/.nr_queued.stddev 14.12 ±133% +48603.5% 6874 ± 21% sched_debug.cfs_rq:/.right_vruntime.avg 2710 ±133% +27861.0% 757818 ± 19% sched_debug.cfs_rq:/.right_vruntime.max 195.09 ±133% +35449.7% 69352 ± 15% sched_debug.cfs_rq:/.right_vruntime.stddev 35.41 ± 6% +493.0% 209.98 ± 5% sched_debug.cfs_rq:/.runnable_avg.avg 685.83 ± 4% +35.7% 930.48 ± 5% sched_debug.cfs_rq:/.runnable_avg.max 83.58 ± 4% +188.8% 241.42 ± 2% sched_debug.cfs_rq:/.runnable_avg.stddev 35.30 ± 6% +494.6% 209.91 ± 5% sched_debug.cfs_rq:/.util_avg.avg 680.17 ± 5% +36.8% 930.40 ± 5% sched_debug.cfs_rq:/.util_avg.max 83.16 ± 4% +190.3% 241.37 ± 2% sched_debug.cfs_rq:/.util_avg.stddev 3.01 ± 38% +3061.8% 95.09 ± 7% sched_debug.cfs_rq:/.util_est.avg 249.27 ± 31% +131.8% 577.82 ± 2% sched_debug.cfs_rq:/.util_est.max 23.40 ± 29% +671.1% 180.43 ± 2% sched_debug.cfs_rq:/.util_est.stddev 968260 -56.6% 420088 ± 9% sched_debug.cpu.avg_idle.avg 134682 ± 51% -64.3% 48084 ± 4% sched_debug.cpu.avg_idle.min 94988 ± 6% +233.8% 317085 ± 8% sched_debug.cpu.avg_idle.stddev 10.48 ± 2% +16.1% 12.18 ± 2% sched_debug.cpu.clock.stddev 526.86 +230.8% 1742 ± 13% sched_debug.cpu.clock_task.stddev 67.14 ± 12% +1549.7% 1107 ± 5% sched_debug.cpu.curr->pid.avg 652.41 ± 3% +205.6% 1993 sched_debug.cpu.curr->pid.stddev 0.00 ± 4% +15.1% 0.00 ± 5% sched_debug.cpu.next_balance.stddev 0.01 ± 20% +1565.3% 0.22 ± 5% sched_debug.cpu.nr_running.avg 0.10 ± 9% +286.8% 0.40 sched_debug.cpu.nr_running.stddev 3400 ± 5% +15458.0% 529068 ± 8% sched_debug.cpu.nr_switches.avg 54421 ± 22% +1031.5% 615757 ± 7% sched_debug.cpu.nr_switches.max 1015 ± 7% +26519.5% 270329 ± 12% sched_debug.cpu.nr_switches.min 5199 ± 22% +721.5% 42712 ± 6% sched_debug.cpu.nr_switches.stddev 0.01 ± 17% +60.3% 0.02 ± 5% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 0.02 ± 17% -39.1% 0.01 ± 17% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 0.01 ± 5% +31.4% 0.01 ± 6% perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read 0.04 ± 75% -66.8% 0.01 ± 13% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 0.01 ± 4% +30.9% 0.01 ± 4% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.01 ± 88% +375.6% 0.03 ± 54% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 0.02 ± 13% -47.0% 0.01 ± 20% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.01 ± 7% +38.7% 0.01 ± 9% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.02 ± 4% -28.6% 0.01 ± 6% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 0.21 ± 62% -96.6% 0.01 ± 5% perf-sched.sch_delay.avg.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg 0.32 ±138% -97.7% 0.01 ± 10% perf-sched.sch_delay.avg.ms.schedule_timeout.sctp_wait_for_sndbuf.sctp_sendmsg_to_asoc.sctp_sendmsg 0.01 ± 26% +334.9% 0.03 ± 11% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1.05 ±217% -98.1% 0.02 ± 6% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.03 ± 20% -54.7% 0.01 ± 17% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 0.12 ±147% -85.3% 0.02 ± 11% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 0.02 ± 9% +235.9% 0.07 ±112% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.03 ± 41% +2.7e+05% 68.69 ±222% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 0.01 ± 86% +1267.4% 0.10 ± 65% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 0.04 ± 6% -50.8% 0.02 ± 22% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.01 ± 10% +139.2% 0.03 ± 61% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.03 ± 22% -48.7% 0.02 ± 3% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 207.81 +150.3% 520.18 ± 46% perf-sched.sch_delay.max.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg 1.79 ±118% +1020.4% 20.01 ± 62% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.16 ±103% -95.3% 0.01 ± 6% perf-sched.total_sch_delay.average.ms 252.56 -99.4% 1.56 ± 2% perf-sched.total_wait_and_delay.average.ms 10166 +15987.4% 1635446 perf-sched.total_wait_and_delay.count.ms 4984 -15.6% 4204 ± 7% perf-sched.total_wait_and_delay.max.ms 252.40 -99.4% 1.55 ± 2% perf-sched.total_wait_time.average.ms 4984 -15.6% 4204 ± 7% perf-sched.total_wait_time.max.ms 7.83 -55.1% 3.52 perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 184.60 ± 4% +14.9% 212.10 perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read 0.47 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.60 ± 4% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 199.50 -99.8% 0.43 ± 3% perf-sched.wait_and_delay.avg.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg 381.71 -99.9% 0.43 ± 3% perf-sched.wait_and_delay.avg.ms.schedule_timeout.sctp_wait_for_sndbuf.sctp_sendmsg_to_asoc.sctp_sendmsg 604.00 -77.6% 135.41 ± 4% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 797.02 ± 3% -18.0% 653.44 perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 217.83 ± 4% -13.4% 188.67 perf-sched.wait_and_delay.count.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read 126.33 -100.0% 0.00 perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 88.17 -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 2310 +34913.6% 808931 perf-sched.wait_and_delay.count.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg 1157 +69710.6% 807941 perf-sched.wait_and_delay.count.schedule_timeout.sctp_wait_for_sndbuf.sctp_sendmsg_to_asoc.sctp_sendmsg 1600 +346.1% 7140 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 723.83 ± 2% +60.2% 1159 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 4984 -79.9% 1000 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 16.98 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.23 ± 6% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 209.37 +491.8% 1239 ± 31% perf-sched.wait_and_delay.max.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg 417.22 +155.0% 1063 perf-sched.wait_and_delay.max.ms.schedule_timeout.sctp_wait_for_sndbuf.sctp_sendmsg_to_asoc.sctp_sendmsg 4122 ± 10% -35.0% 2679 ± 17% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 7.81 -55.3% 3.49 perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 184.59 ± 4% +14.9% 212.09 perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read 0.46 +9.9% 0.51 perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 375.09 ± 24% -99.7% 1.19 ± 48% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 4.04 +10.5% 4.47 ± 3% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 0.60 ± 4% +11.2% 0.66 ± 3% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 199.29 -99.8% 0.43 ± 3% perf-sched.wait_time.avg.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg 381.39 -99.9% 0.42 ± 3% perf-sched.wait_time.avg.ms.schedule_timeout.sctp_wait_for_sndbuf.sctp_sendmsg_to_asoc.sctp_sendmsg 604.00 -77.6% 135.38 ± 4% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 795.97 ± 3% -17.9% 653.42 perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 4984 -79.9% 1000 perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 2.23 ± 6% +21.9% 2.71 ± 6% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 209.31 +408.2% 1063 perf-sched.wait_time.max.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg 417.20 +155.0% 1063 perf-sched.wait_time.max.ms.schedule_timeout.sctp_wait_for_sndbuf.sctp_sendmsg_to_asoc.sctp_sendmsg 4122 ± 10% -35.0% 2679 ± 17% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-07-04 15:29 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucas1p1.samsung.com> 2025-05-30 10:34 ` [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() e.kubanski [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p4> 2025-05-30 11:56 ` Eryk Kubanski 2025-05-30 16:07 ` Stanislav Fomichev [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p1> 2025-06-02 9:27 ` Eryk Kubanski 2025-06-02 15:28 ` Stanislav Fomichev 2025-06-02 16:03 ` Maciej Fijalkowski [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p2> 2025-06-02 16:18 ` Eryk Kubanski 2025-06-04 13:50 ` Maciej Fijalkowski 2025-06-04 14:15 ` Eryk Kubanski 2025-06-09 19:41 ` Maciej Fijalkowski 2025-06-10 9:35 ` Eryk Kubanski [not found] ` <CGME20250530103506eucas1p1e4091678f4157b928ddfa6f6534a0009@eucms1p3> 2025-06-02 15:58 ` Eryk Kubanski 2025-06-10 9:11 ` Re: " Eryk Kubanski 2025-06-11 13:10 ` Maciej Fijalkowski 2025-07-03 23:37 ` Jason Xing 2025-07-04 12:34 ` Maciej Fijalkowski 2025-07-04 15:29 ` Jason Xing 2025-06-04 14:41 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).