All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Willem de Bruijn <willemb@google.com>,
	netdev@vger.kernel.org, eric.dumazet@gmail.com
Subject: Re: [PATCH RFC net-next 4/4] net: allow busy connected flows to switch tx queues
Date: Mon, 13 Oct 2025 10:18:33 +0100	[thread overview]
Message-ID: <aOzD6T6dzZNq06Lj@horms.kernel.org> (raw)
In-Reply-To: <20251008104612.1824200-5-edumazet@google.com>

On Wed, Oct 08, 2025 at 10:46:11AM +0000, Eric Dumazet wrote:
> This is a followup of commit 726e9e8b94b9 ("tcp: refine
> skb->ooo_okay setting") and to the prior commit in this series
> ("net: control skb->ooo_okay from skb_set_owner_w()")
> 
> skb->ooo_okay might never be set for bulk flows that always
> have at least one skb in a qdisc queue of NIC queue,
> especially if TX completion is delayed because of a stressed cpu.
> 
> The so-called "strange attractors" has caused many performance
> issues, we need to do better.
> 
> We have tried very hard to avoid reorders because TCP was
> not dealing with them nicely a decade ago.
> 
> Use the new net.core.txq_reselection_ms sysctl to let
> flows follow XPS and select a more efficient queue.
> 
> After this patch, we no longer have to make sure threads
> are pinned to cpus, they now can be migrated without
> adding too much spinlock/qdisc/TX completion pressure anymore.
> 
> TX completion part was problematic, because it added false sharing
> on various socket fields, but also added false sharing and spinlock
> contention in mm layers. Calling skb_orphan() from ndo_start_xmit()
> is not an option unfortunately.
> 
> Note for later: move sk->sk_tx_queue_mapping closer
> to sk_tx_queue_mapping_jiffies for better cache locality.
> 
> Tested:
> 
> Used a host with 32 TX queues, shared by groups of 8 cores.
> XPS setup :
> 
> echo ff >/sys/class/net/eth1/queue/tx-0/xps_cpus
> echo ff00 >/sys/class/net/eth1/queue/tx-1/xps_cpus
> echo ff0000 >/sys/class/net/eth1/queue/tx-2/xps_cpus
> echo ff000000 >/sys/class/net/eth1/queue/tx-3/xps_cpus
> echo ff,00000000 >/sys/class/net/eth1/queue/tx-4/xps_cpus
> echo ff00,00000000 >/sys/class/net/eth1/queue/tx-5/xps_cpus
> echo ff0000,00000000 >/sys/class/net/eth1/queue/tx-6/xps_cpus
> echo ff000000,00000000 >/sys/class/net/eth1/queue/tx-7/xps_cpus
> ...
> 
> Launched a tcp_stream with 15 threads and 1000 flows, initially affined to core 0-15
> 
> taskset -c 0-15 tcp_stream -T15 -F1000 -l1000 -c -H target_host
> 
> Checked that only queues 0 and 1 are used as instructed by XPS :
> tc -s qdisc show dev eth1|grep backlog|grep -v "backlog 0b 0p"
>  backlog 123489410b 1890p
>  backlog 69809026b 1064p
>  backlog 52401054b 805p
> 
> Then force each thread to run on cpu 1,9,17,25,33,41,49,57,65,73,81,89,97,105,113,121
> 
> C=1;PID=`pidof tcp_stream`;for P in `ls /proc/$PID/task`; do taskset -pc $C $P; C=$(($C + 8));done
> 
> Set txq_reselection_ms to 1000
> echo 1000 > /proc/sys/net/core/txq_reselection_ms
> 
> Check that the flows have migrated nicely:
> 
> tc -s qdisc show dev eth1|grep backlog|grep -v "backlog 0b 0p"
>  backlog 130508314b 1916p
>  backlog 8584380b 126p
>  backlog 8584380b 126p
>  backlog 8379990b 123p
>  backlog 8584380b 126p
>  backlog 8487484b 125p
>  backlog 8584380b 126p
>  backlog 8448120b 124p
>  backlog 8584380b 126p
>  backlog 8720640b 128p
>  backlog 8856900b 130p
>  backlog 8584380b 126p
>  backlog 8652510b 127p
>  backlog 8448120b 124p
>  backlog 8516250b 125p
>  backlog 7834950b 115p
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Very nice :)

> ---
>  include/net/sock.h | 40 +++++++++++++++++++---------------------
>  net/core/dev.c     | 27 +++++++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 23 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 2794bc5c565424491a064049d3d76c3fb7ba1ed8..61f92bb03e00d7167cccfe70da16174f2b40f6de 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -485,6 +485,7 @@ struct sock {
>  	unsigned long		sk_pacing_rate; /* bytes per second */
>  	atomic_t		sk_zckey;
>  	atomic_t		sk_tskey;
> +	unsigned long		sk_tx_queue_mapping_jiffies;

nit: please add sk_tx_queue_mapping_jiffies to Kernel doc

>  	__cacheline_group_end(sock_write_tx);
>  
>  	__cacheline_group_begin(sock_read_tx);

  parent reply	other threads:[~2025-10-13  9:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-08 10:46 [PATCH RFC net-next 0/4] net: deal with strange attractors tx queues Eric Dumazet
2025-10-08 10:46 ` [PATCH RFC net-next 1/4] net: add SK_WMEM_ALLOC_BIAS constant Eric Dumazet
2025-10-08 14:25   ` Neal Cardwell
2025-10-08 10:46 ` [PATCH RFC net-next 2/4] net: control skb->ooo_okay from skb_set_owner_w() Eric Dumazet
2025-10-08 14:26   ` Neal Cardwell
2025-10-08 10:46 ` [PATCH RFC net-next 3/4] net: add /proc/sys/net/core/txq_reselection_ms control Eric Dumazet
2025-10-08 14:27   ` Neal Cardwell
2025-10-08 15:21   ` Paolo Abeni
2025-10-08 15:25     ` Eric Dumazet
2025-10-08 10:46 ` [PATCH RFC net-next 4/4] net: allow busy connected flows to switch tx queues Eric Dumazet
2025-10-08 13:38   ` Willem de Bruijn
2025-10-08 14:38     ` Eric Dumazet
2025-10-08 15:30   ` Paolo Abeni
2025-10-08 16:30     ` Eric Dumazet
2025-10-13  9:18   ` Simon Horman [this message]
2025-10-13  9:33     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aOzD6T6dzZNq06Lj@horms.kernel.org \
    --to=horms@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.