[PATCH net-next] net: optimize enqueue_to_backlog() for the fast path

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path
@ 2025-10-24  9:05 Eric Dumazet
  2025-10-24 14:03 ` Willem de Bruijn
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eric Dumazet @ 2025-10-24  9:05 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet, Willem de Bruijn

Add likely() and unlikely() clauses for the common cases:

Device is running.
Queue is not full.
Queue is less than half capacity.

Add max_backlog parameter to skb_flow_limit() to avoid
a second READ_ONCE(net_hotdata.max_backlog).

skb_flow_limit() does not need the backlog_lock protection,
and can be called before we acquire the lock, for even better
resistance to attacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
 net/core/dev.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 378c2d010faf251ffd874ebf0cc3dd6968eee447..d32f0b0c03bbd069d3651f5a6b772c8029baf96c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5249,14 +5249,15 @@ void kick_defer_list_purge(unsigned int cpu)
 int netdev_flow_limit_table_len __read_mostly = (1 << 12);
 #endif
 
-static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
+static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen,
+			   int max_backlog)
 {
 #ifdef CONFIG_NET_FLOW_LIMIT
-	struct sd_flow_limit *fl;
-	struct softnet_data *sd;
 	unsigned int old_flow, new_flow;
+	const struct softnet_data *sd;
+	struct sd_flow_limit *fl;
 
-	if (qlen < (READ_ONCE(net_hotdata.max_backlog) >> 1))
+	if (likely(qlen < (max_backlog >> 1)))
 		return false;
 
 	sd = this_cpu_ptr(&softnet_data);
@@ -5301,19 +5302,19 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	u32 tail;
 
 	reason = SKB_DROP_REASON_DEV_READY;
-	if (!netif_running(skb->dev))
+	if (unlikely(!netif_running(skb->dev)))
 		goto bad_dev;
 
-	reason = SKB_DROP_REASON_CPU_BACKLOG;
 	sd = &per_cpu(softnet_data, cpu);
 
 	qlen = skb_queue_len_lockless(&sd->input_pkt_queue);
 	max_backlog = READ_ONCE(net_hotdata.max_backlog);
-	if (unlikely(qlen > max_backlog))
+	if (unlikely(qlen > max_backlog) ||
+	    skb_flow_limit(skb, qlen, max_backlog))
 		goto cpu_backlog_drop;
 	backlog_lock_irq_save(sd, &flags);
 	qlen = skb_queue_len(&sd->input_pkt_queue);
-	if (qlen <= max_backlog && !skb_flow_limit(skb, qlen)) {
+	if (likely(qlen <= max_backlog)) {
 		if (!qlen) {
 			/* Schedule NAPI for backlog device. We can use
 			 * non atomic operation as we own the queue lock.
@@ -5334,6 +5335,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	backlog_unlock_irq_restore(sd, &flags);
 
 cpu_backlog_drop:
+	reason = SKB_DROP_REASON_CPU_BACKLOG;
 	numa_drop_add(&sd->drop_counters, 1);
 bad_dev:
 	dev_core_stats_rx_dropped_inc(skb->dev);
-- 
2.51.1.821.gb6fe4d2222-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path
  2025-10-24  9:05 [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path Eric Dumazet
@ 2025-10-24 14:03 ` Willem de Bruijn
  2025-10-24 14:30   ` Eric Dumazet
  2025-10-27 17:07 ` Kuniyuki Iwashima
  2025-10-29  0:50 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2025-10-24 14:03 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet, Willem de Bruijn

Eric Dumazet wrote:
> Add likely() and unlikely() clauses for the common cases:
> 
> Device is running.
> Queue is not full.
> Queue is less than half capacity.
> 
> Add max_backlog parameter to skb_flow_limit() to avoid
> a second READ_ONCE(net_hotdata.max_backlog).
> 
> skb_flow_limit() does not need the backlog_lock protection,
> and can be called before we acquire the lock, for even better
> resistance to attacks.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> ---
>  net/core/dev.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 378c2d010faf251ffd874ebf0cc3dd6968eee447..d32f0b0c03bbd069d3651f5a6b772c8029baf96c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5249,14 +5249,15 @@ void kick_defer_list_purge(unsigned int cpu)
>  int netdev_flow_limit_table_len __read_mostly = (1 << 12);
>  #endif
>  
> -static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
> +static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen,
> +			   int max_backlog)
>  {
>  #ifdef CONFIG_NET_FLOW_LIMIT
> -	struct sd_flow_limit *fl;
> -	struct softnet_data *sd;
>  	unsigned int old_flow, new_flow;
> +	const struct softnet_data *sd;
> +	struct sd_flow_limit *fl;
>  
> -	if (qlen < (READ_ONCE(net_hotdata.max_backlog) >> 1))
> +	if (likely(qlen < (max_backlog >> 1)))
>  		return false;
>  
>  	sd = this_cpu_ptr(&softnet_data);

I assume sd is warm here. Else we could even move skb_flow_limit
behind a static_branch seeing how rarely it is likely used.

> @@ -5301,19 +5302,19 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
>  	u32 tail;
>  
>  	reason = SKB_DROP_REASON_DEV_READY;
> -	if (!netif_running(skb->dev))
> +	if (unlikely(!netif_running(skb->dev)))
>  		goto bad_dev;

Isn't unlikely usually predicted for branches without an else?

And that is ignoring both FDO and actual branch prediction hardware
improving on the simple compiler heuristic.

No immediately concerns. Just want to avoid precedence for others
to sprinkle code with likely/unlikely with abandon. As is sometimes
seen.

>  
> -	reason = SKB_DROP_REASON_CPU_BACKLOG;
>  	sd = &per_cpu(softnet_data, cpu);
>  
>  	qlen = skb_queue_len_lockless(&sd->input_pkt_queue);
>  	max_backlog = READ_ONCE(net_hotdata.max_backlog);
> -	if (unlikely(qlen > max_backlog))
> +	if (unlikely(qlen > max_backlog) ||
> +	    skb_flow_limit(skb, qlen, max_backlog))
>  		goto cpu_backlog_drop;
>  	backlog_lock_irq_save(sd, &flags);
>  	qlen = skb_queue_len(&sd->input_pkt_queue);
> -	if (qlen <= max_backlog && !skb_flow_limit(skb, qlen)) {
> +	if (likely(qlen <= max_backlog)) {
>  		if (!qlen) {
>  			/* Schedule NAPI for backlog device. We can use
>  			 * non atomic operation as we own the queue lock.
> @@ -5334,6 +5335,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
>  	backlog_unlock_irq_restore(sd, &flags);
>  
>  cpu_backlog_drop:
> +	reason = SKB_DROP_REASON_CPU_BACKLOG;
>  	numa_drop_add(&sd->drop_counters, 1);
>  bad_dev:
>  	dev_core_stats_rx_dropped_inc(skb->dev);
> -- 
> 2.51.1.821.gb6fe4d2222-goog
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path
  2025-10-24 14:03 ` Willem de Bruijn
@ 2025-10-24 14:30   ` Eric Dumazet
  2025-10-24 14:48     ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2025-10-24 14:30 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, netdev, eric.dumazet, Willem de Bruijn

On Fri, Oct 24, 2025 at 7:03 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Eric Dumazet wrote:
> > Add likely() and unlikely() clauses for the common cases:
> >
> > Device is running.
> > Queue is not full.
> > Queue is less than half capacity.
> >
> > Add max_backlog parameter to skb_flow_limit() to avoid
> > a second READ_ONCE(net_hotdata.max_backlog).
> >
> > skb_flow_limit() does not need the backlog_lock protection,
> > and can be called before we acquire the lock, for even better
> > resistance to attacks.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Willem de Bruijn <willemb@google.com>
> > ---
> >  net/core/dev.c | 18 ++++++++++--------
> >  1 file changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 378c2d010faf251ffd874ebf0cc3dd6968eee447..d32f0b0c03bbd069d3651f5a6b772c8029baf96c 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5249,14 +5249,15 @@ void kick_defer_list_purge(unsigned int cpu)
> >  int netdev_flow_limit_table_len __read_mostly = (1 << 12);
> >  #endif
> >
> > -static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
> > +static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen,
> > +                        int max_backlog)
> >  {
> >  #ifdef CONFIG_NET_FLOW_LIMIT
> > -     struct sd_flow_limit *fl;
> > -     struct softnet_data *sd;
> >       unsigned int old_flow, new_flow;
> > +     const struct softnet_data *sd;
> > +     struct sd_flow_limit *fl;
> >
> > -     if (qlen < (READ_ONCE(net_hotdata.max_backlog) >> 1))
> > +     if (likely(qlen < (max_backlog >> 1)))
> >               return false;
> >
> >       sd = this_cpu_ptr(&softnet_data);
>
> I assume sd is warm here. Else we could even move skb_flow_limit
> behind a static_branch seeing how rarely it is likely used.

this_cpu_ptr(&ANY_VAR) only loads very hot this_cpu_off. In modern
kernels this is

DEFINE_PER_CPU_CACHE_HOT(unsigned long, this_cpu_off);

rest is in the offsets used in the code.

>
> > @@ -5301,19 +5302,19 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
> >       u32 tail;
> >
> >       reason = SKB_DROP_REASON_DEV_READY;
> > -     if (!netif_running(skb->dev))
> > +     if (unlikely(!netif_running(skb->dev)))
> >               goto bad_dev;
>
> Isn't unlikely usually predicted for branches without an else?

I am not sure this is a hardcoded rule that all compilers will stick with.
Do you have a reference ?

>
> And that is ignoring both FDO and actual branch prediction hardware
> improving on the simple compiler heuristic.

Lets not assume FDO is always used, and close the gap.
This will allow us to iterate faster.
FDO brings its own class of problems...

>
> No immediately concerns. Just want to avoid precedence for others
> to sprinkle code with likely/unlikely with abandon. As is sometimes
> seen.

Sure.

I have not included a change on the apparently _very_ expensive

if (!__test_and_set_bit(NAPI_STATE_SCHED,
                                    &sd->backlog.state))

btsq   $0x0,0x160(%r13)

I tried to test the bit, then set it if needed, but got no
improvement, for some reason
(This was after the other patch making sure to group the dirtied
fields in a single cache line)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path
  2025-10-24 14:30   ` Eric Dumazet
@ 2025-10-24 14:48     ` Willem de Bruijn
  0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-10-24 14:48 UTC (permalink / raw)
  To: Eric Dumazet, Willem de Bruijn
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, netdev, eric.dumazet, Willem de Bruijn

Eric Dumazet wrote:
> On Fri, Oct 24, 2025 at 7:03 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Eric Dumazet wrote:
> > > Add likely() and unlikely() clauses for the common cases:
> > >
> > > Device is running.
> > > Queue is not full.
> > > Queue is less than half capacity.
> > >
> > > Add max_backlog parameter to skb_flow_limit() to avoid
> > > a second READ_ONCE(net_hotdata.max_backlog).
> > >
> > > skb_flow_limit() does not need the backlog_lock protection,
> > > and can be called before we acquire the lock, for even better
> > > resistance to attacks.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Cc: Willem de Bruijn <willemb@google.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

> > > ---
> > >  net/core/dev.c | 18 ++++++++++--------
> > >  1 file changed, 10 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 378c2d010faf251ffd874ebf0cc3dd6968eee447..d32f0b0c03bbd069d3651f5a6b772c8029baf96c 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -5249,14 +5249,15 @@ void kick_defer_list_purge(unsigned int cpu)
> > >  int netdev_flow_limit_table_len __read_mostly = (1 << 12);
> > >  #endif
> > >
> > > -static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
> > > +static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen,
> > > +                        int max_backlog)
> > >  {
> > >  #ifdef CONFIG_NET_FLOW_LIMIT
> > > -     struct sd_flow_limit *fl;
> > > -     struct softnet_data *sd;
> > >       unsigned int old_flow, new_flow;
> > > +     const struct softnet_data *sd;
> > > +     struct sd_flow_limit *fl;
> > >
> > > -     if (qlen < (READ_ONCE(net_hotdata.max_backlog) >> 1))
> > > +     if (likely(qlen < (max_backlog >> 1)))
> > >               return false;
> > >
> > >       sd = this_cpu_ptr(&softnet_data);
> >
> > I assume sd is warm here. Else we could even move skb_flow_limit
> > behind a static_branch seeing how rarely it is likely used.
> 
> this_cpu_ptr(&ANY_VAR) only loads very hot this_cpu_off. In modern
> kernels this is
> 
> DEFINE_PER_CPU_CACHE_HOT(unsigned long, this_cpu_off);
> 
> rest is in the offsets used in the code.
> 
> >
> > > @@ -5301,19 +5302,19 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
> > >       u32 tail;
> > >
> > >       reason = SKB_DROP_REASON_DEV_READY;
> > > -     if (!netif_running(skb->dev))
> > > +     if (unlikely(!netif_running(skb->dev)))
> > >               goto bad_dev;
> >
> > Isn't unlikely usually predicted for branches without an else?
> 
> I am not sure this is a hardcoded rule that all compilers will stick with.
> Do you have a reference ?

Actually I was thinking CPU branch prediction if no prior data.

According to the Intel® 64 and IA-32 Architectures
Optimization Reference Manual, Aug 2023, 3.4.1.2 Static Prediction

Branches that do not have a history in the BTB (see Section 3.4.1)
are predicted using a static prediction algorithm:
- Predict forward conditional branches to be NOT taken.
[..]

But online threads mention that there even for x86_64 between
microarch generations there are differences on the actual
prediction behavior, as well as of explicit prediction hints.
And that's only Intel x86_64. So not a universal guide, perhaps.

> >
> > And that is ignoring both FDO and actual branch prediction hardware
> > improving on the simple compiler heuristic.
> 
> Lets not assume FDO is always used, and close the gap.
> This will allow us to iterate faster.
> FDO brings its own class of problems...
> 
> >
> > No immediately concerns. Just want to avoid precedence for others
> > to sprinkle code with likely/unlikely with abandon. As is sometimes
> > seen.
> 
> Sure.
> 
> I have not included a change on the apparently _very_ expensive
> 
> if (!__test_and_set_bit(NAPI_STATE_SCHED,
>                                     &sd->backlog.state))
> 
> btsq   $0x0,0x160(%r13)
> 
> I tried to test the bit, then set it if needed, but got no
> improvement, for some reason
> (This was after the other patch making sure to group the dirtied
> fields in a single cache line)



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path
  2025-10-24  9:05 [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path Eric Dumazet
  2025-10-24 14:03 ` Willem de Bruijn
@ 2025-10-27 17:07 ` Kuniyuki Iwashima
  2025-10-29  0:50 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2025-10-27 17:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Willem de Bruijn

On Fri, Oct 24, 2025 at 2:05 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Add likely() and unlikely() clauses for the common cases:
>
> Device is running.
> Queue is not full.
> Queue is less than half capacity.
>
> Add max_backlog parameter to skb_flow_limit() to avoid
> a second READ_ONCE(net_hotdata.max_backlog).
>
> skb_flow_limit() does not need the backlog_lock protection,
> and can be called before we acquire the lock, for even better
> resistance to attacks.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path
  2025-10-24  9:05 [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path Eric Dumazet
  2025-10-24 14:03 ` Willem de Bruijn
  2025-10-27 17:07 ` Kuniyuki Iwashima
@ 2025-10-29  0:50 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-29  0:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, kuniyu, netdev, eric.dumazet, willemb

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 24 Oct 2025 09:05:17 +0000 you wrote:
> Add likely() and unlikely() clauses for the common cases:
> 
> Device is running.
> Queue is not full.
> Queue is less than half capacity.
> 
> Add max_backlog parameter to skb_flow_limit() to avoid
> a second READ_ONCE(net_hotdata.max_backlog).
> 
> [...]

Here is the summary with links:
  - [net-next] net: optimize enqueue_to_backlog() for the fast path
    https://git.kernel.org/netdev/net-next/c/a086e9860ce6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-29  0:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-24  9:05 [PATCH net-next] net: optimize enqueue_to_backlog() for the fast path Eric Dumazet
2025-10-24 14:03 ` Willem de Bruijn
2025-10-24 14:30   ` Eric Dumazet
2025-10-24 14:48     ` Willem de Bruijn
2025-10-27 17:07 ` Kuniyuki Iwashima
2025-10-29  0:50 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).