All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Xin Zhao <jackzxcui1989@163.com>,
	 willemdebruijn.kernel@gmail.com,  edumazet@google.com,
	 ferenc@fejes.dev
Cc: davem@davemloft.net,  kuba@kernel.org,  pabeni@redhat.com,
	 horms@kernel.org,  netdev@vger.kernel.org,
	 linux-kernel@vger.kernel.org,  Xin Zhao <jackzxcui1989@163.com>
Subject: Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
Date: Wed, 20 Aug 2025 07:15:13 -0400	[thread overview]
Message-ID: <willemdebruijn.kernel.15a62b0950330@gmail.com> (raw)
In-Reply-To: <20250820092925.2115372-1-jackzxcui1989@163.com>

Xin Zhao wrote:
> In a system with high real-time requirements, the timeout mechanism of
> ordinary timers with jiffies granularity is insufficient to meet the
> demands for real-time performance. Meanwhile, the optimization of CPU
> usage with af_packet is quite significant. Use hrtimer instead of timer
> to help compensate for the shortcomings in real-time performance.
> In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> This is unacceptable in some high real-time systems that require timely
> processing of network packets. By replacing it with hrtimer, if a timeout
> of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> 3 ms.
> 
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> 
> ---
> Changes in v6:
> - Use hrtimer_is_queued instead to check whether it is within the callback function.
>   So do not need to add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer
>   as suggested by Willem de Bruijn;
> - Do not need local_irq_save and local_irq_restore to protect the race of the timer
>   callback running in softirq context or the open_block from tpacket_rcv in process
>   context
>   as suggested by Willem de Bruijn;
> 
> Changes in v5:
> - Remove the unnecessary comments at the top of the _prb_refresh_rx_retire_blk_timer,
>   branch is self-explanatory enough
>   as suggested by Willem de Bruijn;
> - Indentation of _prb_refresh_rx_retire_blk_timer, align with first argument on
>   previous line
>   as suggested by Willem de Bruijn;
> - Do not call hrtimer_start within the hrtimer callback
>   as suggested by Willem de Bruijn
>   So add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer to indicate
>   whether it is within the callback function. Use hrtimer_forward_now instead of
>   hrtimer_start when it is in the callback function and is doing prb_open_block.
> - Link to v5: https://lore.kernel.org/all/20250819091447.1199980-1-jackzxcui1989@163.com/
> 
> Changes in v4:
> - Add 'bool start' to distinguish whether the call to _prb_refresh_rx_retire_blk_timer
>   is for prb_open_block. When it is for prb_open_block, execute hrtimer_start to
>   (re)start the hrtimer; otherwise, use hrtimer_forward_now to set the expiration
>   time as it is more commonly used compared to hrtimer_set_expires.
>   as suggested by Willem de Bruijn;
> - Delete the comments to explain why hrtimer_set_expires(not hrtimer_forward_now)
>   is used, as we do not use hrtimer_set_expires any more;
> - Link to v4: https://lore.kernel.org/all/20250818050233.155344-1-jackzxcui1989@163.com/
> 
> Changes in v3:
> - return HRTIMER_NORESTART when pkc->delete_blk_timer is true
>   as suggested by Willem de Bruijn;
> - Drop the retire_blk_tov field of tpacket_kbdq_core, add interval_ktime instead
>   as suggested by Willem de Bruijn;
> - Add comments to explain why hrtimer_set_expires(not hrtimer_forward_now) is used in
>   _prb_refresh_rx_retire_blk_timer
>   as suggested by Willem de Bruijn;
> - Link to v3: https://lore.kernel.org/all/20250816170130.3969354-1-jackzxcui1989@163.com/
> 
> Changes in v2:
> - Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
>   as suggested by Willem de Bruijn;
> - Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/
> 
> Changes in v1:
> - Do not add another config for the current changes
>   as suggested by Eric Dumazet;
> - Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
>   as suggested by Eric Dumazet;
> - Add some performance details to the changelog
>   as suggested by Ferenc Fejes;
> - Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
>   as suggested by Willem de Bruijn;
> - Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
>   as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
> - Just return HRTIMER_RESTART directly as all cases return the same value
>   as suggested by Willem de Bruijn;
> - Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
> - Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
> ---
>  net/packet/af_packet.c | 40 +++++++++++++++++++++++-----------------
>  net/packet/diag.c      |  2 +-
>  net/packet/internal.h  |  5 ++---
>  3 files changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index a7017d7f0..9b13939a6 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -203,8 +203,8 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
>  static int prb_queue_frozen(struct tpacket_kbdq_core *);
>  static void prb_open_block(struct tpacket_kbdq_core *,
>  		struct tpacket_block_desc *);
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *);
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
> +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *, bool);
>  static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
>  static void prb_clear_rxhash(struct tpacket_kbdq_core *,
>  		struct tpacket3_hdr *);
> @@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
>  
>  static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
>  {
> -	timer_delete_sync(&pkc->retire_blk_timer);
> +	hrtimer_cancel(&pkc->retire_blk_timer);
>  }
>  
>  static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
> @@ -603,9 +603,8 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
>  	struct tpacket_kbdq_core *pkc;
>  
>  	pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> -	timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> -		    0);
> -	pkc->retire_blk_timer.expires = jiffies;
> +	hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> +		      CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
>  }
>  
>  static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> @@ -672,11 +671,10 @@ static void init_prb_bdqc(struct packet_sock *po,
>  	p1->last_kactive_blk_num = 0;
>  	po->stats.stats3.tp_freeze_q_cnt = 0;
>  	if (req_u->req3.tp_retire_blk_tov)
> -		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> +		p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
>  	else
> -		p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> -						req_u->req3.tp_block_size);
> -	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> +		p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> +						req_u->req3.tp_block_size));
>  	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
>  	rwlock_init(&p1->blk_fill_in_prog_lock);
>  
> @@ -689,10 +687,14 @@ static void init_prb_bdqc(struct packet_sock *po,
>  /*  Do NOT update the last_blk_num first.
>   *  Assumes sk_buff_head lock is held.
>   */
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> +					     bool start)
>  {
> -	mod_timer(&pkc->retire_blk_timer,
> -			jiffies + pkc->tov_in_jiffies);
> +	if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
> +		hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> +			      HRTIMER_MODE_REL_SOFT);
> +	else
> +		hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);

Is the hrtimer still queued when prb_retire_rx_blk_timer_expired
fires? Based on the existence of hrtimer_forward_now, I assume so. But
have not checked yet. If so, hrtimer_is_queued alone suffices to
detect the other callstack from tpacket_rcv where hrtimer_start is
needed. No need for bool start?

>  	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
>  }

  reply	other threads:[~2025-08-20 11:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20  9:29 [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-08-20 11:15 ` Willem de Bruijn [this message]
2025-08-20 13:50 ` [syzbot ci] " syzbot ci
  -- strict thread matches above, loose matches on Subject: below --
2025-08-21  8:53 [PATCH net-next v6] " Xin Zhao
2025-08-21 14:32 ` Willem de Bruijn
2025-08-21 15:30 Xin Zhao
2025-08-22  6:37 ` Willem de Bruijn
2025-08-22 10:16 Xin Zhao
2025-08-25  5:06 Xin Zhao
2025-08-25 16:20 ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.15a62b0950330@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=ferenc@fejes.dev \
    --cc=horms@kernel.org \
    --cc=jackzxcui1989@163.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.