[PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-20  9:29 Xin Zhao
  2025-08-20 11:15 ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Zhao @ 2025-08-20  9:29 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao

In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.

Signed-off-by: Xin Zhao <jackzxcui1989@163.com>

---
Changes in v6:
- Use hrtimer_is_queued instead to check whether it is within the callback function.
  So do not need to add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer
  as suggested by Willem de Bruijn;
- Do not need local_irq_save and local_irq_restore to protect the race of the timer
  callback running in softirq context or the open_block from tpacket_rcv in process
  context
  as suggested by Willem de Bruijn;

Changes in v5:
- Remove the unnecessary comments at the top of the _prb_refresh_rx_retire_blk_timer,
  branch is self-explanatory enough
  as suggested by Willem de Bruijn;
- Indentation of _prb_refresh_rx_retire_blk_timer, align with first argument on
  previous line
  as suggested by Willem de Bruijn;
- Do not call hrtimer_start within the hrtimer callback
  as suggested by Willem de Bruijn
  So add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer to indicate
  whether it is within the callback function. Use hrtimer_forward_now instead of
  hrtimer_start when it is in the callback function and is doing prb_open_block.
- Link to v5: https://lore.kernel.org/all/20250819091447.1199980-1-jackzxcui1989@163.com/

Changes in v4:
- Add 'bool start' to distinguish whether the call to _prb_refresh_rx_retire_blk_timer
  is for prb_open_block. When it is for prb_open_block, execute hrtimer_start to
  (re)start the hrtimer; otherwise, use hrtimer_forward_now to set the expiration
  time as it is more commonly used compared to hrtimer_set_expires.
  as suggested by Willem de Bruijn;
- Delete the comments to explain why hrtimer_set_expires(not hrtimer_forward_now)
  is used, as we do not use hrtimer_set_expires any more;
- Link to v4: https://lore.kernel.org/all/20250818050233.155344-1-jackzxcui1989@163.com/

Changes in v3:
- return HRTIMER_NORESTART when pkc->delete_blk_timer is true
  as suggested by Willem de Bruijn;
- Drop the retire_blk_tov field of tpacket_kbdq_core, add interval_ktime instead
  as suggested by Willem de Bruijn;
- Add comments to explain why hrtimer_set_expires(not hrtimer_forward_now) is used in
  _prb_refresh_rx_retire_blk_timer
  as suggested by Willem de Bruijn;
- Link to v3: https://lore.kernel.org/all/20250816170130.3969354-1-jackzxcui1989@163.com/

Changes in v2:
- Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
  as suggested by Willem de Bruijn;
- Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/

Changes in v1:
- Do not add another config for the current changes
  as suggested by Eric Dumazet;
- Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
  as suggested by Eric Dumazet;
- Add some performance details to the changelog
  as suggested by Ferenc Fejes;
- Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
  as suggested by Willem de Bruijn;
- Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
  as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
- Just return HRTIMER_RESTART directly as all cases return the same value
  as suggested by Willem de Bruijn;
- Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
- Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
---
 net/packet/af_packet.c | 40 +++++++++++++++++++++++-----------------
 net/packet/diag.c      |  2 +-
 net/packet/internal.h  |  5 ++---
 3 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a7017d7f0..9b13939a6 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -203,8 +203,8 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
 static int prb_queue_frozen(struct tpacket_kbdq_core *);
 static void prb_open_block(struct tpacket_kbdq_core *,
 		struct tpacket_block_desc *);
-static void prb_retire_rx_blk_timer_expired(struct timer_list *);
-static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *, bool);
 static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
 static void prb_clear_rxhash(struct tpacket_kbdq_core *,
 		struct tpacket3_hdr *);
@@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
 
 static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
 {
-	timer_delete_sync(&pkc->retire_blk_timer);
+	hrtimer_cancel(&pkc->retire_blk_timer);
 }
 
 static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
@@ -603,9 +603,8 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
 	struct tpacket_kbdq_core *pkc;
 
 	pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
-	timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
-		    0);
-	pkc->retire_blk_timer.expires = jiffies;
+	hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
+		      CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
 }
 
 static int prb_calc_retire_blk_tmo(struct packet_sock *po,
@@ -672,11 +671,10 @@ static void init_prb_bdqc(struct packet_sock *po,
 	p1->last_kactive_blk_num = 0;
 	po->stats.stats3.tp_freeze_q_cnt = 0;
 	if (req_u->req3.tp_retire_blk_tov)
-		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
+		p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
 	else
-		p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
-						req_u->req3.tp_block_size);
-	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
+		p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
+						req_u->req3.tp_block_size));
 	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
 	rwlock_init(&p1->blk_fill_in_prog_lock);
 
@@ -689,10 +687,14 @@ static void init_prb_bdqc(struct packet_sock *po,
 /*  Do NOT update the last_blk_num first.
  *  Assumes sk_buff_head lock is held.
  */
-static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
+					     bool start)
 {
-	mod_timer(&pkc->retire_blk_timer,
-			jiffies + pkc->tov_in_jiffies);
+	if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
+		hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
+			      HRTIMER_MODE_REL_SOFT);
+	else
+		hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
 	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
 }
 
@@ -719,8 +721,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
  * prb_calc_retire_blk_tmo() calculates the tmo.
  *
  */
-static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
 {
+	enum hrtimer_restart ret = HRTIMER_RESTART;
 	struct packet_sock *po =
 		timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
 	struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
@@ -732,8 +735,10 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
 	frozen = prb_queue_frozen(pkc);
 	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
 
-	if (unlikely(pkc->delete_blk_timer))
+	if (unlikely(pkc->delete_blk_timer)) {
+		ret = HRTIMER_NORESTART;
 		goto out;
+	}
 
 	/* We only need to plug the race when the block is partially filled.
 	 * tpacket_rcv:
@@ -786,10 +791,11 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
 	}
 
 refresh_timer:
-	_prb_refresh_rx_retire_blk_timer(pkc);
+	_prb_refresh_rx_retire_blk_timer(pkc, false);
 
 out:
 	spin_unlock(&po->sk.sk_receive_queue.lock);
+	return ret;
 }
 
 static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
@@ -921,7 +927,7 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
 	pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
 
 	prb_thaw_queue(pkc1);
-	_prb_refresh_rx_retire_blk_timer(pkc1);
+	_prb_refresh_rx_retire_blk_timer(pkc1, true);
 
 	smp_wmb();
 }
diff --git a/net/packet/diag.c b/net/packet/diag.c
index 6ce1dcc28..c8f43e0c1 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
 	pdr.pdr_frame_nr = ring->frame_max + 1;
 
 	if (ver > TPACKET_V2) {
-		pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
+		pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
 		pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
 		pdr.pdr_features = ring->prb_bdqc.feature_req_word;
 	} else {
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 1e743d031..19d4f0b73 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -45,12 +45,11 @@ struct tpacket_kbdq_core {
 	/* Default is set to 8ms */
 #define DEFAULT_PRB_RETIRE_TOV	(8)
 
-	unsigned short  retire_blk_tov;
+	ktime_t		interval_ktime;
 	unsigned short  version;
-	unsigned long	tov_in_jiffies;
 
 	/* timer to retire an outstanding block */
-	struct timer_list retire_blk_timer;
+	struct hrtimer  retire_blk_timer;
 };
 
 struct pgv {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
  2025-08-20  9:29 Xin Zhao
@ 2025-08-20 11:15 ` Willem de Bruijn
  0 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2025-08-20 11:15 UTC (permalink / raw)
  To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao

Xin Zhao wrote:
> In a system with high real-time requirements, the timeout mechanism of
> ordinary timers with jiffies granularity is insufficient to meet the
> demands for real-time performance. Meanwhile, the optimization of CPU
> usage with af_packet is quite significant. Use hrtimer instead of timer
> to help compensate for the shortcomings in real-time performance.
> In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> This is unacceptable in some high real-time systems that require timely
> processing of network packets. By replacing it with hrtimer, if a timeout
> of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> 3 ms.
> 
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> 
> ---
> Changes in v6:
> - Use hrtimer_is_queued instead to check whether it is within the callback function.
>   So do not need to add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer
>   as suggested by Willem de Bruijn;
> - Do not need local_irq_save and local_irq_restore to protect the race of the timer
>   callback running in softirq context or the open_block from tpacket_rcv in process
>   context
>   as suggested by Willem de Bruijn;
> 
> Changes in v5:
> - Remove the unnecessary comments at the top of the _prb_refresh_rx_retire_blk_timer,
>   branch is self-explanatory enough
>   as suggested by Willem de Bruijn;
> - Indentation of _prb_refresh_rx_retire_blk_timer, align with first argument on
>   previous line
>   as suggested by Willem de Bruijn;
> - Do not call hrtimer_start within the hrtimer callback
>   as suggested by Willem de Bruijn
>   So add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer to indicate
>   whether it is within the callback function. Use hrtimer_forward_now instead of
>   hrtimer_start when it is in the callback function and is doing prb_open_block.
> - Link to v5: https://lore.kernel.org/all/20250819091447.1199980-1-jackzxcui1989@163.com/
> 
> Changes in v4:
> - Add 'bool start' to distinguish whether the call to _prb_refresh_rx_retire_blk_timer
>   is for prb_open_block. When it is for prb_open_block, execute hrtimer_start to
>   (re)start the hrtimer; otherwise, use hrtimer_forward_now to set the expiration
>   time as it is more commonly used compared to hrtimer_set_expires.
>   as suggested by Willem de Bruijn;
> - Delete the comments to explain why hrtimer_set_expires(not hrtimer_forward_now)
>   is used, as we do not use hrtimer_set_expires any more;
> - Link to v4: https://lore.kernel.org/all/20250818050233.155344-1-jackzxcui1989@163.com/
> 
> Changes in v3:
> - return HRTIMER_NORESTART when pkc->delete_blk_timer is true
>   as suggested by Willem de Bruijn;
> - Drop the retire_blk_tov field of tpacket_kbdq_core, add interval_ktime instead
>   as suggested by Willem de Bruijn;
> - Add comments to explain why hrtimer_set_expires(not hrtimer_forward_now) is used in
>   _prb_refresh_rx_retire_blk_timer
>   as suggested by Willem de Bruijn;
> - Link to v3: https://lore.kernel.org/all/20250816170130.3969354-1-jackzxcui1989@163.com/
> 
> Changes in v2:
> - Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
>   as suggested by Willem de Bruijn;
> - Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/
> 
> Changes in v1:
> - Do not add another config for the current changes
>   as suggested by Eric Dumazet;
> - Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
>   as suggested by Eric Dumazet;
> - Add some performance details to the changelog
>   as suggested by Ferenc Fejes;
> - Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
>   as suggested by Willem de Bruijn;
> - Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
>   as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
> - Just return HRTIMER_RESTART directly as all cases return the same value
>   as suggested by Willem de Bruijn;
> - Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
> - Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
> ---
>  net/packet/af_packet.c | 40 +++++++++++++++++++++++-----------------
>  net/packet/diag.c      |  2 +-
>  net/packet/internal.h  |  5 ++---
>  3 files changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index a7017d7f0..9b13939a6 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -203,8 +203,8 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
>  static int prb_queue_frozen(struct tpacket_kbdq_core *);
>  static void prb_open_block(struct tpacket_kbdq_core *,
>  		struct tpacket_block_desc *);
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *);
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
> +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *, bool);
>  static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
>  static void prb_clear_rxhash(struct tpacket_kbdq_core *,
>  		struct tpacket3_hdr *);
> @@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
>  
>  static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
>  {
> -	timer_delete_sync(&pkc->retire_blk_timer);
> +	hrtimer_cancel(&pkc->retire_blk_timer);
>  }
>  
>  static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
> @@ -603,9 +603,8 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
>  	struct tpacket_kbdq_core *pkc;
>  
>  	pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> -	timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> -		    0);
> -	pkc->retire_blk_timer.expires = jiffies;
> +	hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> +		      CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
>  }
>  
>  static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> @@ -672,11 +671,10 @@ static void init_prb_bdqc(struct packet_sock *po,
>  	p1->last_kactive_blk_num = 0;
>  	po->stats.stats3.tp_freeze_q_cnt = 0;
>  	if (req_u->req3.tp_retire_blk_tov)
> -		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> +		p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
>  	else
> -		p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> -						req_u->req3.tp_block_size);
> -	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> +		p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> +						req_u->req3.tp_block_size));
>  	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
>  	rwlock_init(&p1->blk_fill_in_prog_lock);
>  
> @@ -689,10 +687,14 @@ static void init_prb_bdqc(struct packet_sock *po,
>  /*  Do NOT update the last_blk_num first.
>   *  Assumes sk_buff_head lock is held.
>   */
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> +					     bool start)
>  {
> -	mod_timer(&pkc->retire_blk_timer,
> -			jiffies + pkc->tov_in_jiffies);
> +	if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
> +		hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> +			      HRTIMER_MODE_REL_SOFT);
> +	else
> +		hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);

Is the hrtimer still queued when prb_retire_rx_blk_timer_expired
fires? Based on the existence of hrtimer_forward_now, I assume so. But
have not checked yet. If so, hrtimer_is_queued alone suffices to
detect the other callstack from tpacket_rcv where hrtimer_start is
needed. No need for bool start?

>  	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
>  }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-21  8:53 Xin Zhao
  2025-08-21 14:32 ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Zhao @ 2025-08-21  8:53 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

On Wed, 2025-08-20 at 19:15 +0800, Willem wrote:

> > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> > +					     bool start)
> >  {
> > -	mod_timer(&pkc->retire_blk_timer,
> > -			jiffies + pkc->tov_in_jiffies);
> > +	if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
> > +		hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > +			      HRTIMER_MODE_REL_SOFT);
> > +	else
> > +		hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> 
> Is the hrtimer still queued when prb_retire_rx_blk_timer_expired
> fires? Based on the existence of hrtimer_forward_now, I assume so. But
> have not checked yet. If so, hrtimer_is_queued alone suffices to
> detect the other callstack from tpacket_rcv where hrtimer_start is
> needed. No need for bool start?
> 
> >  	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> >  }

Since the CI tests have reported the previously mentioned WARN_ON situation within
hrtimer_forward_now, I believe we should reevaluate the implementation of the
_prb_refresh_rx_retire_blk_timer function, which is responsible for setting the
hrtimer timeout, and establish the principles it should adhere to. After careful
consideration and a detailed review of the hrtimer implementation, I have identified
the following two principles:

1. It is essential to ensure that calls to hrtimer_forward_now or hrtimer_set_expires
occur strictly within the hrtimer's callback.
2. It is critical to ensure that no calls to hrtimer_forward_now or hrtimer_set_expires
are made while the hrtimer is enqueued.

Regarding these two principles, I would like to add three points:
1. In principle 1, if hrtimer_forward_now or hrtimer_set_expires is called outside of
the hrtimer's callback without triggering the timer's enqueue, it will lead to the
hrtimer timeout not being triggered as expected (this issue is obvious and can be
reproduced by writing a kernel object and setting a short timeout, such as 2ms).
2. Both principles above are aimed at preventing scenarios where hrtimer_forward_now
or hrtimer_set_expires modify the timeout while the hrtimer is already enqueued, which
could lead to disarray in the hrtimer's red-black tree (after all, the WARN_ON check
for enqueue in the non-inlined hrtimer_forward_now interface exists to prevent such
situations). It is also important to note that apart from executing the hrtimer_start
series of functions outside the hrtimer callback, the __run_hrtimer function, upon
returning HRTIMER_RESTART after executing the hrtimer callback, will also enqueue the
hrtimer.
3. The reason for principle 2, in addition to principle 1, is that when setting the
timeout using hrtimer_forward_now in the hrtimer's callback, there is no protection
provided by the lock for hrtimer_cpu_base, meaning that if an hrtimer_start action is
performed outside the hrtimer's callback while simultaneously updating the timeout
within the callback, it could cause disarray in the hrtimer's red-black tree.

The occurrence of the WARN_ON enqueue error in the CI test indicates that
hrtimer_forward_now was executed in a scenario outside the hrtimer's callback while
the hrtimer was in a queued state. A possible sequence that could cause this issue is
as follows:
cpu0 (softirq context, hrtimer timeout)                             cpu1 (process context, need prb_open_block)
hrtimer_run_softirq
  __hrtimer_run_queues
    __run_hrtimer
      _prb_refresh_rx_retire_blk_timer
        spin_lock(&po->sk.sk_receive_queue.lock);
        hrtimer_is_queued(&pkc->retire_blk_timer) == false
        hrtimer_forward_now
        spin_unlock(&po->sk.sk_receive_queue.lock)                 tpacket_rcv
      enqueue_hrtimer                                                spin_lock(&sk->sk_receive_queue.lock);
                                                                     packet_current_rx_frame
                                                                       __packet_lookup_frame_in_block
                                                                         prb_open_block
                                                                           _prb_refresh_rx_retire_blk_timer
                                                                             hrtimer_is_queued(&pkc->retire_blk_timer) == true
                                                                             hrtimer_forward_now
                                                                             WARN_ON

In summary, the key issue now is to find a mechanism to ensure that the hrtimer's start
or enqueue, as well as the timeout settings for the hrtimer, cannot be executed
concurrently. I have thought of two methods to address this issue (method 1 will make the
code appear much simpler, while method 2 will make the code more complex):

Method 1:
Do not call hrtimer_forward_now to set the timeout within the hrtimer's callback; instead,
only call the hrtimer_start function to perform the hrtimer's enqueue. This approach is
viable because, in the current version, inside __run_hrtimer, the state of the timer is
checked under the protection of cpu_base->lock. If the timer is already enqueued, it will
not be enqueued again. By doing this, all hrtimer enqueues will be protected under both
sk_receive_queue.lock and cpu_base->lock, eliminating concerns about the timeout being
concurrently modified during enqueueing, which could lead to chaos in the hrtimer's
red-black tree.

Method 2:
This method primarily aims to strictly ensure that hrtimer_start is not called within the
hrtimer's callback. However, doing so would require a lot of additional logic:
We would need to add a callback parameter to strictly ensure that hrtimer_forward_now is
executed within the callback and hrtimer_start is executed outside the callback. The
occurrence of the WARN_ON in the CI test indicates that the method of using
"hrtimer_is_queued to make the judgment" does not cover all scenarios. The fundamental
reason for this is that the hrtimer_is_queued check must be precise, which requires
protection from cpu_base->lock. Similarly, using hrtimer_callback_running check would not
achieve precise judgment either. It is necessary to know on which CPU the hrtimer is running
and send an IPI to execute hrtimer_forward_now using local_irq_save on that CPU to satisfy
the aforementioned principle 2), as it is inappropriate to acquire the cpu_base->lock within
the af_packet logic; the only way to ensure that the hrtimer_forward_now operation is
executed without enqueueing the hrtimer is by disabling IRQs.

Since strictly ensuring that hrtimer_start is not called within the hrtimer's callback leads
to a lot of extra logic, and logically, I have also demonstrated that it is permissible to
call hrtimer_start within the hrtimer's callback (for the hrtimer module, the lock is
cpu_base->lock, which is associated with the clock base where the hrtimer resides, and does
not follow smp_processor_id()), it does not matter whether hrtimer_start is executed by the
CPU executing the hrtimer callback or by another CPU; both scenarios are the same for the
hrtimer module. TTherefore, I prefer to use the aforementioned method 1) to resolve this
issue. If there are no concerns, I will reflect this in PATCH v7.

Thanks
Xin Zhao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
  2025-08-21  8:53 Xin Zhao
@ 2025-08-21 14:32 ` Willem de Bruijn
  0 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2025-08-21 14:32 UTC (permalink / raw)
  To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

Xin Zhao wrote:
> On Wed, 2025-08-20 at 19:15 +0800, Willem wrote:
> 
> > > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > > +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> > > +					     bool start)
> > >  {
> > > -	mod_timer(&pkc->retire_blk_timer,
> > > -			jiffies + pkc->tov_in_jiffies);
> > > +	if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
> > > +		hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > > +			      HRTIMER_MODE_REL_SOFT);
> > > +	else
> > > +		hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> > 
> > Is the hrtimer still queued when prb_retire_rx_blk_timer_expired
> > fires? Based on the existence of hrtimer_forward_now, I assume so. But
> > have not checked yet. If so, hrtimer_is_queued alone suffices to
> > detect the other callstack from tpacket_rcv where hrtimer_start is
> > needed. No need for bool start?
> > 
> > >  	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> > >  }
> 
> 
> Since the CI tests have reported the previously mentioned WARN_ON situation within
> hrtimer_forward_now, I believe we should reevaluate the implementation of the
> _prb_refresh_rx_retire_blk_timer function, which is responsible for setting the
> hrtimer timeout, and establish the principles it should adhere to. After careful
> consideration and a detailed review of the hrtimer implementation, I have identified
> the following two principles:
> 
> 1. It is essential to ensure that calls to hrtimer_forward_now or hrtimer_set_expires
> occur strictly within the hrtimer's callback.
> 2. It is critical to ensure that no calls to hrtimer_forward_now or hrtimer_set_expires
> are made while the hrtimer is enqueued.
> 
> 
> Regarding these two principles, I would like to add three points:
> 1. In principle 1, if hrtimer_forward_now or hrtimer_set_expires is called outside of
> the hrtimer's callback without triggering the timer's enqueue, it will lead to the
> hrtimer timeout not being triggered as expected (this issue is obvious and can be
> reproduced by writing a kernel object and setting a short timeout, such as 2ms).
> 2. Both principles above are aimed at preventing scenarios where hrtimer_forward_now
> or hrtimer_set_expires modify the timeout while the hrtimer is already enqueued, which
> could lead to disarray in the hrtimer's red-black tree (after all, the WARN_ON check
> for enqueue in the non-inlined hrtimer_forward_now interface exists to prevent such
> situations). It is also important to note that apart from executing the hrtimer_start
> series of functions outside the hrtimer callback, the __run_hrtimer function, upon
> returning HRTIMER_RESTART after executing the hrtimer callback, will also enqueue the
> hrtimer.
> 3. The reason for principle 2, in addition to principle 1, is that when setting the
> timeout using hrtimer_forward_now in the hrtimer's callback, there is no protection
> provided by the lock for hrtimer_cpu_base, meaning that if an hrtimer_start action is
> performed outside the hrtimer's callback while simultaneously updating the timeout
> within the callback, it could cause disarray in the hrtimer's red-black tree.
> 
> The occurrence of the WARN_ON enqueue error in the CI test indicates that
> hrtimer_forward_now was executed in a scenario outside the hrtimer's callback while
> the hrtimer was in a queued state. A possible sequence that could cause this issue is
> as follows:
> cpu0 (softirq context, hrtimer timeout)                             cpu1 (process context, need prb_open_block)
> hrtimer_run_softirq
>   __hrtimer_run_queues
>     __run_hrtimer
>       _prb_refresh_rx_retire_blk_timer
>         spin_lock(&po->sk.sk_receive_queue.lock);
>         hrtimer_is_queued(&pkc->retire_blk_timer) == false
>         hrtimer_forward_now
>         spin_unlock(&po->sk.sk_receive_queue.lock)                 tpacket_rcv
>       enqueue_hrtimer                                                spin_lock(&sk->sk_receive_queue.lock);
>                                                                      packet_current_rx_frame
>                                                                        __packet_lookup_frame_in_block
>                                                                          prb_open_block
>                                                                            _prb_refresh_rx_retire_blk_timer
>                                                                              hrtimer_is_queued(&pkc->retire_blk_timer) == true
>                                                                              hrtimer_forward_now
>                                                                              WARN_ON
> 
> In summary, the key issue now is to find a mechanism to ensure that the hrtimer's start
> or enqueue, as well as the timeout settings for the hrtimer, cannot be executed
> concurrently. I have thought of two methods to address this issue (method 1 will make the
> code appear much simpler, while method 2 will make the code more complex):
> 
> Method 1:
> Do not call hrtimer_forward_now to set the timeout within the hrtimer's callback; instead,
> only call the hrtimer_start function to perform the hrtimer's enqueue. This approach is
> viable because, in the current version, inside __run_hrtimer, the state of the timer is
> checked under the protection of cpu_base->lock. If the timer is already enqueued, it will
> not be enqueued again. By doing this, all hrtimer enqueues will be protected under both
> sk_receive_queue.lock and cpu_base->lock, eliminating concerns about the timeout being
> concurrently modified during enqueueing, which could lead to chaos in the hrtimer's
> red-black tree.
> 
> Method 2:
> This method primarily aims to strictly ensure that hrtimer_start is not called within the
> hrtimer's callback. However, doing so would require a lot of additional logic:
> We would need to add a callback parameter to strictly ensure that hrtimer_forward_now is
> executed within the callback and hrtimer_start is executed outside the callback. The
> occurrence of the WARN_ON in the CI test indicates that the method of using
> "hrtimer_is_queued to make the judgment" does not cover all scenarios. The fundamental
> reason for this is that the hrtimer_is_queued check must be precise, which requires
> protection from cpu_base->lock. Similarly, using hrtimer_callback_running check would not
> achieve precise judgment either. It is necessary to know on which CPU the hrtimer is running
> and send an IPI to execute hrtimer_forward_now using local_irq_save on that CPU to satisfy
> the aforementioned principle 2), as it is inappropriate to acquire the cpu_base->lock within
> the af_packet logic; the only way to ensure that the hrtimer_forward_now operation is
> executed without enqueueing the hrtimer is by disabling IRQs.
> 
> Since strictly ensuring that hrtimer_start is not called within the hrtimer's callback leads
> to a lot of extra logic, and logically, I have also demonstrated that it is permissible to
> call hrtimer_start within the hrtimer's callback (for the hrtimer module, the lock is
> cpu_base->lock, which is associated with the clock base where the hrtimer resides, and does
> not follow smp_processor_id()), it does not matter whether hrtimer_start is executed by the
> CPU executing the hrtimer callback or by another CPU; both scenarios are the same for the
> hrtimer module. TTherefore, I prefer to use the aforementioned method 1) to resolve this
> issue. If there are no concerns, I will reflect this in PATCH v7.

Thanks for the analysis.

Using hrtimer_start from within the callback that returns
HRTIMER_RESTART does not sound in line with the intention of the API
to me.

I think we should just adjust and restart from within the callback and
hrtimer_start from tpacket_rcv iff the timer is not yet queued.

Since all these modifications are made while the receive queue lock is
held I don't immediately see why we would need additional mutual
exclusion beyond that.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-21 15:30 Xin Zhao
  2025-08-22  6:37 ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Zhao @ 2025-08-21 15:30 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

On Thu, 2025-08-21 at 22:32 +0800, Willem wrote:

> Thanks for the analysis.
> 
> Using hrtimer_start from within the callback that returns
> HRTIMER_RESTART does not sound in line with the intention of the API
> to me.
> 
> I think we should just adjust and restart from within the callback and
> hrtimer_start from tpacket_rcv iff the timer is not yet queued.
> 
> Since all these modifications are made while the receive queue lock is
> held I don't immediately see why we would need additional mutual
> exclusion beyond that.


The hrtimer callback is called by __run_hrtimer, if we only use hrtimer_forward_now in the callback,
it will not restart the time within the callback. The timer will be enqueued after the callback
return. So when the timer is being enqueued, it is not protected by sk_receive_queue.lock.

Consider the following timing sequence:
timer   cpu0 (softirq context, hrtimer timeout)                cpu
0       hrtimer_run_softirq
1         __hrtimer_run_queues
2           __run_hrtimer
3             prb_retire_rx_blk_timer_expired
4               spin_lock(&po->sk.sk_receive_queue.lock);
5               _prb_refresh_rx_retire_blk_timer
6                 hrtimer_forward_now
7               spin_unlock(&po->sk.sk_receive_queue.lock)
8             raw_spin_lock_irq(&cpu_base->lock);              tpacket_rcv
9             enqueue_hrtimer                                    spin_lock(&sk->sk_receive_queue.lock);
10                                                               packet_current_rx_frame
11                                                                 __packet_lookup_frame_in_block
12            finish enqueue_hrtimer                                 prb_open_block
13                                                                     _prb_refresh_rx_retire_blk_timer
14                                                                       hrtimer_is_queued(&pkc->retire_blk_timer) == true
15                                                                       hrtimer_forward_now
16                                                                         WARN_ON

On cpu0 in the timing sequence above, enqueue_hrtimer is not protected by sk_receive_queue.lock,
while the hrtimer_forward_now is not protected by raw_spin_lock_irq(&cpu_base->lock).

It will cause WARN_ON if we only use 'hrtimer_is_queued(&pkc->retire_blk_timer) == true' to check
whether to call hrtimer_forward_now.


Thanks
Xin Zhao


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
  2025-08-21 15:30 Xin Zhao
@ 2025-08-22  6:37 ` Willem de Bruijn
  0 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2025-08-22  6:37 UTC (permalink / raw)
  To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

Xin Zhao wrote:
> On Thu, 2025-08-21 at 22:32 +0800, Willem wrote:
> 
> > Thanks for the analysis.
> > 
> > Using hrtimer_start from within the callback that returns
> > HRTIMER_RESTART does not sound in line with the intention of the API
> > to me.
> > 
> > I think we should just adjust and restart from within the callback and
> > hrtimer_start from tpacket_rcv iff the timer is not yet queued.
> > 
> > Since all these modifications are made while the receive queue lock is
> > held I don't immediately see why we would need additional mutual
> > exclusion beyond that.
> 
> 
> The hrtimer callback is called by __run_hrtimer, if we only use hrtimer_forward_now in the callback,
> it will not restart the time within the callback. The timer will be enqueued after the callback
> return. So when the timer is being enqueued, it is not protected by sk_receive_queue.lock.

I see.
 
> Consider the following timing sequence:
> timer   cpu0 (softirq context, hrtimer timeout)                cpu
> 0       hrtimer_run_softirq
> 1         __hrtimer_run_queues
> 2           __run_hrtimer
> 3             prb_retire_rx_blk_timer_expired
> 4               spin_lock(&po->sk.sk_receive_queue.lock);
> 5               _prb_refresh_rx_retire_blk_timer
> 6                 hrtimer_forward_now
> 7               spin_unlock(&po->sk.sk_receive_queue.lock)
> 8             raw_spin_lock_irq(&cpu_base->lock);              tpacket_rcv
> 9             enqueue_hrtimer                                    spin_lock(&sk->sk_receive_queue.lock);
> 10                                                               packet_current_rx_frame
> 11                                                                 __packet_lookup_frame_in_block
> 12            finish enqueue_hrtimer                                 prb_open_block
> 13                                                                     _prb_refresh_rx_retire_blk_timer
> 14                                                                       hrtimer_is_queued(&pkc->retire_blk_timer) == true
> 15                                                                       hrtimer_forward_now
> 16                                                                         WARN_ON
> 
> On cpu0 in the timing sequence above, enqueue_hrtimer is not protected by sk_receive_queue.lock,
> while the hrtimer_forward_now is not protected by raw_spin_lock_irq(&cpu_base->lock).
> 
> It will cause WARN_ON if we only use 'hrtimer_is_queued(&pkc->retire_blk_timer) == true' to check
> whether to call hrtimer_forward_now.

One way around this may be to keep the is_timer_queued state inside
tpacket_kbdq_core protected by a relevant lock, like the receive queue
lock. Similar to pkc->delete_blk_timer.

Admittedly I have not given this much thought yet. Am traveling for a
few days, have limited time.
 
> 
> Thanks
> Xin Zhao
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-22 10:16 Xin Zhao
  0 siblings, 0 replies; 9+ messages in thread
From: Xin Zhao @ 2025-08-22 10:16 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

On Fri, 2025-08-22 at 14:37 +0800, Willem wrote:

> The hrtimer callback is called by __run_hrtimer, if we only use hrtimer_forward_now in the callback,
> it will not restart the time within the callback. The timer will be enqueued after the callback
> return. So when the timer is being enqueued, it is not protected by sk_receive_queue.lock.

I see.

> > Consider the following timing sequence:
> > timer   cpu0 (softirq context, hrtimer timeout)                cpu
> > 0       hrtimer_run_softirq
> > 1         __hrtimer_run_queues
> > 2           __run_hrtimer
> > 3             prb_retire_rx_blk_timer_expired
> > 4               spin_lock(&po->sk.sk_receive_queue.lock);
> > 5               _prb_refresh_rx_retire_blk_timer
> > 6                 hrtimer_forward_now
> > 7               spin_unlock(&po->sk.sk_receive_queue.lock)
> > 8             raw_spin_lock_irq(&cpu_base->lock);              tpacket_rcv
> > 9             enqueue_hrtimer                                    spin_lock(&sk->sk_receive_queue.lock);
> > 10                                                               packet_current_rx_frame
> > 11                                                                 __packet_lookup_frame_in_block
> > 12            finish enqueue_hrtimer                                 prb_open_block
> > 13                                                                     _prb_refresh_rx_retire_blk_timer
> > 14                                                                       hrtimer_is_queued(&pkc->retire_blk_timer) == true
> > 15                                                                       hrtimer_forward_now
> > 16                                                                         WARN_ON
> > 
> > On cpu0 in the timing sequence above, enqueue_hrtimer is not protected by sk_receive_queue.lock,
> > while the hrtimer_forward_now is not protected by raw_spin_lock_irq(&cpu_base->lock).
> > 
> > It will cause WARN_ON if we only use 'hrtimer_is_queued(&pkc->retire_blk_timer) == true' to check
> > whether to call hrtimer_forward_now.
> 
> One way around this may be to keep the is_timer_queued state inside
> tpacket_kbdq_core protected by a relevant lock, like the receive queue
> lock. Similar to pkc->delete_blk_timer.
> 
> Admittedly I have not given this much thought yet. Am traveling for a
> few days, have limited time.

Thank you for replying to me during your break. I later thought of a way to ensure that the enqueue of
the hrtimer can be set in an ordered manner without adding new locks or using local_irq_save. I will
reflect this in version 7.

Additionally, we still need the 'bool callback' parameter to determine whether we are inside the
hrtimer's callback, while 'bool start' parameter is unnecessary.

Why should we keep the callback parameter?

As mentioned earlier, the enqueue action occurs after exiting the hrtimer callback, and this enqueue
action is not performed in the af_packet code. This could lead to the timer's state being changed back
to enqueued at an uncertain time, which does not provide a timing guarantee for our logic in
_prb_refresh_rx_retire_blk_timer, where we check the status using hrtimer_is_queued.

As previously discussed, I said that we must accurately determine whether we are in the hrtimer callback
in _prb_refresh_rx_retire_blk_timer. Relying on either hrtimer_is_queued or hrtimer_callback_running
would not provide sufficient accuracy. However, adding a callback variable as a parameter would be a
reliable approach, requiring minimal code changes and making it easier for future readers to understand.
Therefore, I will also add this 'bool callback' parameter in version 7.

Thanks
Xin Zhao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-25  5:06 Xin Zhao
  2025-08-25 16:20 ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Zhao @ 2025-08-25  5:06 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

On Mon, 2025-08-25 at 2:08 +0800, Willem wrote:

> This is getting more complex than needed.
> 
> Essentially the lifecycle is that packet_set_ring calls hrtimer_setup
> and hrtimer_del_sync.
> 
> Inbetween, while the ring is configured, the timer is either
> 
> - scheduled from tpacket_rcv and !is_scheduled
>     -> call hrtimer_start
> - scheduled from tpacket_rcv and is_scheduled
>     -> call hrtimer_set_expires

We cannot use hrtimer_set_expires/hrtimer_forward_now when a hrtimer is
already enqueued.  
The WARN_ON(timer->state & HRTIMER_STATE_ENQUEUED) in hrtimer_forward
already clearly indicates this point. The reason for not adding this
WARN_ON in hrtimer_set_expires is that hrtimer_set_expires is an inline
function, wory about increase code size.
The implementation of perf_mux_hrtimer_restart actually checks whether
the hrtimer is active when restarting the hrtimer.

static int perf_mux_hrtimer_restart(struct perf_cpu_pmu_context *cpc)
{
	struct hrtimer *timer = &cpc->hrtimer;
	unsigned long flags;

	raw_spin_lock_irqsave(&cpc->hrtimer_lock, flags);
	if (!cpc->hrtimer_active) {
		cpc->hrtimer_active = 1;
		hrtimer_forward_now(timer, cpc->hrtimer_interval);
		hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED_HARD);
	}
	raw_spin_unlock_irqrestore(&cpc->hrtimer_lock, flags);

	return 0;
}

Therefore, according to the overall design of the hrtimer, once the
hrtimer is active, it is not allowed to set the timeout outside of the
hrtimer callback nor is it allowed to restart the hrtimer.

So two ways to update the hrtimer timeout:
1. update expire time in the callback
2. Call the hrtimer_cancel and then call hrtimer_start
According to your suggestion, we don't call hrtimer_start inside the
callback, would you accept calling hrtimer_cancel first and then calling
hrtimer_start in the callback? However, this approach also requires
attention, as hrtimer_cancel will block until the callback is running,
so it is essential to ensure that it is not called within the hrtimer
callback; otherwise, it could lead to a deadlock.

> - rescheduled from the timer callback
>     -> call hrtimer_set_expires and return HRTIMER_RESTART
> 
> The only complication is that the is_scheduled check can race with the
> HRTIMER_RESTART restart, as that happens outside the sk_receive_queue
> critical section.
> 
> One option that I suggested before is to convert pkc->delete_blk_timer
> to pkc->blk_timer_scheduled to record whether the timer is scheduled
> without relying on hrtimer_is_queued. Set it on first open_block and
> clear it from the callback when returning HR_NORESTART.

Do you agree with adding a callback variable to distinguish between
scheduled from tpacket_rcv and scheduled from the callback? I really
couldn't think of a better solution.

So, a possible solution may be?
1. Continue to keep the callback parameter to strictly ensure whether it
is within the callback.
2. Use hrtimer_set_expires within the callback to update the timeout (the
hrtimer module will enqueue the hrtimer when callback return)
3. If it is not in callback, call hrtimer_cancel + hrtimer_start to restart
the timer.
4. To avoid the potential issue of the enqueue in step 2 and the
hrtimer_start in step 3 happening simultaneously, which could lead to
hrtimer_start being triggered twice in a very short period, the logic should
be:
if (hrtimer_cancel(...))
    hrtimer_start(...);
Additionally, the hrtimer_cancel check will also avoid hrtimer callback
triggered once more when just called prb_del_retire_blk_timer by packet_set_ring.
The hrtimer should be in an active state beginning from when
prb_setup_retire_blk_timer is called to the time when prb_del_retire_blk_timer
is called.

Thanks
Xin Zhao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
  2025-08-25  5:06 Xin Zhao
@ 2025-08-25 16:20 ` Willem de Bruijn
  0 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2025-08-25 16:20 UTC (permalink / raw)
  To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

Xin Zhao wrote:
> On Mon, 2025-08-25 at 2:08 +0800, Willem wrote:
> 
> > This is getting more complex than needed.
> > 
> > Essentially the lifecycle is that packet_set_ring calls hrtimer_setup
> > and hrtimer_del_sync.
> > 
> > Inbetween, while the ring is configured, the timer is either
> > 
> > - scheduled from tpacket_rcv and !is_scheduled
> >     -> call hrtimer_start
> > - scheduled from tpacket_rcv and is_scheduled
> >     -> call hrtimer_set_expires
> 
> We cannot use hrtimer_set_expires/hrtimer_forward_now when a hrtimer is
> already enqueued.  

Perhaps we need to simplify and stop trying to adjust the timer from
tpacket_rcv once scheduled. Let the callback handle that.

> The WARN_ON(timer->state & HRTIMER_STATE_ENQUEUED) in hrtimer_forward
> already clearly indicates this point. The reason for not adding this
> WARN_ON in hrtimer_set_expires is that hrtimer_set_expires is an inline
> function, wory about increase code size.
> The implementation of perf_mux_hrtimer_restart actually checks whether
> the hrtimer is active when restarting the hrtimer.
> 
> static int perf_mux_hrtimer_restart(struct perf_cpu_pmu_context *cpc)
> {
> 	struct hrtimer *timer = &cpc->hrtimer;
> 	unsigned long flags;
> 
> 	raw_spin_lock_irqsave(&cpc->hrtimer_lock, flags);
> 	if (!cpc->hrtimer_active) {
> 		cpc->hrtimer_active = 1;
> 		hrtimer_forward_now(timer, cpc->hrtimer_interval);
> 		hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED_HARD);
> 	}
> 	raw_spin_unlock_irqrestore(&cpc->hrtimer_lock, flags);
> 
> 	return 0;
> }
> 
> Therefore, according to the overall design of the hrtimer, once the
> hrtimer is active, it is not allowed to set the timeout outside of the
> hrtimer callback nor is it allowed to restart the hrtimer.
> 
> So two ways to update the hrtimer timeout:
> 1. update expire time in the callback
> 2. Call the hrtimer_cancel and then call hrtimer_start

1 seems preferable. The intent of the API.

> According to your suggestion, we don't call hrtimer_start inside the
> callback, would you accept calling hrtimer_cancel first and then calling
> hrtimer_start in the callback? However, this approach also requires
> attention, as hrtimer_cancel will block until the callback is running,
> so it is essential to ensure that it is not called within the hrtimer
> callback; otherwise, it could lead to a deadlock.
> 
> 
> > - rescheduled from the timer callback
> >     -> call hrtimer_set_expires and return HRTIMER_RESTART
> > 
> > The only complication is that the is_scheduled check can race with the
> > HRTIMER_RESTART restart, as that happens outside the sk_receive_queue
> > critical section.
> > 
> > One option that I suggested before is to convert pkc->delete_blk_timer
> > to pkc->blk_timer_scheduled to record whether the timer is scheduled
> > without relying on hrtimer_is_queued. Set it on first open_block and
> > clear it from the callback when returning HR_NORESTART.
> 
> Do you agree with adding a callback variable to distinguish between
> scheduled from tpacket_rcv and scheduled from the callback? I really
> couldn't think of a better solution.

Yes, no objections to that if necessary.
> 
> 
> So, a possible solution may be?
> 1. Continue to keep the callback parameter to strictly ensure whether it
> is within the callback.
> 2. Use hrtimer_set_expires within the callback to update the timeout (the
> hrtimer module will enqueue the hrtimer when callback return)
> 3. If it is not in callback, call hrtimer_cancel + hrtimer_start to restart
> the timer.

Instead, I would use an in_scheduled param, as in my previous reply and
simply skip trying to schedule if already scheduled.

> 4. To avoid the potential issue of the enqueue in step 2 and the
> hrtimer_start in step 3 happening simultaneously, which could lead to
> hrtimer_start being triggered twice in a very short period, the logic should
> be:
> if (hrtimer_cancel(...))
>     hrtimer_start(...);
> Additionally, the hrtimer_cancel check will also avoid hrtimer callback
> triggered once more when just called prb_del_retire_blk_timer by packet_set_ring.
> The hrtimer should be in an active state beginning from when
> prb_setup_retire_blk_timer is called to the time when prb_del_retire_blk_timer
> is called.
> 
> 
> Thanks
> Xin Zhao
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-25 16:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 10:16 [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
  -- strict thread matches above, loose matches on Subject: below --
2025-08-25  5:06 Xin Zhao
2025-08-25 16:20 ` Willem de Bruijn
2025-08-21 15:30 Xin Zhao
2025-08-22  6:37 ` Willem de Bruijn
2025-08-21  8:53 Xin Zhao
2025-08-21 14:32 ` Willem de Bruijn
2025-08-20  9:29 Xin Zhao
2025-08-20 11:15 ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).