All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v10 0/2] net: af_packet: optimize retire operation
@ 2025-08-31 10:08 Xin Zhao
  2025-08-31 10:08 ` [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field Xin Zhao
  2025-08-31 10:08 ` [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
  0 siblings, 2 replies; 11+ messages in thread
From: Xin Zhao @ 2025-08-31 10:08 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao

In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.

---
Changes in v10:
- kactive_blk_num (K) is incremented on block close. last_kactive_blk_num (L)
  is set to match K on block open and each timer. So the only time that they
  differ is if a block is closed in tpacket_rcv and no new block could be
  opened. So the origin check L==K in timer callback only skip the case 'no
  new block to open'. If we remove L==K check, it will make prb_curr_blk_in_use
  check earlier, which will not cause any side effect
  as suggested by Willem de Bruijn.
- Submit a precursor patch that removes last_kactive_blk_num
  as suggested by Willem de Bruijn.

Changes in v9:
- Remove the function prb_setup_retire_blk_timer and move hrtimer setup and start
  logic into function init_prb_bdqc
  as suggested by Willem de Bruijn.
- Always update last_kactive_blk_num before hrtimer callback return as the origin
  logic does, as suggested by Willem de Bruijn.
  In tpacket_rcv, it may call prb_close_block but do not call prb_open_block in
  prb_dispatch_next_block, leading to inconsistency between last_kactive_blk_num
  and kactive_blk_num. In hrtimer callback, we should update last_kactive_blk_num
  in this case.
- Remove 'refresh_timer:' label which is not needed while I change goto logic to
  if-else implementation.
- Link to v9: https://lore.kernel.org/netdev/20250828155127.3076551-1-jackzxcui1989@163.com/

Changes in v8:
- Delete delete_blk_timer field, as suggested by Willem de Bruijn,
  hrtimer_cancel will check and wait until the timer callback return and ensure
  enter enter callback again;
- Simplify the logic related to setting timeout, as suggestd by Willem de Bruijn.
  Currently timer callback just restarts itself unconditionally, so delete the
 'out:' label, do not forward hrtimer in prb_open_block, call hrtimer_forward_now
  directly and always return HRTIMER_RESTART. The only special case is when
  prb_open_block is called from tpacket_rcv. That would set the timeout further
  into the future than the already queued timer. An earlier timeout is not
  problematic. No need to add complexity to avoid that.
- Link to v8: https://lore.kernel.org/all/20250827150131.2193485-1-jackzxcui1989@163.com/

Changes in v7:
- Only update the hrtimer expire time within the hrtimer callback.
  When the callback return, without sk_buff_head lock protection, __run_hrtimer will
  enqueue the timer if return HRTIMER_RESTART. Setting the hrtimer expires while
  enqueuing a timer may cause chaos in the hrtimer red-black tree.
  The setting expire time is monotonic, so if we do not update the expire time to the
  retire_blk_timer when it is not in callback, it will not cause problem if we skip
  the timeout event and update it when find out that expire_ktime is bigger than the
  expire time of retire_blk_timer.
- Use hrtimer_set_expires instead of hrtimer_forward_now.
  The end time for retiring each block is not fixed because when network packets are
  received quickly, blocks are retired rapidly, and the new block retire time needs
  to be recalculated. However, hrtimer_forward_now increments the previous timeout
  by an interval, which is not correct.
- The expire time is monotonic, so if we do not update the expire time to the
  retire_blk_timer when it is not in callback, it will not cause problem if we skip
  the timeout event and update it when find out that expire_ktime is bigger than the
  expire time of retire_blk_timer.
- Adding the 'bool callback' parameter back is intended to more accurately determine
  whether we are inside the hrtimer callback when executing
  _prb_refresh_rx_retire_blk_timer. This ensures that we only update the hrtimer's
  timeout value within the hrtimer callback.
- Link to v7: https://lore.kernel.org/all/20250822132051.266787-1-jackzxcui1989@163.com/

Changes in v6:
- Use hrtimer_is_queued instead to check whether it is within the callback function.
  So do not need to add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer
  as suggested by Willem de Bruijn;
- Do not need local_irq_save and local_irq_restore to protect the race of the timer
  callback running in softirq context or the open_block from tpacket_rcv in process
  context
  as suggested by Willem de Bruijn;
- Link to v6: https://lore.kernel.org/all/20250820092925.2115372-1-jackzxcui1989@163.com/

Changes in v5:
- Remove the unnecessary comments at the top of the _prb_refresh_rx_retire_blk_timer,
  branch is self-explanatory enough
  as suggested by Willem de Bruijn;
- Indentation of _prb_refresh_rx_retire_blk_timer, align with first argument on
  previous line
  as suggested by Willem de Bruijn;
- Do not call hrtimer_start within the hrtimer callback
  as suggested by Willem de Bruijn
  So add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer to indicate
  whether it is within the callback function. Use hrtimer_forward_now instead of
  hrtimer_start when it is in the callback function and is doing prb_open_block.
- Link to v5: https://lore.kernel.org/all/20250819091447.1199980-1-jackzxcui1989@163.com/

Changes in v4:
- Add 'bool start' to distinguish whether the call to _prb_refresh_rx_retire_blk_timer
  is for prb_open_block. When it is for prb_open_block, execute hrtimer_start to
  (re)start the hrtimer; otherwise, use hrtimer_forward_now to set the expiration
  time as it is more commonly used compared to hrtimer_set_expires.
  as suggested by Willem de Bruijn;
- Delete the comments to explain why hrtimer_set_expires(not hrtimer_forward_now)
  is used, as we do not use hrtimer_set_expires any more;
- Link to v4: https://lore.kernel.org/all/20250818050233.155344-1-jackzxcui1989@163.com/

Changes in v3:
- return HRTIMER_NORESTART when pkc->delete_blk_timer is true
  as suggested by Willem de Bruijn;
- Drop the retire_blk_tov field of tpacket_kbdq_core, add interval_ktime instead
  as suggested by Willem de Bruijn;
- Add comments to explain why hrtimer_set_expires(not hrtimer_forward_now) is used in
  _prb_refresh_rx_retire_blk_timer
  as suggested by Willem de Bruijn;
- Link to v3: https://lore.kernel.org/all/20250816170130.3969354-1-jackzxcui1989@163.com/

Changes in v2:
- Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
  as suggested by Willem de Bruijn;
- Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/

Changes in v1:
- Do not add another config for the current changes
  as suggested by Eric Dumazet;
- Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
  as suggested by Eric Dumazet;
- Add some performance details to the changelog
  as suggested by Ferenc Fejes;
- Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
  as suggested by Willem de Bruijn;
- Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
  as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
- Just return HRTIMER_RESTART directly as all cases return the same value
  as suggested by Willem de Bruijn;
- Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
- Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/

Xin Zhao (2):
  net: af_packet: remove last_kactive_blk_num field
  net: af_packet: Use hrtimer to do the retire operation

 net/packet/af_packet.c | 107 +++++++++++------------------------------
 net/packet/diag.c      |   2 +-
 net/packet/internal.h  |  12 +----
 3 files changed, 32 insertions(+), 89 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field
  2025-08-31 10:08 [PATCH net-next v10 0/2] net: af_packet: optimize retire operation Xin Zhao
@ 2025-08-31 10:08 ` Xin Zhao
  2025-09-01  1:18   ` Willem de Bruijn
  2025-09-02 14:04   ` Jason Xing
  2025-08-31 10:08 ` [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
  1 sibling, 2 replies; 11+ messages in thread
From: Xin Zhao @ 2025-08-31 10:08 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao

kactive_blk_num (K) is incremented on block close. last_kactive_blk_num (L)
is set to match K on block open and each timer. So the only time that they
differ is if a block is closed in tpacket_rcv and no new block could be
opened.
So the origin check L==K in timer callback only skip the case 'no new block
to open'. If we remove L==K check, it will make prb_curr_blk_in_use check
earlier, which will not cause any side effect.

Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
---
 net/packet/af_packet.c | 60 ++++++++++++++++++++----------------------
 net/packet/internal.h  |  6 -----
 2 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a7017d7f0..d4eb4a4fe 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -669,7 +669,6 @@ static void init_prb_bdqc(struct packet_sock *po,
 	p1->knum_blocks	= req_u->req3.tp_block_nr;
 	p1->hdrlen = po->tp_hdrlen;
 	p1->version = po->tp_version;
-	p1->last_kactive_blk_num = 0;
 	po->stats.stats3.tp_freeze_q_cnt = 0;
 	if (req_u->req3.tp_retire_blk_tov)
 		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
@@ -693,7 +692,6 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
 {
 	mod_timer(&pkc->retire_blk_timer,
 			jiffies + pkc->tov_in_jiffies);
-	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
 }
 
 /*
@@ -750,38 +748,36 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
 		write_unlock(&pkc->blk_fill_in_prog_lock);
 	}
 
-	if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
-		if (!frozen) {
-			if (!BLOCK_NUM_PKTS(pbd)) {
-				/* An empty block. Just refresh the timer. */
-				goto refresh_timer;
-			}
-			prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
-			if (!prb_dispatch_next_block(pkc, po))
-				goto refresh_timer;
-			else
-				goto out;
+	if (!frozen) {
+		if (!BLOCK_NUM_PKTS(pbd)) {
+			/* An empty block. Just refresh the timer. */
+			goto refresh_timer;
+		}
+		prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
+		if (!prb_dispatch_next_block(pkc, po))
+			goto refresh_timer;
+		else
+			goto out;
+	} else {
+		/* Case 1. Queue was frozen because user-space was
+		 * lagging behind.
+		 */
+		if (prb_curr_blk_in_use(pbd)) {
+			/*
+			 * Ok, user-space is still behind.
+			 * So just refresh the timer.
+			 */
+			goto refresh_timer;
 		} else {
-			/* Case 1. Queue was frozen because user-space was
-			 *	   lagging behind.
+			/* Case 2. queue was frozen,user-space caught up,
+			 * now the link went idle && the timer fired.
+			 * We don't have a block to close.So we open this
+			 * block and restart the timer.
+			 * opening a block thaws the queue,restarts timer
+			 * Thawing/timer-refresh is a side effect.
 			 */
-			if (prb_curr_blk_in_use(pbd)) {
-				/*
-				 * Ok, user-space is still behind.
-				 * So just refresh the timer.
-				 */
-				goto refresh_timer;
-			} else {
-			       /* Case 2. queue was frozen,user-space caught up,
-				* now the link went idle && the timer fired.
-				* We don't have a block to close.So we open this
-				* block and restart the timer.
-				* opening a block thaws the queue,restarts timer
-				* Thawing/timer-refresh is a side effect.
-				*/
-				prb_open_block(pkc, pbd);
-				goto out;
-			}
+			prb_open_block(pkc, pbd);
+			goto out;
 		}
 	}
 
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 1e743d031..d367b9f93 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -24,12 +24,6 @@ struct tpacket_kbdq_core {
 	unsigned short	kactive_blk_num;
 	unsigned short	blk_sizeof_priv;
 
-	/* last_kactive_blk_num:
-	 * trick to see if user-space has caught up
-	 * in order to avoid refreshing timer when every single pkt arrives.
-	 */
-	unsigned short	last_kactive_blk_num;
-
 	char		*pkblk_start;
 	char		*pkblk_end;
 	int		kblk_size;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation
  2025-08-31 10:08 [PATCH net-next v10 0/2] net: af_packet: optimize retire operation Xin Zhao
  2025-08-31 10:08 ` [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field Xin Zhao
@ 2025-08-31 10:08 ` Xin Zhao
  2025-09-01  1:21   ` Willem de Bruijn
  2025-09-02 15:43   ` Jason Xing
  1 sibling, 2 replies; 11+ messages in thread
From: Xin Zhao @ 2025-08-31 10:08 UTC (permalink / raw)
  To: willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao

In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.

Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
---
Changes in v8:
- Simplify the logic related to setting timeout.

Changes in v7:
- Only update the hrtimer expire time within the hrtimer callback.

Changes in v1:
- Do not add another config for the current changes.

---
 net/packet/af_packet.c | 79 +++++++++---------------------------------
 net/packet/diag.c      |  2 +-
 net/packet/internal.h  |  6 ++--
 3 files changed, 20 insertions(+), 67 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d4eb4a4fe..3e3bb4216 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -203,8 +203,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
 static int prb_queue_frozen(struct tpacket_kbdq_core *);
 static void prb_open_block(struct tpacket_kbdq_core *,
 		struct tpacket_block_desc *);
-static void prb_retire_rx_blk_timer_expired(struct timer_list *);
-static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
 static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
 static void prb_clear_rxhash(struct tpacket_kbdq_core *,
 		struct tpacket3_hdr *);
@@ -579,33 +578,13 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
 	return proto;
 }
 
-static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
-{
-	timer_delete_sync(&pkc->retire_blk_timer);
-}
-
 static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
 		struct sk_buff_head *rb_queue)
 {
 	struct tpacket_kbdq_core *pkc;
 
 	pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
-
-	spin_lock_bh(&rb_queue->lock);
-	pkc->delete_blk_timer = 1;
-	spin_unlock_bh(&rb_queue->lock);
-
-	prb_del_retire_blk_timer(pkc);
-}
-
-static void prb_setup_retire_blk_timer(struct packet_sock *po)
-{
-	struct tpacket_kbdq_core *pkc;
-
-	pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
-	timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
-		    0);
-	pkc->retire_blk_timer.expires = jiffies;
+	hrtimer_cancel(&pkc->retire_blk_timer);
 }
 
 static int prb_calc_retire_blk_tmo(struct packet_sock *po,
@@ -671,29 +650,22 @@ static void init_prb_bdqc(struct packet_sock *po,
 	p1->version = po->tp_version;
 	po->stats.stats3.tp_freeze_q_cnt = 0;
 	if (req_u->req3.tp_retire_blk_tov)
-		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
+		p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
 	else
-		p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
-						req_u->req3.tp_block_size);
-	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
+		p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
+						req_u->req3.tp_block_size));
 	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
 	rwlock_init(&p1->blk_fill_in_prog_lock);
 
 	p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
 	prb_init_ft_ops(p1, req_u);
-	prb_setup_retire_blk_timer(po);
+	hrtimer_setup(&p1->retire_blk_timer, prb_retire_rx_blk_timer_expired,
+		      CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
+	hrtimer_start(&p1->retire_blk_timer, p1->interval_ktime,
+		      HRTIMER_MODE_REL_SOFT);
 	prb_open_block(p1, pbd);
 }
 
-/*  Do NOT update the last_blk_num first.
- *  Assumes sk_buff_head lock is held.
- */
-static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
-{
-	mod_timer(&pkc->retire_blk_timer,
-			jiffies + pkc->tov_in_jiffies);
-}
-
 /*
  * Timer logic:
  * 1) We refresh the timer only when we open a block.
@@ -717,7 +689,7 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
  * prb_calc_retire_blk_tmo() calculates the tmo.
  *
  */
-static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
 {
 	struct packet_sock *po =
 		timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
@@ -730,9 +702,6 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
 	frozen = prb_queue_frozen(pkc);
 	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
 
-	if (unlikely(pkc->delete_blk_timer))
-		goto out;
-
 	/* We only need to plug the race when the block is partially filled.
 	 * tpacket_rcv:
 	 *		lock(); increment BLOCK_NUM_PKTS; unlock()
@@ -749,26 +718,16 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
 	}
 
 	if (!frozen) {
-		if (!BLOCK_NUM_PKTS(pbd)) {
-			/* An empty block. Just refresh the timer. */
-			goto refresh_timer;
+		if (BLOCK_NUM_PKTS(pbd)) {
+			/* Not an empty block. Need retire the block. */
+			prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
+			prb_dispatch_next_block(pkc, po);
 		}
-		prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
-		if (!prb_dispatch_next_block(pkc, po))
-			goto refresh_timer;
-		else
-			goto out;
 	} else {
 		/* Case 1. Queue was frozen because user-space was
 		 * lagging behind.
 		 */
-		if (prb_curr_blk_in_use(pbd)) {
-			/*
-			 * Ok, user-space is still behind.
-			 * So just refresh the timer.
-			 */
-			goto refresh_timer;
-		} else {
+		if (!prb_curr_blk_in_use(pbd)) {
 			/* Case 2. queue was frozen,user-space caught up,
 			 * now the link went idle && the timer fired.
 			 * We don't have a block to close.So we open this
@@ -777,15 +736,12 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
 			 * Thawing/timer-refresh is a side effect.
 			 */
 			prb_open_block(pkc, pbd);
-			goto out;
 		}
 	}
 
-refresh_timer:
-	_prb_refresh_rx_retire_blk_timer(pkc);
-
-out:
+	hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
 	spin_unlock(&po->sk.sk_receive_queue.lock);
+	return HRTIMER_RESTART;
 }
 
 static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
@@ -917,7 +873,6 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
 	pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
 
 	prb_thaw_queue(pkc1);
-	_prb_refresh_rx_retire_blk_timer(pkc1);
 
 	smp_wmb();
 }
diff --git a/net/packet/diag.c b/net/packet/diag.c
index 6ce1dcc28..c8f43e0c1 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
 	pdr.pdr_frame_nr = ring->frame_max + 1;
 
 	if (ver > TPACKET_V2) {
-		pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
+		pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
 		pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
 		pdr.pdr_features = ring->prb_bdqc.feature_req_word;
 	} else {
diff --git a/net/packet/internal.h b/net/packet/internal.h
index d367b9f93..f8cfd9213 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -20,7 +20,6 @@ struct tpacket_kbdq_core {
 	unsigned int	feature_req_word;
 	unsigned int	hdrlen;
 	unsigned char	reset_pending_on_curr_blk;
-	unsigned char   delete_blk_timer;
 	unsigned short	kactive_blk_num;
 	unsigned short	blk_sizeof_priv;
 
@@ -39,12 +38,11 @@ struct tpacket_kbdq_core {
 	/* Default is set to 8ms */
 #define DEFAULT_PRB_RETIRE_TOV	(8)
 
-	unsigned short  retire_blk_tov;
+	ktime_t		interval_ktime;
 	unsigned short  version;
-	unsigned long	tov_in_jiffies;
 
 	/* timer to retire an outstanding block */
-	struct timer_list retire_blk_timer;
+	struct hrtimer  retire_blk_timer;
 };
 
 struct pgv {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field
  2025-08-31 10:08 ` [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field Xin Zhao
@ 2025-09-01  1:18   ` Willem de Bruijn
  2025-09-02 14:04   ` Jason Xing
  1 sibling, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-09-01  1:18 UTC (permalink / raw)
  To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao

Xin Zhao wrote:
> kactive_blk_num (K) is incremented on block close. last_kactive_blk_num (L)
> is set to match K on block open and each timer. So the only time that they
> differ is if a block is closed in tpacket_rcv and no new block could be
> opened.
> So the origin check L==K in timer callback only skip the case 'no new block
> to open'. If we remove L==K check, it will make prb_curr_blk_in_use check
> earlier, which will not cause any side effect.
> 
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation
  2025-08-31 10:08 ` [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
@ 2025-09-01  1:21   ` Willem de Bruijn
  2025-09-02 15:43   ` Jason Xing
  1 sibling, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-09-01  1:21 UTC (permalink / raw)
  To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao

Xin Zhao wrote:
> In a system with high real-time requirements, the timeout mechanism of
> ordinary timers with jiffies granularity is insufficient to meet the
> demands for real-time performance. Meanwhile, the optimization of CPU
> usage with af_packet is quite significant. Use hrtimer instead of timer
> to help compensate for the shortcomings in real-time performance.
> In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> This is unacceptable in some high real-time systems that require timely
> processing of network packets. By replacing it with hrtimer, if a timeout
> of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> 3 ms.
> 
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>

Tiny style point that is probably not worth respinning for.

Otherwise

Reviewed-by: Willem de Bruijn <willemb@google.com>



> ---
> Changes in v8:
> - Simplify the logic related to setting timeout.
> 
> Changes in v7:
> - Only update the hrtimer expire time within the hrtimer callback.
> 
> Changes in v1:
> - Do not add another config for the current changes.
> 
> ---
>  net/packet/af_packet.c | 79 +++++++++---------------------------------
>  net/packet/diag.c      |  2 +-
>  net/packet/internal.h  |  6 ++--
>  3 files changed, 20 insertions(+), 67 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d4eb4a4fe..3e3bb4216 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -203,8 +203,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
>  static int prb_queue_frozen(struct tpacket_kbdq_core *);
>  static void prb_open_block(struct tpacket_kbdq_core *,
>  		struct tpacket_block_desc *);
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *);
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
>  static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
>  static void prb_clear_rxhash(struct tpacket_kbdq_core *,
>  		struct tpacket3_hdr *);
> @@ -579,33 +578,13 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
>  	return proto;
>  }
>  
> -static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> -{
> -	timer_delete_sync(&pkc->retire_blk_timer);
> -}
> -
>  static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
>  		struct sk_buff_head *rb_queue)
>  {
>  	struct tpacket_kbdq_core *pkc;
>  
>  	pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> -
> -	spin_lock_bh(&rb_queue->lock);
> -	pkc->delete_blk_timer = 1;
> -	spin_unlock_bh(&rb_queue->lock);
> -
> -	prb_del_retire_blk_timer(pkc);
> -}
> -
> -static void prb_setup_retire_blk_timer(struct packet_sock *po)
> -{
> -	struct tpacket_kbdq_core *pkc;
> -
> -	pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> -	timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> -		    0);
> -	pkc->retire_blk_timer.expires = jiffies;
> +	hrtimer_cancel(&pkc->retire_blk_timer);
>  }
>  
>  static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> @@ -671,29 +650,22 @@ static void init_prb_bdqc(struct packet_sock *po,
>  	p1->version = po->tp_version;
>  	po->stats.stats3.tp_freeze_q_cnt = 0;
>  	if (req_u->req3.tp_retire_blk_tov)
> -		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> +		p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
>  	else
> -		p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> -						req_u->req3.tp_block_size);
> -	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> +		p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> +						req_u->req3.tp_block_size));

req_u is not aligned with the line above.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field
  2025-08-31 10:08 ` [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field Xin Zhao
  2025-09-01  1:18   ` Willem de Bruijn
@ 2025-09-02 14:04   ` Jason Xing
  1 sibling, 0 replies; 11+ messages in thread
From: Jason Xing @ 2025-09-02 14:04 UTC (permalink / raw)
  To: Xin Zhao
  Cc: willemdebruijn.kernel, edumazet, ferenc, davem, kuba, pabeni,
	horms, netdev, linux-kernel

On Sun, Aug 31, 2025 at 6:10 PM Xin Zhao <jackzxcui1989@163.com> wrote:
>
> kactive_blk_num (K) is incremented on block close. last_kactive_blk_num (L)
> is set to match K on block open and each timer. So the only time that they
> differ is if a block is closed in tpacket_rcv and no new block could be
> opened.
> So the origin check L==K in timer callback only skip the case 'no new block
> to open'. If we remove L==K check, it will make prb_curr_blk_in_use check
> earlier, which will not cause any side effect.

I believe the above commit message needs to be revised:
1) the above sentence (starting from 'if we remove L....') means
nothing because your modification doesn't change the behaviour when
the queue is not frozen.
2) lack of proofs/reasons on why exposing the prb_open_block() logic doesn't
cause side effects. It's the key proof that shows to future readers to
make sure this patch will not bring trouble.

>
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>

It was suggested by Willem, so please add:
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>

So far, it looks good to me as well:
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

And I will finish reviewing the other patch by tomorrow :)

Thanks,
Jason



> ---
>  net/packet/af_packet.c | 60 ++++++++++++++++++++----------------------
>  net/packet/internal.h  |  6 -----
>  2 files changed, 28 insertions(+), 38 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index a7017d7f0..d4eb4a4fe 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -669,7 +669,6 @@ static void init_prb_bdqc(struct packet_sock *po,
>         p1->knum_blocks = req_u->req3.tp_block_nr;
>         p1->hdrlen = po->tp_hdrlen;
>         p1->version = po->tp_version;
> -       p1->last_kactive_blk_num = 0;
>         po->stats.stats3.tp_freeze_q_cnt = 0;
>         if (req_u->req3.tp_retire_blk_tov)
>                 p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> @@ -693,7 +692,6 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
>  {
>         mod_timer(&pkc->retire_blk_timer,
>                         jiffies + pkc->tov_in_jiffies);
> -       pkc->last_kactive_blk_num = pkc->kactive_blk_num;
>  }
>
>  /*
> @@ -750,38 +748,36 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
>                 write_unlock(&pkc->blk_fill_in_prog_lock);
>         }
>
> -       if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
> -               if (!frozen) {
> -                       if (!BLOCK_NUM_PKTS(pbd)) {
> -                               /* An empty block. Just refresh the timer. */
> -                               goto refresh_timer;
> -                       }
> -                       prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> -                       if (!prb_dispatch_next_block(pkc, po))
> -                               goto refresh_timer;
> -                       else
> -                               goto out;
> +       if (!frozen) {
> +               if (!BLOCK_NUM_PKTS(pbd)) {
> +                       /* An empty block. Just refresh the timer. */
> +                       goto refresh_timer;
> +               }
> +               prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> +               if (!prb_dispatch_next_block(pkc, po))
> +                       goto refresh_timer;
> +               else
> +                       goto out;
> +       } else {
> +               /* Case 1. Queue was frozen because user-space was
> +                * lagging behind.
> +                */
> +               if (prb_curr_blk_in_use(pbd)) {
> +                       /*
> +                        * Ok, user-space is still behind.
> +                        * So just refresh the timer.
> +                        */
> +                       goto refresh_timer;
>                 } else {
> -                       /* Case 1. Queue was frozen because user-space was
> -                        *         lagging behind.
> +                       /* Case 2. queue was frozen,user-space caught up,
> +                        * now the link went idle && the timer fired.
> +                        * We don't have a block to close.So we open this
> +                        * block and restart the timer.
> +                        * opening a block thaws the queue,restarts timer
> +                        * Thawing/timer-refresh is a side effect.
>                          */
> -                       if (prb_curr_blk_in_use(pbd)) {
> -                               /*
> -                                * Ok, user-space is still behind.
> -                                * So just refresh the timer.
> -                                */
> -                               goto refresh_timer;
> -                       } else {
> -                              /* Case 2. queue was frozen,user-space caught up,
> -                               * now the link went idle && the timer fired.
> -                               * We don't have a block to close.So we open this
> -                               * block and restart the timer.
> -                               * opening a block thaws the queue,restarts timer
> -                               * Thawing/timer-refresh is a side effect.
> -                               */
> -                               prb_open_block(pkc, pbd);
> -                               goto out;
> -                       }
> +                       prb_open_block(pkc, pbd);
> +                       goto out;
>                 }
>         }
>
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index 1e743d031..d367b9f93 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -24,12 +24,6 @@ struct tpacket_kbdq_core {
>         unsigned short  kactive_blk_num;
>         unsigned short  blk_sizeof_priv;
>
> -       /* last_kactive_blk_num:
> -        * trick to see if user-space has caught up
> -        * in order to avoid refreshing timer when every single pkt arrives.
> -        */
> -       unsigned short  last_kactive_blk_num;
> -
>         char            *pkblk_start;
>         char            *pkblk_end;
>         int             kblk_size;
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation
  2025-08-31 10:08 ` [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
  2025-09-01  1:21   ` Willem de Bruijn
@ 2025-09-02 15:43   ` Jason Xing
  2025-09-02 16:42     ` Jason Xing
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Xing @ 2025-09-02 15:43 UTC (permalink / raw)
  To: Xin Zhao
  Cc: willemdebruijn.kernel, edumazet, ferenc, davem, kuba, pabeni,
	horms, netdev, linux-kernel

On Sun, Aug 31, 2025 at 6:09 PM Xin Zhao <jackzxcui1989@163.com> wrote:
>
> In a system with high real-time requirements, the timeout mechanism of
> ordinary timers with jiffies granularity is insufficient to meet the
> demands for real-time performance. Meanwhile, the optimization of CPU
> usage with af_packet is quite significant. Use hrtimer instead of timer
> to help compensate for the shortcomings in real-time performance.
> In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> This is unacceptable in some high real-time systems that require timely
> processing of network packets. By replacing it with hrtimer, if a timeout
> of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> 3 ms.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> ---
> Changes in v8:
> - Simplify the logic related to setting timeout.
>
> Changes in v7:
> - Only update the hrtimer expire time within the hrtimer callback.
>
> Changes in v1:
> - Do not add another config for the current changes.
>
> ---
>  net/packet/af_packet.c | 79 +++++++++---------------------------------
>  net/packet/diag.c      |  2 +-
>  net/packet/internal.h  |  6 ++--
>  3 files changed, 20 insertions(+), 67 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d4eb4a4fe..3e3bb4216 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -203,8 +203,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
>  static int prb_queue_frozen(struct tpacket_kbdq_core *);
>  static void prb_open_block(struct tpacket_kbdq_core *,
>                 struct tpacket_block_desc *);
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *);
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
>  static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
>  static void prb_clear_rxhash(struct tpacket_kbdq_core *,
>                 struct tpacket3_hdr *);
> @@ -579,33 +578,13 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
>         return proto;
>  }
>
> -static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> -{
> -       timer_delete_sync(&pkc->retire_blk_timer);
> -}
> -
>  static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
>                 struct sk_buff_head *rb_queue)
>  {
>         struct tpacket_kbdq_core *pkc;
>
>         pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> -
> -       spin_lock_bh(&rb_queue->lock);
> -       pkc->delete_blk_timer = 1;
> -       spin_unlock_bh(&rb_queue->lock);
> -
> -       prb_del_retire_blk_timer(pkc);
> -}
> -
> -static void prb_setup_retire_blk_timer(struct packet_sock *po)
> -{
> -       struct tpacket_kbdq_core *pkc;
> -
> -       pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> -       timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> -                   0);
> -       pkc->retire_blk_timer.expires = jiffies;
> +       hrtimer_cancel(&pkc->retire_blk_timer);
>  }
>
>  static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> @@ -671,29 +650,22 @@ static void init_prb_bdqc(struct packet_sock *po,
>         p1->version = po->tp_version;
>         po->stats.stats3.tp_freeze_q_cnt = 0;
>         if (req_u->req3.tp_retire_blk_tov)
> -               p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> +               p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
>         else
> -               p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> -                                               req_u->req3.tp_block_size);
> -       p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> +               p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> +                                               req_u->req3.tp_block_size));
>         p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
>         rwlock_init(&p1->blk_fill_in_prog_lock);
>
>         p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
>         prb_init_ft_ops(p1, req_u);
> -       prb_setup_retire_blk_timer(po);
> +       hrtimer_setup(&p1->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> +                     CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> +       hrtimer_start(&p1->retire_blk_timer, p1->interval_ktime,
> +                     HRTIMER_MODE_REL_SOFT);

You expect to see it start at the setsockopt phase? Even if it's far
from the real use of recv at the moment.

>         prb_open_block(p1, pbd);
>  }
>
> -/*  Do NOT update the last_blk_num first.
> - *  Assumes sk_buff_head lock is held.
> - */
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> -{
> -       mod_timer(&pkc->retire_blk_timer,
> -                       jiffies + pkc->tov_in_jiffies);
> -}
> -
>  /*
>   * Timer logic:
>   * 1) We refresh the timer only when we open a block.
> @@ -717,7 +689,7 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
>   * prb_calc_retire_blk_tmo() calculates the tmo.
>   *
>   */
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
>  {
>         struct packet_sock *po =
>                 timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
> @@ -730,9 +702,6 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
>         frozen = prb_queue_frozen(pkc);
>         pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
>
> -       if (unlikely(pkc->delete_blk_timer))
> -               goto out;
> -
>         /* We only need to plug the race when the block is partially filled.
>          * tpacket_rcv:
>          *              lock(); increment BLOCK_NUM_PKTS; unlock()
> @@ -749,26 +718,16 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
>         }
>
>         if (!frozen) {
> -               if (!BLOCK_NUM_PKTS(pbd)) {
> -                       /* An empty block. Just refresh the timer. */
> -                       goto refresh_timer;
> +               if (BLOCK_NUM_PKTS(pbd)) {
> +                       /* Not an empty block. Need retire the block. */
> +                       prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> +                       prb_dispatch_next_block(pkc, po);
>                 }
> -               prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> -               if (!prb_dispatch_next_block(pkc, po))
> -                       goto refresh_timer;
> -               else
> -                       goto out;
>         } else {
>                 /* Case 1. Queue was frozen because user-space was
>                  * lagging behind.
>                  */
> -               if (prb_curr_blk_in_use(pbd)) {
> -                       /*
> -                        * Ok, user-space is still behind.
> -                        * So just refresh the timer.
> -                        */
> -                       goto refresh_timer;
> -               } else {
> +               if (!prb_curr_blk_in_use(pbd)) {
>                         /* Case 2. queue was frozen,user-space caught up,
>                          * now the link went idle && the timer fired.
>                          * We don't have a block to close.So we open this
> @@ -777,15 +736,12 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
>                          * Thawing/timer-refresh is a side effect.
>                          */
>                         prb_open_block(pkc, pbd);
> -                       goto out;
>                 }
>         }
>
> -refresh_timer:
> -       _prb_refresh_rx_retire_blk_timer(pkc);
> -
> -out:
> +       hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
>         spin_unlock(&po->sk.sk_receive_queue.lock);
> +       return HRTIMER_RESTART;
>  }
>
>  static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
> @@ -917,7 +873,6 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
>         pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
>
>         prb_thaw_queue(pkc1);
> -       _prb_refresh_rx_retire_blk_timer(pkc1);

Could you say more on why you remove this here and only reset/update
the expiry time in the timer handler? Probably I missed something
appearing in the previous long discussion.

>
>         smp_wmb();
>  }
> diff --git a/net/packet/diag.c b/net/packet/diag.c
> index 6ce1dcc28..c8f43e0c1 100644
> --- a/net/packet/diag.c
> +++ b/net/packet/diag.c
> @@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
>         pdr.pdr_frame_nr = ring->frame_max + 1;
>
>         if (ver > TPACKET_V2) {
> -               pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
> +               pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
>                 pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
>                 pdr.pdr_features = ring->prb_bdqc.feature_req_word;
>         } else {
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index d367b9f93..f8cfd9213 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -20,7 +20,6 @@ struct tpacket_kbdq_core {
>         unsigned int    feature_req_word;
>         unsigned int    hdrlen;
>         unsigned char   reset_pending_on_curr_blk;
> -       unsigned char   delete_blk_timer;
>         unsigned short  kactive_blk_num;
>         unsigned short  blk_sizeof_priv;
>
> @@ -39,12 +38,11 @@ struct tpacket_kbdq_core {
>         /* Default is set to 8ms */
>  #define DEFAULT_PRB_RETIRE_TOV (8)
>
> -       unsigned short  retire_blk_tov;
> +       ktime_t         interval_ktime;
>         unsigned short  version;
> -       unsigned long   tov_in_jiffies;
>
>         /* timer to retire an outstanding block */
> -       struct timer_list retire_blk_timer;
> +       struct hrtimer  retire_blk_timer;
>  };

The whole structure needs a new organization?

Before:
        /* size: 152, cachelines: 3, members: 22 */
        /* sum members: 144, holes: 2, sum holes: 8 */
        /* paddings: 1, sum paddings: 4 */
        /* last cacheline: 24 bytes */
After:
        /* size: 176, cachelines: 3, members: 19 */
        /* sum members: 163, holes: 4, sum holes: 13 */
        /* paddings: 1, sum paddings: 4 */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 6 */
        /* last cacheline: 48 bytes */

Thanks,
Jason

>
>  struct pgv {
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation
  2025-09-02 15:43   ` Jason Xing
@ 2025-09-02 16:42     ` Jason Xing
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Xing @ 2025-09-02 16:42 UTC (permalink / raw)
  To: Xin Zhao
  Cc: willemdebruijn.kernel, edumazet, ferenc, davem, kuba, pabeni,
	horms, netdev, linux-kernel

On Tue, Sep 2, 2025 at 11:43 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sun, Aug 31, 2025 at 6:09 PM Xin Zhao <jackzxcui1989@163.com> wrote:
> >
> > In a system with high real-time requirements, the timeout mechanism of
> > ordinary timers with jiffies granularity is insufficient to meet the
> > demands for real-time performance. Meanwhile, the optimization of CPU
> > usage with af_packet is quite significant. Use hrtimer instead of timer
> > to help compensate for the shortcomings in real-time performance.
> > In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> > enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> > This is unacceptable in some high real-time systems that require timely
> > processing of network packets. By replacing it with hrtimer, if a timeout
> > of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> > 3 ms.
> >
> > Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> > ---
> > Changes in v8:
> > - Simplify the logic related to setting timeout.
> >
> > Changes in v7:
> > - Only update the hrtimer expire time within the hrtimer callback.
> >
> > Changes in v1:
> > - Do not add another config for the current changes.
> >
> > ---
> >  net/packet/af_packet.c | 79 +++++++++---------------------------------
> >  net/packet/diag.c      |  2 +-
> >  net/packet/internal.h  |  6 ++--
> >  3 files changed, 20 insertions(+), 67 deletions(-)
> >
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index d4eb4a4fe..3e3bb4216 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -203,8 +203,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
> >  static int prb_queue_frozen(struct tpacket_kbdq_core *);
> >  static void prb_open_block(struct tpacket_kbdq_core *,
> >                 struct tpacket_block_desc *);
> > -static void prb_retire_rx_blk_timer_expired(struct timer_list *);
> > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
> > +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
> >  static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
> >  static void prb_clear_rxhash(struct tpacket_kbdq_core *,
> >                 struct tpacket3_hdr *);
> > @@ -579,33 +578,13 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
> >         return proto;
> >  }
> >
> > -static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > -{
> > -       timer_delete_sync(&pkc->retire_blk_timer);
> > -}
> > -
> >  static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
> >                 struct sk_buff_head *rb_queue)
> >  {
> >         struct tpacket_kbdq_core *pkc;
> >
> >         pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> > -
> > -       spin_lock_bh(&rb_queue->lock);
> > -       pkc->delete_blk_timer = 1;

One more review from my side is that as to the removal of
delete_blk_timer, I'm afraid it deserves a clarification in the commit
message.

> > -       spin_unlock_bh(&rb_queue->lock);
> > -
> > -       prb_del_retire_blk_timer(pkc);
> > -}
> > -
> > -static void prb_setup_retire_blk_timer(struct packet_sock *po)
> > -{
> > -       struct tpacket_kbdq_core *pkc;
> > -
> > -       pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> > -       timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > -                   0);
> > -       pkc->retire_blk_timer.expires = jiffies;
> > +       hrtimer_cancel(&pkc->retire_blk_timer);
> >  }
> >
> >  static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> > @@ -671,29 +650,22 @@ static void init_prb_bdqc(struct packet_sock *po,
> >         p1->version = po->tp_version;
> >         po->stats.stats3.tp_freeze_q_cnt = 0;
> >         if (req_u->req3.tp_retire_blk_tov)
> > -               p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> > +               p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
> >         else
> > -               p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> > -                                               req_u->req3.tp_block_size);
> > -       p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> > +               p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> > +                                               req_u->req3.tp_block_size));
> >         p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
> >         rwlock_init(&p1->blk_fill_in_prog_lock);
> >
> >         p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
> >         prb_init_ft_ops(p1, req_u);
> > -       prb_setup_retire_blk_timer(po);
> > +       hrtimer_setup(&p1->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > +                     CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> > +       hrtimer_start(&p1->retire_blk_timer, p1->interval_ktime,
> > +                     HRTIMER_MODE_REL_SOFT);
>
> You expect to see it start at the setsockopt phase? Even if it's far
> from the real use of recv at the moment.
>
> >         prb_open_block(p1, pbd);
> >  }
> >
> > -/*  Do NOT update the last_blk_num first.
> > - *  Assumes sk_buff_head lock is held.
> > - */
> > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > -{
> > -       mod_timer(&pkc->retire_blk_timer,
> > -                       jiffies + pkc->tov_in_jiffies);
> > -}
> > -
> >  /*
> >   * Timer logic:
> >   * 1) We refresh the timer only when we open a block.
> > @@ -717,7 +689,7 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> >   * prb_calc_retire_blk_tmo() calculates the tmo.
> >   *
> >   */
> > -static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> > +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
> >  {
> >         struct packet_sock *po =
> >                 timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
> > @@ -730,9 +702,6 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> >         frozen = prb_queue_frozen(pkc);
> >         pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
> >
> > -       if (unlikely(pkc->delete_blk_timer))
> > -               goto out;
> > -
> >         /* We only need to plug the race when the block is partially filled.
> >          * tpacket_rcv:
> >          *              lock(); increment BLOCK_NUM_PKTS; unlock()
> > @@ -749,26 +718,16 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> >         }
> >
> >         if (!frozen) {
> > -               if (!BLOCK_NUM_PKTS(pbd)) {
> > -                       /* An empty block. Just refresh the timer. */
> > -                       goto refresh_timer;
> > +               if (BLOCK_NUM_PKTS(pbd)) {
> > +                       /* Not an empty block. Need retire the block. */
> > +                       prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> > +                       prb_dispatch_next_block(pkc, po);
> >                 }
> > -               prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> > -               if (!prb_dispatch_next_block(pkc, po))
> > -                       goto refresh_timer;
> > -               else
> > -                       goto out;
> >         } else {
> >                 /* Case 1. Queue was frozen because user-space was
> >                  * lagging behind.
> >                  */
> > -               if (prb_curr_blk_in_use(pbd)) {
> > -                       /*
> > -                        * Ok, user-space is still behind.
> > -                        * So just refresh the timer.
> > -                        */
> > -                       goto refresh_timer;
> > -               } else {
> > +               if (!prb_curr_blk_in_use(pbd)) {
> >                         /* Case 2. queue was frozen,user-space caught up,
> >                          * now the link went idle && the timer fired.
> >                          * We don't have a block to close.So we open this
> > @@ -777,15 +736,12 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> >                          * Thawing/timer-refresh is a side effect.
> >                          */
> >                         prb_open_block(pkc, pbd);
> > -                       goto out;
> >                 }
> >         }
> >
> > -refresh_timer:
> > -       _prb_refresh_rx_retire_blk_timer(pkc);
> > -
> > -out:
> > +       hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> >         spin_unlock(&po->sk.sk_receive_queue.lock);
> > +       return HRTIMER_RESTART;
> >  }
> >
> >  static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
> > @@ -917,7 +873,6 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
> >         pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
> >
> >         prb_thaw_queue(pkc1);
> > -       _prb_refresh_rx_retire_blk_timer(pkc1);
>
> Could you say more on why you remove this here and only reset/update
> the expiry time in the timer handler? Probably I missed something
> appearing in the previous long discussion.

I gradually understand your thought behind this modification. You're
trying to move the timer operation out of prb_open_block() and then
spread the timer operation into each caller.

You probably miss the following call trace:
packet_current_rx_frame() -> __packet_lookup_frame_in_block() ->
prb_open_block() -> _prb_refresh_rx_retire_blk_timer()
?

May I ask why bother introducing so many changes like this instead of
leaving it as-is?

Thanks,
Jason

>
> >
> >         smp_wmb();
> >  }
> > diff --git a/net/packet/diag.c b/net/packet/diag.c
> > index 6ce1dcc28..c8f43e0c1 100644
> > --- a/net/packet/diag.c
> > +++ b/net/packet/diag.c
> > @@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
> >         pdr.pdr_frame_nr = ring->frame_max + 1;
> >
> >         if (ver > TPACKET_V2) {
> > -               pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
> > +               pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
> >                 pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
> >                 pdr.pdr_features = ring->prb_bdqc.feature_req_word;
> >         } else {
> > diff --git a/net/packet/internal.h b/net/packet/internal.h
> > index d367b9f93..f8cfd9213 100644
> > --- a/net/packet/internal.h
> > +++ b/net/packet/internal.h
> > @@ -20,7 +20,6 @@ struct tpacket_kbdq_core {
> >         unsigned int    feature_req_word;
> >         unsigned int    hdrlen;
> >         unsigned char   reset_pending_on_curr_blk;
> > -       unsigned char   delete_blk_timer;
> >         unsigned short  kactive_blk_num;
> >         unsigned short  blk_sizeof_priv;
> >
> > @@ -39,12 +38,11 @@ struct tpacket_kbdq_core {
> >         /* Default is set to 8ms */
> >  #define DEFAULT_PRB_RETIRE_TOV (8)
> >
> > -       unsigned short  retire_blk_tov;
> > +       ktime_t         interval_ktime;
> >         unsigned short  version;
> > -       unsigned long   tov_in_jiffies;
> >
> >         /* timer to retire an outstanding block */
> > -       struct timer_list retire_blk_timer;
> > +       struct hrtimer  retire_blk_timer;
> >  };
>
> The whole structure needs a new organization?
>
> Before:
>         /* size: 152, cachelines: 3, members: 22 */
>         /* sum members: 144, holes: 2, sum holes: 8 */
>         /* paddings: 1, sum paddings: 4 */
>         /* last cacheline: 24 bytes */
> After:
>         /* size: 176, cachelines: 3, members: 19 */
>         /* sum members: 163, holes: 4, sum holes: 13 */
>         /* paddings: 1, sum paddings: 4 */
>         /* forced alignments: 1, forced holes: 1, sum forced holes: 6 */
>         /* last cacheline: 48 bytes */
>
> Thanks,
> Jason
>
> >
> >  struct pgv {
> > --
> > 2.34.1
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field
@ 2025-09-03 14:56 Xin Zhao
  2025-09-04  2:09 ` Jason Xing
  0 siblings, 1 reply; 11+ messages in thread
From: Xin Zhao @ 2025-09-03 14:56 UTC (permalink / raw)
  To: kerneljasonxing, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

On Tue, Sep 2, 2025 at 22:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote:

> > kactive_blk_num (K) is incremented on block close. last_kactive_blk_num (L)
> > is set to match K on block open and each timer. So the only time that they
> > differ is if a block is closed in tpacket_rcv and no new block could be
> > opened.
> > So the origin check L==K in timer callback only skip the case 'no new block
> > to open'. If we remove L==K check, it will make prb_curr_blk_in_use check
> > earlier, which will not cause any side effect.
> 
> I believe the above commit message needs to be revised:
> 1) the above sentence (starting from 'if we remove L....') means
> nothing because your modification doesn't change the behaviour when
> the queue is not frozen.
> 2) lack of proofs/reasons on why exposing the prb_open_block() logic doesn't
> cause side effects. It's the key proof that shows to future readers to
> make sure this patch will not bring trouble.

This diff file may not clearly demonstrate the changes made to the
prb_retire_rx_blk_timer_expired function. We have simply removed the check for
pkc->last_kactive_blk_num == pkc->kactive_blk_num; the other logic remains unchanged.
Therefore, we should only need to explain why removing the check for
pkc->last_kactive_blk_num == pkc->kactive_blk_num will not cause any negative impacts.

I will clarify in the commit that our change to prb_retire_rx_blk_timer_expired only
involves removing the check for pkc->last_kactive_blk_num == pkc->kactive_blk_num,
to ensure that everyone understands this point, as it may not be clearly visible from
the diff file.

The commit may be changed as follow:

kactive_blk_num (K) is only incremented on block close.
In timer callback prb_retire_rx_blk_timer_expired, except delete_blk_timer
is true, last_kactive_blk_num (L) is set to match kactive_blk_num (K) in
all cases. L is also set to match K in prb_open_block.
The only case K not equal to L is when scheduled by tpacket_rcv
and K is just incremented on block close but no new block could be opened,
so that it does not call prb_open_block in prb_dispatch_next_block.
This patch modifies the prb_retire_rx_blk_timer_expired function by simply 
removing the check for L == K. Why can we remove the check for L == K in
timer callback?
In prb_freeze_queue, reset_pending_on_curr_blk (R) is set to 1. If R == 1,
prb_queue_frozen return 1. In prb_retire_rx_blk_timer_expired,
frozen = prb_queue_frozen(pkc); so frozen is 1 when R == 1.

Consider the following case:
(before applying this patch)
cpu0                                  cpu1
tpacket_rcv
  ...
    prb_dispatch_next_block
      prb_freeze_queue (R = 1)
                                      prb_retire_rx_blk_timer_expired
                                        L != K
                                          _prb_refresh_rx_retire_blk_timer
                                            refresh timer
                                            set L = K
(after applying this patch)
cpu0                                  cpu1
tpacket_rcv
  ...
    prb_dispatch_next_block
      prb_freeze_queue (R = 1)
                                      prb_retire_rx_blk_timer_expired
                                        !forzen is 0
                                          check prb_curr_blk_in_use
                                            if true
                                              same as (before apply)
                                            if false
                                              prb_open_block
Before applying this patch, prb_retire_rx_blk_timer_expired will do nothing
but refresh timer and set L = K in the case above. After applying this
patch, it will check prb_curr_blk_in_use and call prb_open_block if
user-space caught up.


Please check if the above description is appropriate. I will change the
description as above in PATCH v11.


> 
> >
> > Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> 
> It was suggested by Willem, so please add:
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>

Okay, I will add it in the PATCH v11.

> 
> So far, it looks good to me as well:
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> 
> And I will finish reviewing the other patch by tomorrow :)


Thanks
Xin Zhao


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field
  2025-09-03 14:56 [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field Xin Zhao
@ 2025-09-04  2:09 ` Jason Xing
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Xing @ 2025-09-04  2:09 UTC (permalink / raw)
  To: Xin Zhao
  Cc: willemdebruijn.kernel, edumazet, ferenc, davem, kuba, pabeni,
	horms, netdev, linux-kernel

On Wed, Sep 3, 2025 at 10:58 PM Xin Zhao <jackzxcui1989@163.com> wrote:
>
> On Tue, Sep 2, 2025 at 22:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> > > kactive_blk_num (K) is incremented on block close. last_kactive_blk_num (L)
> > > is set to match K on block open and each timer. So the only time that they
> > > differ is if a block is closed in tpacket_rcv and no new block could be
> > > opened.
> > > So the origin check L==K in timer callback only skip the case 'no new block
> > > to open'. If we remove L==K check, it will make prb_curr_blk_in_use check
> > > earlier, which will not cause any side effect.
> >
> > I believe the above commit message needs to be revised:
> > 1) the above sentence (starting from 'if we remove L....') means
> > nothing because your modification doesn't change the behaviour when
> > the queue is not frozen.
> > 2) lack of proofs/reasons on why exposing the prb_open_block() logic doesn't
> > cause side effects. It's the key proof that shows to future readers to
> > make sure this patch will not bring trouble.
>
> This diff file may not clearly demonstrate the changes made to the
> prb_retire_rx_blk_timer_expired function. We have simply removed the check for
> pkc->last_kactive_blk_num == pkc->kactive_blk_num; the other logic remains unchanged.
> Therefore, we should only need to explain why removing the check for
> pkc->last_kactive_blk_num == pkc->kactive_blk_num will not cause any negative impacts.
>
> I will clarify in the commit that our change to prb_retire_rx_blk_timer_expired only
> involves removing the check for pkc->last_kactive_blk_num == pkc->kactive_blk_num,
> to ensure that everyone understands this point, as it may not be clearly visible from
> the diff file.
>
> The commit may be changed as follow:
>
> kactive_blk_num (K) is only incremented on block close.
> In timer callback prb_retire_rx_blk_timer_expired, except delete_blk_timer
> is true, last_kactive_blk_num (L) is set to match kactive_blk_num (K) in
> all cases. L is also set to match K in prb_open_block.
> The only case K not equal to L is when scheduled by tpacket_rcv
> and K is just incremented on block close but no new block could be opened,
> so that it does not call prb_open_block in prb_dispatch_next_block.
> This patch modifies the prb_retire_rx_blk_timer_expired function by simply
> removing the check for L == K. Why can we remove the check for L == K in
> timer callback?
> In prb_freeze_queue, reset_pending_on_curr_blk (R) is set to 1. If R == 1,
> prb_queue_frozen return 1. In prb_retire_rx_blk_timer_expired,
> frozen = prb_queue_frozen(pkc); so frozen is 1 when R == 1.
>
> Consider the following case:
> (before applying this patch)
> cpu0                                  cpu1
> tpacket_rcv
>   ...
>     prb_dispatch_next_block
>       prb_freeze_queue (R = 1)
>                                       prb_retire_rx_blk_timer_expired
>                                         L != K
>                                           _prb_refresh_rx_retire_blk_timer
>                                             refresh timer
>                                             set L = K

I do not think the above can happen because:
1) tpacket_rcv() owns the sk_receive_queue.lock and then calls
packet_current_rx_frame()->__packet_lookup_frame_in_block()->prb_dispatch_next_block()
2) the timer prb_retire_rx_blk_timer_expired() also needs to acquire
the same lock first.

> (after applying this patch)
> cpu0                                  cpu1
> tpacket_rcv
>   ...
>     prb_dispatch_next_block
>       prb_freeze_queue (R = 1)
>                                       prb_retire_rx_blk_timer_expired
>                                         !forzen is 0
>                                           check prb_curr_blk_in_use
>                                             if true
>                                               same as (before apply)
>                                             if false
>                                               prb_open_block
> Before applying this patch, prb_retire_rx_blk_timer_expired will do nothing
> but refresh timer and set L = K in the case above. After applying this
> patch, it will check prb_curr_blk_in_use and call prb_open_block if
> user-space caught up.

The major difference after this patch is that even if L != K we would
call prb_open_block(). So I think the key point is that this patch
provides another checkpoint to thaw the might-be-frozen block in any
case. It doesn't have any effect because
__packet_lookup_frame_in_block() has the same logic and does it again
without this patch when detecting the ring is frozen. The patch only
advances checking the status of the ring.

Thanks,
Jason

>
>
> Please check if the above description is appropriate. I will change the
> description as above in PATCH v11.
>
>
> >
> > >
> > > Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> >
> > It was suggested by Willem, so please add:
> > Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>
> Okay, I will add it in the PATCH v11.
>
> >
> > So far, it looks good to me as well:
> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> >
> > And I will finish reviewing the other patch by tomorrow :)
>
>
> Thanks
> Xin Zhao
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field
@ 2025-09-04  2:35 Xin Zhao
  0 siblings, 0 replies; 11+ messages in thread
From: Xin Zhao @ 2025-09-04  2:35 UTC (permalink / raw)
  To: kerneljasonxing, willemdebruijn.kernel, edumazet, ferenc
  Cc: davem, kuba, pabeni, horms, netdev, linux-kernel

On Thu, Sep 4, 2025 at 10:09 +0800 Jason Xing <kerneljasonxing@gmail.com> wrote:

> > Consider the following case:
> > (before applying this patch)
> > cpu0                                  cpu1
> > tpacket_rcv
> >   ...
> >     prb_dispatch_next_block
> >       prb_freeze_queue (R = 1)
> >                                       prb_retire_rx_blk_timer_expired
> >                                         L != K
> >                                           _prb_refresh_rx_retire_blk_timer
> >                                             refresh timer
> >                                             set L = K
> 
> I do not think the above can happen because:
> 1) tpacket_rcv() owns the sk_receive_queue.lock and then calls
> packet_current_rx_frame()->__packet_lookup_frame_in_block()->prb_dispatch_next_block()
> 2) the timer prb_retire_rx_blk_timer_expired() also needs to acquire
> the same lock first.
> 
> > (after applying this patch)
> > cpu0                                  cpu1
> > tpacket_rcv
> >   ...
> >     prb_dispatch_next_block
> >       prb_freeze_queue (R = 1)
> >                                       prb_retire_rx_blk_timer_expired
> >                                         !forzen is 0
> >                                           check prb_curr_blk_in_use
> >                                             if true
> >                                               same as (before apply)
> >                                             if false
> >                                               prb_open_block
> > Before applying this patch, prb_retire_rx_blk_timer_expired will do nothing
> > but refresh timer and set L = K in the case above. After applying this
> > patch, it will check prb_curr_blk_in_use and call prb_open_block if
> > user-space caught up.
> 
> The major difference after this patch is that even if L != K we would
> call prb_open_block(). So I think the key point is that this patch
> provides another checkpoint to thaw the might-be-frozen block in any
> case. It doesn't have any effect because
> __packet_lookup_frame_in_block() has the same logic and does it again
> without this patch when detecting the ring is frozen. The patch only
> advances checking the status of the ring.


In the prb_dispatch_next_block function, after executing prb_freeze_queue, it
directly returns without executing prb_open_block. As a result, tpacket_rcv
completes and exits the lock, and then callback executes while (L != K).
Perhaps my diagram did not convey this clearly. I think it might be better to
use your description above to replace the flowchart representation.


Thanks
Xin Zhao


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-09-04  2:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-31 10:08 [PATCH net-next v10 0/2] net: af_packet: optimize retire operation Xin Zhao
2025-08-31 10:08 ` [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field Xin Zhao
2025-09-01  1:18   ` Willem de Bruijn
2025-09-02 14:04   ` Jason Xing
2025-08-31 10:08 ` [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-09-01  1:21   ` Willem de Bruijn
2025-09-02 15:43   ` Jason Xing
2025-09-02 16:42     ` Jason Xing
  -- strict thread matches above, loose matches on Subject: below --
2025-09-03 14:56 [PATCH net-next v10 1/2] net: af_packet: remove last_kactive_blk_num field Xin Zhao
2025-09-04  2:09 ` Jason Xing
2025-09-04  2:35 Xin Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.