* [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-22 13:20 Xin Zhao
2025-08-24 18:08 ` Willem de Bruijn
0 siblings, 1 reply; 6+ messages in thread
From: Xin Zhao @ 2025-08-22 13:20 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao
In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
---
Changes in v7:
- Only update the hrtimer expire time within the hrtimer callback.
When the callback return, without sk_buff_head lock protection, __run_hrtimer will
enqueue the timer if return HRTIMER_RESTART. Setting the hrtimer expires while
enqueuing a timer may cause chaos in the hrtimer red-black tree.
The setting expire time is monotonic, so if we do not update the expire time to the
retire_blk_timer when it is not in callback, it will not cause problem if we skip
the timeout event and update it when find out that expire_ktime is bigger than the
expire time of retire_blk_timer.
- Use hrtimer_set_expires here instead of hrtimer_forward_now.
The end time for retiring each block is not fixed because when network packets are
received quickly, blocks are retired rapidly, and the new block retire time needs
to be recalculated. However, hrtimer_forward_now increments the previous timeout
by an interval, which is not correct.
- The expire time is monotonic, so if we do not update the expire time to the
retire_blk_timer when it is not in callback, it will not cause problem if we skip
the timeout event and update it when find out that expire_ktime is bigger than the
expire time of retire_blk_timer.
- Adding the 'bool callback' parameter back is intended to more accurately determine
whether we are inside the hrtimer callback when executing
_prb_refresh_rx_retire_blk_timer. This ensures that we only update the hrtimer's
timeout value within the hrtimer callback.
Changes in v6:
- Use hrtimer_is_queued instead to check whether it is within the callback function.
So do not need to add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer
as suggested by Willem de Bruijn;
- Do not need local_irq_save and local_irq_restore to protect the race of the timer
callback running in softirq context or the open_block from tpacket_rcv in process
context
as suggested by Willem de Bruijn;
- Link to v6: https://lore.kernel.org/all/20250820092925.2115372-1-jackzxcui1989@163.com/
Changes in v5:
- Remove the unnecessary comments at the top of the _prb_refresh_rx_retire_blk_timer,
branch is self-explanatory enough
as suggested by Willem de Bruijn;
- Indentation of _prb_refresh_rx_retire_blk_timer, align with first argument on
previous line
as suggested by Willem de Bruijn;
- Do not call hrtimer_start within the hrtimer callback
as suggested by Willem de Bruijn
So add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer to indicate
whether it is within the callback function. Use hrtimer_forward_now instead of
hrtimer_start when it is in the callback function and is doing prb_open_block.
- Link to v5: https://lore.kernel.org/all/20250819091447.1199980-1-jackzxcui1989@163.com/
Changes in v4:
- Add 'bool start' to distinguish whether the call to _prb_refresh_rx_retire_blk_timer
is for prb_open_block. When it is for prb_open_block, execute hrtimer_start to
(re)start the hrtimer; otherwise, use hrtimer_forward_now to set the expiration
time as it is more commonly used compared to hrtimer_set_expires.
as suggested by Willem de Bruijn;
- Delete the comments to explain why hrtimer_set_expires(not hrtimer_forward_now)
is used, as we do not use hrtimer_set_expires any more;
- Link to v4: https://lore.kernel.org/all/20250818050233.155344-1-jackzxcui1989@163.com/
Changes in v3:
- return HRTIMER_NORESTART when pkc->delete_blk_timer is true
as suggested by Willem de Bruijn;
- Drop the retire_blk_tov field of tpacket_kbdq_core, add interval_ktime instead
as suggested by Willem de Bruijn;
- Add comments to explain why hrtimer_set_expires(not hrtimer_forward_now) is used in
_prb_refresh_rx_retire_blk_timer
as suggested by Willem de Bruijn;
- Link to v3: https://lore.kernel.org/all/20250816170130.3969354-1-jackzxcui1989@163.com/
Changes in v2:
- Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
as suggested by Willem de Bruijn;
- Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/
Changes in v1:
- Do not add another config for the current changes
as suggested by Eric Dumazet;
- Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
as suggested by Eric Dumazet;
- Add some performance details to the changelog
as suggested by Ferenc Fejes;
- Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
as suggested by Willem de Bruijn;
- Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
- Just return HRTIMER_RESTART directly as all cases return the same value
as suggested by Willem de Bruijn;
- Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
- Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
---
net/packet/af_packet.c | 93 +++++++++++++++++++++++++++++-------------
net/packet/diag.c | 2 +-
net/packet/internal.h | 6 +--
3 files changed, 69 insertions(+), 32 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a7017d7f0..654ae65ea 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -197,14 +197,14 @@ static void *packet_previous_frame(struct packet_sock *po,
static void packet_increment_head(struct packet_ring_buffer *buff);
static int prb_curr_blk_in_use(struct tpacket_block_desc *);
static void *prb_dispatch_next_block(struct tpacket_kbdq_core *,
- struct packet_sock *);
+ struct packet_sock *, bool);
static void prb_retire_current_block(struct tpacket_kbdq_core *,
struct packet_sock *, unsigned int status);
static int prb_queue_frozen(struct tpacket_kbdq_core *);
static void prb_open_block(struct tpacket_kbdq_core *,
- struct tpacket_block_desc *);
-static void prb_retire_rx_blk_timer_expired(struct timer_list *);
-static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
+ struct tpacket_block_desc *, bool);
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *, bool);
static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
static void prb_clear_rxhash(struct tpacket_kbdq_core *,
struct tpacket3_hdr *);
@@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
- timer_delete_sync(&pkc->retire_blk_timer);
+ hrtimer_cancel(&pkc->retire_blk_timer);
}
static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
@@ -603,9 +603,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
struct tpacket_kbdq_core *pkc;
pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
- timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
- 0);
- pkc->retire_blk_timer.expires = jiffies;
+ hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
+ CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
+ hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
+ HRTIMER_MODE_REL_SOFT);
}
static int prb_calc_retire_blk_tmo(struct packet_sock *po,
@@ -672,27 +673,52 @@ static void init_prb_bdqc(struct packet_sock *po,
p1->last_kactive_blk_num = 0;
po->stats.stats3.tp_freeze_q_cnt = 0;
if (req_u->req3.tp_retire_blk_tov)
- p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
+ p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
else
- p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
- req_u->req3.tp_block_size);
- p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
+ p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
+ req_u->req3.tp_block_size));
p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
rwlock_init(&p1->blk_fill_in_prog_lock);
p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
prb_init_ft_ops(p1, req_u);
prb_setup_retire_blk_timer(po);
- prb_open_block(p1, pbd);
+ prb_open_block(p1, pbd, false);
}
/* Do NOT update the last_blk_num first.
* Assumes sk_buff_head lock is held.
*/
-static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
-{
- mod_timer(&pkc->retire_blk_timer,
- jiffies + pkc->tov_in_jiffies);
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
+ bool callback)
+{
+ if (likely(ktime_to_ns(pkc->expire_ktime))) {
+ pkc->expire_ktime = ktime_add_safe(ktime_get(), pkc->interval_ktime);
+ if (callback) {
+ /* Three tips:
+ * 1) Only update the hrtimer expire time within the callback.
+ * When the callback return, without sk_buff_head lock protection,
+ * __run_hrtimer will enqueue the timer if return HRTIMER_RESTART.
+ * Setting the hrtimer expires while enqueuing a timer may cause
+ * chaos in the hrtimer red-black tree.
+ * 2) Use hrtimer_set_expires here instead of hrtimer_forward_now,
+ * because the end time for retiring each block is not fixed
+ * because when network packets are received quickly, blocks are
+ * retired rapidly, and the new block retire time needs to be
+ * recalculated. However, hrtimer_forward_now increments the
+ * previous timeout by an interval, which is not correct.
+ * 3)The expire time is monotonic, so if we do not update the
+ * expire time to the retire_blk_timer when it is not in callback,
+ * it will not cause problem if we skip the timeout event and
+ * update it when find out that expire_ktime is bigger than the
+ * expire time of retire_blk_timer.
+ */
+ hrtimer_set_expires(&pkc->retire_blk_timer, pkc->expire_ktime);
+ }
+ } else {
+ /* Just after prb_setup_retire_blk_timer. */
+ pkc->expire_ktime = hrtimer_get_expires(&pkc->retire_blk_timer);
+ }
pkc->last_kactive_blk_num = pkc->kactive_blk_num;
}
@@ -719,8 +745,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
* prb_calc_retire_blk_tmo() calculates the tmo.
*
*/
-static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
{
+ enum hrtimer_restart ret = HRTIMER_RESTART;
struct packet_sock *po =
timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
@@ -732,8 +759,17 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
frozen = prb_queue_frozen(pkc);
pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
- if (unlikely(pkc->delete_blk_timer))
+ if (unlikely(pkc->delete_blk_timer)) {
+ ret = HRTIMER_NORESTART;
goto out;
+ }
+
+ /* See comments in _prb_refresh_rx_retire_blk_timer. */
+ if (ktime_compare(pkc->expire_ktime,
+ hrtimer_get_expires(&pkc->retire_blk_timer)) > 0) {
+ hrtimer_set_expires(&pkc->retire_blk_timer, pkc->expire_ktime);
+ goto out;
+ }
/* We only need to plug the race when the block is partially filled.
* tpacket_rcv:
@@ -757,7 +793,7 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
goto refresh_timer;
}
prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
- if (!prb_dispatch_next_block(pkc, po))
+ if (!prb_dispatch_next_block(pkc, po, true))
goto refresh_timer;
else
goto out;
@@ -779,17 +815,18 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
* opening a block thaws the queue,restarts timer
* Thawing/timer-refresh is a side effect.
*/
- prb_open_block(pkc, pbd);
+ prb_open_block(pkc, pbd, true);
goto out;
}
}
}
refresh_timer:
- _prb_refresh_rx_retire_blk_timer(pkc);
+ _prb_refresh_rx_retire_blk_timer(pkc, true);
out:
spin_unlock(&po->sk.sk_receive_queue.lock);
+ return ret;
}
static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
@@ -890,7 +927,7 @@ static void prb_thaw_queue(struct tpacket_kbdq_core *pkc)
*
*/
static void prb_open_block(struct tpacket_kbdq_core *pkc1,
- struct tpacket_block_desc *pbd1)
+ struct tpacket_block_desc *pbd1, bool callback)
{
struct timespec64 ts;
struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1;
@@ -921,7 +958,7 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
prb_thaw_queue(pkc1);
- _prb_refresh_rx_retire_blk_timer(pkc1);
+ _prb_refresh_rx_retire_blk_timer(pkc1, callback);
smp_wmb();
}
@@ -965,7 +1002,7 @@ static void prb_freeze_queue(struct tpacket_kbdq_core *pkc,
* So, caller must check the return value.
*/
static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc,
- struct packet_sock *po)
+ struct packet_sock *po, bool callback)
{
struct tpacket_block_desc *pbd;
@@ -985,7 +1022,7 @@ static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc,
* open this block and return the offset where the first packet
* needs to get stored.
*/
- prb_open_block(pkc, pbd);
+ prb_open_block(pkc, pbd, callback);
return (void *)pkc->nxt_offset;
}
@@ -1124,7 +1161,7 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
* opening a block also thaws the queue.
* Thawing is a side effect.
*/
- prb_open_block(pkc, pbd);
+ prb_open_block(pkc, pbd, false);
}
}
@@ -1143,7 +1180,7 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
prb_retire_current_block(pkc, po, 0);
/* Now, try to dispatch the next block */
- curr = (char *)prb_dispatch_next_block(pkc, po);
+ curr = (char *)prb_dispatch_next_block(pkc, po, false);
if (curr) {
pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
prb_fill_curr_block(curr, pkc, pbd, len);
diff --git a/net/packet/diag.c b/net/packet/diag.c
index 6ce1dcc28..c8f43e0c1 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
pdr.pdr_frame_nr = ring->frame_max + 1;
if (ver > TPACKET_V2) {
- pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
+ pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
pdr.pdr_features = ring->prb_bdqc.feature_req_word;
} else {
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 1e743d031..f2df563c7 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -45,12 +45,12 @@ struct tpacket_kbdq_core {
/* Default is set to 8ms */
#define DEFAULT_PRB_RETIRE_TOV (8)
- unsigned short retire_blk_tov;
+ ktime_t interval_ktime;
unsigned short version;
- unsigned long tov_in_jiffies;
/* timer to retire an outstanding block */
- struct timer_list retire_blk_timer;
+ struct hrtimer retire_blk_timer;
+ ktime_t expire_ktime;
};
struct pgv {
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation
2025-08-22 13:20 [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
@ 2025-08-24 18:08 ` Willem de Bruijn
0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-08-24 18:08 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao
Xin Zhao wrote:
> In a system with high real-time requirements, the timeout mechanism of
> ordinary timers with jiffies granularity is insufficient to meet the
> demands for real-time performance. Meanwhile, the optimization of CPU
> usage with af_packet is quite significant. Use hrtimer instead of timer
> to help compensate for the shortcomings in real-time performance.
> In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> This is unacceptable in some high real-time systems that require timely
> processing of network packets. By replacing it with hrtimer, if a timeout
> of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> 3 ms.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
>
> ---
> Changes in v7:
> - Only update the hrtimer expire time within the hrtimer callback.
> When the callback return, without sk_buff_head lock protection, __run_hrtimer will
> enqueue the timer if return HRTIMER_RESTART. Setting the hrtimer expires while
> enqueuing a timer may cause chaos in the hrtimer red-black tree.
> The setting expire time is monotonic, so if we do not update the expire time to the
> retire_blk_timer when it is not in callback, it will not cause problem if we skip
> the timeout event and update it when find out that expire_ktime is bigger than the
> expire time of retire_blk_timer.
> - Use hrtimer_set_expires here instead of hrtimer_forward_now.
> The end time for retiring each block is not fixed because when network packets are
> received quickly, blocks are retired rapidly, and the new block retire time needs
> to be recalculated. However, hrtimer_forward_now increments the previous timeout
> by an interval, which is not correct.
> - The expire time is monotonic, so if we do not update the expire time to the
> retire_blk_timer when it is not in callback, it will not cause problem if we skip
> the timeout event and update it when find out that expire_ktime is bigger than the
> expire time of retire_blk_timer.
> - Adding the 'bool callback' parameter back is intended to more accurately determine
> whether we are inside the hrtimer callback when executing
> _prb_refresh_rx_retire_blk_timer. This ensures that we only update the hrtimer's
> timeout value within the hrtimer callback.
This is getting more complex than needed.
Essentially the lifecycle is that packet_set_ring calls hrtimer_setup
and hrtimer_del_sync.
Inbetween, while the ring is configured, the timer is either
- scheduled from tpacket_rcv and !is_scheduled
-> call hrtimer_start
- scheduled from tpacket_rcv and is_scheduled
-> call hrtimer_set_expires
- rescheduled from the timer callback
-> call hrtimer_set_expires and return HRTIMER_RESTART
The only complication is that the is_scheduled check can race with the
HRTIMER_RESTART restart, as that happens outside the sk_receive_queue
critical section.
One option that I suggested before is to convert pkc->delete_blk_timer
to pkc->blk_timer_scheduled to record whether the timer is scheduled
without relying on hrtimer_is_queued. Set it on first open_block and
clear it from the callback when returning HR_NORESTART.
> Changes in v6:
> - Use hrtimer_is_queued instead to check whether it is within the callback function.
> So do not need to add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer
> as suggested by Willem de Bruijn;
> - Do not need local_irq_save and local_irq_restore to protect the race of the timer
> callback running in softirq context or the open_block from tpacket_rcv in process
> context
> as suggested by Willem de Bruijn;
> - Link to v6: https://lore.kernel.org/all/20250820092925.2115372-1-jackzxcui1989@163.com/
>
> Changes in v5:
> - Remove the unnecessary comments at the top of the _prb_refresh_rx_retire_blk_timer,
> branch is self-explanatory enough
> as suggested by Willem de Bruijn;
> - Indentation of _prb_refresh_rx_retire_blk_timer, align with first argument on
> previous line
> as suggested by Willem de Bruijn;
> - Do not call hrtimer_start within the hrtimer callback
> as suggested by Willem de Bruijn
> So add 'bool callback' parameter to _prb_refresh_rx_retire_blk_timer to indicate
> whether it is within the callback function. Use hrtimer_forward_now instead of
> hrtimer_start when it is in the callback function and is doing prb_open_block.
> - Link to v5: https://lore.kernel.org/all/20250819091447.1199980-1-jackzxcui1989@163.com/
>
> Changes in v4:
> - Add 'bool start' to distinguish whether the call to _prb_refresh_rx_retire_blk_timer
> is for prb_open_block. When it is for prb_open_block, execute hrtimer_start to
> (re)start the hrtimer; otherwise, use hrtimer_forward_now to set the expiration
> time as it is more commonly used compared to hrtimer_set_expires.
> as suggested by Willem de Bruijn;
> - Delete the comments to explain why hrtimer_set_expires(not hrtimer_forward_now)
> is used, as we do not use hrtimer_set_expires any more;
> - Link to v4: https://lore.kernel.org/all/20250818050233.155344-1-jackzxcui1989@163.com/
>
> Changes in v3:
> - return HRTIMER_NORESTART when pkc->delete_blk_timer is true
> as suggested by Willem de Bruijn;
> - Drop the retire_blk_tov field of tpacket_kbdq_core, add interval_ktime instead
> as suggested by Willem de Bruijn;
> - Add comments to explain why hrtimer_set_expires(not hrtimer_forward_now) is used in
> _prb_refresh_rx_retire_blk_timer
> as suggested by Willem de Bruijn;
> - Link to v3: https://lore.kernel.org/all/20250816170130.3969354-1-jackzxcui1989@163.com/
>
> Changes in v2:
> - Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
> as suggested by Willem de Bruijn;
> - Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/
>
> Changes in v1:
> - Do not add another config for the current changes
> as suggested by Eric Dumazet;
> - Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
> as suggested by Eric Dumazet;
> - Add some performance details to the changelog
> as suggested by Ferenc Fejes;
> - Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
> as suggested by Willem de Bruijn;
> - Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
> as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
> - Just return HRTIMER_RESTART directly as all cases return the same value
> as suggested by Willem de Bruijn;
> - Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
> - Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
> ---
> net/packet/af_packet.c | 93 +++++++++++++++++++++++++++++-------------
> net/packet/diag.c | 2 +-
> net/packet/internal.h | 6 +--
> 3 files changed, 69 insertions(+), 32 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index a7017d7f0..654ae65ea 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -197,14 +197,14 @@ static void *packet_previous_frame(struct packet_sock *po,
> static void packet_increment_head(struct packet_ring_buffer *buff);
> static int prb_curr_blk_in_use(struct tpacket_block_desc *);
> static void *prb_dispatch_next_block(struct tpacket_kbdq_core *,
> - struct packet_sock *);
> + struct packet_sock *, bool);
> static void prb_retire_current_block(struct tpacket_kbdq_core *,
> struct packet_sock *, unsigned int status);
> static int prb_queue_frozen(struct tpacket_kbdq_core *);
> static void prb_open_block(struct tpacket_kbdq_core *,
> - struct tpacket_block_desc *);
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *);
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
> + struct tpacket_block_desc *, bool);
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
> +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *, bool);
> static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
> static void prb_clear_rxhash(struct tpacket_kbdq_core *,
> struct tpacket3_hdr *);
> @@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
>
> static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> {
> - timer_delete_sync(&pkc->retire_blk_timer);
> + hrtimer_cancel(&pkc->retire_blk_timer);
> }
>
> static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
> @@ -603,9 +603,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
> struct tpacket_kbdq_core *pkc;
>
> pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> - timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> - 0);
> - pkc->retire_blk_timer.expires = jiffies;
> + hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> + CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> + hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> + HRTIMER_MODE_REL_SOFT);
> }
>
> static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> @@ -672,27 +673,52 @@ static void init_prb_bdqc(struct packet_sock *po,
> p1->last_kactive_blk_num = 0;
> po->stats.stats3.tp_freeze_q_cnt = 0;
> if (req_u->req3.tp_retire_blk_tov)
> - p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> + p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
> else
> - p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> - req_u->req3.tp_block_size);
> - p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> + p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> + req_u->req3.tp_block_size));
> p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
> rwlock_init(&p1->blk_fill_in_prog_lock);
>
> p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
> prb_init_ft_ops(p1, req_u);
> prb_setup_retire_blk_timer(po);
> - prb_open_block(p1, pbd);
> + prb_open_block(p1, pbd, false);
> }
>
> /* Do NOT update the last_blk_num first.
> * Assumes sk_buff_head lock is held.
> */
> -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> -{
> - mod_timer(&pkc->retire_blk_timer,
> - jiffies + pkc->tov_in_jiffies);
> +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> + bool callback)
> +{
> + if (likely(ktime_to_ns(pkc->expire_ktime))) {
> + pkc->expire_ktime = ktime_add_safe(ktime_get(), pkc->interval_ktime);
> + if (callback) {
> + /* Three tips:
> + * 1) Only update the hrtimer expire time within the callback.
> + * When the callback return, without sk_buff_head lock protection,
> + * __run_hrtimer will enqueue the timer if return HRTIMER_RESTART.
> + * Setting the hrtimer expires while enqueuing a timer may cause
> + * chaos in the hrtimer red-black tree.
> + * 2) Use hrtimer_set_expires here instead of hrtimer_forward_now,
> + * because the end time for retiring each block is not fixed
> + * because when network packets are received quickly, blocks are
> + * retired rapidly, and the new block retire time needs to be
> + * recalculated. However, hrtimer_forward_now increments the
> + * previous timeout by an interval, which is not correct.
> + * 3)The expire time is monotonic, so if we do not update the
> + * expire time to the retire_blk_timer when it is not in callback,
> + * it will not cause problem if we skip the timeout event and
> + * update it when find out that expire_ktime is bigger than the
> + * expire time of retire_blk_timer.
> + */
> + hrtimer_set_expires(&pkc->retire_blk_timer, pkc->expire_ktime);
> + }
> + } else {
> + /* Just after prb_setup_retire_blk_timer. */
> + pkc->expire_ktime = hrtimer_get_expires(&pkc->retire_blk_timer);
> + }
> pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> }
>
> @@ -719,8 +745,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> * prb_calc_retire_blk_tmo() calculates the tmo.
> *
> */
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
> {
> + enum hrtimer_restart ret = HRTIMER_RESTART;
> struct packet_sock *po =
> timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
> struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> @@ -732,8 +759,17 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> frozen = prb_queue_frozen(pkc);
> pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
>
> - if (unlikely(pkc->delete_blk_timer))
> + if (unlikely(pkc->delete_blk_timer)) {
> + ret = HRTIMER_NORESTART;
> goto out;
> + }
> +
> + /* See comments in _prb_refresh_rx_retire_blk_timer. */
> + if (ktime_compare(pkc->expire_ktime,
> + hrtimer_get_expires(&pkc->retire_blk_timer)) > 0) {
> + hrtimer_set_expires(&pkc->retire_blk_timer, pkc->expire_ktime);
> + goto out;
> + }
>
> /* We only need to plug the race when the block is partially filled.
> * tpacket_rcv:
> @@ -757,7 +793,7 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> goto refresh_timer;
> }
> prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> - if (!prb_dispatch_next_block(pkc, po))
> + if (!prb_dispatch_next_block(pkc, po, true))
> goto refresh_timer;
> else
> goto out;
> @@ -779,17 +815,18 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> * opening a block thaws the queue,restarts timer
> * Thawing/timer-refresh is a side effect.
> */
> - prb_open_block(pkc, pbd);
> + prb_open_block(pkc, pbd, true);
> goto out;
> }
> }
> }
>
> refresh_timer:
> - _prb_refresh_rx_retire_blk_timer(pkc);
> + _prb_refresh_rx_retire_blk_timer(pkc, true);
>
> out:
> spin_unlock(&po->sk.sk_receive_queue.lock);
> + return ret;
> }
>
> static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
> @@ -890,7 +927,7 @@ static void prb_thaw_queue(struct tpacket_kbdq_core *pkc)
> *
> */
> static void prb_open_block(struct tpacket_kbdq_core *pkc1,
> - struct tpacket_block_desc *pbd1)
> + struct tpacket_block_desc *pbd1, bool callback)
> {
> struct timespec64 ts;
> struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1;
> @@ -921,7 +958,7 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
> pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
>
> prb_thaw_queue(pkc1);
> - _prb_refresh_rx_retire_blk_timer(pkc1);
> + _prb_refresh_rx_retire_blk_timer(pkc1, callback);
>
> smp_wmb();
> }
> @@ -965,7 +1002,7 @@ static void prb_freeze_queue(struct tpacket_kbdq_core *pkc,
> * So, caller must check the return value.
> */
> static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc,
> - struct packet_sock *po)
> + struct packet_sock *po, bool callback)
> {
> struct tpacket_block_desc *pbd;
>
> @@ -985,7 +1022,7 @@ static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc,
> * open this block and return the offset where the first packet
> * needs to get stored.
> */
> - prb_open_block(pkc, pbd);
> + prb_open_block(pkc, pbd, callback);
> return (void *)pkc->nxt_offset;
> }
>
> @@ -1124,7 +1161,7 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
> * opening a block also thaws the queue.
> * Thawing is a side effect.
> */
> - prb_open_block(pkc, pbd);
> + prb_open_block(pkc, pbd, false);
> }
> }
>
> @@ -1143,7 +1180,7 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
> prb_retire_current_block(pkc, po, 0);
>
> /* Now, try to dispatch the next block */
> - curr = (char *)prb_dispatch_next_block(pkc, po);
> + curr = (char *)prb_dispatch_next_block(pkc, po, false);
> if (curr) {
> pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
> prb_fill_curr_block(curr, pkc, pbd, len);
> diff --git a/net/packet/diag.c b/net/packet/diag.c
> index 6ce1dcc28..c8f43e0c1 100644
> --- a/net/packet/diag.c
> +++ b/net/packet/diag.c
> @@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
> pdr.pdr_frame_nr = ring->frame_max + 1;
>
> if (ver > TPACKET_V2) {
> - pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
> + pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
> pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
> pdr.pdr_features = ring->prb_bdqc.feature_req_word;
> } else {
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index 1e743d031..f2df563c7 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -45,12 +45,12 @@ struct tpacket_kbdq_core {
> /* Default is set to 8ms */
> #define DEFAULT_PRB_RETIRE_TOV (8)
>
> - unsigned short retire_blk_tov;
> + ktime_t interval_ktime;
> unsigned short version;
> - unsigned long tov_in_jiffies;
>
> /* timer to retire an outstanding block */
> - struct timer_list retire_blk_timer;
> + struct hrtimer retire_blk_timer;
> + ktime_t expire_ktime;
> };
>
> struct pgv {
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-26 3:03 Xin Zhao
2025-08-26 12:54 ` Willem de Bruijn
0 siblings, 1 reply; 6+ messages in thread
From: Xin Zhao @ 2025-08-26 3:03 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
On Tue, 2025-08-25 at 0:20 +0800, Willem wrote:
> > We cannot use hrtimer_set_expires/hrtimer_forward_now when a hrtimer is
> > already enqueued.
> > The WARN_ON(timer->state & HRTIMER_STATE_ENQUEUED) in hrtimer_forward
> > already clearly indicates this point. The reason for not adding this
> > WARN_ON in hrtimer_set_expires is that hrtimer_set_expires is an inline
> > function, wory about increase code size.
> > The implementation of perf_mux_hrtimer_restart actually checks whether
> > the hrtimer is active when restarting the hrtimer.
>
> Perhaps we need to simplify and stop trying to adjust the timer from
> tpacket_rcv once scheduled. Let the callback handle that.
>
Okay, I would also like to modify the timeout only within the callback,
so I think PATCH v7 might be a better solution. Additionally, in terms of
performance, it should be more efficient than frequently calling
hrtimer_cancel/hrtimer_start functions to change the timeout outside the
callback.
Why do I add the pkc->expire_ktime in PATCH v7?
For example 8ms retire timeout.
T means the time callback/tpacket_rcv call _prb_refresh_rx_retire_blk_timer.
T1 means time T plus 1ms, T2 means time T plus 2ms...
timeline: past -----------> -----------> -----------> future
callback: T T8
tpacket_rcv: T7
Considering the situation in the above diagram, at time T7, the tpacket_rcv
function processes the network and finds that a new block needs to be opened,
which requires setting a timeout of T7 + 8ms which is T15ms. However, we
cannot directly set the timeout within tpacket_rcv, so we use a variable
expire_ktime to record this value. At time T8, in the hrtimer callback, we
check that expire_ktime which is T15 is greater than the current timeout of
the hrtimer, which is T8. Therefore, we simply return from the hrtimer
callback at T8, the next execution time of the hrtimer callback will be T15.
This achieves the same effect as executing hrtimer_start in tpacket_rcv
using a "one shot" approach.
> > Do you agree with adding a callback variable to distinguish between
> > scheduled from tpacket_rcv and scheduled from the callback? I really
> > couldn't think of a better solution.
>
> Yes, no objections to that if necessary.
So it seems that the logic of 'adding a callback variable to distinguish' in
PATCH v7 is OK?
> > So, a possible solution may be?
> > 1. Continue to keep the callback parameter to strictly ensure whether it
> > is within the callback.
> > 2. Use hrtimer_set_expires within the callback to update the timeout (the
> > hrtimer module will enqueue the hrtimer when callback return)
> > 3. If it is not in callback, call hrtimer_cancel + hrtimer_start to restart
> > the timer.
>
> Instead, I would use an in_scheduled param, as in my previous reply and
> simply skip trying to schedule if already scheduled.
I understand that the additional in_scheduled variable is meant to prevent
multiple calls to hrtimer_start. However, based on the current logic
implementation, the only scenario that would cancel the hrtimer is after calling
prb_shutdown_retire_blk_timer. Therefore, once we have called hrtimer_start in
prb_setup_retire_blk_timer, we don't need to worry about the hrtimer stopping,
and we don't need to execute hrtimer_start again or check if the hrtimer is in
an active state. We can simply update the timeout in the callback.
Additionally, we don't need to worry about the situation where packet_set_ring
is entered twice, leading to multiple calls to hrtimer_start, because there is
a check for pg_vec before executing init_prb_bdqc in packet_set_ring. If pg_vec
is non-zero, it will go to the out label.
So is PATCH v7 good to go? Besides I think that ktime_after should be used
instead of ktime_compare, I haven't noticed any other areas in PATCH v7 that
need modification. What do you think?
Thanks
Xin Zhao
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation
2025-08-26 3:03 Xin Zhao
@ 2025-08-26 12:54 ` Willem de Bruijn
0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-08-26 12:54 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
Xin Zhao wrote:
> On Tue, 2025-08-25 at 0:20 +0800, Willem wrote:
>
> > > We cannot use hrtimer_set_expires/hrtimer_forward_now when a hrtimer is
> > > already enqueued.
> > > The WARN_ON(timer->state & HRTIMER_STATE_ENQUEUED) in hrtimer_forward
> > > already clearly indicates this point. The reason for not adding this
> > > WARN_ON in hrtimer_set_expires is that hrtimer_set_expires is an inline
> > > function, wory about increase code size.
> > > The implementation of perf_mux_hrtimer_restart actually checks whether
> > > the hrtimer is active when restarting the hrtimer.
> >
> > Perhaps we need to simplify and stop trying to adjust the timer from
> > tpacket_rcv once scheduled. Let the callback handle that.
> >
>
> Okay, I would also like to modify the timeout only within the callback,
> so I think PATCH v7 might be a better solution. Additionally, in terms of
> performance, it should be more efficient than frequently calling
> hrtimer_cancel/hrtimer_start functions to change the timeout outside the
> callback.
>
> Why do I add the pkc->expire_ktime in PATCH v7?
>
> For example 8ms retire timeout.
> T means the time callback/tpacket_rcv call _prb_refresh_rx_retire_blk_timer.
> T1 means time T plus 1ms, T2 means time T plus 2ms...
>
> timeline: past -----------> -----------> -----------> future
> callback: T T8
> tpacket_rcv: T7
>
> Considering the situation in the above diagram, at time T7, the tpacket_rcv
> function processes the network and finds that a new block needs to be opened,
> which requires setting a timeout of T7 + 8ms which is T15ms. However, we
> cannot directly set the timeout within tpacket_rcv, so we use a variable
> expire_ktime to record this value. At time T8, in the hrtimer callback, we
> check that expire_ktime which is T15 is greater than the current timeout of
> the hrtimer, which is T8. Therefore, we simply return from the hrtimer
> callback at T8, the next execution time of the hrtimer callback will be T15.
> This achieves the same effect as executing hrtimer_start in tpacket_rcv
> using a "one shot" approach.
>
>
> > > Do you agree with adding a callback variable to distinguish between
> > > scheduled from tpacket_rcv and scheduled from the callback? I really
> > > couldn't think of a better solution.
> >
> > Yes, no objections to that if necessary.
>
> So it seems that the logic of 'adding a callback variable to distinguish' in
> PATCH v7 is OK?
>
>
> > > So, a possible solution may be?
> > > 1. Continue to keep the callback parameter to strictly ensure whether it
> > > is within the callback.
> > > 2. Use hrtimer_set_expires within the callback to update the timeout (the
> > > hrtimer module will enqueue the hrtimer when callback return)
> > > 3. If it is not in callback, call hrtimer_cancel + hrtimer_start to restart
> > > the timer.
> >
> > Instead, I would use an in_scheduled param, as in my previous reply and
> > simply skip trying to schedule if already scheduled.
>
> I understand that the additional in_scheduled variable is meant to prevent
> multiple calls to hrtimer_start. However, based on the current logic
> implementation, the only scenario that would cancel the hrtimer is after calling
> prb_shutdown_retire_blk_timer. Therefore, once we have called hrtimer_start in
> prb_setup_retire_blk_timer, we don't need to worry about the hrtimer stopping,
> and we don't need to execute hrtimer_start again or check if the hrtimer is in
> an active state. We can simply update the timeout in the callback.
The hrtimer is also canceled when the callback returns
HRTIMER_NORESTART.
> Additionally, we don't need to worry about the situation where packet_set_ring
> is entered twice, leading to multiple calls to hrtimer_start, because there is
> a check for pg_vec before executing init_prb_bdqc in packet_set_ring. If pg_vec
> is non-zero, it will go to the out label.
>
> So is PATCH v7 good to go? Besides I think that ktime_after should be used
> instead of ktime_compare, I haven't noticed any other areas in PATCH v7 that
> need modification. What do you think?
>
>
> Thanks
> Xin Zhao
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-26 14:53 Xin Zhao
2025-08-26 16:05 ` Willem de Bruijn
0 siblings, 1 reply; 6+ messages in thread
From: Xin Zhao @ 2025-08-26 14:53 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
On Tue, 2025-08-25 at 20:54 +0800, Willem wrote:
> > I understand that the additional in_scheduled variable is meant to prevent
> > multiple calls to hrtimer_start. However, based on the current logic
> > implementation, the only scenario that would cancel the hrtimer is after calling
> > prb_shutdown_retire_blk_timer. Therefore, once we have called hrtimer_start in
> > prb_setup_retire_blk_timer, we don't need to worry about the hrtimer stopping,
> > and we don't need to execute hrtimer_start again or check if the hrtimer is in
> > an active state. We can simply update the timeout in the callback.
>
> The hrtimer is also canceled when the callback returns
> HRTIMER_NORESTART.
In prb_retire_rx_blk_timer_expired function, the only way to return HRTIMER_NORESTART
is that the pkc->delete_blk_timer is NOT 0.
The delete_blk_timer is only set to 1 in prb_shutdown_retire_blk_timer which is called
by packet_set_ring.
In my understanding, once packet_set_ring is called and prb_shutdown_retire_blk_timer
is executed, the only way to make this af_packet work again is to call packet_set_ring
again to execute prb_setup_retire_blk_timer. At that point, hrtimer_start will be
called again. Therefore, I feel that there is no need to perform the check in
_prb_refresh_rx_retire_blk_timer. Only let prb_setup_retire_blk_timer to hrtimer_start,
is that right?
Thanks
Xin Zhao
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation
2025-08-26 14:53 Xin Zhao
@ 2025-08-26 16:05 ` Willem de Bruijn
0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-08-26 16:05 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
Xin Zhao wrote:
> On Tue, 2025-08-25 at 20:54 +0800, Willem wrote:
>
> > > I understand that the additional in_scheduled variable is meant to prevent
> > > multiple calls to hrtimer_start. However, based on the current logic
> > > implementation, the only scenario that would cancel the hrtimer is after calling
> > > prb_shutdown_retire_blk_timer. Therefore, once we have called hrtimer_start in
> > > prb_setup_retire_blk_timer, we don't need to worry about the hrtimer stopping,
> > > and we don't need to execute hrtimer_start again or check if the hrtimer is in
> > > an active state. We can simply update the timeout in the callback.
> >
> > The hrtimer is also canceled when the callback returns
> > HRTIMER_NORESTART.
>
> In prb_retire_rx_blk_timer_expired function, the only way to return HRTIMER_NORESTART
> is that the pkc->delete_blk_timer is NOT 0.
> The delete_blk_timer is only set to 1 in prb_shutdown_retire_blk_timer which is called
> by packet_set_ring.
> In my understanding, once packet_set_ring is called and prb_shutdown_retire_blk_timer
> is executed, the only way to make this af_packet work again is to call packet_set_ring
> again to execute prb_setup_retire_blk_timer. At that point, hrtimer_start will be
> called again. Therefore, I feel that there is no need to perform the check in
> _prb_refresh_rx_retire_blk_timer. Only let prb_setup_retire_blk_timer to hrtimer_start,
> is that right?
Good point.
Let's clean up the control flow a bit more to make that more clear.
For one, no need for delete_blk_timer. hrtimer_cancel will cancel the
timer if it is queued. And the callback spends the vast majority of
its time after the check. So the odds of delete_blk_timer having any
effect is minimal.
And if the callback just restarts itself unconditionally, no need for
the special refresh_timer and out labels. Or the somewhat complex
calling flow between _prb_refresh_rx_retire_blk_timer, prb_open_block
and prb_retire_rx_blk_timer_expired. They all just schedule the next
timer the fixed computed jiffies/ms from now. The only special case
is when prb_open_block is called from tpacket_rcv. That would set
the timeout further into the future than the already queued timer.
I don't think that an earlier timeout is problematic. No need to
add complexity to avoid that.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-26 16:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 13:20 [PATCH net-next v7] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-08-24 18:08 ` Willem de Bruijn
-- strict thread matches above, loose matches on Subject: below --
2025-08-26 3:03 Xin Zhao
2025-08-26 12:54 ` Willem de Bruijn
2025-08-26 14:53 Xin Zhao
2025-08-26 16:05 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).