* [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-16 2:48 Xin Zhao
2025-08-16 9:34 ` Willem de Bruijn
2025-08-17 13:28 ` Willem de Bruijn
0 siblings, 2 replies; 10+ messages in thread
From: Xin Zhao @ 2025-08-16 2:48 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao
In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
---
Changes in v3:
- return HRTIMER_NORESTART when pkc->delete_blk_timer is true
as suggested by Willem de Bruijn;
Changes in v2:
- Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
as suggested by Willem de Bruijn;
- Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/
Changes in v1:
- Do not add another config for the current changes
as suggested by Eric Dumazet;
- Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
as suggested by Eric Dumazet;
- Add some performance details to the changelog
as suggested by Ferenc Fejes;
- Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
as suggested by Willem de Bruijn;
- Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
- Just return HRTIMER_RESTART directly as all cases return the same value
as suggested by Willem de Bruijn;
- Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
- Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
---
---
net/packet/af_packet.c | 24 ++++++++++++++----------
net/packet/internal.h | 3 +--
2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a7017d7f0..763b0c968 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -203,7 +203,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
static int prb_queue_frozen(struct tpacket_kbdq_core *);
static void prb_open_block(struct tpacket_kbdq_core *,
struct tpacket_block_desc *);
-static void prb_retire_rx_blk_timer_expired(struct timer_list *);
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
static void prb_clear_rxhash(struct tpacket_kbdq_core *,
@@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
- timer_delete_sync(&pkc->retire_blk_timer);
+ hrtimer_cancel(&pkc->retire_blk_timer);
}
static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
@@ -603,9 +603,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
struct tpacket_kbdq_core *pkc;
pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
- timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
- 0);
- pkc->retire_blk_timer.expires = jiffies;
+ hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
+ CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
+ hrtimer_start(&pkc->retire_blk_timer, ms_to_ktime(pkc->retire_blk_tov),
+ HRTIMER_MODE_REL_SOFT);
}
static int prb_calc_retire_blk_tmo(struct packet_sock *po,
@@ -676,7 +677,6 @@ static void init_prb_bdqc(struct packet_sock *po,
else
p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
req_u->req3.tp_block_size);
- p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
rwlock_init(&p1->blk_fill_in_prog_lock);
@@ -691,8 +691,8 @@ static void init_prb_bdqc(struct packet_sock *po,
*/
static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
- mod_timer(&pkc->retire_blk_timer,
- jiffies + pkc->tov_in_jiffies);
+ hrtimer_set_expires(&pkc->retire_blk_timer,
+ ktime_add(ktime_get(), ms_to_ktime(pkc->retire_blk_tov)));
pkc->last_kactive_blk_num = pkc->kactive_blk_num;
}
@@ -719,8 +719,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
* prb_calc_retire_blk_tmo() calculates the tmo.
*
*/
-static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
{
+ enum hrtimer_restart ret = HRTIMER_RESTART;
struct packet_sock *po =
timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
@@ -732,8 +733,10 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
frozen = prb_queue_frozen(pkc);
pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
- if (unlikely(pkc->delete_blk_timer))
+ if (unlikely(pkc->delete_blk_timer)) {
+ ret = HRTIMER_NORESTART;
goto out;
+ }
/* We only need to plug the race when the block is partially filled.
* tpacket_rcv:
@@ -790,6 +793,7 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
out:
spin_unlock(&po->sk.sk_receive_queue.lock);
+ return ret;
}
static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 1e743d031..9812feb3d 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -47,10 +47,9 @@ struct tpacket_kbdq_core {
unsigned short retire_blk_tov;
unsigned short version;
- unsigned long tov_in_jiffies;
/* timer to retire an outstanding block */
- struct timer_list retire_blk_timer;
+ struct hrtimer retire_blk_timer;
};
struct pgv {
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
2025-08-16 2:48 [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
@ 2025-08-16 9:34 ` Willem de Bruijn
2025-08-17 13:28 ` Willem de Bruijn
1 sibling, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-08-16 9:34 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao
Xin Zhao wrote:
> In a system with high real-time requirements, the timeout mechanism of
> ordinary timers with jiffies granularity is insufficient to meet the
> demands for real-time performance. Meanwhile, the optimization of CPU
> usage with af_packet is quite significant. Use hrtimer instead of timer
> to help compensate for the shortcomings in real-time performance.
> In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> This is unacceptable in some high real-time systems that require timely
> processing of network packets. By replacing it with hrtimer, if a timeout
> of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> 3 ms.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
Discussion in v2 is still ongoing. This will have to be respun based
on that.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-16 17:01 Xin Zhao
0 siblings, 0 replies; 10+ messages in thread
From: Xin Zhao @ 2025-08-16 17:01 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao
In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
---
Changes in v3:
- return HRTIMER_NORESTART when pkc->delete_blk_timer is true
as suggested by Willem de Bruijn;
- Drop the retire_blk_tov field of tpacket_kbdq_core, add interval_ktime instead
as suggested by Willem de Bruijn;
- Add comments to explain why hrtimer_set_expires(not hrtimer_forward_now) is used in
_prb_refresh_rx_retire_blk_timer
as suggested by Willem de Bruijn;
Changes in v2:
- Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
as suggested by Willem de Bruijn;
- Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/
Changes in v1:
- Do not add another config for the current changes
as suggested by Eric Dumazet;
- Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
as suggested by Eric Dumazet;
- Add some performance details to the changelog
as suggested by Ferenc Fejes;
- Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
as suggested by Willem de Bruijn;
- Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
- Just return HRTIMER_RESTART directly as all cases return the same value
as suggested by Willem de Bruijn;
- Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
- Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
---
---
net/packet/af_packet.c | 39 ++++++++++++++++++++++++++-------------
net/packet/diag.c | 2 +-
net/packet/internal.h | 5 ++---
3 files changed, 29 insertions(+), 17 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a7017d7f0..11b3d2cfa 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -203,7 +203,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
static int prb_queue_frozen(struct tpacket_kbdq_core *);
static void prb_open_block(struct tpacket_kbdq_core *,
struct tpacket_block_desc *);
-static void prb_retire_rx_blk_timer_expired(struct timer_list *);
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
static void prb_clear_rxhash(struct tpacket_kbdq_core *,
@@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
- timer_delete_sync(&pkc->retire_blk_timer);
+ hrtimer_cancel(&pkc->retire_blk_timer);
}
static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
@@ -603,9 +603,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
struct tpacket_kbdq_core *pkc;
pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
- timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
- 0);
- pkc->retire_blk_timer.expires = jiffies;
+ hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
+ CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
+ hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
+ HRTIMER_MODE_REL_SOFT);
}
static int prb_calc_retire_blk_tmo(struct packet_sock *po,
@@ -672,11 +673,10 @@ static void init_prb_bdqc(struct packet_sock *po,
p1->last_kactive_blk_num = 0;
po->stats.stats3.tp_freeze_q_cnt = 0;
if (req_u->req3.tp_retire_blk_tov)
- p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
+ p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
else
- p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
- req_u->req3.tp_block_size);
- p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
+ p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
+ req_u->req3.tp_block_size));
p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
rwlock_init(&p1->blk_fill_in_prog_lock);
@@ -691,8 +691,17 @@ static void init_prb_bdqc(struct packet_sock *po,
*/
static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
- mod_timer(&pkc->retire_blk_timer,
- jiffies + pkc->tov_in_jiffies);
+ /* We cannot use hrtimer_forward_now here because the function
+ * _prb_refresh_rx_retire_blk_timer can be called not only when
+ * the retire timer expires, but also when the kernel logic for
+ * receiving network packets detects that a network packet has
+ * filled up a block and calls prb_open_block to use the next
+ * block. This can lead to a WARN_ON being triggered in
+ * hrtimer_forward_now when it checks if the timer has already
+ * been enqueued.
+ */
+ hrtimer_set_expires(&pkc->retire_blk_timer,
+ ktime_add(ktime_get(), pkc->interval_ktime));
pkc->last_kactive_blk_num = pkc->kactive_blk_num;
}
@@ -719,8 +728,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
* prb_calc_retire_blk_tmo() calculates the tmo.
*
*/
-static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
+static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
{
+ enum hrtimer_restart ret = HRTIMER_RESTART;
struct packet_sock *po =
timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
@@ -732,8 +742,10 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
frozen = prb_queue_frozen(pkc);
pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
- if (unlikely(pkc->delete_blk_timer))
+ if (unlikely(pkc->delete_blk_timer)) {
+ ret = HRTIMER_NORESTART;
goto out;
+ }
/* We only need to plug the race when the block is partially filled.
* tpacket_rcv:
@@ -790,6 +802,7 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
out:
spin_unlock(&po->sk.sk_receive_queue.lock);
+ return ret;
}
static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
diff --git a/net/packet/diag.c b/net/packet/diag.c
index 6ce1dcc28..c8f43e0c1 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
pdr.pdr_frame_nr = ring->frame_max + 1;
if (ver > TPACKET_V2) {
- pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
+ pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
pdr.pdr_features = ring->prb_bdqc.feature_req_word;
} else {
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 1e743d031..19d4f0b73 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -45,12 +45,11 @@ struct tpacket_kbdq_core {
/* Default is set to 8ms */
#define DEFAULT_PRB_RETIRE_TOV (8)
- unsigned short retire_blk_tov;
+ ktime_t interval_ktime;
unsigned short version;
- unsigned long tov_in_jiffies;
/* timer to retire an outstanding block */
- struct timer_list retire_blk_timer;
+ struct hrtimer retire_blk_timer;
};
struct pgv {
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
2025-08-16 2:48 [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-08-16 9:34 ` Willem de Bruijn
@ 2025-08-17 13:28 ` Willem de Bruijn
1 sibling, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-08-17 13:28 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel, Xin Zhao
Xin Zhao wrote:
> In a system with high real-time requirements, the timeout mechanism of
> ordinary timers with jiffies granularity is insufficient to meet the
> demands for real-time performance. Meanwhile, the optimization of CPU
> usage with af_packet is quite significant. Use hrtimer instead of timer
> to help compensate for the shortcomings in real-time performance.
> In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
> enough, with fluctuations reaching over 8ms (on a system with HZ=250).
> This is unacceptable in some high real-time systems that require timely
> processing of network packets. By replacing it with hrtimer, if a timeout
> of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
> 3 ms.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> ---
> Changes in v3:
> - return HRTIMER_NORESTART when pkc->delete_blk_timer is true
> as suggested by Willem de Bruijn;
>
> Changes in v2:
> - Drop the tov_in_msecs field of tpacket_kbdq_core added by the patch
> as suggested by Willem de Bruijn;
> - Link to v2: https://lore.kernel.org/all/20250815044141.1374446-1-jackzxcui1989@163.com/
>
> Changes in v1:
> - Do not add another config for the current changes
> as suggested by Eric Dumazet;
> - Mention the beneficial cases 'HZ=100 or HZ=250' in the changelog
> as suggested by Eric Dumazet;
> - Add some performance details to the changelog
> as suggested by Ferenc Fejes;
> - Delete the 'pkc->tov_in_msecs == 0' bounds check which is not necessary
> as suggested by Willem de Bruijn;
> - Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
> as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
> - Just return HRTIMER_RESTART directly as all cases return the same value
> as suggested by Willem de Bruijn;
> - Link to v1: https://lore.kernel.org/all/20250813165201.1492779-1-jackzxcui1989@163.com/
> - Link to v0: https://lore.kernel.org/all/20250806055210.1530081-1-jackzxcui1989@163.com/
>
> ---
> ---
> net/packet/af_packet.c | 24 ++++++++++++++----------
> net/packet/internal.h | 3 +--
> 2 files changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index a7017d7f0..763b0c968 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -203,7 +203,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *,
> static int prb_queue_frozen(struct tpacket_kbdq_core *);
> static void prb_open_block(struct tpacket_kbdq_core *,
> struct tpacket_block_desc *);
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *);
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *);
> static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
> static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
> static void prb_clear_rxhash(struct tpacket_kbdq_core *,
> @@ -581,7 +581,7 @@ static __be16 vlan_get_protocol_dgram(const struct sk_buff *skb)
>
> static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> {
> - timer_delete_sync(&pkc->retire_blk_timer);
> + hrtimer_cancel(&pkc->retire_blk_timer);
> }
>
> static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
> @@ -603,9 +603,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
> struct tpacket_kbdq_core *pkc;
>
> pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> - timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> - 0);
> - pkc->retire_blk_timer.expires = jiffies;
> + hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> + CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> + hrtimer_start(&pkc->retire_blk_timer, ms_to_ktime(pkc->retire_blk_tov),
> + HRTIMER_MODE_REL_SOFT);
> }
>
> static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> @@ -676,7 +677,6 @@ static void init_prb_bdqc(struct packet_sock *po,
> else
> p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> req_u->req3.tp_block_size);
> - p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
> rwlock_init(&p1->blk_fill_in_prog_lock);
>
> @@ -691,8 +691,8 @@ static void init_prb_bdqc(struct packet_sock *po,
> */
> static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> {
> - mod_timer(&pkc->retire_blk_timer,
> - jiffies + pkc->tov_in_jiffies);
> + hrtimer_set_expires(&pkc->retire_blk_timer,
> + ktime_add(ktime_get(), ms_to_ktime(pkc->retire_blk_tov)));
Here we cannot use hrtimer_add_expires for the same reason you gave in
the second version of the patch:
> Additionally, I think we cannot avoid using ktime_get, as the retire
> timeout for each block is not fixed. When there are a lot of network packets,
> a block can retire quickly, and if we do not re-fetch the time, the timeout
> duration may be set incorrectly.
Is that right?
Otherwise patch LGTM.
> pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> }
>
> @@ -719,8 +719,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> * prb_calc_retire_blk_tmo() calculates the tmo.
> *
> */
> -static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> +static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
> {
> + enum hrtimer_restart ret = HRTIMER_RESTART;
> struct packet_sock *po =
> timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
> struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> @@ -732,8 +733,10 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
> frozen = prb_queue_frozen(pkc);
> pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
>
> - if (unlikely(pkc->delete_blk_timer))
> + if (unlikely(pkc->delete_blk_timer)) {
> + ret = HRTIMER_NORESTART;
> goto out;
> + }
>
> /* We only need to plug the race when the block is partially filled.
> * tpacket_rcv:
> @@ -790,6 +793,7 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t)
>
> out:
> spin_unlock(&po->sk.sk_receive_queue.lock);
> + return ret;
> }
>
> static void prb_flush_block(struct tpacket_kbdq_core *pkc1,
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index 1e743d031..9812feb3d 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -47,10 +47,9 @@ struct tpacket_kbdq_core {
>
> unsigned short retire_blk_tov;
> unsigned short version;
> - unsigned long tov_in_jiffies;
>
> /* timer to retire an outstanding block */
> - struct timer_list retire_blk_timer;
> + struct hrtimer retire_blk_timer;
> };
>
> struct pgv {
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-17 14:47 Xin Zhao
2025-08-17 15:55 ` Willem de Bruijn
0 siblings, 1 reply; 10+ messages in thread
From: Xin Zhao @ 2025-08-17 14:47 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
On Sun, 2025-08-17 at 21:28 +0800, Willem wrote:
> Here we cannot use hrtimer_add_expires for the same reason you gave in
> the second version of the patch:
>
> > Additionally, I think we cannot avoid using ktime_get, as the retire
> > timeout for each block is not fixed. When there are a lot of network packets,
> > a block can retire quickly, and if we do not re-fetch the time, the timeout
> > duration may be set incorrectly.
>
> Is that right?
>
> Otherwise patch LGTM.
I'll think about whether there's a better way to implement the logic.
Additionally, regarding the previous email where you mentioned replacing retire_blk_tov
with the interval_ktime field, do we still need to make that change?
I noticed you didn't respond to my latest patch that replaces retire_blk_tov with
interval_ktime, and I'm wondering if we should make that change.
So we remain the retire_blk_tov field?
Thanks
Xin Zhao
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
2025-08-17 14:47 Xin Zhao
@ 2025-08-17 15:55 ` Willem de Bruijn
0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-08-17 15:55 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
Xin Zhao wrote:
> On Sun, 2025-08-17 at 21:28 +0800, Willem wrote:
>
> > Here we cannot use hrtimer_add_expires for the same reason you gave in
> > the second version of the patch:
> >
> > > Additionally, I think we cannot avoid using ktime_get, as the retire
> > > timeout for each block is not fixed. When there are a lot of network packets,
> > > a block can retire quickly, and if we do not re-fetch the time, the timeout
> > > duration may be set incorrectly.
> >
> > Is that right?
> >
> > Otherwise patch LGTM.
>
>
> I'll think about whether there's a better way to implement the logic.
>
> Additionally, regarding the previous email where you mentioned replacing retire_blk_tov
> with the interval_ktime field, do we still need to make that change?
> I noticed you didn't respond to my latest patch that replaces retire_blk_tov with
> interval_ktime, and I'm wondering if we should make that change.
> So we remain the retire_blk_tov field?
Sorry, this response was intended to v4. Yes, let's keep that change.
If hrtimer_add_expires cannot be used, then that patch is good as is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-18 5:38 Xin Zhao
2025-08-18 7:21 ` Willem de Bruijn
0 siblings, 1 reply; 10+ messages in thread
From: Xin Zhao @ 2025-08-18 5:38 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
On Sun, 2025-08-17 at 21:28 +0800, Willem wrote:
> Here we cannot use hrtimer_add_expires for the same reason you gave in
> the second version of the patch:
>
> > Additionally, I think we cannot avoid using ktime_get, as the retire
> > timeout for each block is not fixed. When there are a lot of network packets,
> > a block can retire quickly, and if we do not re-fetch the time, the timeout
> > duration may be set incorrectly.
>
> Is that right?
>
> Otherwise patch LGTM.
Dear Willem,
I have adjusted the logic in the recently sent v4 version by adding a boolean variable start
to distinguish whether it is the case of prb_open_block. If it is prb_open_block, I use
hrtimer_start to (re)start the timer; otherwise, I use hrtimer_set_expires to update the
expiration time. Additionally, I have added comments explaining this branch selection before
the _prb_refresh_rx_retire_blk_timer function.
I apologize for sending three PATCH v4 emails in a row. In the first email, I forgot to include
the link to v3. In the second email, there were no blank lines between v4 and v3.
Therefore, you can just refer to the latest v4 version in the third PATCH v4 email.
Thanks
Xin Zhao
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
@ 2025-08-18 7:13 Xin Zhao
2025-08-18 9:21 ` Willem de Bruijn
0 siblings, 1 reply; 10+ messages in thread
From: Xin Zhao @ 2025-08-18 7:13 UTC (permalink / raw)
To: willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
On Sun, 2025-08-17 at 21:28 +0800, Willem wrote:
> Here we cannot use hrtimer_add_expires for the same reason you gave in
> the second version of the patch:
>
> > Additionally, I think we cannot avoid using ktime_get, as the retire
> > timeout for each block is not fixed. When there are a lot of network packets,
> > a block can retire quickly, and if we do not re-fetch the time, the timeout
> > duration may be set incorrectly.
>
> Is that right?
>
> Otherwise patch LGTM.
Dear Willem,
While reviewing the code, I suddenly realized that previously I used
hrtimer_set_expires instead of hrtimer_forward_now to resolve the situation when
handling the retire timer timeout while run into prb_open_block simultaneously.
However, since there is now a distinction with the bool start variable in PATCH v4,
it seems that we no longer need to use hrtimer_set_expires and can directly use
hrtimer_forward_now instead. Therefore, I plan to make this change immediately and
resend PATCH v4. Please take a look at it then.
Thanks
Xin Zhao
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
2025-08-18 5:38 Xin Zhao
@ 2025-08-18 7:21 ` Willem de Bruijn
0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-08-18 7:21 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
Xin Zhao wrote:
> On Sun, 2025-08-17 at 21:28 +0800, Willem wrote:
>
> > Here we cannot use hrtimer_add_expires for the same reason you gave in
> > the second version of the patch:
> >
> > > Additionally, I think we cannot avoid using ktime_get, as the retire
> > > timeout for each block is not fixed. When there are a lot of network packets,
> > > a block can retire quickly, and if we do not re-fetch the time, the timeout
> > > duration may be set incorrectly.
> >
> > Is that right?
> >
> > Otherwise patch LGTM.
>
>
> Dear Willem,
>
> I have adjusted the logic in the recently sent v4 version by adding a boolean variable start
> to distinguish whether it is the case of prb_open_block. If it is prb_open_block, I use
> hrtimer_start to (re)start the timer; otherwise, I use hrtimer_set_expires to update the
> expiration time. Additionally, I have added comments explaining this branch selection before
> the _prb_refresh_rx_retire_blk_timer function.
>
> I apologize for sending three PATCH v4 emails in a row. In the first email, I forgot to include
> the link to v3. In the second email, there were no blank lines between v4 and v3.
> Therefore, you can just refer to the latest v4 version in the third PATCH v4 email.
For the future: do not resend a patch within 24 hours.
And do not resend a patch with the same number. Again, follow the
documentation I pointed to before.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation
2025-08-18 7:13 Xin Zhao
@ 2025-08-18 9:21 ` Willem de Bruijn
0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-08-18 9:21 UTC (permalink / raw)
To: Xin Zhao, willemdebruijn.kernel, edumazet, ferenc
Cc: davem, kuba, pabeni, horms, netdev, linux-kernel
Xin Zhao wrote:
> On Sun, 2025-08-17 at 21:28 +0800, Willem wrote:
>
> > Here we cannot use hrtimer_add_expires for the same reason you gave in
> > the second version of the patch:
> >
> > > Additionally, I think we cannot avoid using ktime_get, as the retire
> > > timeout for each block is not fixed. When there are a lot of network packets,
> > > a block can retire quickly, and if we do not re-fetch the time, the timeout
> > > duration may be set incorrectly.
> >
> > Is that right?
> >
> > Otherwise patch LGTM.
>
>
> Dear Willem,
>
> While reviewing the code, I suddenly realized that previously I used
> hrtimer_set_expires instead of hrtimer_forward_now to resolve the situation when
> handling the retire timer timeout while run into prb_open_block simultaneously.
> However, since there is now a distinction with the bool start variable in PATCH v4,
> it seems that we no longer need to use hrtimer_set_expires and can directly use
> hrtimer_forward_now instead. Therefore, I plan to make this change immediately and
> resend PATCH v4. Please take a look at it then.
Having a conversation in one thread that is not concluded yet and
already starting another thread makes back and forth communication
a bit difficult.
I'll take a look, but just send a v5 after 24 hrs.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-08-18 9:22 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-16 2:48 [PATCH net-next v3] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-08-16 9:34 ` Willem de Bruijn
2025-08-17 13:28 ` Willem de Bruijn
-- strict thread matches above, loose matches on Subject: below --
2025-08-16 17:01 Xin Zhao
2025-08-17 14:47 Xin Zhao
2025-08-17 15:55 ` Willem de Bruijn
2025-08-18 5:38 Xin Zhao
2025-08-18 7:21 ` Willem de Bruijn
2025-08-18 7:13 Xin Zhao
2025-08-18 9:21 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).