From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Xin Zhao <jackzxcui1989@163.com>,
willemdebruijn.kernel@gmail.com, edumazet@google.com,
ferenc@fejes.dev
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
horms@kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v8] net: af_packet: Use hrtimer to do the retire operation
Date: Thu, 28 Aug 2025 11:30:06 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.33a7282981e76@gmail.com> (raw)
In-Reply-To: <20250828063140.2747329-1-jackzxcui1989@163.com>
Xin Zhao wrote:
> On Thu, 2025-08-28 at 5:53 +0800, Willem wrote:
>
> > > Changes in v8:
> > > - Delete delete_blk_timer field, as suggested by Willem de Bruijn,
> > > hrtimer_cancel will check and wait until the timer callback return and ensure
> > > enter enter callback again;
> > > - Simplify the logic related to setting timeout, as suggestd by Willem de Bruijn.
> > > Currently timer callback just restarts itself unconditionally, so delete the
> > > 'out:' label, do not forward hrtimer in prb_open_block, call hrtimer_forward_now
> > > directly and always return HRTIMER_RESTART. The only special case is when
> > > prb_open_block is called from tpacket_rcv. That would set the timeout further
> > > into the future than the already queued timer. An earlier timeout is not
> > > problematic. No need to add complexity to avoid that.
> >
> > This simplifies the timer logic tremendously. I like this direction a lot.
>
> Thanks. :)
>
> >
> > > static void prb_setup_retire_blk_timer(struct packet_sock *po)
> > > @@ -603,9 +592,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
> > > struct tpacket_kbdq_core *pkc;
> > >
> > > pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> > > - timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > > - 0);
> > > - pkc->retire_blk_timer.expires = jiffies;
> > > + hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > > + CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> > > + hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > > + HRTIMER_MODE_REL_SOFT);
> >
> > Since this is only called from init_prb_bdqc, we can further remove
> > this whole function and move the two hrtimer calls to the parent.
>
> Okay, I will move hrtimer_setup and hrtimer_start into init_prb_bdqc in PATCH v9.
>
> I do not move the prb_shutdown_retire_blk_timer into packet_set_ring either, because
> in packet_set_ring, there is no existing pointer for tpacket_kbdq_core. If move the
> logic of prb_shutdown_retire_blk_timer into packet_set_ring, we need to add the
> tpacket_kbdq_core pointer conversion logic in packet_set_ring, I think it is not
> necessary.
Agreed.
>
> > > }
> > >
> > > static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> > > @@ -672,11 +662,10 @@ static void init_prb_bdqc(struct packet_sock *po,
> > > p1->last_kactive_blk_num = 0;
> > > po->stats.stats3.tp_freeze_q_cnt = 0;
> > > if (req_u->req3.tp_retire_blk_tov)
> > > - p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> > > + p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
> > > else
> > > - p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> > > - req_u->req3.tp_block_size);
> > > - p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> > > + p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> > > + req_u->req3.tp_block_size));
> > > p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
> > > rwlock_init(&p1->blk_fill_in_prog_lock);
> > >
> > > @@ -686,16 +675,6 @@ static void init_prb_bdqc(struct packet_sock *po,
> > > prb_open_block(p1, pbd);
> > > }
> > >
> > > -/* Do NOT update the last_blk_num first.
> > > - * Assumes sk_buff_head lock is held.
> > > - */
> > > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > > -{
> > > - mod_timer(&pkc->retire_blk_timer,
> > > - jiffies + pkc->tov_in_jiffies);
> > > - pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> >
> > last_kactive_blk_num is now only updated on prb_open_block. It still
> > needs to be updated on each timer callback? To see whether the active
> > block did not change since the last callback.
>
> Since prb_open_block is executed once during the initialization, after initialization,
> both last_kactive_blk_num and kactive_blk_num have the same value, which is 0. After
> initialization, if the value of kactive_blk_num remains unchanged, it is meaningless
> to assign the value of kactive_blk_num to last_kactive_blk_num. I searched through
> all the places in the code that can modify kactive_blk_num, and found that there is
> only one place, which is in prb_close_block. This means that after executing
> prb_close_block, we need to update last_kactive_blk_num at the corresponding places
> where it should be updated. Since I did not modify this logic under the tpacket_rcv
> scenario, I only need to check the logic in the hrtimer callback.
>
> Upon inspection, I did find an issue. When tpacket_rcv calls __packet_lookup_frame_in_block,
> it subsequently calls prb_retire_current_block, which in turn calls prb_close_block,
> resulting in an update to kactive_blk_num. After executing prb_retire_current_block,
> function __packet_lookup_frame_in_block calls prb_dispatch_next_block, it may not
> execute prb_open_block. If prb_open_block is not executed, this will lead to an
> inconsistency between last_kactive_blk_num and kactive_blk_num. At this point, the
> hrtimer callback will check whether pkc->last_kactive_blk_num == pkc->kactive_blk_num,
> which will evaluate to false, thus causing the current logic to differ from the original
> logic. However, at this time, it is still necessary to update last_kactive_blk_num.
>
> On the other hand, I also carefully checked the original implementation of
> prb_retire_rx_blk_timer_expired and found that in the original hrtimer callback,
> last_kactive_blk_num is updated in all cases. Therefore, I need to perform this update
> before exiting the sk_receive_queue lock.
>
> In addition, in PATCH v9, I will also remove the refresh_timer label and change the only
> instance where goto is used, to an if-else implementation, so that the 'refresh_timer:'
> label is no longer needed.
>
> The new implementation of prb_retire_rx_blk_timer_expired:
>
> static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
> {
> struct packet_sock *po =
> timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
> struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> unsigned int frozen;
> struct tpacket_block_desc *pbd;
>
> spin_lock(&po->sk.sk_receive_queue.lock);
>
> frozen = prb_queue_frozen(pkc);
> pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
>
> /* We only need to plug the race when the block is partially filled.
> * tpacket_rcv:
> * lock(); increment BLOCK_NUM_PKTS; unlock()
> * copy_bits() is in progress ...
> * timer fires on other cpu:
> * we can't retire the current block because copy_bits
> * is in progress.
> *
> */
> if (BLOCK_NUM_PKTS(pbd)) {
> /* Waiting for skb_copy_bits to finish... */
> write_lock(&pkc->blk_fill_in_prog_lock);
> write_unlock(&pkc->blk_fill_in_prog_lock);
> }
>
> if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
> if (!frozen) {
> if (BLOCK_NUM_PKTS(pbd)) {
> /* Not an empty block. Need retire the block. */
> prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> prb_dispatch_next_block(pkc, po);
> }
> } else {
> /* Case 1. Queue was frozen because user-space was
> * lagging behind.
> */
> if (!prb_curr_blk_in_use(pbd)) {
> /* Case 2. queue was frozen,user-space caught up,
> * now the link went idle && the timer fired.
> * We don't have a block to close.So we open this
> * block and restart the timer.
> * opening a block thaws the queue,restarts timer
> * Thawing/timer-refresh is a side effect.
> */
> prb_open_block(pkc, pbd);
> }
> }
> }
>
> pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> spin_unlock(&po->sk.sk_receive_queue.lock);
> return HRTIMER_RESTART;
> }
Ack.
next prev parent reply other threads:[~2025-08-28 15:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-28 6:31 [PATCH net-next v8] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-08-28 15:30 ` Willem de Bruijn [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-08-27 15:01 Xin Zhao
2025-08-27 21:53 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.33a7282981e76@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=ferenc@fejes.dev \
--cc=horms@kernel.org \
--cc=jackzxcui1989@163.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).