From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Xin Zhao <jackzxcui1989@163.com>,
willemdebruijn.kernel@gmail.com, edumazet@google.com,
ferenc@fejes.dev
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
horms@kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
Date: Thu, 21 Aug 2025 10:32:16 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.970c1e7dc19d@gmail.com> (raw)
In-Reply-To: <20250821085318.3022527-1-jackzxcui1989@163.com>
Xin Zhao wrote:
> On Wed, 2025-08-20 at 19:15 +0800, Willem wrote:
>
> > > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > > +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> > > + bool start)
> > > {
> > > - mod_timer(&pkc->retire_blk_timer,
> > > - jiffies + pkc->tov_in_jiffies);
> > > + if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
> > > + hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > > + HRTIMER_MODE_REL_SOFT);
> > > + else
> > > + hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> >
> > Is the hrtimer still queued when prb_retire_rx_blk_timer_expired
> > fires? Based on the existence of hrtimer_forward_now, I assume so. But
> > have not checked yet. If so, hrtimer_is_queued alone suffices to
> > detect the other callstack from tpacket_rcv where hrtimer_start is
> > needed. No need for bool start?
> >
> > > pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> > > }
>
>
> Since the CI tests have reported the previously mentioned WARN_ON situation within
> hrtimer_forward_now, I believe we should reevaluate the implementation of the
> _prb_refresh_rx_retire_blk_timer function, which is responsible for setting the
> hrtimer timeout, and establish the principles it should adhere to. After careful
> consideration and a detailed review of the hrtimer implementation, I have identified
> the following two principles:
>
> 1. It is essential to ensure that calls to hrtimer_forward_now or hrtimer_set_expires
> occur strictly within the hrtimer's callback.
> 2. It is critical to ensure that no calls to hrtimer_forward_now or hrtimer_set_expires
> are made while the hrtimer is enqueued.
>
>
> Regarding these two principles, I would like to add three points:
> 1. In principle 1, if hrtimer_forward_now or hrtimer_set_expires is called outside of
> the hrtimer's callback without triggering the timer's enqueue, it will lead to the
> hrtimer timeout not being triggered as expected (this issue is obvious and can be
> reproduced by writing a kernel object and setting a short timeout, such as 2ms).
> 2. Both principles above are aimed at preventing scenarios where hrtimer_forward_now
> or hrtimer_set_expires modify the timeout while the hrtimer is already enqueued, which
> could lead to disarray in the hrtimer's red-black tree (after all, the WARN_ON check
> for enqueue in the non-inlined hrtimer_forward_now interface exists to prevent such
> situations). It is also important to note that apart from executing the hrtimer_start
> series of functions outside the hrtimer callback, the __run_hrtimer function, upon
> returning HRTIMER_RESTART after executing the hrtimer callback, will also enqueue the
> hrtimer.
> 3. The reason for principle 2, in addition to principle 1, is that when setting the
> timeout using hrtimer_forward_now in the hrtimer's callback, there is no protection
> provided by the lock for hrtimer_cpu_base, meaning that if an hrtimer_start action is
> performed outside the hrtimer's callback while simultaneously updating the timeout
> within the callback, it could cause disarray in the hrtimer's red-black tree.
>
> The occurrence of the WARN_ON enqueue error in the CI test indicates that
> hrtimer_forward_now was executed in a scenario outside the hrtimer's callback while
> the hrtimer was in a queued state. A possible sequence that could cause this issue is
> as follows:
> cpu0 (softirq context, hrtimer timeout) cpu1 (process context, need prb_open_block)
> hrtimer_run_softirq
> __hrtimer_run_queues
> __run_hrtimer
> _prb_refresh_rx_retire_blk_timer
> spin_lock(&po->sk.sk_receive_queue.lock);
> hrtimer_is_queued(&pkc->retire_blk_timer) == false
> hrtimer_forward_now
> spin_unlock(&po->sk.sk_receive_queue.lock) tpacket_rcv
> enqueue_hrtimer spin_lock(&sk->sk_receive_queue.lock);
> packet_current_rx_frame
> __packet_lookup_frame_in_block
> prb_open_block
> _prb_refresh_rx_retire_blk_timer
> hrtimer_is_queued(&pkc->retire_blk_timer) == true
> hrtimer_forward_now
> WARN_ON
>
> In summary, the key issue now is to find a mechanism to ensure that the hrtimer's start
> or enqueue, as well as the timeout settings for the hrtimer, cannot be executed
> concurrently. I have thought of two methods to address this issue (method 1 will make the
> code appear much simpler, while method 2 will make the code more complex):
>
> Method 1:
> Do not call hrtimer_forward_now to set the timeout within the hrtimer's callback; instead,
> only call the hrtimer_start function to perform the hrtimer's enqueue. This approach is
> viable because, in the current version, inside __run_hrtimer, the state of the timer is
> checked under the protection of cpu_base->lock. If the timer is already enqueued, it will
> not be enqueued again. By doing this, all hrtimer enqueues will be protected under both
> sk_receive_queue.lock and cpu_base->lock, eliminating concerns about the timeout being
> concurrently modified during enqueueing, which could lead to chaos in the hrtimer's
> red-black tree.
>
> Method 2:
> This method primarily aims to strictly ensure that hrtimer_start is not called within the
> hrtimer's callback. However, doing so would require a lot of additional logic:
> We would need to add a callback parameter to strictly ensure that hrtimer_forward_now is
> executed within the callback and hrtimer_start is executed outside the callback. The
> occurrence of the WARN_ON in the CI test indicates that the method of using
> "hrtimer_is_queued to make the judgment" does not cover all scenarios. The fundamental
> reason for this is that the hrtimer_is_queued check must be precise, which requires
> protection from cpu_base->lock. Similarly, using hrtimer_callback_running check would not
> achieve precise judgment either. It is necessary to know on which CPU the hrtimer is running
> and send an IPI to execute hrtimer_forward_now using local_irq_save on that CPU to satisfy
> the aforementioned principle 2), as it is inappropriate to acquire the cpu_base->lock within
> the af_packet logic; the only way to ensure that the hrtimer_forward_now operation is
> executed without enqueueing the hrtimer is by disabling IRQs.
>
> Since strictly ensuring that hrtimer_start is not called within the hrtimer's callback leads
> to a lot of extra logic, and logically, I have also demonstrated that it is permissible to
> call hrtimer_start within the hrtimer's callback (for the hrtimer module, the lock is
> cpu_base->lock, which is associated with the clock base where the hrtimer resides, and does
> not follow smp_processor_id()), it does not matter whether hrtimer_start is executed by the
> CPU executing the hrtimer callback or by another CPU; both scenarios are the same for the
> hrtimer module. TTherefore, I prefer to use the aforementioned method 1) to resolve this
> issue. If there are no concerns, I will reflect this in PATCH v7.
Thanks for the analysis.
Using hrtimer_start from within the callback that returns
HRTIMER_RESTART does not sound in line with the intention of the API
to me.
I think we should just adjust and restart from within the callback and
hrtimer_start from tpacket_rcv iff the timer is not yet queued.
Since all these modifications are made while the receive queue lock is
held I don't immediately see why we would need additional mutual
exclusion beyond that.
next prev parent reply other threads:[~2025-08-21 14:32 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-21 8:53 [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-08-21 14:32 ` Willem de Bruijn [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-08-25 5:06 Xin Zhao
2025-08-25 16:20 ` Willem de Bruijn
2025-08-22 10:16 Xin Zhao
2025-08-21 15:30 Xin Zhao
2025-08-22 6:37 ` Willem de Bruijn
2025-08-20 9:29 Xin Zhao
2025-08-20 11:15 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.970c1e7dc19d@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=ferenc@fejes.dev \
--cc=horms@kernel.org \
--cc=jackzxcui1989@163.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.