linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Xin Zhao <jackzxcui1989@163.com>,
	 willemdebruijn.kernel@gmail.com,  edumazet@google.com,
	 ferenc@fejes.dev
Cc: davem@davemloft.net,  kuba@kernel.org,  pabeni@redhat.com,
	 horms@kernel.org,  netdev@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation
Date: Thu, 21 Aug 2025 10:32:16 -0400	[thread overview]
Message-ID: <willemdebruijn.kernel.970c1e7dc19d@gmail.com> (raw)
In-Reply-To: <20250821085318.3022527-1-jackzxcui1989@163.com>

Xin Zhao wrote:
> On Wed, 2025-08-20 at 19:15 +0800, Willem wrote:
> 
> > > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > > +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> > > +					     bool start)
> > >  {
> > > -	mod_timer(&pkc->retire_blk_timer,
> > > -			jiffies + pkc->tov_in_jiffies);
> > > +	if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
> > > +		hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > > +			      HRTIMER_MODE_REL_SOFT);
> > > +	else
> > > +		hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> > 
> > Is the hrtimer still queued when prb_retire_rx_blk_timer_expired
> > fires? Based on the existence of hrtimer_forward_now, I assume so. But
> > have not checked yet. If so, hrtimer_is_queued alone suffices to
> > detect the other callstack from tpacket_rcv where hrtimer_start is
> > needed. No need for bool start?
> > 
> > >  	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> > >  }
> 
> 
> Since the CI tests have reported the previously mentioned WARN_ON situation within
> hrtimer_forward_now, I believe we should reevaluate the implementation of the
> _prb_refresh_rx_retire_blk_timer function, which is responsible for setting the
> hrtimer timeout, and establish the principles it should adhere to. After careful
> consideration and a detailed review of the hrtimer implementation, I have identified
> the following two principles:
> 
> 1. It is essential to ensure that calls to hrtimer_forward_now or hrtimer_set_expires
> occur strictly within the hrtimer's callback.
> 2. It is critical to ensure that no calls to hrtimer_forward_now or hrtimer_set_expires
> are made while the hrtimer is enqueued.
> 
> 
> Regarding these two principles, I would like to add three points:
> 1. In principle 1, if hrtimer_forward_now or hrtimer_set_expires is called outside of
> the hrtimer's callback without triggering the timer's enqueue, it will lead to the
> hrtimer timeout not being triggered as expected (this issue is obvious and can be
> reproduced by writing a kernel object and setting a short timeout, such as 2ms).
> 2. Both principles above are aimed at preventing scenarios where hrtimer_forward_now
> or hrtimer_set_expires modify the timeout while the hrtimer is already enqueued, which
> could lead to disarray in the hrtimer's red-black tree (after all, the WARN_ON check
> for enqueue in the non-inlined hrtimer_forward_now interface exists to prevent such
> situations). It is also important to note that apart from executing the hrtimer_start
> series of functions outside the hrtimer callback, the __run_hrtimer function, upon
> returning HRTIMER_RESTART after executing the hrtimer callback, will also enqueue the
> hrtimer.
> 3. The reason for principle 2, in addition to principle 1, is that when setting the
> timeout using hrtimer_forward_now in the hrtimer's callback, there is no protection
> provided by the lock for hrtimer_cpu_base, meaning that if an hrtimer_start action is
> performed outside the hrtimer's callback while simultaneously updating the timeout
> within the callback, it could cause disarray in the hrtimer's red-black tree.
> 
> The occurrence of the WARN_ON enqueue error in the CI test indicates that
> hrtimer_forward_now was executed in a scenario outside the hrtimer's callback while
> the hrtimer was in a queued state. A possible sequence that could cause this issue is
> as follows:
> cpu0 (softirq context, hrtimer timeout)                             cpu1 (process context, need prb_open_block)
> hrtimer_run_softirq
>   __hrtimer_run_queues
>     __run_hrtimer
>       _prb_refresh_rx_retire_blk_timer
>         spin_lock(&po->sk.sk_receive_queue.lock);
>         hrtimer_is_queued(&pkc->retire_blk_timer) == false
>         hrtimer_forward_now
>         spin_unlock(&po->sk.sk_receive_queue.lock)                 tpacket_rcv
>       enqueue_hrtimer                                                spin_lock(&sk->sk_receive_queue.lock);
>                                                                      packet_current_rx_frame
>                                                                        __packet_lookup_frame_in_block
>                                                                          prb_open_block
>                                                                            _prb_refresh_rx_retire_blk_timer
>                                                                              hrtimer_is_queued(&pkc->retire_blk_timer) == true
>                                                                              hrtimer_forward_now
>                                                                              WARN_ON
> 
> In summary, the key issue now is to find a mechanism to ensure that the hrtimer's start
> or enqueue, as well as the timeout settings for the hrtimer, cannot be executed
> concurrently. I have thought of two methods to address this issue (method 1 will make the
> code appear much simpler, while method 2 will make the code more complex):
> 
> Method 1:
> Do not call hrtimer_forward_now to set the timeout within the hrtimer's callback; instead,
> only call the hrtimer_start function to perform the hrtimer's enqueue. This approach is
> viable because, in the current version, inside __run_hrtimer, the state of the timer is
> checked under the protection of cpu_base->lock. If the timer is already enqueued, it will
> not be enqueued again. By doing this, all hrtimer enqueues will be protected under both
> sk_receive_queue.lock and cpu_base->lock, eliminating concerns about the timeout being
> concurrently modified during enqueueing, which could lead to chaos in the hrtimer's
> red-black tree.
> 
> Method 2:
> This method primarily aims to strictly ensure that hrtimer_start is not called within the
> hrtimer's callback. However, doing so would require a lot of additional logic:
> We would need to add a callback parameter to strictly ensure that hrtimer_forward_now is
> executed within the callback and hrtimer_start is executed outside the callback. The
> occurrence of the WARN_ON in the CI test indicates that the method of using
> "hrtimer_is_queued to make the judgment" does not cover all scenarios. The fundamental
> reason for this is that the hrtimer_is_queued check must be precise, which requires
> protection from cpu_base->lock. Similarly, using hrtimer_callback_running check would not
> achieve precise judgment either. It is necessary to know on which CPU the hrtimer is running
> and send an IPI to execute hrtimer_forward_now using local_irq_save on that CPU to satisfy
> the aforementioned principle 2), as it is inappropriate to acquire the cpu_base->lock within
> the af_packet logic; the only way to ensure that the hrtimer_forward_now operation is
> executed without enqueueing the hrtimer is by disabling IRQs.
> 
> Since strictly ensuring that hrtimer_start is not called within the hrtimer's callback leads
> to a lot of extra logic, and logically, I have also demonstrated that it is permissible to
> call hrtimer_start within the hrtimer's callback (for the hrtimer module, the lock is
> cpu_base->lock, which is associated with the clock base where the hrtimer resides, and does
> not follow smp_processor_id()), it does not matter whether hrtimer_start is executed by the
> CPU executing the hrtimer callback or by another CPU; both scenarios are the same for the
> hrtimer module. TTherefore, I prefer to use the aforementioned method 1) to resolve this
> issue. If there are no concerns, I will reflect this in PATCH v7.

Thanks for the analysis.

Using hrtimer_start from within the callback that returns
HRTIMER_RESTART does not sound in line with the intention of the API
to me.

I think we should just adjust and restart from within the callback and
hrtimer_start from tpacket_rcv iff the timer is not yet queued.

Since all these modifications are made while the receive queue lock is
held I don't immediately see why we would need additional mutual
exclusion beyond that.


  reply	other threads:[~2025-08-21 14:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21  8:53 [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation Xin Zhao
2025-08-21 14:32 ` Willem de Bruijn [this message]
  -- strict thread matches above, loose matches on Subject: below --
2025-08-25  5:06 Xin Zhao
2025-08-25 16:20 ` Willem de Bruijn
2025-08-22 10:16 Xin Zhao
2025-08-21 15:30 Xin Zhao
2025-08-22  6:37 ` Willem de Bruijn
2025-08-20  9:29 Xin Zhao
2025-08-20 11:15 ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.970c1e7dc19d@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=ferenc@fejes.dev \
    --cc=horms@kernel.org \
    --cc=jackzxcui1989@163.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).