linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
	bpf@vger.kernel.org, arnd@arndb.de, catalin.marinas@arm.com,
	will@kernel.org, peterz@infradead.org, akpm@linux-foundation.org,
	mark.rutland@arm.com, harisokn@amazon.com, cl@gentwo.org,
	ast@kernel.org, daniel.lezcano@linaro.org, memxor@gmail.com,
	zhenglifeng1@huawei.com, xueshuai@linux.alibaba.com,
	joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [RESEND PATCH v7 7/7] cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout()
Date: Wed, 05 Nov 2025 00:30:43 -0800	[thread overview]
Message-ID: <87y0ok98zw.fsf@oracle.com> (raw)
In-Reply-To: <CAJZ5v0izSBR0_DeH5HVnSLFGRfV9WoSzbu9Mh5yvvuyrvw7fLg@mail.gmail.com>


Rafael J. Wysocki <rafael@kernel.org> writes:

> On Wed, Oct 29, 2025 at 10:01 PM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>>
>>
>> Rafael J. Wysocki <rafael@kernel.org> writes:
>>
>> > On Wed, Oct 29, 2025 at 8:13 PM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>> >>
>> >>
>> >> Rafael J. Wysocki <rafael@kernel.org> writes:
>> >>
>> >> > On Wed, Oct 29, 2025 at 5:42 AM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>> >> >>
>> >> >>
>> >> >> Rafael J. Wysocki <rafael@kernel.org> writes:
>> >> >>
>> >> >> > On Tue, Oct 28, 2025 at 6:32 AM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>> >> >> >>
>> >> >> >> The inner loop in poll_idle() polls over the thread_info flags,
>> >> >> >> waiting to see if the thread has TIF_NEED_RESCHED set. The loop
>> >> >> >> exits once the condition is met, or if the poll time limit has
>> >> >> >> been exceeded.
>> >> >> >>
>> >> >> >> To minimize the number of instructions executed in each iteration,
>> >> >> >> the time check is done only intermittently (once every
>> >> >> >> POLL_IDLE_RELAX_COUNT iterations). In addition, each loop iteration
>> >> >> >> executes cpu_relax() which on certain platforms provides a hint to
>> >> >> >> the pipeline that the loop busy-waits, allowing the processor to
>> >> >> >> reduce power consumption.
>> >> >> >>
>> >> >> >> This is close to what smp_cond_load_relaxed_timeout() provides. So,
>> >> >> >> restructure the loop and fold the loop condition and the timeout check
>> >> >> >> in smp_cond_load_relaxed_timeout().
>> >> >> >
>> >> >> > Well, it is close, but is it close enough?
>> >> >>
>> >> >> I guess that's the question.
>> >> >>
>> >> >> >> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> >> >> >> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
>> >> >> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> >> >> >> ---
>> >> >> >>  drivers/cpuidle/poll_state.c | 29 ++++++++---------------------
>> >> >> >>  1 file changed, 8 insertions(+), 21 deletions(-)
>> >> >> >>
>> >> >> >> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> >> >> >> index 9b6d90a72601..dc7f4b424fec 100644
>> >> >> >> --- a/drivers/cpuidle/poll_state.c
>> >> >> >> +++ b/drivers/cpuidle/poll_state.c
>> >> >> >> @@ -8,35 +8,22 @@
>> >> >> >>  #include <linux/sched/clock.h>
>> >> >> >>  #include <linux/sched/idle.h>
>> >> >> >>
>> >> >> >> -#define POLL_IDLE_RELAX_COUNT  200
>> >> >> >> -
>> >> >> >>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> >> >> >>                                struct cpuidle_driver *drv, int index)
>> >> >> >>  {
>> >> >> >> -       u64 time_start;
>> >> >> >> -
>> >> >> >> -       time_start = local_clock_noinstr();
>> >> >> >> +       u64 time_end;
>> >> >> >> +       u32 flags = 0;
>> >> >> >>
>> >> >> >>         dev->poll_time_limit = false;
>> >> >> >>
>> >> >> >> +       time_end = local_clock_noinstr() + cpuidle_poll_time(drv, dev);
>> >> >> >
>> >> >> > Is there any particular reason for doing this unconditionally?  If
>> >> >> > not, then it looks like an arbitrary unrelated change to me.
>> >> >>
>> >> >> Agreed. Will fix.
>> >> >>
>> >> >> >> +
>> >> >> >>         raw_local_irq_enable();
>> >> >> >>         if (!current_set_polling_and_test()) {
>> >> >> >> -               unsigned int loop_count = 0;
>> >> >> >> -               u64 limit;
>> >> >> >> -
>> >> >> >> -               limit = cpuidle_poll_time(drv, dev);
>> >> >> >> -
>> >> >> >> -               while (!need_resched()) {
>> >> >> >> -                       cpu_relax();
>> >> >> >> -                       if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> >> >> >> -                               continue;
>> >> >> >> -
>> >> >> >> -                       loop_count = 0;
>> >> >> >> -                       if (local_clock_noinstr() - time_start > limit) {
>> >> >> >> -                               dev->poll_time_limit = true;
>> >> >> >> -                               break;
>> >> >> >> -                       }
>> >> >> >> -               }
>> >> >> >> +               flags = smp_cond_load_relaxed_timeout(&current_thread_info()->flags,
>> >> >> >> +                                                     (VAL & _TIF_NEED_RESCHED),
>> >> >> >> +                                                     (local_clock_noinstr() >= time_end));
>> >> >> >
>> >> >> > So my understanding of this is that it reduces duplication with some
>> >> >> > other places doing similar things.  Fair enough.
>> >> >> >
>> >> >> > However, since there is "timeout" in the name, I'd expect it to take
>> >> >> > the timeout as an argument.
>> >> >>
>> >> >> The early versions did have a timeout but that complicated the
>> >> >> implementation significantly. And the current users poll_idle(),
>> >> >> rqspinlock don't need a precise timeout.
>> >> >>
>> >> >> smp_cond_load_relaxed_timed(), smp_cond_load_relaxed_timecheck()?
>> >> >>
>> >> >> The problem with all suffixes I can think of is that it makes the
>> >> >> interface itself nonobvious.
>> >> >>
>> >> >> Possibly something with the sense of bail out might work.
>> >> >
>> >> > It basically has two conditions, one of which is checked in every step
>> >> > of the internal loop and the other one is checked every
>> >> > SMP_TIMEOUT_POLL_COUNT steps of it.  That isn't particularly
>> >> > straightforward IMV.
>> >>
>> >> Right. And that's similar to what poll_idle().
>> >
>> > My point is that the macro in its current form is not particularly
>> > straightforward.
>> >
>> > The code in poll_idle() does what it needs to do.
>> >
>> >> > Honestly, I prefer the existing code.  It is much easier to follow and
>> >> > I don't see why the new code would be better.  Sorry.
>> >>
>> >> I don't think there's any problem with the current code. However, I'd like
>> >> to add support for poll_idle() on arm64 (and maybe other platforms) where
>> >> instead of spinning in a cpu_relax() loop, you wait on a cacheline.
>> >
>> > Well, there is MWAIT on x86, but it is not used here.  It just takes
>> > too much time to wake up from.  There are "fast" variants of that too,
>> > but they have been designed with user space in mind, so somewhat
>> > cumbersome for kernel use.
>> >
>> >> And that's what using something like smp_cond_load_relaxed_timeout()
>> >> would enable.
>> >>
>> >> Something like the series here:
>> >>   https://lore.kernel.org/lkml/87wmaljd81.fsf@oracle.com/
>> >>
>> >> (Sorry, should have mentioned this in the commit message.)
>> >
>> > I'm not sure how you can combine that with a proper timeout.
>>
>> Would taking the timeout as a separate argument work?
>>
>>   flags = smp_cond_load_relaxed_timeout(&current_thread_info()->flags,
>>                                          (VAL & _TIF_NEED_RESCHED),
>>                                          local_clock_noinstr(), time_end);
>>
>> Or you are thinking of something on different lines from the smp_cond_load
>> kind of interface?
>
> I would like it to be something along the lines of
>
> arch_busy_wait_for_need_resched(time_limit);
> dev->poll_time_limit = !need_resched();
>
> and I don't care much about how exactly this is done in the arch code,
> so long as it does what it says.

This looks great. I think it could just be:

  tif_need_resched_wait(time_limit);

And, given that this is tied in with scheduling contexts, this interface
should be able to use local_clock()/sched_clock().

--
ankur


      reply	other threads:[~2025-11-05  8:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28  5:31 [RESEND PATCH v7 0/7] barrier: Add smp_cond_load_*_timeout() Ankur Arora
2025-10-28  5:31 ` [RESEND PATCH v7 1/7] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28  9:42   ` Arnd Bergmann
2025-10-29  3:17     ` Ankur Arora
2025-11-02 21:52       ` Arnd Bergmann
2025-11-03 21:41         ` Ankur Arora
2025-10-28  5:31 ` [RESEND PATCH v7 2/7] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28  8:42   ` Arnd Bergmann
2025-10-28 16:21     ` Christoph Lameter (Ampere)
2025-10-28 18:01     ` Ankur Arora
2025-10-28 21:17       ` Catalin Marinas
2025-11-02 21:39         ` Arnd Bergmann
2025-11-03 21:00           ` Ankur Arora
2025-11-04 13:55             ` Catalin Marinas
2025-11-05  8:27               ` Ankur Arora
2025-11-05 10:37                 ` Arnd Bergmann
2025-11-06  0:36                   ` Ankur Arora
2025-10-28  5:31 ` [RESEND PATCH v7 3/7] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2025-10-28  5:31 ` [RESEND PATCH v7 4/7] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2025-10-28  5:31 ` [RESEND PATCH v7 5/7] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2025-10-28  5:31 ` [RESEND PATCH v7 6/7] rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2025-10-28  5:31 ` [RESEND PATCH v7 7/7] cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28 12:30   ` Rafael J. Wysocki
2025-10-29  4:41     ` Ankur Arora
2025-10-29 18:53       ` Rafael J. Wysocki
2025-10-29 19:13         ` Ankur Arora
2025-10-29 20:29           ` Rafael J. Wysocki
2025-10-29 21:01             ` Ankur Arora
2025-11-04 18:07               ` Rafael J. Wysocki
2025-11-05  8:30                 ` Ankur Arora [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0ok98zw.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=harisokn@amazon.com \
    --cc=joao.m.martins@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=memxor@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    --cc=zhenglifeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).