From: Aaron Lu <ziqianlu@bytedance.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>,
Florian Bezdeka <florian.bezdeka@siemens.com>,
Valentin Schneider <vschneid@redhat.com>,
Ben Segall <bsegall@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Xi Wang <xii@google.com>,
linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Chengming Zhou <chengming.zhou@linux.dev>,
Chuyi Zhou <zhouchuyi@bytedance.com>,
"Sebastian Andrzej Siewior," <bigeasy@linutronix.de>
Subject: Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user
Date: Tue, 22 Apr 2025 10:10:54 +0800 [thread overview]
Message-ID: <20250421120648.GA3357499@bytedance> (raw)
In-Reply-To: <e65a32af-271b-4de6-937a-1a1049bbf511@amd.com>
Hi Prateek,
On Tue, Apr 15, 2025 at 09:19:20PM +0530, K Prateek Nayak wrote:
> Hello Jan,
>
> Sorry for the noise.
>
> On 4/15/2025 4:46 PM, K Prateek Nayak wrote:
> > Hello Jan,
> >
> > On 4/15/2025 3:51 PM, Jan Kiszka wrote:
> > > > Is this in line with what you are seeing?
> > > >
> > >
> > > Yes, and if you wait a bit longer for the second reporting round, you
> > > should get more task backtraces as well.
> >
> > So looking at the backtrace [1], Aaron's patch should help with the
> > stalls you are seeing.
> >
> > timerfd that queues a hrtimer also uses ep_poll_callback() to wakeup
> > the epoll waiter which queues ahead of the bandwidth timer and
> > requires the read lock but now since the writer tried to grab the
> > lock pushing readers on the slowpath. if epoll-stall-writer is now
> > throttled, it needs ktimer to replenish its bandwidth which cannot
> > happen without it grabbing the read lock first.
> >
> > # epoll-stall-writer
>
> So I got confused between "epoll-stall" and "epoll-stall-writer" here.
> Turns out the actual series of events (based on traces, and hopefully
> correct this time) are slightly longer. The correct series of events
> are:
>
> # epoll-stall-writer
>
> anon_pipe_write()
> __wake_up_common()
> ep_poll_callback() {
> read_lock_irq(&ep->lock) /* Read lock acquired here */
I was confused by this function's name. I had thought irq is off but
then realized under PREEMPT_RT, read_lock_irq() doesn't disable irq...
> __wake_up_common()
> ep_autoremove_wake_function()
> try_to_wake_up() /* Wakes up "epoll-stall" */
> preempt_schedule()
> ...
>
> # "epoll-stall-writer" has run out of bandwidth, needs replenish to run
Luckily in this "only throttle when ret2user" model, epoll-stall-writer
does not need replenish to run again(and then unblock the others).
> # sched_switch: "epoll-stall-writer" => "epoll-stall"
>
> ... /* Resumes from epoll_wait() */
> epoll_wait() => 1 /* Write to FIFO */
> read() /* Reads one byte of data */
> epoll_wait()
> write_lock_irq() /* Tries to grab write lock; "epoll-stall-writer" still has read lock */
> schedule_rtlock() /* Sleeps but put next readers on slowpath */
> ...
>
> # sched_switch: "epoll-stall" => "swapper"
> # CPU is idle
>
> ...
>
> # Timer interrupt schedules ktimers
> # sched_switch: "swapper" => "ktimers"
>
> hrtimer_run_softirq()
> timerfd_tmrproc()
> __wake_up_common()
> ep_poll_callback() {
> read_lock_irq(&ep->lock) /* Blocks since we are in rwlock slowpath */
> schedule_rtlock()
> ...
>
> # sched_switch: "ktimers" => "swapper"
> # Bandwidth replenish never happens
> # Stall
>
> From a second look at trace, this should be the right series of
> events since "epoll-stall-writer" with bandwidth control seems
> to have cut off during while doing the wakeup and hasn't run
> again.
>
> Sorry for the noise.
>
Thanks for the analysis.
I'm testing this reprod with this series and didn't notice any issue
yet, I'll report if anything unexpected happened.
Best wishes,
Aaron
next prev parent reply other threads:[~2025-04-22 2:11 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 12:07 [RFC PATCH v2 0/7] Defer throttle when task exits to user Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-04-14 3:58 ` K Prateek Nayak
2025-04-14 11:55 ` Aaron Lu
2025-04-14 13:37 ` K Prateek Nayak
2025-04-09 12:07 ` [RFC PATCH v2 2/7] sched/fair: Handle throttle path " Aaron Lu
2025-04-14 8:54 ` Florian Bezdeka
2025-04-14 12:10 ` Aaron Lu
2025-04-14 14:39 ` Florian Bezdeka
2025-04-14 15:02 ` K Prateek Nayak
2025-04-30 10:01 ` Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 3/7] sched/fair: Handle unthrottle " Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 4/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 5/7] sched/fair: get rid of throttled_lb_pair() Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 6/7] sched/fair: fix h_nr_runnable accounting with per-task throttle Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time Aaron Lu
2025-04-09 14:24 ` Aaron Lu
2025-04-17 14:06 ` Florian Bezdeka
2025-04-18 3:15 ` Aaron Lu
2025-04-22 15:03 ` Florian Bezdeka
2025-04-23 11:26 ` Aaron Lu
2025-04-23 12:15 ` Florian Bezdeka
2025-04-24 2:26 ` Aaron Lu
2025-05-07 9:09 ` Aaron Lu
2025-05-07 9:33 ` Florian Bezdeka
2025-05-08 2:45 ` Aaron Lu
2025-05-08 6:13 ` Jan Kiszka
2025-05-08 13:43 ` Steven Rostedt
2025-04-14 3:05 ` [RFC PATCH v2 0/7] Defer throttle when task exits to user Chengming Zhou
2025-04-14 11:47 ` Aaron Lu
2025-04-14 8:54 ` Florian Bezdeka
2025-04-14 12:04 ` Aaron Lu
2025-04-15 5:29 ` Jan Kiszka
2025-04-15 6:05 ` K Prateek Nayak
2025-04-15 6:09 ` Jan Kiszka
2025-04-15 8:45 ` K Prateek Nayak
2025-04-15 10:21 ` Jan Kiszka
2025-04-15 11:14 ` K Prateek Nayak
[not found] ` <ec2cea83-07fe-472f-8320-911d215473fd@amd.com>
2025-04-15 15:49 ` K Prateek Nayak
2025-04-22 2:10 ` Aaron Lu [this message]
2025-04-22 2:54 ` K Prateek Nayak
2025-04-22 14:54 ` Florian Bezdeka
2025-04-15 10:34 ` K Prateek Nayak
2025-04-14 16:34 ` K Prateek Nayak
2025-04-15 11:25 ` Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250421120648.GA3357499@bytedance \
--to=ziqianlu@bytedance.com \
--cc=bigeasy@linutronix.de \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xii@google.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox