Re: Question on task_blocks_on_rt_mutex()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	peterz@infradead.org, mingo@redhat.com, will@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: Question on task_blocks_on_rt_mutex()
Date: Sat, 5 Sep 2020 17:45:02 -0400	[thread overview]
Message-ID: <20200905214502.GA2631534@google.com> (raw)
In-Reply-To: <20200905212406.GA2074270@google.com>

On Sat, Sep 05, 2020 at 05:24:06PM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> On Thu, Sep 03, 2020 at 01:06:39PM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 02, 2020 at 08:54:10AM -0700, Paul E. McKenney wrote:
> > > On Tue, Sep 01, 2020 at 06:51:28PM -0700, Davidlohr Bueso wrote:
> > > > On Tue, 01 Sep 2020, Paul E. McKenney wrote:
> > > > 
> > > > > And it appears that a default-niced CPU-bound SCHED_OTHER process is
> > > > > not preempted by a newly awakened MAX_NICE SCHED_OTHER process.  OK,
> > > > > OK, I never waited for more than 10 minutes, but on my 2.2GHz that is
> > > > > close enough to a hang for most people.
> > > > > 
> > > > > Which means that the patch below prevents the hangs.  And maybe does
> > > > > other things as well, firing rcutorture up on it to check.
> > > > > 
> > > > > But is this indefinite delay expected behavior?
> > > > > 
> > > > > This reproduces for me on current mainline as follows:
> > > > > 
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05
> > > > > 
> > > > > This hangs within a minute of boot on my setup.  Here "hangs" is defined
> > > > > as stopping the per-15-second console output of:
> > > > > 	Writes:  Total: 569906696 Max/Min: 81495031/63736508   Fail: 0
[...]
> > ------------------------------------------------------------------------
> > 
> > commit d93a64389f4d544ded241d0ba30b2586497f5dc0
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Tue Sep 1 16:58:41 2020 -0700
> > 
> >     torture: Periodically pause in stutter_wait()
> >     
> >     Running locktorture scenario LOCK05 results in hangs:
> >     
> >     tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05
> >     
> >     The lock_torture_writer() kthreads set themselves to MAX_NICE while
> >     running SCHED_OTHER.  Other locktorture kthreads run at default niceness,
> >     also SCHED_OTHER.  This results in these other locktorture kthreads
> >     indefinitely preempting the lock_torture_writer() kthreads.  Note that
> 
> In the past I have seen issues with niceness and CFS. Those issues were
> related to tick granularity, if the scheduler tick is too coarse, then
> scheduler may allow a low priority task to run for a bit longer. But this
> also means that higher priority tasks will take even longer to catch up to
> the vruntime of the lower priority ones. IIRC, this can run into several
> seconds.
> 
> Not fully sure if that's what you're seeing. If you drop the niceness by some
> amount, does the issue go away or get better?
> 
> >     the cond_resched() in the stutter_wait() function's loop is ineffective
> >     because this scenario is built with CONFIG_PREEMPT=y.
> >     
> >     It is not clear that such indefinite preemption is supposed to happen, but
> >     in the meantime this commit prevents kthreads running in stutter_wait()
> >     from being completely CPU-bound, thus allowing the other threads to get
> >     some CPU in a timely fashion.  This commit also uses hrtimers to provide
> >     very short sleeps to avoid degrading the sudden-on testing that stutter
> >     is supposed to provide.
> 
> There is a CFS tracepoint called sched:sched_stat_runtime. That could be
> enabled to see what happens to the vruntime values on the wakeup of the lower
> prio task.
> 
> I'm also seeing the LOCK05 failure, I see that some writer threads are in
> TASK_UNINTERRUPTIBLE state shown by hung task detector on LOCK05. So these
> writers didn't wake up for over 2 minutes to begin with:
> 
> [  246.797326] task:lock_torture_wr state:D stack:14696 pid:   72 ppid:     2 flags:0x00004000
> [  246.798826] Call Trace:
> [  246.799282]  __schedule+0x414/0x6a0
> [  246.799917]  schedule+0x41/0xe0
> [  246.800510]  __rt_mutex_slowlock+0x49/0xd0
> [  246.801259]  rt_mutex_slowlock+0xca/0x1e0
> [  246.801994]  ? lock_torture_reader+0x110/0x110
> [  246.802799]  torture_rtmutex_lock+0xc/0x10
> [  246.803545]  lock_torture_writer+0x72/0x150
> [  246.804322]  kthread+0x120/0x160
> [  246.804911]  ? kthread_park+0x80/0x80
> [  246.805581]  ret_from_fork+0x22/0x30
> [  246.806237] INFO: task lock_torture_wr:73 blocked for more than 122 seconds.
> [  246.807505]       Not tainted 5.9.0-rc1+ #26
> [  246.808287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  246.809690] task:lock_torture_wr state:D stack:14696 pid:   73 ppid:     2 flags:0x00004000
> [  246.811208] Call Trace:
> [  246.811657]  __schedule+0x414/0x6a0
> [  246.812306]  schedule+0x41/0xe0
> [  246.812881]  __rt_mutex_slowlock+0x49/0xd0
> [  246.813636]  rt_mutex_slowlock+0xca/0x1e0
> [  246.814371]  ? lock_torture_reader+0x110/0x110
> [  246.815182]  torture_rtmutex_lock+0xc/0x10
> [  246.815923]  lock_torture_writer+0x72/0x150
> [  246.816692]  kthread+0x120/0x160
> [  246.817287]  ? kthread_park+0x80/0x80
> [  246.817952]  ret_from_fork+0x22/0x30
> 
> Could this just be a side effect of the issue you are seeing?  (A writer
> acquired a lock but never got CPU to release it, which inturn caused lock
> acquirers to block in D-state indefinitely).

It appears to me the reason could be because the higher priority task is RT:

sched_switch: prev_comm=lock_torture_wr prev_pid=74 prev_prio=139 prev_state=R+ ==> next_comm=lock_torture_wr next_pid=70 next_prio=49

After this, only pid=70 runs till the hungtasks detector screams.

Could this because the writer calls cur_ops->task_boost(); which sets pid=70
to RT?  As long as RT task runs, it will block the CFS task without giving it CPU.

thanks,

 - Joel

next prev parent reply	other threads:[~2020-09-05 21:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-31 22:49 Question on task_blocks_on_rt_mutex() Paul E. McKenney
2020-08-31 23:21 ` Paul E. McKenney
2020-09-01 17:49   ` Paul E. McKenney
2020-09-01 23:58     ` Paul E. McKenney
2020-09-02  1:51       ` Davidlohr Bueso
2020-09-02 15:54         ` Paul E. McKenney
2020-09-03 20:06           ` Paul E. McKenney
2020-09-04 17:24             ` Davidlohr Bueso
2020-09-04 19:56               ` Paul E. McKenney
2020-09-05 21:24             ` Joel Fernandes
2020-09-05 21:45               ` Joel Fernandes [this message]
2020-09-06  4:14                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200905214502.GA2631534@google.com \
    --to=joel@joelfernandes.org \
    --cc=dave@stgolabs.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.