public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Sargun Dhillon <sargun@sargun.me>,
	Xie XiuQi <xiexiuqi@huawei.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	xiezhipeng1@huawei.com, huawei.libin@huawei.com,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages
Date: Thu, 27 Dec 2018 17:15:24 -0800	[thread overview]
Message-ID: <20181228011524.GF2509588@devbig004.ftw2.facebook.com> (raw)
In-Reply-To: <CAHk-=wja7WN1sDz5_E7hf8c47CmX=yH8xc3h0zmmF8Y_fSR71A@mail.gmail.com>

Happy holidays, everyone.

(cc'ing Rik, who has been looking at the scheduler code a lot lately)

On Thu, Dec 27, 2018 at 10:15:17AM -0800, Linus Torvalds wrote:
> [ goes off and looks ]
> 
> Oh. unthrottle_cfs_rq -> enqueue_entity -> list_add_leaf_cfs_rq()
> doesn't actually seem to hold the rq lock at all. It's just called
> under a rcu read lock.

I'm pretty sure enqueue_entity() *has* to be called with rq lock.
unthrottle_cfs_rq() is called from tg_set_cfs_bandwidth(),
distribute_cfs_runtime() and unthrottle_offline_cfs_rqs.  The first
two grabs the rq_lock just around the calls and the last one has a
lockdep assert on the rq_lock.  What am I missing?

> So it all seems to depend on that "on_list" flag for exclusion. Which
> seems fundamentally racy, since it's not protected by a lock.

The only place on_list is accessed without holding rq_lock is
unregister_fair_sched_group().  It's a minor optimization on a
relatively cold path (group destruction), so if it's racy there, I
think we can take out that optimization.  I'd be surprised if anyone
notices that.

That said, I don't think it's broken.  False positive on on_list is
fine and I can't see how a false negative would happen given that the
only event which can set it is the sched entity getting scheduled and
there's no way the removal path can't race against that transition.

> But that still makes me go "how come is this only noticed 18 months
> after the fact"?

Unless I'm totally confused, which is definitely possible, I don't
think there's a race condition and the only bug is the
tmp_alone_branch pointer getting dangled, which maybe doesn't happen
all that much?

Thanks.

-- 
tejun

  parent reply	other threads:[~2018-12-28  1:15 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-27  3:04 [PATCH] sched: fix infinity loop in update_blocked_averages Xie XiuQi
2018-12-27  9:21 ` Vincent Guittot
2018-12-27 10:21   ` Vincent Guittot
2018-12-27 10:23     ` Vincent Guittot
2018-12-27 16:39       ` Sargun Dhillon
2018-12-27 17:01         ` Vincent Guittot
2018-12-27 18:15           ` Linus Torvalds
2018-12-27 21:08             ` Sargun Dhillon
2018-12-27 21:46               ` Linus Torvalds
2018-12-28  1:15             ` Tejun Heo [this message]
2018-12-28  1:36               ` Linus Torvalds
2018-12-28  1:53                 ` Tejun Heo
2018-12-28  2:02                   ` Tejun Heo
2018-12-28  2:30                     ` Xie XiuQi
2018-12-28  5:38                     ` Sargun Dhillon
2018-12-28  9:30                     ` Vincent Guittot
2018-12-28 14:26                       ` Sargun Dhillon
2018-12-28 16:54                       ` Tejun Heo
2018-12-28 17:25                         ` Vincent Guittot
2018-12-28 17:46                           ` Tejun Heo
2018-12-28 18:04                             ` Vincent Guittot
2018-12-28 10:25                     ` Xiezhipeng (EulerOS)
2018-12-30 12:04   ` Ingo Molnar
2018-12-30 12:31     ` [PATCH] sched: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c Ingo Molnar
2018-12-30 12:36     ` [PATCH] sched: fix infinity loop in update_blocked_averages Vincent Guittot
2018-12-30 12:54       ` Ingo Molnar
2018-12-30 13:00 ` [tip:sched/urgent] sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c tip-bot for Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181228011524.GF2509588@devbig004.ftw2.facebook.com \
    --to=tj@kernel.org \
    --cc=dmitry.adamushko@gmail.com \
    --cc=huawei.libin@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=sargun@sargun.me \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=xiexiuqi@huawei.com \
    --cc=xiezhipeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox