From: Rik van Riel <riel@redhat.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Avi Kiviti <avi@redhat.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Mike Galbraith <efault@gmx.de>,
Chris Wright <chrisw@sous-sol.org>,
ttracy@redhat.com, dshaks@redhat.com, "Nakajima,
Jun" <jun.nakajima@intel.com>, Paul Turner <pjt@google.com>
Subject: Re: [RFC -v6 PATCH 2/8] sched: limit the scope of clear_buddies
Date: Mon, 24 Jan 2011 13:04:39 -0500 [thread overview]
Message-ID: <4D3DBF37.2060401@redhat.com> (raw)
In-Reply-To: <1295891861.28776.448.camel@laptop>
On 01/24/2011 12:57 PM, Peter Zijlstra wrote:
> On Thu, 2011-01-20 at 16:33 -0500, Rik van Riel wrote:
>> The clear_buddies function does not seem to play well with the concept
>> of hierarchical runqueues. In the following tree, task groups are
>> represented by 'G', tasks by 'T', next by 'n' and last by 'l'.
>>
>> (nl)
>> / \
>> G(nl) G
>> / \ \
>> T(l) T(n) T
>>
>> This situation can arise when a task is woken up T(n), and the previously
>> running task T(l) is marked last.
>>
>> When clear_buddies is called from either T(l) or T(n), the next and last
>> buddies of the group G(nl) will be cleared. This is not the desired
>> result, since we would like to be able to find the other type of buddy
>> in many cases.
>>
>> This especially a worry when implementing yield_task_fair through the
>> buddy system.
>>
>> The fix is simple: only clear the buddy type that the task itself
>> is indicated to be. As an added bonus, we stop walking up the tree
>> when the buddy has already been cleared or pointed elsewhere.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.coM>
>> ---
>> kernel/sched_fair.c | 30 +++++++++++++++++++++++-------
>> 1 files changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index f4ee445..0321473 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -784,19 +784,35 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>> __enqueue_entity(cfs_rq, se);
>> }
>>
>> -static void __clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> +static void __clear_buddies_last(struct sched_entity *se)
>> {
>> - if (!se || cfs_rq->last == se)
>> - cfs_rq->last = NULL;
>> + for_each_sched_entity(se) {
>> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> + if (cfs_rq->last == se)
>> + cfs_rq->last = NULL;
>> + else
>> + break;
>> + }
>> +}
>>
>> - if (!se || cfs_rq->next == se)
>> - cfs_rq->next = NULL;
>> +static void __clear_buddies_next(struct sched_entity *se)
>> +{
>> + for_each_sched_entity(se) {
>> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> + if (cfs_rq->next == se)
>> + cfs_rq->next = NULL;
>> + else
>> + break;
>> + }
>> }
>>
>> static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> {
>> - for_each_sched_entity(se)
>> - __clear_buddies(cfs_rq_of(se), se);
>> + if (cfs_rq->last == se)
>> + __clear_buddies_last(se);
>> +
>> + if (cfs_rq->next == se)
>> + __clear_buddies_next(se);
>> }
>>
>
> Right, I think this sorta matches with something the Google guys talked
> about, they wanted to change pick_next_task() no always start from the
> top but only go up one level when the current level ran out.
>
> It looks ok, just sad that we can now have two hierarchy traversals (and
> 3 with the next patch).
On the other hand, I don't think we'll actually _do_ the
hierarchy traversal most of the time, since pick_next_entity
calls clear_buddies, every step of the way down the tree.
A hierarchy traversal will only be done if a task already
has one type of buddy set, and then gets another type of
buddy set, before it is rescheduled.
Eg. a task can have ->last set and then call yield, causing
the ->yield buddy to get pointed at itself. When doing that,
it will walk up the tree, clearing ->last.
I suspect that with this patch, we'll end up doing less
tree traversal than before.
next prev parent reply other threads:[~2011-01-24 18:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-20 21:31 [RFC -v6 PATCH 0/8] directed yield for Pause Loop Exiting Rik van Riel
2011-01-20 21:32 ` [RFC -v6 PATCH 1/8] sched: check the right ->nr_running in yield_task_fair Rik van Riel
2011-01-20 21:33 ` [RFC -v6 PATCH 2/8] sched: limit the scope of clear_buddies Rik van Riel
2011-01-24 17:57 ` Peter Zijlstra
2011-01-24 18:04 ` Rik van Riel [this message]
2011-01-20 21:33 ` [RFC -v6 PATCH 3/8] sched: use a buddy to implement yield_task_fair Rik van Riel
2011-01-24 18:04 ` Peter Zijlstra
2011-01-24 18:16 ` Rik van Riel
2011-01-20 21:34 ` [RFC -v6 PATCH 4/8] sched: Add yield_to(task, preempt) functionality Rik van Riel
2011-01-24 18:12 ` Peter Zijlstra
2011-01-24 18:19 ` Rik van Riel
2011-01-20 21:36 ` [RFC -v6 PATCH 6/8] export pid symbols needed for kvm_vcpu_on_spin Rik van Riel
2011-01-20 21:36 ` [RFC -v6 PATCH 7/8] kvm: keep track of which task is running a KVM vcpu Rik van Riel
2011-01-26 13:01 ` Avi Kivity
2011-01-26 15:20 ` Rik van Riel
2011-01-20 21:37 ` [RFC -v6 PATCH 5/8] sched: drop superfluous tests from yield_to Rik van Riel
2011-01-20 21:38 ` [RFC -v6 PATCH 8/8] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D3DBF37.2060401@redhat.com \
--to=riel@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=avi@redhat.com \
--cc=chrisw@sous-sol.org \
--cc=dshaks@redhat.com \
--cc=efault@gmx.de \
--cc=jun.nakajima@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pjt@google.com \
--cc=ttracy@redhat.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).