From: Rik van Riel <riel@redhat.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Avi Kiviti <avi@redhat.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Mike Galbraith <efault@gmx.de>,
Chris Wright <chrisw@sous-sol.org>,
ttracy@redhat.com, dshaks@redhat.com, "Nakajima,
Jun" <jun.nakajima@intel.com>, Paul Turner <pjt@google.com>
Subject: Re: [RFC -v6 PATCH 2/8] sched: limit the scope of clear_buddies
Date: Mon, 24 Jan 2011 13:04:39 -0500 [thread overview]
Message-ID: <4D3DBF37.2060401@redhat.com> (raw)
In-Reply-To: <1295891861.28776.448.camel@laptop>
On 01/24/2011 12:57 PM, Peter Zijlstra wrote:
> On Thu, 2011-01-20 at 16:33 -0500, Rik van Riel wrote:
>> The clear_buddies function does not seem to play well with the concept
>> of hierarchical runqueues. In the following tree, task groups are
>> represented by 'G', tasks by 'T', next by 'n' and last by 'l'.
>>
>> (nl)
>> / \
>> G(nl) G
>> / \ \
>> T(l) T(n) T
>>
>> This situation can arise when a task is woken up T(n), and the previously
>> running task T(l) is marked last.
>>
>> When clear_buddies is called from either T(l) or T(n), the next and last
>> buddies of the group G(nl) will be cleared. This is not the desired
>> result, since we would like to be able to find the other type of buddy
>> in many cases.
>>
>> This especially a worry when implementing yield_task_fair through the
>> buddy system.
>>
>> The fix is simple: only clear the buddy type that the task itself
>> is indicated to be. As an added bonus, we stop walking up the tree
>> when the buddy has already been cleared or pointed elsewhere.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.coM>
>> ---
>> kernel/sched_fair.c | 30 +++++++++++++++++++++++-------
>> 1 files changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index f4ee445..0321473 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -784,19 +784,35 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>> __enqueue_entity(cfs_rq, se);
>> }
>>
>> -static void __clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> +static void __clear_buddies_last(struct sched_entity *se)
>> {
>> - if (!se || cfs_rq->last == se)
>> - cfs_rq->last = NULL;
>> + for_each_sched_entity(se) {
>> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> + if (cfs_rq->last == se)
>> + cfs_rq->last = NULL;
>> + else
>> + break;
>> + }
>> +}
>>
>> - if (!se || cfs_rq->next == se)
>> - cfs_rq->next = NULL;
>> +static void __clear_buddies_next(struct sched_entity *se)
>> +{
>> + for_each_sched_entity(se) {
>> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> + if (cfs_rq->next == se)
>> + cfs_rq->next = NULL;
>> + else
>> + break;
>> + }
>> }
>>
>> static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> {
>> - for_each_sched_entity(se)
>> - __clear_buddies(cfs_rq_of(se), se);
>> + if (cfs_rq->last == se)
>> + __clear_buddies_last(se);
>> +
>> + if (cfs_rq->next == se)
>> + __clear_buddies_next(se);
>> }
>>
>
> Right, I think this sorta matches with something the Google guys talked
> about, they wanted to change pick_next_task() no always start from the
> top but only go up one level when the current level ran out.
>
> It looks ok, just sad that we can now have two hierarchy traversals (and
> 3 with the next patch).
On the other hand, I don't think we'll actually _do_ the
hierarchy traversal most of the time, since pick_next_entity
calls clear_buddies, every step of the way down the tree.
A hierarchy traversal will only be done if a task already
has one type of buddy set, and then gets another type of
buddy set, before it is rescheduled.
Eg. a task can have ->last set and then call yield, causing
the ->yield buddy to get pointed at itself. When doing that,
it will walk up the tree, clearing ->last.
I suspect that with this patch, we'll end up doing less
tree traversal than before.
next prev parent reply other threads:[~2011-01-24 18:05 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-20 21:31 [RFC -v6 PATCH 0/8] directed yield for Pause Loop Exiting Rik van Riel
2011-01-20 21:32 ` [RFC -v6 PATCH 1/8] sched: check the right ->nr_running in yield_task_fair Rik van Riel
2011-01-20 21:33 ` [RFC -v6 PATCH 2/8] sched: limit the scope of clear_buddies Rik van Riel
2011-01-24 17:57 ` Peter Zijlstra
2011-01-24 18:04 ` Rik van Riel [this message]
2011-01-20 21:33 ` [RFC -v6 PATCH 3/8] sched: use a buddy to implement yield_task_fair Rik van Riel
2011-01-24 18:04 ` Peter Zijlstra
2011-01-24 18:16 ` Rik van Riel
2011-01-20 21:34 ` [RFC -v6 PATCH 4/8] sched: Add yield_to(task, preempt) functionality Rik van Riel
2011-01-24 18:12 ` Peter Zijlstra
2011-01-24 18:19 ` Rik van Riel
2011-01-20 21:36 ` [RFC -v6 PATCH 6/8] export pid symbols needed for kvm_vcpu_on_spin Rik van Riel
2011-01-20 21:36 ` [RFC -v6 PATCH 7/8] kvm: keep track of which task is running a KVM vcpu Rik van Riel
2011-01-26 13:01 ` Avi Kivity
2011-01-26 15:20 ` Rik van Riel
2011-01-20 21:37 ` [RFC -v6 PATCH 5/8] sched: drop superfluous tests from yield_to Rik van Riel
2011-01-20 21:38 ` [RFC -v6 PATCH 8/8] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D3DBF37.2060401@redhat.com \
--to=riel@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=avi@redhat.com \
--cc=chrisw@sous-sol.org \
--cc=dshaks@redhat.com \
--cc=efault@gmx.de \
--cc=jun.nakajima@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pjt@google.com \
--cc=ttracy@redhat.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.