kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Avi Kiviti <avi@redhat.com>,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	Mike Galbraith <efault@gmx.de>,
	Chris Wright <chrisw@sous-sol.org>,
	ttracy@redhat.com, dshaks@redhat.com, "Nakajima,
	Jun" <jun.nakajima@intel.com>, Paul Turner <pjt@google.com>
Subject: Re: [RFC -v6 PATCH 2/8] sched: limit the scope of clear_buddies
Date: Mon, 24 Jan 2011 13:04:39 -0500	[thread overview]
Message-ID: <4D3DBF37.2060401@redhat.com> (raw)
In-Reply-To: <1295891861.28776.448.camel@laptop>

On 01/24/2011 12:57 PM, Peter Zijlstra wrote:
> On Thu, 2011-01-20 at 16:33 -0500, Rik van Riel wrote:
>> The clear_buddies function does not seem to play well with the concept
>> of hierarchical runqueues.  In the following tree, task groups are
>> represented by 'G', tasks by 'T', next by 'n' and last by 'l'.
>>
>>       (nl)
>>      /    \
>>     G(nl)  G
>>     / \     \
>>   T(l) T(n)  T
>>
>> This situation can arise when a task is woken up T(n), and the previously
>> running task T(l) is marked last.
>>
>> When clear_buddies is called from either T(l) or T(n), the next and last
>> buddies of the group G(nl) will be cleared.  This is not the desired
>> result, since we would like to be able to find the other type of buddy
>> in many cases.
>>
>> This especially a worry when implementing yield_task_fair through the
>> buddy system.
>>
>> The fix is simple: only clear the buddy type that the task itself
>> is indicated to be.  As an added bonus, we stop walking up the tree
>> when the buddy has already been cleared or pointed elsewhere.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.coM>
>> ---
>>   kernel/sched_fair.c |   30 +++++++++++++++++++++++-------
>>   1 files changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index f4ee445..0321473 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -784,19 +784,35 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>   		__enqueue_entity(cfs_rq, se);
>>   }
>>
>> -static void __clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> +static void __clear_buddies_last(struct sched_entity *se)
>>   {
>> -	if (!se || cfs_rq->last == se)
>> -		cfs_rq->last = NULL;
>> +	for_each_sched_entity(se) {
>> +		struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> +		if (cfs_rq->last == se)
>> +			cfs_rq->last = NULL;
>> +		else
>> +			break;
>> +	}
>> +}
>>
>> -	if (!se || cfs_rq->next == se)
>> -		cfs_rq->next = NULL;
>> +static void __clear_buddies_next(struct sched_entity *se)
>> +{
>> +	for_each_sched_entity(se) {
>> +		struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> +		if (cfs_rq->next == se)
>> +			cfs_rq->next = NULL;
>> +		else
>> +			break;
>> +	}
>>   }
>>
>>   static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>   {
>> -	for_each_sched_entity(se)
>> -		__clear_buddies(cfs_rq_of(se), se);
>> +	if (cfs_rq->last == se)
>> +		__clear_buddies_last(se);
>> +
>> +	if (cfs_rq->next == se)
>> +		__clear_buddies_next(se);
>>   }
>>
>
> Right, I think this sorta matches with something the Google guys talked
> about, they wanted to change pick_next_task() no always start from the
> top but only go up one level when the current level ran out.
>
> It looks ok, just sad that we can now have two hierarchy traversals (and
> 3 with the next patch).

On the other hand, I don't think we'll actually _do_ the
hierarchy traversal most of the time, since pick_next_entity
calls clear_buddies, every step of the way down the tree.

A hierarchy traversal will only be done if a task already
has one type of buddy set, and then gets another type of
buddy set, before it is rescheduled.

Eg. a task can have ->last set and then call yield, causing
the ->yield buddy to get pointed at itself.  When doing that,
it will walk up the tree, clearing ->last.

I suspect that with this patch, we'll end up doing less
tree traversal than before.

  reply	other threads:[~2011-01-24 18:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-20 21:31 [RFC -v6 PATCH 0/8] directed yield for Pause Loop Exiting Rik van Riel
2011-01-20 21:32 ` [RFC -v6 PATCH 1/8] sched: check the right ->nr_running in yield_task_fair Rik van Riel
2011-01-20 21:33 ` [RFC -v6 PATCH 2/8] sched: limit the scope of clear_buddies Rik van Riel
2011-01-24 17:57   ` Peter Zijlstra
2011-01-24 18:04     ` Rik van Riel [this message]
2011-01-20 21:33 ` [RFC -v6 PATCH 3/8] sched: use a buddy to implement yield_task_fair Rik van Riel
2011-01-24 18:04   ` Peter Zijlstra
2011-01-24 18:16     ` Rik van Riel
2011-01-20 21:34 ` [RFC -v6 PATCH 4/8] sched: Add yield_to(task, preempt) functionality Rik van Riel
2011-01-24 18:12   ` Peter Zijlstra
2011-01-24 18:19     ` Rik van Riel
2011-01-20 21:36 ` [RFC -v6 PATCH 6/8] export pid symbols needed for kvm_vcpu_on_spin Rik van Riel
2011-01-20 21:36 ` [RFC -v6 PATCH 7/8] kvm: keep track of which task is running a KVM vcpu Rik van Riel
2011-01-26 13:01   ` Avi Kivity
2011-01-26 15:20     ` Rik van Riel
2011-01-20 21:37 ` [RFC -v6 PATCH 5/8] sched: drop superfluous tests from yield_to Rik van Riel
2011-01-20 21:38 ` [RFC -v6 PATCH 8/8] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D3DBF37.2060401@redhat.com \
    --to=riel@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=dshaks@redhat.com \
    --cc=efault@gmx.de \
    --cc=jun.nakajima@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pjt@google.com \
    --cc=ttracy@redhat.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).