From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756484AbZISIbz (ORCPT ); Sat, 19 Sep 2009 04:31:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754124AbZISIby (ORCPT ); Sat, 19 Sep 2009 04:31:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:10398 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754982AbZISIbu (ORCPT ); Sat, 19 Sep 2009 04:31:50 -0400 Message-ID: <4AB496ED.7010800@redhat.com> Date: Sat, 19 Sep 2009 11:31:41 +0300 From: Avi Kivity User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Lightning/1.0pre Thunderbird/3.0b3 MIME-Version: 1.0 To: Peter Zijlstra CC: Ingo Molnar , Mark Langsdorf , Mike Galbraith , linux-kernel@vger.kernel.org Subject: Re: [PATCH] Prevent immediate process rescheduling References: <200909181449.12311.mark.langsdorf@amd.com> <20090918195455.GC11726@elte.hu> <1253304203.10538.64.camel@laptop> In-Reply-To: <1253304203.10538.64.camel@laptop> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/18/2009 11:03 PM, Peter Zijlstra wrote: > On Fri, 2009-09-18 at 21:54 +0200, Ingo Molnar wrote: > > >>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c >>> index 652e8bd..4fad08f 100644 >>> --- a/kernel/sched_fair.c >>> +++ b/kernel/sched_fair.c >>> @@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) >>> static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq) >>> { >>> struct rb_node *left = cfs_rq->rb_leftmost; >>> + struct sched_entity *se, *curr; >>> >>> if (!left) >>> return NULL; >>> >>> - return rb_entry(left, struct sched_entity, run_node); >>> + se = rb_entry(left, struct sched_entity, run_node); >>> + curr =¤t->se; >>> + >>> + /* >>> + * Don't select the entity who just tried to schedule away >>> + * if there's another entity available. >>> + */ >>> + if (unlikely(se == curr&& cfs_rq->nr_running> 1)) { >>> + struct rb_node *next_node = rb_next(&curr->run_node); >>> + if (next_node) >>> + se = rb_entry(next_node, struct sched_entity, run_node); >>> + } >>> + >>> + return se; >>> } >>> > Really hate this change though,. doesn't seem right to not pick the same > task again if its runnable. Bad for cache footprint. > > The scenario is quite common for stuff like: > > CPU0 CPU1 > > set_task_state(TASK_INTERRUPTIBLE) > > if (cond) > goto out; > <--- ttwu() > schedule(); > > I agree, yielding should be explicitly requested. Also, on a heavily overcommitted box an undirected yield might take quite a long time to find the thread that's holding the lock. I think a yield_to() will be a lot better: - we can pick one of the vcpus belonging to the same guest to improve the probability that the lock actually get released - we avoid an issue when the other vcpus are on different runqueues (in which case the current patch does nothing) - we can fix the accounting by donating vruntime from current to the yielded-to vcpu -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.