From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966259AbdAKMwe (ORCPT ); Wed, 11 Jan 2017 07:52:34 -0500 Received: from mail.santannapisa.it ([193.205.80.99]:22354 "EHLO mail.santannapisa.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966154AbdAKMwd (ORCPT ); Wed, 11 Jan 2017 07:52:33 -0500 X-Greylist: delayed 778 seconds by postgrey-1.27 at vger.kernel.org; Wed, 11 Jan 2017 07:52:32 EST Date: Wed, 11 Jan 2017 13:39:26 +0100 From: Luca Abeni To: Juri Lelli Cc: Daniel Bristot de Oliveira , linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Claudio Scordino , Steven Rostedt , Tommaso Cucinotta Subject: Re: [RFC v4 0/6] CPU reclaiming for SCHED_DEADLINE Message-ID: <20170111133926.7ec0a5b0@luca> In-Reply-To: <20170111121951.GI10415@e106622-lin> References: <1483097591-3871-1-git-send-email-lucabe72@gmail.com> <6c4ab4ec-7164-fafe-5efc-990f3cf31269@redhat.com> <20170104131755.573651ca@sweethome> <20170111121951.GI10415@e106622-lin> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Juri, (I reply from my new email address) On Wed, 11 Jan 2017 12:19:51 +0000 Juri Lelli wrote: [...] > > > For example, with my taskset, with a hypothetical perfect balance > > > of the whole runqueue, one possible scenario is: > > > > > > CPU 0 1 2 3 > > > # TASKS 3 3 3 2 > > > > > > In this case, CPUs 0 1 2 are with 100% of local utilization. > > > Thus, the current task on these CPUs will have their runtime > > > decreased by GRUB. Meanwhile, the luck tasks in the CPU 3 would > > > use an additional time that they "globally" do not have - because > > > the system, globally, has a load higher than the 66.6...% of the > > > local runqueue. Actually, part of the time decreased from tasks > > > on [0-2] are being used by the tasks on 3, until the next > > > migration of any task, which will change the luck tasks... but > > > without any guaranty that all tasks will be the luck one on every > > > activation, causing the problem. > > > > > > Does it make sense? > > > > Yes; but my impression is that gEDF will migrate tasks so that the > > distribution of the reclaimed CPU bandwidth is almost uniform... > > Instead, you saw huge differences in the utilisations (and I do not > > think that "compressing" the utilisations from 100% to 95% can > > decrease the utilisation of a task from 33% to 25% / 26%... :) > > > > I tried to replicate Daniel's experiment, but I don't see such a > skewed allocation. They get a reasonably uniform bandwidth and the > trace looks fairly good as well (all processes get to run on the > different processors at some time). With some effort, I replicated the issue noticed by Daniel... I think it also depends on the CPU speed (and on good or bad luck :), but the "unfair" CPU allocation can actually happen. I am working on a fix (based on the m-grub modifications proposed at last April's SAC - in my original patchset, I over-simplified the algorithm). > > I suspect there is something more going on here (might be some bug > > in one of my patches). I am trying to better understand what > > happened. > > However, playing with this a bit further, I found out one thing that > looks counter-intuitive (at least to me :). > > Simplifying Daniel's example, let's say that we have one 10/30 task > running on a CPU with a 500/1000 global limit. Applying grub_reclaim() > formula we have: > > delta_exec = delta * (0.5 + 0.333) = delta * 0.833 > > Which in practice means that 1ms of real delta (at 1000HZ) corresponds > to 0.833ms of virtual delta. Considering this, a 10ms (over 30ms) > reservation gets "extended" to ~12ms (over 30ms), that is to say the > task consumes 0.4 of the CPU's bandwidth. top seems to back what I'm > saying, but am I still talking nonsense? :) You are right; my "Do not reclaim the whole CPU bandwidth" patch is an approximation... I hoped that this approximation could be more precise than what it really is. I used the "Uact + unreclaimable utilization" equation to avoid divisions in grub_reclaim(), but the equation should really be "Uact / reclaimable utilization"... So, in your example it is delta * 0.3333 / 0.5 = delta * 0.6666 that results in 15ms over 30ms, as expected. I'll fix that patch for the next submission. > I was expecting that the task could consume 0.5 worth of bandwidth > with the given global limit. Is the current behaviour intended? > > If we want to change this behaviour maybe something like the following > might work? > > delta_exec = (delta * to_ratio((1ULL << 20) - rq->dl.non_deadline_bw, > rq->dl.running_bw)) >> 20 My current patch does (delta * rq->dl.running_bw * rq->dl.deadline_bw_inv) >> 20 >> 8; where rq->dl.deadline_bw_inv has been set to to_ratio(global_rt_runtime(), global_rt_period()) >> 12; This seems to work fine, and should introduce less overhead than to_ratio(). Thanks, Luca