From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S933112AbcBDScV (ORCPT <rfc822;w@1wt.eu>);
	Thu, 4 Feb 2016 13:32:21 -0500
Received: from foss.arm.com ([217.140.101.70]:43910 "EHLO foss.arm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932556AbcBDScS (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 4 Feb 2016 13:32:18 -0500
Date: Thu, 4 Feb 2016 18:32:59 +0000
From: Juri Lelli <juri.lelli@arm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Clark Williams <williams@redhat.com>, John Kacur <jkacur@redhat.com>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Juri Lelli <juri.lelli@gmail.com>
Subject: Re: [BUG] Corrupted SCHED_DEADLINE bandwidth with cpusets
Message-ID: <20160204183259.GF29586@e106622-lin>
References: <20160203135550.5f95ecb2@gandalf.local.home>
 <20160204095448.GE12132@e106622-lin>
 <20160204120412.GA29586@e106622-lin>
 <20160204122745.GC29586@e106622-lin>
 <20160204163049.GE29586@e106622-lin>
 <20160204123103.058642ed@gandalf.local.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160204123103.058642ed@gandalf.local.home>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 04/02/16 12:31, Steven Rostedt wrote:
> On Thu, 4 Feb 2016 16:30:49 +0000
> Juri Lelli <juri.lelli@arm.com> wrote:
> 
> > I've actually changed a bit this approach, and things seem better here.
> > Could you please give this a try? (You can also fetch the same branch).
> 
> It appears to fix the one issue I pointed out, but it doesn't fix the
> issue with cpusets.
> 
>  # burn&
>  # TASK=$!
>  # schedtool -E -t 2000000:20000000 $TASK
>  # grep dl /proc/sched_debug
> dl_rq[0]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> dl_rq[1]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> dl_rq[2]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> dl_rq[3]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> dl_rq[4]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> dl_rq[5]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> dl_rq[6]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> dl_rq[7]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 104857
> 
>  # mkdir /sys/fs/cgroup/cpuset/my_cpuset
>  # echo 1 > /sys/fs/cgroup/cpuset/my_cpuset/cpuset.cpus
>  # grep dl /proc/sched_debug
> dl_rq[0]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> dl_rq[1]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> dl_rq[2]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> dl_rq[3]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> dl_rq[4]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> dl_rq[5]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> dl_rq[6]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> dl_rq[7]:
>   .dl_nr_running                 : 0
>   .dl_bw->bw                     : 996147
>   .dl_bw->total_bw               : 209714
> 
> It appears to add double the bandwidth.
> 

Mmm.. IIUC that's because we don't destroy any root_domain in this case,
as sched_load_balance of the parent is still set. So we add again to the
existing one. I could fix that with some flag indicating when we
actually destroy root_domain(s), but I fear it will make this solution
uglier than it is already :/. More thinking required.

Thanks for testing.

Best,

- Juri