From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752171AbcBKMtO (ORCPT ); Thu, 11 Feb 2016 07:49:14 -0500 Received: from foss.arm.com ([217.140.101.70]:53862 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751017AbcBKMtN (ORCPT ); Thu, 11 Feb 2016 07:49:13 -0500 Date: Thu, 11 Feb 2016 12:49:59 +0000 From: Juri Lelli To: luca abeni Cc: Steven Rostedt , linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, vincent.guittot@linaro.org, wanpeng.li@hotmail.com Subject: Re: [PATCH 1/2] sched/deadline: add per rq tracking of admitted bandwidth Message-ID: <20160211124959.GO11415@e106622-lin> References: <1454935531-7541-1-git-send-email-juri.lelli@arm.com> <1454935531-7541-2-git-send-email-juri.lelli@arm.com> <20160210113258.GX11415@e106622-lin> <20160210093702.10c655be@gandalf.local.home> <20160210162748.GI11415@e106622-lin> <20160211121257.GL11415@e106622-lin> <20160211132254.1a369fe9@utopia> <20160211122754.GN11415@e106622-lin> <20160211134018.6b15fd68@utopia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160211134018.6b15fd68@utopia> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/02/16 13:40, Luca Abeni wrote: > On Thu, 11 Feb 2016 12:27:54 +0000 > Juri Lelli wrote: > > > On 11/02/16 13:22, Luca Abeni wrote: > > > Hi Juri, > > > > > > On Thu, 11 Feb 2016 12:12:57 +0000 > > > Juri Lelli wrote: > > > [...] > > > > I think we still have (at least) two problems: > > > > > > > > - select_task_rq_dl, if we select a different target > > > > - select_task_rq might make use of select_fallback_rq, if > > > > cpus_allowed changed after the task went to sleep > > > > > > > > Second case is what creates the problem here, as we don't update > > > > task_rq(p) and fallback_cpu ac_bw. I was thinking we might do so, > > > > maybe adding fallback_cpu in task_struct, from > > > > migrate_task_rq_dl() (it has to be added yes), but I fear that we > > > > should hold both rq locks :/. > > > > > > > > Luca, did you already face this problem (if I got it right) and > > > > thought of a way to fix it? I'll go back and stare a bit more at > > > > those paths. > > > In my patch I took care of the first case (modifying > > > select_task_rq_dl() to move the utilization from the "old rq" to the > > > "new rq"), but I never managed to trigger select_fallback_rq() in my > > > tests, so I overlooked that case. > > > > > > > Right, I was thinking to do the same. And you did that after grabbing > > both locks, right? > > Not sure if I did everything correctly, but my code in > select_task_rq_dl() currently looks like this (you can obviously > ignore the "migrate_active" and "*_running_bw()" parts, and focus on > the "*_rq_bw()" stuff): > [...] > if (rq != cpu_rq(cpu)) { > int migrate_active; > > raw_spin_lock(&rq->lock); > migrate_active = hrtimer_active(&p->dl.inactive_timer); > if (migrate_active) { > hrtimer_try_to_cancel(&p->dl.inactive_timer); > sub_running_bw(&p->dl, &rq->dl); > } > sub_rq_bw(&p->dl, &rq->dl); > raw_spin_unlock(&rq->lock); > rq = cpu_rq(cpu); Can't something happen here? My problem is that I use per-rq bw tracking to save/restore root_domain state. So, I fear that a root_domain update can happen while we are in the middle of moving bw from one cpu to another. > raw_spin_lock(&rq->lock); > add_rq_bw(&p->dl, &rq->dl); > if (migrate_active) > add_running_bw(&p->dl, &rq->dl); > raw_spin_unlock(&rq->lock); > } > [...] > > lockdep is not screaming, and I am not able to trigger any race > condition or strange behaviour (I am currently at more than 24h of > continuous stress-testing, but maybe my testcase is not so good in > finding races here :) > Thanks for sharing what you have!