From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17121126C08 for ; Tue, 17 Dec 2024 08:52:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734425573; cv=none; b=SGlVx33XqzBmqZ+HGUhj75K7wCMa1flAdyjGt851eHeDvf2ABZZgj+b3T9FqTe9ZTxNbX/EU7Q0laR6cygcT4jaxoTl/8ym8PV1ewo2Q+iDc3958PKyB3P1XhPM4NvYlfB4qA/seCA4TtukPsNSXaN/GzRrlNSf5TS2UpeGgPMw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734425573; c=relaxed/simple; bh=A3o+UlHWraOY4K2CE/YvM5CDsQmxGDMQ1d5dkZ9NoAI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=K1cVVGaIWUf2dzOM41e0EfJ8oQ+nZaUKwONs/YyjQiEzUqAuJll4ReA1b1R7WamfdLfzpKHR5TIqnuiRwggSHW6ISF9W1kfOT6dX89iCxnase8Z4/T2TD1pi+XHwtJOGW39bu9HCpvDzEiRUwE2sw0W1jLa8282PAUfPlJKWDfs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=HIDvKdp2; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="HIDvKdp2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=aypXfkBlQH9z+E33uDRENlhfda0ApLGXn+W6C302NpQ=; b=HIDvKdp28duzDC6tJNS5L/B1vK D3rkd9DwMwwXtZ9LUMLrL7xB5ewwaR/+tePbEl5m88MiqMSiG7xMp5IA2wfgVe2oluGVhsc3sFUlB EaM27VPcZuKifTN3k8S5GklgZomiMYKNr0oceC9tz69O9015rIHjAnDIBhPjesQW4ePmgc9L1ijP0 8yBStgh6/LOEFYiNUmu+r+J1oKRUTtzTDN0ytUEOo0pITQ623iod30kE+8qs2bjxiZ4LlaTtFRZcv n4a8L7IebHL69zHTsgKiAAv0JqxxDC1uHNjCE38adey88HH68VziX3QMG/+roiGzUWsrpBwd7ALU3 3g1dJ2uA==; Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tNTJs-00000005tKO-1oxR; Tue, 17 Dec 2024 08:52:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id F1181300192; Tue, 17 Dec 2024 09:52:43 +0100 (CET) Date: Tue, 17 Dec 2024 09:52:43 +0100 From: Peter Zijlstra To: John Stultz Cc: LKML , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com Subject: Re: [RFC][PATCH v14 5/7] sched: Add an initial sketch of the find_proxy_task() function Message-ID: <20241217085243.GI35539@noisy.programming.kicks-ass.net> References: <20241125195204.2374458-1-jstultz@google.com> <20241125195204.2374458-6-jstultz@google.com> <20241214000559.GC17501@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, Dec 16, 2024 at 09:42:31PM -0800, John Stultz wrote: > On Fri, Dec 13, 2024 at 4:06 PM Peter Zijlstra wrote: > > > > On Mon, Nov 25, 2024 at 11:51:59AM -0800, John Stultz wrote: > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index f8714050b6d0d..b492506d33415 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -5052,6 +5052,34 @@ static void do_balance_callbacks(struct rq *rq, struct balance_callback *head) > > > } > > > } > > > > > > +/* > > > + * Only called from __schedule context > > > + * > > > + * There are some cases where we are going to re-do the action > > > + * that added the balance callbacks. We may not be in a state > > > + * where we can run them, so just zap them so they can be > > > + * properly re-added on the next time around. This is similar > > > + * handling to running the callbacks, except we just don't call > > > + * them. > > > + */ > > > > Which specific callbacks are this? sched_core_balance()? > > > > In general, shooting down all callbacks like this makes me feel somewhat > > uncomfortable. > > So, if we originally picked a RT task, I believe it would setup the > push_rt_tasks callback, but if it got migrated and if we needed to > pick again, we'd end up tripping on > `SCHED_WARN_ON(rq->balance_callback && rq->balance_callback != > &balance_push_callback);` > > For a while I tried to unpin and run the balance callbacks before > calling pick_again, if find_proxy_task() failed, but that was running > into troubles with tasks getting unintentionally added to the rt > pushable list (this was back in ~feb, so my memory is a little fuzzy). > > So that's when I figured zaping the callbacks would be best, with the > idea being that we are starting selection over, so we effectively have > to undo any of the state that was set by pick_next_task() before > calling it again. > > Let me know if you have concerns with this, or suggestions for other approaches. For now, lets stick a coherent comment on, explaining exactly which callbacks and why. > > > +/* > > > + * Initial simple proxy that just returns the task if it's waking > > > + * or deactivates the blocked task so we can pick something that > > > + * isn't blocked. > > > + */ > > > +static struct task_struct * > > > +find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) > > > +{ > > > + struct task_struct *p = donor; > > > + struct mutex *mutex; > > > + > > > + mutex = p->blocked_on; > > > + /* Something changed in the chain, so pick again */ > > > + if (!mutex) > > > + return NULL; > > > + /* > > > + * By taking mutex->wait_lock we hold off concurrent mutex_unlock() > > > + * and ensure @owner sticks around. > > > + */ > > > + raw_spin_lock(&mutex->wait_lock); > > > + raw_spin_lock(&p->blocked_lock); > > > > I'm still wondering what this blocked_lock does, that previous patch had > > it mirror wait_mutex too, so far I don't see the point. > > Yeah, early on in the series it's maybe not as useful, but as we start > dealing with sleeping owner enqueuing, its doing more: > https://github.com/johnstultz-work/linux-dev/commit/d594ca8df88645aa3b2b9daa105664893818bdb7 > > But it is possible it is more of a crutch for me to keep straight the > locking rules as it's simpler to keep in my head. :) > Happy to think a bit more on if it can be folded together with another lock. I'm a big believer in only introducing state when we actually need it -- and I don't believe we actually need blocked_lock until we go SMP. Anyway, I have since figured out the why of blocked_lock again; but yeah, comments, because I'm sure to forget it again at some point.