From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E53AC265CA2; Thu, 4 Sep 2025 21:43:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757022209; cv=none; b=s3UdVet6K/k22NXuUrLrAfLUPRicqKahPslQT6waLUWThm+aM6WmKZQN5tnIjar5wEzKpfgdtaUl/HKGKafJwuoHzH2+wyvtISLj4qGSAImqDobgpqXIyPSBHT4yHej3O5FLjweqzo6Dvyh8Jxu+cwBc/8kPeboXnXaaklPzrwc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757022209; c=relaxed/simple; bh=lmdSHR+sDOzJlQJ5baywCcS7HXdzD4kawscTHt4X3wE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dLFvw+0561fzkcoy+64KXfOxvhVlR07csMPOFdWCCHqN3jFKNaJ8P7k9uIT09MxmwQALEHVAGJqBStRyQuBmY4UQax66NOfjrKiEMThRtqXU7kzswJrGrOJfv7rC5hp1ITAv6TLB80VdrJTPozBlRGt7l65YtVk5Akh4pLD63SQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eoqPOz7g; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eoqPOz7g" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 35C31C4CEF0; Thu, 4 Sep 2025 21:43:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1757022208; bh=lmdSHR+sDOzJlQJ5baywCcS7HXdzD4kawscTHt4X3wE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=eoqPOz7g2HYlaNz8DnGvkMyTkWfXuvGcMwW4/Oddz8BpTcwExpOfu5Hqkp02rmDfy w6Loeu2TAynPCHd+xnUAs63g01EPCXTYbDfovksnBcTTMlgrj/yFL9Q9RYbwG2PVxy VSqk7lQu5NB6m1l91tVQsmOxJB78zv03yNdePedudim5NPYiKKqD3HC+w/cgiwV9UK p1ZmEAquwkKeDLKVxbQcRyDUGOzmp6UGF8Hmig9xYDIIeDJ/joDfcUTH8aXNaJBgdt smuq1UTtzRr8Rhrl6oHm/J5Q61wULTObDwfwoc7YVcFU6UPp/0IwVjCQyRsepjHnFC aVuSlEgNhRtQw== Date: Thu, 4 Sep 2025 11:43:27 -1000 From: Tejun Heo To: Peter Zijlstra Cc: Andrea Righi , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Joel Fernandes , David Vernet , Changwoo Min , Shuah Khan , sched-ext@lists.linux.dev, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Luigi De Matteis Subject: Re: [PATCH 07/16] sched_ext: Add a DL server for sched_ext tasks Message-ID: References: <20250903095008.162049-1-arighi@nvidia.com> <20250903095008.162049-8-arighi@nvidia.com> <20250903200822.GO4067720@noisy.programming.kicks-ass.net> <20250903205646.GR4067720@noisy.programming.kicks-ass.net> <20250904202858.GN4068168@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250904202858.GN4068168@noisy.programming.kicks-ass.net> Hello, Peter. On Thu, Sep 04, 2025 at 10:28:58PM +0200, Peter Zijlstra wrote: ... > RUNNABLE: > 1) hold both source and target rq->lock. ... > Now, assuming you have a locking order like: > > p->pi_lock > rq->lock > dsq->lock > > When you do something like: > > __schedule() > raw_spin_lock(rq->lock); > next = pick_next_task() -> pick_task_scx() > raw_spin_lock(dsq->lock); > > Then you are, in effect, in the RUNNABLE 1) case above. You hold both > locks. Nothing is going to move your task around while you hold that > dsq->lock. That task is on the dsq, anybody else wanting to also do > anything with that task, will have to first take dsq->lock. > > Therefore, at this point, it is perfectly fine to do: > > set_task_cpu(cpu_of(rq)); // move task here > > There is no actual concurrency. The only thing there is is > set_task_cpu() complaining you're not following the rules -- but you > are, it just doesn't know -- and we can fix that. I can't convince myself this is safe. For example, when task_rq_lock() returns, it should guarantee that the rq that the task is currently associated with is locked and the task can't go anywhere. However, as task_rq_lock() isn't interlocked with dsq lock, this won't hold true. I think this will break multiple things subtly - e.g. the assumptions that task_call_func() makes in the comment wouldn't hold anymore, task_sched_runtime()'s test of task_on_rq_queued() would be racy, and so on. ie. Operations protected by deq/enq pair would be fine but anything which is protected only by task_rq_lock/unlock() would become racy, right? Thanks. -- tejun