From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 097761AB50D for ; Wed, 8 Jan 2025 08:50:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736326239; cv=none; b=lneWG04Ddw0hxIAzOno/XyAEm+BFlHTLUxCygitjDS6KG0xrsrk6LP/iL5YSfGaK6W/YPDj0q4SnMEZB+TdhsIVAkPBLoD7h1d9QZbhQ2zCyp70wBIElBTPPtKLfLR3s23nlY2JXT/K//N5O4SoYPuC5DnZar9vUH3wWZuG0oww= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736326239; c=relaxed/simple; bh=OkaJDDkVDHcuGlDWX8Nw4eZ+jbTM8iQel+HyuyvkRBI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Kl79tsj85YMtBQKfVsWqbC3IDHci8el2Db2G068QCh/JqC8DtbAfzashR4RRWUKSv+z6FzSd4NgSQ6zbM5aLgCOt1cU+SppRYbOOONsnmFv24Q9Hupcn4TMzjb8hJRWfraJdn71ksOsbL9QL0dnpLJAxXY/VJI6kXhCl8xxiPbI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=UpzzjTvX; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="UpzzjTvX" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=chhzOcV7ww7ftLll3Q7g5cnqF1IRvX3VRRQoIy3dQlY=; b=UpzzjTvX93TsFhuYRV8XxE3RCl MySxLFnKE69Ig4vaTZ5lkq6SEVJraDr+ZLVMrmLnOh/U3YeO7ByYxq8uXQOjjjdvjMKIVEwb2LTSm e2CNYSeCLKxMskcksyTdPdxTWl3doVq2vKBNDSIcOp9yYNMy1iW8EsV26jP9/h8ojMozl/m56Z4M5 IV1bDrxa68V8cO5A0KoIkpEQXQuqXb5mMDd2fC1SdyCKAPiGGinP57qVL7mQk2lS2UWXLC/WBuHqw qYa7v6HEytLwkCMR5qLt4yr1C7vvebwkfjDCYfM1nDoAO+taL3sVj0lXdOlHMwHSjlMW8xJr1dkV6 dA2C2p8g==; Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tVRli-0000000EHFE-3kHC; Wed, 08 Jan 2025 08:50:27 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 2AA073005D6; Wed, 8 Jan 2025 09:50:26 +0100 (CET) Date: Wed, 8 Jan 2025 09:50:26 +0100 From: Peter Zijlstra To: Changwoo Min Cc: tj@kernel.org, void@manifault.com, arighi@nvidia.com, mingo@redhat.com, changwoo@igalia.com, kernel-dev@igalia.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v7 2/6] sched_ext: Implement scx_bpf_now() Message-ID: <20250108085026.GC23315@noisy.programming.kicks-ass.net> References: <20241230095625.114363-1-changwoo@igalia.com> <20241230095625.114363-3-changwoo@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241230095625.114363-3-changwoo@igalia.com> > +__bpf_kfunc u64 scx_bpf_now(void) > +{ > + struct rq *rq; > + u64 clock; > + > + preempt_disable(); > + > + rq = this_rq(); > + if (READ_ONCE(rq->scx.flags) & SCX_RQ_CLK_VALID) { > + /* > + * If the rq clock is valid, use the cached rq clock. > + * > + * Note that scx_bpf_now() is re-entrant between a process > + * context and an interrupt context (e.g., timer interrupt). > + * However, we don't need to consider the race between them > + * because such race is not observable from a caller. > + */ > + clock = READ_ONCE(rq->scx.clock); > + } else { > + /* > + * Otherwise, return a fresh rq clock. > + * > + * The rq clock is updated outside of the rq lock. > + * In this case, keep the updated rq clock invalid so the next > + * kfunc call outside the rq lock gets a fresh rq clock. > + */ > + clock = sched_clock_cpu(cpu_of(rq)); > + } > + > + preempt_enable(); > + > + return clock; > +} > +static inline void scx_rq_clock_update(struct rq *rq, u64 clock) > +{ > + if (!scx_enabled()) > + return; > + WRITE_ONCE(rq->scx.clock, clock); > + WRITE_ONCE(rq->scx.flags, rq->scx.flags | SCX_RQ_CLK_VALID); > +} AFAICT it is possible to be used like: CPU0 CPU1 lock(rq1->lock); ... scx_rq_clock_update(...); scx_bpf_now(); ... unlock(rq->lock); Which then enables the following ordering problem: CPU0 CPU1 WRITE_ONCE(rq->scx.clock, clock); if (rq->scx.flags & VALID) WRITE_ONCE(rq->scx.flags, VALID); return rq->scx.clock; Where it then becomes possible to observe VALID before clock is written. That is, I rather think you need: > +static inline void scx_rq_clock_update(struct rq *rq, u64 clock) > +{ > + if (!scx_enabled()) > + return; > + WRITE_ONCE(rq->scx.clock, clock); > + smp_store_release(&rq->scx.flags, rq->scx.flags | SCX_RQ_CLK_VALID); > +} and: if (smp_load_acquire(&rq->scx.flags) & SCX_RQ_CLK_VALID) { > + /* > + * If the rq clock is valid, use the cached rq clock. > + * > + * Note that scx_bpf_now() is re-entrant between a process > + * context and an interrupt context (e.g., timer interrupt). > + * However, we don't need to consider the race between them > + * because such race is not observable from a caller. > + */ > + clock = READ_ONCE(rq->scx.clock); Such that if you ovbserve VALID, you must then also observe the clock.