From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04DC139936F for ; Thu, 7 May 2026 09:57:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778147848; cv=none; b=hT3/OQmVUCxlogN5Aq/tAdAHqahyRkNClL2gw2LEP7MEJtuqWEVdlt1t2kkJv7DjUoH3Mel4XCFHBqa6JJRKDM35D5c9uCtNxn0sdaJR9qbndnol5MjmigV9N5eoIbvGFljV+xvAvNx/NSOeiUqsjJV55oLsEyyhGeKX+IVhP60= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778147848; c=relaxed/simple; bh=hfGaOaB91E+XZBFp379GHs8PshD6s7Rlikxhrr5afig=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=n0IFtiRGMIv8mh3uWfTt9uVvdhY2MJ4lpfVXJEBz+mWo3hTO4mHgGO4djpFNiSxLmxGu7TZf2aiKUCSnvz3QUAd5OR40+RXKEfz877XDG12kAKN/uVlJdBwnWTq9zaisJPeJaLKv5xY+aEMc7QEw7FQEXlL9/x5bM1XfcUEFXgY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=f9iGJjtG; arc=none smtp.client-ip=209.85.128.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="f9iGJjtG" Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-488ff90d6c7so5763255e9.2 for ; Thu, 07 May 2026 02:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778147843; x=1778752643; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=vhLkLTolbyFKFrJFgOsn5aZNEw5iIaEfKoEnlONf3Ys=; b=f9iGJjtGGW5OD/Z68PnTMAc5pa5IGJBpfme4iErpujL9IruYgzql5GlO+A4wJ103Xr 5COnQ7DVUtmvB/3MEYGEgCIOY/IwhB8ORUoOYMNEQcHQNgAWPkq7VGrk1FIwT9QtwRzp 7s48Yotw7pQ5izpZJYMJ1EirPvE1cE6wbXRRGKWeAlzRM3wZsrFosjyERRC71RovadUH qjQLG0AivQh59Xssk6gQVZWMFhZgDSgcovqQ8W/XpXF6/aaTM/BUVgaiQBBooboWEg0C tJ0LcMnetpMCP/y60FAJtH+0vPVcsUymFcdjL4kJXaDLlGuDpr1oxs4HEI7XzCfY2aqf QP5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778147843; x=1778752643; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=vhLkLTolbyFKFrJFgOsn5aZNEw5iIaEfKoEnlONf3Ys=; b=cRvZmqR+s0I5XKEWjFa6i81GT0LR6pBgTa1bA8oOCxewaWrKCJrm9GweAlV3iY/Qvp xAZtRQqoGusXZ3sNr8kkw8esa7zf4Thv6Y8uEohB2SVRQQP5mTqDrwU3OpbTYKseFWGX 7enXws/ncmtDk0e/oI25w3Zw3s+NFTBUQwC7AahTYXlxj4yh6KheMmdEuVe3Q/xNHDHC yz/sye9aS7/AHf4/cup4ee3Xn31G4X2Z5vAqA/DbEZfHu0ocIPoSyRmMeYNHcHxrsHP9 GXkoDeKit5F7q+yMRVpw2NvsK2vXO9RttvcBNw5ZGTnpsqzoAiPMhLYb7mPh8R4kif+B rG+Q== X-Gm-Message-State: AOJu0YySaAVy65is5Dk36SIDXV1M/EPLtRSVbctsCa9jwSvC7NqXXE0h xInW05Df55iCIzJTFlLwkoM+ooXYNkjNyWoL/mT0gme8Mdq9iO5UVzdR X-Gm-Gg: AeBDievOShWvrAex6gy+Ag2z9E45AAEGqty6pdvVIICYaAdu62PXoghnHaYnEoP5b5/ D/wbLuifuiUTJX8HnBB6UcldvWruU8TabMVRS810XAZEAqnstjEo1BKc5WaDNPV5d3IKjHhhmMv 5h/WJWSundlhIDpeneKPvgTru3xh1h4SajFQAcAI4ugty6mEFoxuloviCBf+iAGHkBmUb6Ui5l7 /nU9mFmQJc/GgVo8KBG3Z3bhkyfOZZeN3KB2y6ewV7XDRgTnzzOSO/Ebiwb4hLwJiTNBagihNWM O7UHU/hgF4r+kDlAvaNMvtzFti0m1eMIxphVRda47FO9mfWKngKD/bpGpnJO2ikBEKR6n/G+LUK 9TjSzxgVWfIm8eGgIuGxLnXVncBIt4ebGBkan5PLdrfb42LM670gvi9K2oUBnvkPCQBMbVP6bxT OGUC9kotwmvBGQvuMYTfg+3LmA3i8H6TLT/jUrbNcZXrBfJgtETQ5XgHVIQh5bzPIfuVHezUQ= X-Received: by 2002:a05:600c:1e8a:b0:488:fd7e:1063 with SMTP id 5b1f17b1804b1-48e51f4cdcdmr114717615e9.29.1778147843159; Thu, 07 May 2026 02:57:23 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48e538ca8c0sm104145975e9.13.2026.05.07.02.57.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 May 2026 02:57:22 -0700 (PDT) Date: Thu, 7 May 2026 10:57:21 +0100 From: David Laight To: Ankur Arora Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org, bpf@vger.kernel.org, arnd@arndb.de, catalin.marinas@arm.com, will@kernel.org, peterz@infradead.org, akpm@linux-foundation.org, mark.rutland@arm.com, harisokn@amazon.com, cl@gentwo.org, ast@kernel.org, rafael@kernel.org, daniel.lezcano@linaro.org, memxor@gmail.com, zhenglifeng1@huawei.com, xueshuai@linux.alibaba.com, rdunlap@infradead.org, joao.m.martins@oracle.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, ashok.bhat@arm.com Subject: Re: [PATCH v11 01/14] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Message-ID: <20260507105721.66ba1e45@pumpkin> In-Reply-To: <87o6isl0nl.fsf@oracle.com> References: <20260408122538.3610871-1-ankur.a.arora@oracle.com> <20260408122538.3610871-2-ankur.a.arora@oracle.com> <874iklm1uy.fsf@oracle.com> <20260506095836.216d9cc5@pumpkin> <87o6isl0nl.fsf@oracle.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Wed, 06 May 2026 13:54:06 -0700 Ankur Arora wrote: > David Laight writes: > > > On Wed, 06 May 2026 00:30:29 -0700 > > Ankur Arora wrote: > > > >> Ankur Arora writes: > >> > >> > Add smp_cond_load_relaxed_timeout(), which extends > >> > smp_cond_load_relaxed() to allow waiting for a duration. > >> > > >> > We loop around waiting for the condition variable to change while > >> > peridically doing a time-check. The loop uses cpu_poll_relax() to slow > >> > down the busy-wait, which, unless overridden by the architecture > >> > code, amounts to a cpu_relax(). > >> > > >> > Note that there are two ways for the time-check to fail: the timeout > >> > case or, @time_expr_ns returning an invalid value (negative or zero). > >> > The second failure mode allows for clocks attached to the clock-domain > >> > of @cond_expr -- which might cease to operate meaningfully once some > >> > state internal to @cond_expr has changed -- to fail. > >> > > >> > Evaluation of @time_expr_ns: in the fastpath we want to keep the > >> > performance close to smp_cond_load_relaxed(). So defer evaluation > >> > of the potentially costly @time_expr_ns to the slowpath. > >> > > >> > This also means that there will always be some hardware dependent > >> > duration that has passed in cpu_poll_relax() iterations at the time > >> > of first evaluation. Additionally cpu_poll_relax() is not guaranteed > >> > to return at timeout boundary. In sum, expect timeout overshoot when > >> > we exit due to expiration of the timeout. > >> > > >> > The number of spin iterations before time-check, SMP_TIMEOUT_POLL_COUNT > >> > is chosen to be 200 by default. With a cpu_poll_relax() iteration > >> > taking ~20-30 cycles (measured on a variety of x86 platforms), we > >> > expect a time-check every ~4000-6000 cycles. > >> > > >> > The outer limit of the overshoot is double that when working with the > >> > parameters above. This might be higher or lower depending on the > >> > implementation of cpu_poll_relax() across architectures. > >> > > >> > Lastly, config option ARCH_HAS_CPU_RELAX indicates availability of a > >> > cpu_poll_relax() that is cheaper than polling. This might be relevant > >> > for cases with a long timeout. > >> > > >> > Cc: Arnd Bergmann > >> > Cc: Will Deacon > >> > Cc: Catalin Marinas > >> > Cc: Peter Zijlstra > >> > Cc: linux-arch@vger.kernel.org > >> > Reviewed-by: Catalin Marinas > >> > Signed-off-by: Ankur Arora > >> > --- > >> > Notes: > >> > - add a comment mentioning that smp_cond_load_relaxed_timeout() might > >> > be using architectural primitives that don't support MMIO. > >> > (David Laight, Catalin Marinas) > >> > > >> > include/asm-generic/barrier.h | 69 +++++++++++++++++++++++++++++++++++ > >> > 1 file changed, 69 insertions(+) > >> > > >> > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > >> > index d4f581c1e21d..e5a6a1c04649 100644 > >> > --- a/include/asm-generic/barrier.h > >> > +++ b/include/asm-generic/barrier.h > >> > @@ -273,6 +273,75 @@ do { \ > >> > }) > >> > #endif > >> > > >> > +/* > >> > + * Number of times we iterate in the loop before doing the time check. > >> > + * Note that the iteration count assumes that the loop condition is > >> > + * relatively cheap. > >> > + */ > >> > +#ifndef SMP_TIMEOUT_POLL_COUNT > >> > +#define SMP_TIMEOUT_POLL_COUNT 200 > >> > +#endif > >> > + > >> > +/* > >> > + * Platforms with ARCH_HAS_CPU_RELAX have a cpu_poll_relax() implementation > >> > + * that is expected to be cheaper (lower power) than pure polling. > >> > + */ > >> > +#ifndef cpu_poll_relax > >> > +#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax() > >> > +#endif > >> > + > >> > +/** > >> > + * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering > >> > + * guarantees until a timeout expires. > >> > + * @ptr: pointer to the variable to wait on. > >> > + * @cond_expr: boolean expression to wait for. > >> > + * @time_expr_ns: expression that evaluates to monotonic time (in ns) or, > >> > + * on failure, returns a negative value. > >> > + * @timeout_ns: timeout value in ns > >> > + * Both of the above are assumed to be compatible with s64; the signed > >> > + * value is used to handle the failure case in @time_expr_ns. > >> > + * > >> > + * Equivalent to using READ_ONCE() on the condition variable. > >> > + * > >> > + * Callers that expect to wait for prolonged durations might want > >> > + * to take into account the availability of ARCH_HAS_CPU_RELAX. > >> > + * > >> > + * Note that @ptr is expected to point to a memory address. Using this > >> > + * interface with MMIO will be slower (since SMP_TIMEOUT_POLL_COUNT is > >> > + * tuned for memory) and might also break in interesting architecture > >> > + * dependent ways. > >> > + */ > >> > +#ifndef smp_cond_load_relaxed_timeout > >> > +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \ > >> > + time_expr_ns, timeout_ns) \ > >> > +({ \ > >> > + typeof(ptr) __PTR = (ptr); \ auto __PTR = ptr; > >> > + __unqual_scalar_typeof(*ptr) VAL; \ It can't matter if integer promotions before assigning to VAL. So something like: auto VAL = 1 ? 0 : *__PTR + 0; will generate a suitable writable variable. (The '+ 0' is needed because some versions of gcc incorrectly propagate 'const'.) > >> > + u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \ > >> > + s64 __timeout = (s64)timeout_ns; \ The (s64) cast can only hide errors. > >> > + s64 __time_now, __time_end = 0; \ > >> > + \ > >> > + for (;;) { \ > >> > + VAL = READ_ONCE(*__PTR); \ > >> > + if (cond_expr) \ > >> > + break; \ > >> > + cpu_poll_relax(__PTR, VAL, (u64)__timeout); \ That doesn't look right, __timeout is relative; if the underlying code uses the timeout then the code delays for 200 * timeout_ns before even looking at the actual time. If you want to spin then you may not even want the cpu_relax() at all. (Or with a very short timeout as in the version below.) > >> > + if (++__n < __spin) \ > >> > + continue; \ > >> > + __time_now = (s64)(time_expr_ns); \ Another cast that can only hide bugs. > >> > + if (unlikely(__time_end == 0)) \ > >> > + __time_end = __time_now + __timeout; \ > >> > + __timeout = __time_end - __time_now; \ > >> > + if (__time_now <= 0 || __timeout <= 0) { \ > >> > + VAL = READ_ONCE(*__PTR); \ > >> > + break; \ > >> > + } \ > >> > + __n = 0; \ Resetting the spin count doesn't look right at all. In principle the code will delay for 200 * __timeout. Possibly the earlier check should be: if (__n < __spin) { __n++; continue; } (Or just don't worry that the code will spin again after 4M loops. The problem you have is that if cpu_poll_relay() ignores the timeout you probably want to spin 'for a bit' in code that only accesses local data (in particular avoiding evaluating cond_expr or time_expr_ns). > >> > + } \ > >> > + (typeof(*ptr))VAL; \ That cast is pointless; the return value will be subject to 'integer promotion' and converted to a rvalue - which removes any const/volatile qualifiers. > >> > +}) > >> > +#endif > >> > + > >> > >> A cluster of issues that got flagged by sashiko was around timeout_ns > >> being specified as s64 and a bunch of potential edge cases around > >> that. > >> > >> These were mostly caused by an implicit assumption in the code that > >> the timeout specified by the caller is generally reasonable. So, way > >> below S64_MAX, not 0 etc. > > > > There are plenty of ways kernel code can break things. > > Provided this code doesn't itself overwrite anywhere (rather than > > just loop forever or return immediately etc) I'd be tempted to > > just document the valid range rather than slow everything down > > with the extra tests. > > I don't disagree. In this case, however, it's somewhat borderline. > > On the pro side, we get rid of some of the implicit type conversions > and assumptions around those. > > On the negative, it adds an extra modulo operation in the slow path. > And, the for loop is structured a little differently from the usual > version. > > On balance, I think this is a good change if only because it makes > the types a little more explicit. > > Ankur > > > David > > > >> > >> I think this is worth cleaning up a bit. The change is mostly around > >> introducing a u32 __itertime and explicitly computing the waiting time. > >> And adding a check to ensure that we start with a valid value. > >> > >> This does make the implementation a little more involved. So just wanted > >> to see if people have any opinions on this? > >> > >> +#ifndef smp_cond_load_relaxed_timeout > >> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \ > >> + time_expr_ns, timeout_ns) \ > >> +({ \ > >> + typeof(ptr) __PTR = (ptr); \ > >> + __unqual_scalar_typeof(*(ptr)) VAL; \ > >> + u32 __count = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \ > >> + s64 __timeout = (s64)(timeout_ns); \ > >> + s64 __time_now, __time_end = 0; \ > >> + u32 __maybe_unused __itertime; \ > >> + \ > >> + for (__itertime = NSEC_PER_USEC; \ Ok, so that limits the initial 'spin' to 200 usecs. That gets added to any caller-specified timeout. > >> + VAL = READ_ONCE(*__PTR), __timeout > 0; ) { \ Broken indentation. I'd change it back to a for (;;) loop. If __timeout <= 0 then the code goes through the 'timer expired' path (below) on the first iteration. So the extra check is just bloat. > >> + if (cond_expr) \ > >> + break; \ > >> + cpu_poll_relax(__PTR, VAL, __itertime); \ > >> + if (++__count < __spin) \ > >> + continue; \ > >> + __time_now = (s64)(time_expr_ns); \ > >> + if (unlikely(__time_end == 0)) \ > >> + __time_end = __time_now + __timeout; \ > >> + __timeout = __time_end - __time_now; \ > >> + if (__time_now <= 0 || __timeout <= 0) { \ > >> + VAL = READ_ONCE(*__PTR); \ > >> + break; \ > >> + } \ How about: if (unlikely(__time_end == 0)) { if (__time_now <= 0) goto timed_out; __time_end = __time_now + __timeout; } else { if (__time_now >= __time_end) { timed_out: VAL = READ_ONCE(*__PTR); break; } __timeout = __time_end - __time_now; } > >> + __itertime = __timeout % NSEC_PER_MSEC + \ > >> + NSEC_PER_USEC; \ That seems to just be putting a bound on the timeout. So the '% NSEC_PER_MSEC' could be '& ((1u << 20) - 1)' replacing an expensive signed divide with a cheap mask. But overall this is a lot of code to inline. It has to be possible to get it down to something like: struct info info = { .tmo_ns = timeout_ns }; for (;;) { VAL = READ_ONCE(PTR); if (cond_expr) break; if (_smp_cond_load_relaxed_timeout(&info, PTR, VAL)) break; } VAL; (yes, I know it isn't that simple because the arm 'relax' code has a re-read in it so needs to know the size.) -- David > >> + __count = 0; \ > >> + } \ > >> + (typeof(*(ptr)))VAL; \ > >> +}) > >> +#endif > >> > >> Thanks > >> > >> -- > >> ankur > >> > > > -- > ankur >