From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C33413E03B for ; Tue, 23 Apr 2024 18:00:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713895260; cv=none; b=SWrpVDKPNJUSZASktbplW2Ze+3F6gd85ZFYOodf3i9PiX1oAao1+kkSAJcOpIJfA/LEBT1b1+3XLvEFrgJWT/+zp7vnlXYEcAU6DqXwNVzAWF+Y59NLQJRjh6JGnVbiDeXKvDV8SilOiaX+FQ1qeFT6RswyIOdaIuy0TNLdZhqA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713895260; c=relaxed/simple; bh=6UPBrc50kudzP0E3j3aFN5KLRCoiqbDMmCOUC2z9cDQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dmrxOR20ArnzXsLaUZTm3aKWE/Wi8vRtOQCDrkFMh+F0WGw2+zWdpAXLHUeHhmHW5v7RoksgNPFPCTF32+SqLpBLNrYpXWSyFUfwKEdeyNULGtbj3AhIAVQ1kTfcgc/Hopyltr9PWSPo4569QM0Aku/lkBxdx/c+pCef7FBKw5Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b=EPeUfwYT; arc=none smtp.client-ip=209.85.219.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="EPeUfwYT" Received: by mail-yb1-f182.google.com with SMTP id 3f1490d57ef6-de55708c616so817538276.1 for ; Tue, 23 Apr 2024 11:00:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1713895257; x=1714500057; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ruXmLN/UvQwTr3oZGWnl8Qpluv6+VMewHhW300M9b8o=; b=EPeUfwYTCfJSrX7QLr1HKuVMooK11CKTwchwCD6kramPWjubq91nkc84vtuDJpWkio TibP3x5t8V5PnHTPoovoF7rEa9akulvdzQbYPPTBHBW+ssCEhHOOwoWjtEoc7aMZDJkm E5wiIdK4FboNsIYi2bN1r6k5k2pVhXeHcXdIzD4xnn3tiBY4ui9tb3pu0bjHipQXR7ah KdpgJjYeRC20KOEX/V2pVGO+9ohEYQYqPXjYKb/HIJS4Ixv5fUosMb/qG1UvzK8YSpKo a4c4wgP4dJqeeaH9a7v6hTHpUotEDdUPHODL/bOYYERg3ihG6HPTfqC5yIgs9LC9yUZy SGjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713895257; x=1714500057; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ruXmLN/UvQwTr3oZGWnl8Qpluv6+VMewHhW300M9b8o=; b=jbAEoUPbA60FOgnxdmcJwNuSYuTH8gSNTG+wS5lCUiFZD7JkS674S5nZN/iXRLPfpW hNvqebC8RtV21U67FLkOrRolfS5G3cAoP/4jzThgubpLMK2ULiXAYJK4o76qiKKaI5CR LMLrP3AggRugzF1i3hf39rJoUKBoJuqvgnff1cQN0PUQQbojZfsVNsaa+tFYrImnVecZ HBlEpg5PxBCb15Sgd2F/KgrjFS6yPwGIyJWFmf/nJiiV899Aj8FMHAPnD+ehbSQgT3eT xT+TFNpPHF9W9R1aKfrASa3BzGs6ZXVa/iGyZ05+pua7QbR2WI8hI/4OWAVb8MDggqFr AL0Q== X-Forwarded-Encrypted: i=1; AJvYcCXU0weIWDTg/SdJSsAyk16vnBmjKeAfSB1r2OAGVmJNz2d4in6+/d1VptI4tJTsuyej9ChF/2+1RWb002D6++DoHdJdgiJixLaSJA== X-Gm-Message-State: AOJu0YyC8X8hBtINa9RylPcRjMWEIwFLPEtXsio3yh/9lSJBrURGZa98 1XcSm5frHaWyM5KC4iIoRaoBF8p4JFIQ4IUqEf3Hz+rDsBmxa+ee5rZCykgqSO0= X-Google-Smtp-Source: AGHT+IFeSayoJdfNCiZBWXKiMtSmsaH8Ne38oZHRgd3aBw8ywcYnlBz0y8Ts1iIsJ0n8+CYNFP1T3Q== X-Received: by 2002:a25:2b05:0:b0:de5:6a82:49dd with SMTP id r5-20020a252b05000000b00de56a8249ddmr69482ybr.13.1713895257306; Tue, 23 Apr 2024 11:00:57 -0700 (PDT) Received: from ghost ([50.146.0.2]) by smtp.gmail.com with ESMTPSA id e196-20020a25d3cd000000b00dcda3959006sm2659558ybf.33.2024.04.23.11.00.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Apr 2024 11:00:56 -0700 (PDT) Date: Tue, 23 Apr 2024 14:00:53 -0400 From: Charlie Jenkins To: Andrew Jones Cc: Conor Dooley , linux-riscv@lists.infradead.org, kvm-riscv@lists.infradead.org, devicetree@vger.kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, conor.dooley@microchip.com, anup@brainfault.org, atishp@atishpatra.org, robh@kernel.org, krzysztof.kozlowski+dt@linaro.org, conor+dt@kernel.org, christoph.muellner@vrull.eu, heiko@sntech.de, David.Laight@aculab.com, parri.andrea@gmail.com, luxu.kernel@bytedance.com Subject: Re: [PATCH v2 2/6] dt-bindings: riscv: Add Zawrs ISA extension description Message-ID: References: <20240419135321.70781-8-ajones@ventanamicro.com> <20240419135321.70781-10-ajones@ventanamicro.com> <20240419-chafe-leotard-e5daee19b1c8@spud> <20240419-8c6af6a169a7aa0b9aa5cac5@orel> <20240419-disdain-litmus-82874cc4872e@spud> <20240421-f63f90c2146deb0695b84441@orel> <20240423-ed9ddb701be1df4a25e29f31@orel> Precedence: bulk X-Mailing-List: devicetree@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240423-ed9ddb701be1df4a25e29f31@orel> On Tue, Apr 23, 2024 at 10:46:01AM +0200, Andrew Jones wrote: > On Mon, Apr 22, 2024 at 06:36:45PM -0400, Charlie Jenkins wrote: > > On Sun, Apr 21, 2024 at 12:20:03PM +0200, Andrew Jones wrote: > > > On Fri, Apr 19, 2024 at 12:40:01PM -0400, Charlie Jenkins wrote: > ... > > > > What would be the purpose of a vendor implementing WRS.NTO (and putting > > > > it in the DT) that never terminates? The spec says "Then a subsequent > > > > WRS.NTO instruction would cause the hart to temporarily stall execution > > > > in a low- power state until a store occurs to the reservation set or an > > > > interrupt is observed." Why is this wording for WRS.NTO not sufficient > > > > to assume that an implementation of this instruction would eventually > > > > terminate? > > > > > > > > > > We can invoke smp_cond_load_relaxed(addr, VAL || anything_we_want()). This > > > means we may not expect VAL ever to be written, which rules out "until a > > > store occurs". As for "an interrupt is observed", we don't know which one > > > to expect to arrive within a "reasonable" amount of time. We need to know > > > which one(s), since, while wrs.nto will terminate even when interrupts are > > > globally disabled, we still need to have the interrupt(s) we expect to be > > > locally enabled. And, the interrupts should arrive in a "reasonable" > > > amount of time since we want to poll anything_we_want() at a "reasonable" > > > frequency. > > > > > > So, we need firmware to promise to enable exceptions if there aren't any > > > such interrupts. Or, we could require hardware descriptions to identify > > > which interrupt(s) would be good to have enabled before calling wrs.nto. > > > Maybe there's already some way to describe something like that? > > > > > > Thanks, > > > drew > > > > Ahh okay I am caught up now. So the wording we are looking at in the > > spec is: > > > > "When executing in VS or VU mode, if the VTW bit is set in hstatus, the > > TW bit in mstatus is clear, and the WRS.NTO does not complete within an > > implementation-specific bounded time limit, the WRS.NTO instruction will > > cause a virtual instruction exception." > > That's what the hypervisor should promise to do when there's no other > guarantee of wrs.nto terminating (but the hypervisor likely wants to > anyway since it wants guests to trap on wrs.nto in order to potentially > schedule the lock holding VCPU). The firmware of the host should likewise > promise to set mstatus.TW when there's no guarantee of wrs.nto > terminating, but that's likely _not_ something it normally would want to > do, so hopefully there will always be implementation-specific "other > reasons" which guarantee termination. > > > > > With the concern being that it is possible for "implementation-specific > > bounded time limit" to be infinite/never times out, > > The implementation-defined short timeout cannot be infinite, but it only > applies to wrs.sto. While using wrs.sto would relieve the concern, it > cannot be configured to raise exceptions, which means it's not useful in > guests. If we want to use wrs.sto in hosts and wrs.nto in guests then we > need a paravirt channel which allows an "enlightened" guest to determine > that it is a guest and that the hypervisor has configured wrs.nto to > trap, which then indicates it's a good idea to patch wrs.sto to wrs.nto. > But, adding paravirt stuff should be avoided whenever possible since it > adds complexity we'd rather not maintain. > That still wouldn't solve this issue, because the wrs.nto guest may still never wakeup in the implementation-specific way? > > and the kernel > > enters a WRS where the reservation set is not required to be invalidated > > for the condition we are waiting on to become true. > > > > An option here would be to enforce in the spec that this time limit is > > finite. If the original intention of the spec was to have it be finite, > > then this would not be an issue. If the intention was to allow no time > > limit, then this would probably have to be a new extension. > > wrs.nto has been specified to never timeout. > wrs.sto has been specified to never raise exceptions. > > If we had an instruction which would timeout when mstatus.TW/hstatus.VTW > is clear and raise exceptions when set, then that's the one we'd choose. > Yes, this does seem like the ideal situtation. > > > > We are also able to change the kernel to not allow these conditions that > > would break this interpretation of WRS. I found three instances in the > > kernel that contain a condition that is not dependent on the wrs > > reservation. > > > > 1. > > # queued_spin_lock_slowpath() in kernel/locking/qspinlock.c > > val = atomic_cond_read_relaxed(&lock->val, > > (VAL != _Q_PENDING_VAL) || !cnt--); > > > > The first condition will only become true if lock->val changes which > > should invalidate the reservation. !cnt-- on the otherhand is a counter > > of the number of loops that happen under-the-hood in > > atomic_cond_read_relaxed. This seems like an abuse of the function and > > could be factored out into a new bounded-iteration cond_read macro. > > > > 2. > > # osq_lock() in kernel/locking/osq_lock.c > > if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() || > > vcpu_is_preempted(node_cpu(node->prev)))) > > > > VAL is the first condition and won't be a problem here since changes to > > it will cause the reservation to become invalid. arm64 has hard-coded > > vcpu_is_preempted to be false for the same exact reason that riscv would > > want to (the wait wouldn't be woken up). There is a comment that > > points this out in arch/arm64/include/asm/spinlock.h. riscv currently > > uses the default implementation which returns false. > > The operative word is 'currently'. I plan to eventually get riscv's > vcpu_is_preempted() working since we already laid the groundwork by > adding a preempted flag to the SBI STA struct. > > > > > need_resched() should not be a problem since this condition only changes > > when the hart recieves an IPI, so as long as the hart is able to receive > > an IPI while in WRS it will be fine. > > > > 3. > > # __arm_smmu_cmdq_poll_until_msi() in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > smp_cond_load_relaxed(cmd, !VAL || (ret = queue_poll(&qp))); > > > > arm driver, not relevant. > > > > > > > > The only case that would cause a problem in the current implementation > > of the kernel would be queued_spin_lock_slowpath() with the cnt check. > > We are able to either change this definition, change the spec, or leave > > it up to the vendor who would be hit by this issue to change it. > > We could attempt to document restrictions on the condition given to > smp_cond_load_relaxed() and then change the callers to honor those > restrictions, but that doesn't sound too easy. How will we remove > vcpu_is_preempted() on x86? The solution here seems like it would be to not use wrs for riscv in this case. We really would need an alternate extension that allows wrs in a guest to be guaranteed to trap into the host at some point. > > We should probably start the process of a new extension which has the > hybrid wrs.sto/nto instruction, but I'm reluctant to, since, in practice, > the current wrs.nto is probably fine. I don't really expect there to be > implementations that never terminate, even though I'd rather we document > that it's _required_ wrs.nto terminates, or that exceptions be raised. > Maybe I'm attempting to document it in the wrong place though. Maybe this > is more of a Documentation/arch/riscv/boot.rst type of thing. > wrs.nto is most likely sufficient. A new extension will take a long time. We could go ahead with wrs.nto, propose the extension, and when the extension is ready switch over to it. In the meantime have this behavior documented. > Thanks, > drew - Charlie