From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1F876C3A59D for ; Fri, 5 Apr 2024 23:26:12 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rssw1-0006RH-H2; Fri, 05 Apr 2024 19:25:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rssvy-0006Qu-V2 for qemu-devel@nongnu.org; Fri, 05 Apr 2024 19:25:23 -0400 Received: from fhigh8-smtp.messagingengine.com ([103.168.172.159]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rssvv-0000Pn-5I for qemu-devel@nongnu.org; Fri, 05 Apr 2024 19:25:22 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 6ED5911400A6; Fri, 5 Apr 2024 19:25:14 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Fri, 05 Apr 2024 19:25:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=buhman.org; h=cc :cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1712359514; x=1712445914; bh=6FCB+7nAKc Vdt0zrG5gQFTwT/5dczxNIQqh1yGPZW0E=; b=a7jQ6JGyoUF7Khm6wi2xhs4Bk4 n7Y1MlL+kMnTmGzy/TA5Pe2d/jenD9MXXjNXbg5wp01ldehoag5ddRuWv8uj6vAR /r7qbnQaPQ7M3eNcl2OyMPw4d6NJaOEaUMQFah+aZoR3ZzYgtDOvFZMRR/BjoDS0 G0bDVsyWi1QqSP/17SbfUSSN0X9DPOzRkEOTjTL+Y6RRtLTdF9412KLQAlfno5lG LqfCQQh6jjXrluUrvaub+nqqXWYbTAlnFOUp0md8jbVC8FSGy+EpFMeKd5ASmzfU NGoUcXCzkGFCElgnjriwvSmpY/swK/E7zg0DBsU5peDYNfEL/43UUW9iRLww== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1712359514; x=1712445914; bh=6FCB+7nAKcVdt0zrG5gQFTwT/5dc zxNIQqh1yGPZW0E=; b=UBKNrQsMf2TP6u5AnZUwcx0KhmvmCVGmqILQeAbxCbOH Oc1T3HO6i+S7nYaSaGFSt752gIGSGBYpjJ7vYlv5RXjfQnfkhbwSgMxsGDqtNYnu tUJ+8ff2dTW1oVPLXGayZ199dBldiqzy+eWeLTo/fAtWB7FsoenxLdvgRrcUGfmL KaW6girJ8CzAn+Bg63XAUmT/2+wKgLQBos+S/tT+8SDKqy8dWXz0OuJYaVb8BJMN 3TLihtVfcrLhHIbiQBjH84JIp7qYTEXV0Q+NssdkIZfVrHz7xnIFRQGMg+F4xqj5 QvIsZUy34GQaqK4E7zBzKybYk8l3XgEp8yHKzEWwKw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudeguddgvdduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfhgfhffvvefuffgjkfggtgesthdtredttdertdenucfhrhhomhepkggrtghk uceuuhhhmhgrnhcuoeiirggtkhessghuhhhmrghnrdhorhhgqeenucggtffrrghtthgvrh hnpeejveejfffhteelvdehiedvieetteetudelgffhgeettddugeegudeugfdutefgkeen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpeiirggtkh essghuhhhmrghnrdhorhhg X-ME-Proxy: Feedback-ID: i1541475f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 5 Apr 2024 19:25:11 -0400 (EDT) References: <20240405075404.2122-2-zack@buhman.org> User-agent: mu4e 1.10.6; emacs 28.2 From: Zack Buhman To: Peter Maydell Cc: qemu-devel@nongnu.org, ysato@users.sourceforge.jp Subject: Re: [PATCH] sh4: mac.w: implement saturation arithmetic logic Date: Sat, 06 Apr 2024 07:19:51 +0800 In-reply-to: Message-ID: MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=103.168.172.159; envelope-from=zack@buhman.org; helo=fhigh8-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Peter Maydell writes: > On Fri, 5 Apr 2024 at 08:55, Zack Buhman wrote: >> >> The saturation arithmetic logic in helper_macw is not correct. >> >> I tested and verified this behavior on a SH7091, the general pattern >> is a code sequence such as: >> >> sets >> >> mov.l _mach,r2 >> lds r2,mach >> mov.l _macl,r2 >> lds r2,macl >> >> mova _n,r0 >> mov r0,r1 >> mova _m,r0 >> mac.w @r0+,@r1+ >> >> _mach: .long 0xffffffff >> _macl: .long 0xfffffffe >> _m: .word 0x0002 >> .word 0 >> _n: .word 0x0003 >> .word 0 >> >> test 0: >> (mach should not be modified if an overflow did not occur) >> >> given, prior to saturation mac.l: >> mach = 0xffffffff ; macl = 0xfffffffe >> @r0 = 0x0002 ; @r1 = 0x0003 >> >> expected saturation mac.w result: >> mach = 0xffffffff (unchanged) >> macl = 0x00000004 >> >> qemu saturation mac.w result (before this commit): >> mach = 0x00000001 >> macl = 0x80000000 >> >> In the context of the helper_macw implementation prior to this >> commit, initially this appears to be a surprising result. This is >> because (prior to unary negation) the C literal `0x80000000` (due to >> being outside the range of a `signed int`) is evaluated as an >> `unsigned int` whereas the literal `1` (due to being inside the >> range of `signed int`) is evaluated as `signed int`, as in: >> >> static_assert(1 < -0x80000000 == 1); >> static_assert(1 < -1 == 0); >> >> This is because the unary negation of an unsigned int is an >> unsigned int. > > So we could also fix this by getting the C literals right > so that they are correctly the signed 64 bit values that > the author intended, right? Making -0x80000000 signed as intended, as a standalone change without the other aspects of this patch, would not yield a completely correct mac.w implementation. For example, in saturation mode, the following logic does not exist prior to this patch: - MACH must not be overwritten if an overflow did not occur - MACH must not be included in the addition/accumulation operation (it is simply a 32-bit + 32-bit = 33-bit addition of MACL and the multiplication result) > >> In other words, if the `res < -0x80000000` comparison used >> infinite-precision literals, the saturation mac.w result would have >> been: >> >> mach = 0x00000000 >> macl = 0x00000004 >> >> Due to this (forgivable) misunderstanding of C literals, the >> following behavior also occurs: >> >> test 1: >> (`2 * 3 + 0` is not an overflow) >> >> given, prior to saturation mac.l: >> mach = 0x00000000 ; macl = 0x00000000 >> @r0 = 0x0002 ; @r1 = 0x0003 >> >> expected saturation mac.w result: >> mach = 0x00000000 (unchanged) >> macl = 0x00000006 >> >> qemu saturation mac.w result (before this commit): >> mach = 0x00000001 >> macl = 0x80000000 >> >> test 2: >> (mach should not be accumulated in saturation mode) >> (16-bit operands are sign-extended) >> >> given, prior to saturation mac.l: >> mach = 0x12345678 ; macl = 0x7ffffffe >> @r0 = 0x0002 ; @r1 = 0xfffd >> >> expected saturation mac.w result: >> mach = 0x12345678 (unchanged) >> macl = 0x7ffffff8 >> >> qemu saturation mac.w result (before this commit): >> mach = 0x00000001 >> macl = 0x7fffffff >> >> test 3: >> (macl should have the correct saturation value) >> >> given, prior to saturation mac.l: >> mach = 0xabcdef12 ; macl = 0x7ffffffa >> @r0 = 0x0002 ; @r1 = 0x0003 >> >> expected saturation mac.w result: >> mach = 0x00000001 (overwritten) >> macl = 0x7fffffff >> >> qemu saturation mac.w result (before this commit): >> mach = 0x00000001 >> macl = 0x80000000 >> >> All of the above also matches the description of MAC.W as documented >> in cd00147165-sh-4-32-bit-cpu-core-architecture-stmicroelectronics.pdf >> >> Signed-off-by: Zack Buhman >> --- >> target/sh4/op_helper.c | 41 +++++++++++++++++++++++++++++++---------- >> 1 file changed, 31 insertions(+), 10 deletions(-) >> >> diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c >> index ee16524083..b3c1e69f53 100644 >> --- a/target/sh4/op_helper.c >> +++ b/target/sh4/op_helper.c >> @@ -187,20 +187,41 @@ void helper_macl(CPUSH4State *env, uint32_t arg0, uint32_t arg1) >> >> void helper_macw(CPUSH4State *env, uint32_t arg0, uint32_t arg1) >> { >> - int64_t res; >> + int16_t value0 = (int16_t)arg0; >> + int16_t value1 = (int16_t)arg1; >> + int32_t mul = ((int32_t)value0) * ((int32_t)value1); >> >> - res = ((uint64_t) env->mach << 32) | env->macl; >> - res += (int64_t) (int16_t) arg0 *(int64_t) (int16_t) arg1; >> - env->mach = (res >> 32) & 0xffffffff; >> - env->macl = res & 0xffffffff; >> + /* Perform 32-bit saturation arithmetic if the S flag is set */ >> if (env->sr & (1u << SR_S)) { >> - if (res < -0x80000000) { >> - env->mach = 1; >> - env->macl = 0x80000000; >> - } else if (res > 0x000000007fffffff) { >> + const int64_t upper_bound = ((1ul << 31) - 1); >> + const int64_t lower_bound = -((1ul << 31) - 0); > > UL is usually the wrong suffix to use (and more generally, > in QEMU the "long" type is rarely the right type, because > it might be either 32 or 64 bits depending on the host). > Either we know the value fits in 32 bits and we want a 32-bit > type, in which case U is sufficient, or we want a 64-bit type, > in which case we need ULL. > >> + >> + /* >> + * In saturation arithmetic mode, the accumulator is 32-bit >> + * with carry. MACH is not considered during the addition >> + * operation nor the 32-bit saturation logic. >> + */ >> + int32_t mac = env->macl; >> + int32_t result; >> + bool overflow = sadd32_overflow(mac, mul, &result); >> + if (overflow) { >> + result = (mac < 0) ? lower_bound : upper_bound; >> + /* MACH is set to 1 to denote overflow */ >> + env->macl = result; >> env->mach = 1; >> - env->macl = 0x7fffffff; >> + } else { >> + result = MIN(MAX(result, lower_bound), upper_bound); > > Maybe I'm confused, but result is an int32_t, so when can it > be lower than lower_bound or higher than upper_bound ? Correct, this MIN(MAX(...)) is a no-operation in this case. I will send [PATCH v2] with this removed and your other suggestions included. > >> + /* If there was no overflow, MACH is unchanged */ >> + env->macl = result; >> } >> + } else { >> + /* In non-saturation arithmetic mode, the accumulator is 64-bit */ >> + int64_t mac = (((uint64_t)env->mach) << 32) | env->macl; >> + >> + /* The carry bit of the 64-bit addition is discarded */ >> + int64_t result = mac + (int64_t)mul; >> + env->macl = result; >> + env->mach = result >> 32; >> } >> } >> > > thanks > -- PMM