From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0E003E5EEF for ; Wed, 13 May 2026 09:49:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778665746; cv=none; b=P8xrEVWBwmeSZadbX3YDqqmekevg0LVZTLgiiGHDu2fY7HM9CZQ4ZSQEBMhaqgeFiGorJ0mCIn9rUJFEydrZ+eqJXQV5ApVCwfJpAR1aGQGsA2pHBx+eNbh+YCiZnVsEE61vOm8nz/Kw2MaEARCECp0wcMO5sAePnTN/wPC40JE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778665746; c=relaxed/simple; bh=WQPXrhPdIJ5RIcq2Z1k3NI3ra+Cy3ch9mXHb39X4YuU=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VvrEJ9p8nX/oy4PZBJojvlRR5T+ofqMo+YeAYp+sGpf4qXSZVCWH2yvJU3ZP7LvdUbvz/b9bcTpmw0xNPTFqNOsxyueIYnaxNx/j1SVT6GQH4TRl+s8Rc21NBKCU6a4ER0E1aVxcOKqepiE0ktY82uYZ8aDvcOxm71gt7yFagZs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=qW6Si3pq; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="qW6Si3pq" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-48a7fe4f40bso74870115e9.0 for ; Wed, 13 May 2026 02:49:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778665743; x=1779270543; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=Y1ByYmYroktLw1pdZDAPFuPAbUc0Nudy+RxlkHPvcPM=; b=qW6Si3pq55liZf9+eOLKtylLen7bKeIdRWAhT+SYaN11MskILPPKPZD8Sy1XHQL5+L MHAO7WotcBvSJWBl1awFmLM4jTHlRhAJGWsi+idBqGnm78V7MZkblfvjUjvjpGzTmkO+ RFVugFO6aTZsUQcThz7oQxOkVEdQ230Xp+/wybrTmz5KiMbNSoxj1YGBEZAnt99rVqVh A3+xP0YwmAWc3EA8h9M2ld4pxvr/EnX+PVSbbkVFv+nQVywRbDCt0AkemDfUBVoCiNcP 9SbdoZLMBD//YaXoeVLT2Jk8odqmfalXCR2mtuicMncuCd7eSz/K7TMIgRuYe3LkPzdI lJdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778665743; x=1779270543; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Y1ByYmYroktLw1pdZDAPFuPAbUc0Nudy+RxlkHPvcPM=; b=d/iKlKM+zybkgCFbN5JoHD9zin7w87RIZYujF33DUESiaLoQQZeg0qiUcT+JVBQFm5 IsqpWqnkBOWNYPSQvNXztrObtPb2RX/aTqzZkvYfC1Cu1R2EfwJHskUp8wgRMXwAZhYS H5Pso1wV2b/HiQxhL0q/ScTQpmKRrY5MfnpztdjVX38iL3WoDGfA+CiZCflY+8Vzm66J BVztF/q+ds3iHzQPZeG2xyaDtSSg0oHgnNa3O8Qg35LLi3Nx4g6RmHnW+ssvovxH0y+a ZA0kcniqXwa8J9T5Zi7IaIZGVkvARXXSwRVgvetdqx+NeZLvM6cXJg0/aReUuifeZOeM pe5w== X-Forwarded-Encrypted: i=1; AFNElJ8wUceaGM9IW6iVlSTbDltFJAtkksgpoFDbZ4WoltpqPNcMvgeaZoQeRF6V38XgeFO0CYaEUxaQYEXUFuQ=@vger.kernel.org X-Gm-Message-State: AOJu0YynaFhfm4o1lZW6WMUM5y1NOP5BF6B4XE80LTlVN81uCFtqg4dQ wmcwOAss7z4Ttnzf4nk4hvb8oX7YjgEqVRdT9JbeF9hUF/RNmFWMtQG2i48wfoLR X-Gm-Gg: Acq92OEbOZN7JXDJkl88bBKwMPf8MdK5cFXAyoKpVHvRF/aBj0on5emi/a32nFtGhzq 2PXbpRTTC2++pXFJ8lESyAqIOXFHaBlLOrd2OGlWXL0Q0rzew7aCLv5qDWM3YvWKj6Unfw2X1P8 SgWUQD3rm4HsGJCqkOkPTQmmGHcco9xOZybk8FagRa5NFVo/f+nxWpIKkyBxXTWjahHHDFOL6ul iZJCI8UqcEF5XOx4Q2ZFwaFi9TROjZJMuJrU397sxLPEaK7Hbf7MHFxRRxzN51jq0+PDdNBi70M 8WzdTIPNCh/phbrfigcgPj8OcBHgRxg99PYf0AxNhgl8+svEFXYUbK1unhPlFanH/rhu72IGScp 3XF+aKxwdWgzosw8Qkx7L274zquIDQ6uCcbDETbL5u1gx9s3iRiHokc/9yEAu6BmnmyFCyh+FcZ zgcGqVRpLaXD4b57+HAh4wvDsFdu1L9Fi1iEb4l/1IDEd60AMBOxXPa0+KogaK X-Received: by 2002:a05:600c:154a:b0:48d:1a94:56c with SMTP id 5b1f17b1804b1-48fce9da5e0mr24828215e9.18.1778665742833; Wed, 13 May 2026 02:49:02 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48fce05e45esm36992975e9.4.2026.05.13.02.49.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2026 02:49:02 -0700 (PDT) Date: Wed, 13 May 2026 10:49:01 +0100 From: David Laight To: Milan Tripkovic Cc: Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexandre Ghiti , Dusan Stojkovic , Milan Tripkovic , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] riscv: lib: add memcmp() implementation Message-ID: <20260513104901.719ac53a@pumpkin> In-Reply-To: <20260512141007.1193033-1-milant2002@gmail.com> References: <20260512141007.1193033-1-milant2002@gmail.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Tue, 12 May 2026 16:10:06 +0200 Milan Tripkovic wrote: > From: Milan Tripkovic > > Add an assembly implementation of memcmp() for RISC-V. The implementation > uses the ZBB extension for word-at-a-time comparison and an assembly > fallback for non-ZBB systems. > > Benchmark results (QEMU TCG, rv64): > > Len | Def | NoZBB | ZBB | %NoZBB | %ZBB > -----|-------|-------|-------|--------|------- > 1 B | 22.4 | 24.6 | 23.2 | +9.8% | +3.5% > 7 B | 96.9 | 108.5 | 107.3 | +12.0% | +10.7% > 8 B | 107.0 | 116.3 | 176.7 | +8.7% | +65.1% > 16 B | 148.4 | 172.8 | 315.6 | +16.4% | +112.6% > 31 B | 182.2 | 217.1 | 377.6 | +19.2% | +107.2% > 64 B | 220.6 | 239.4 | 874.2 | +8.5% | +296.2% > 127 B| 213.7 | 254.8 | 1042.9| +19.2% | +388.0% > 512 B| 255.1 | 269.0 | 1778.6| +5.4% | +597.2% > 1024B| 252.3 | 280.9 | 1887.7| +11.3% | +648.1% > 3173B| 241.3 | 288.7 | 2063.2| +19.6% | +755.0% > 4096B| 240.9 | 280.5 | 2064.5| +16.4% | +756.9% > > Signed-off-by: Milan Tripkovic > --- > arch/riscv/include/asm/string.h | 2 + > arch/riscv/lib/Makefile | 1 + > arch/riscv/lib/memcmp.S | 103 ++++++++++++++++++++++++++++++++ > arch/riscv/purgatory/Makefile | 5 +- > 4 files changed, 110 insertions(+), 1 deletion(-) > create mode 100644 arch/riscv/lib/memcmp.S > > diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h > index 764ffe8f6..5c5299678 100644 > --- a/arch/riscv/include/asm/string.h > +++ b/arch/riscv/include/asm/string.h > @@ -18,6 +18,8 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t); > #define __HAVE_ARCH_MEMMOVE > extern asmlinkage void *memmove(void *, const void *, size_t); > extern asmlinkage void *__memmove(void *, const void *, size_t); > +#define __HAVE_ARCH_MEMCMP > +extern asmlinkage int memcmp(const void *, const void *, size_t); > > #if !(defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) > #define __HAVE_ARCH_STRCMP > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile > index 6f767b2a3..b529e1be1 100644 > --- a/arch/riscv/lib/Makefile > +++ b/arch/riscv/lib/Makefile > @@ -3,6 +3,7 @@ lib-y += delay.o > lib-y += memcpy.o > lib-y += memset.o > lib-y += memmove.o > +lib-y += memcmp.o > ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),) > lib-y += strcmp.o > lib-y += strlen.o > diff --git a/arch/riscv/lib/memcmp.S b/arch/riscv/lib/memcmp.S > new file mode 100644 > index 000000000..444b082d9 > --- /dev/null > +++ b/arch/riscv/lib/memcmp.S > @@ -0,0 +1,103 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > + > +#include > +#include > +#include > +#include > + > +/* int memcmp(const void *cs, const void *ct, size_t n) */ > +SYM_FUNC_START(memcmp) > + > + __ALTERNATIVE_CFG("nop", "j memcmp_zbb", 0, RISCV_ISA_EXT_ZBB, > + IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB)) > +/* > + * Parameters > + * a0 - Pointer to first memory block (cs), also return value > + * a1 - Pointer to second memory block (ct) > + * a2 - Number of bytes to compare (n), transformed to end pointer (a0 + n) > + * > + * Returns > + * a0 - 0 if equal, positive if cs > ct, negative if cs < ct > + * > + * Clobbers > + * t0, t1 > + */ > + beqz a2, 2f > + add a2, a0, a2 > +1: > + lbu t0, 0(a0) > + lbu t1, 0(a1) > + bne t0, t1, 3f > + addi a0, a0, 1 > + addi a1, a1, 1 > + bne a0, a2, 1b > +2: > + li a0, 0 > + ret > +3: > + sub a0, t0, t1 > + ret > + > + > +memcmp_zbb: > +.option push > +.option arch,+zbb > +/* > + * Parameters > + * a0 - Pointer to first memory block (cs), also return value > + * a1 - Pointer to second memory block (ct) > + * a2 - Number of bytes to compare (n), decremented during loop > + * > + * Returns > + * a0 - 0 if equal, positive if cs > ct, negative if cs < ct > + * > + * Clobbers > + * t0, t1, t2 > + */ > + beq a0, a1, 4f There is no point optimising for equal pointers. > + > + li t0, SZREG > + bltu a2, t0, 5f > + > +1: > + REG_L t1, 0(a0) > + REG_L t2, 0(a1) Aren't there some systems where misaligned reads are very expensive? You might want to fall back to byte compares for misaligned buffers. > + bne t1, t2, 2f > + > + addi a0, a0, SZREG > + addi a1, a1, SZREG > + addi a2, a2, -SZREG > + bgeu a2, t0, 1b You've a loop with two comparisons it in. Move the length one to the top and the check before the loop shouldn't be needed. If you calculate the end address of one of the buffers you only need two increments in the loop, not three. You might need to access -SZREG(a0) to get the data. > + > +5: > + beqz a2, 4f If a0 and a1 are aligned you can read the next full word, shift right (LE, left BE) and then compare. > +6: > + lbu t1, 0(a0) > + lbu t2, 0(a1) > + bne t1, t2, 3f > + addi a0, a0, 1 > + addi a1, a1, 1 > + addi a2, a2, -1 > + bnez a2, 6b > + > +4: li a0, 0 > + ret > +2: > +#ifndef CONFIG_CPU_BIG_ENDIAN > + rev8 t1, t1 > + rev8 t2, t2 > +#endif That looks like the only bit that needs zbb? Is BIG_ENDIAN common enough to actually worry about? You could just fall back to byte accesses (rereading memory) on BE. -- David > + sltu a0, t2, t1 > + sltu t0, t1, t2 > + sub a0, a0, t0 > + ret > + > +3: > + sub a0, t1, t2 > + ret > + > +.option pop > + > +SYM_FUNC_END(memcmp) > +SYM_FUNC_ALIAS(__pi_memcmp, memcmp) > +EXPORT_SYMBOL(memcmp) > diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile > index b0358a78f..456929971 100644 > --- a/arch/riscv/purgatory/Makefile > +++ b/arch/riscv/purgatory/Makefile > @@ -1,6 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0 > > -purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o > +purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o memcmp.o > ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),) > purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o strchr.o strrchr.o > endif > @@ -41,6 +41,9 @@ $(obj)/strchr.o: $(srctree)/arch/riscv/lib/strchr.S FORCE > $(obj)/strrchr.o: $(srctree)/arch/riscv/lib/strrchr.S FORCE > $(call if_changed_rule,as_o_S) > > +$(obj)/memcmp.o: $(srctree)/arch/riscv/lib/memcmp.S FORCE > + $(call if_changed_rule,as_o_S) > + > CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY > CFLAGS_string.o := -D__DISABLE_EXPORTS > CFLAGS_ctype.o := -D__DISABLE_EXPORTS