From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 910F031B837 for ; Thu, 14 May 2026 16:09:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778774997; cv=none; b=f6y3wbPQVJPRrfgYfLDr5TNk/LhYOO6MqjmIImpPUD81D6yymYavzGYPCnXy1Oh7iD1afSn6FXR7zUaxvPC5RGLCrKQb3lnyn0Q8dm/23R9UHN2GOrFGcGLrpZjI/LdFdS2nMa8rv18MGQRtk9eYB/+b5ykdubPnwy/ldakNIcA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778774997; c=relaxed/simple; bh=iGDY0tDrr/FNPODofc/iTaSk+ZG74MJNhmJqwVxHF24=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=IrHkGecQzt48FWM6nCBv1N+M01Yet7y++Zq3NP1J4GIMbCG366RN+V12AW9l+jcdlpq+yOadsVYykuyuPLMfd4jaxXtRGD/GAUMLz6jdDjpXDke3M91pCG0Lk4HfIfYtmv1zPBKqEw4bJyNn9b8ItfZZ5CHBKm5icLVcrN7unrA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=keNapjIr; arc=none smtp.client-ip=209.85.218.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="keNapjIr" Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-bcc9fdc959cso862988366b.2 for ; Thu, 14 May 2026 09:09:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778774994; x=1779379794; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=a/TOhWCMb69A8t8x3HcmEMu6AFhsGlWitaL0c7QL/dg=; b=keNapjIr+yAf+b710+8D8x8sz4K66NMdkAvMibsd91JW4hT9moO26HhHqJ9hYaa2lo R4D7DrwyOOeSq424ynn8QzUQtF1gXQ/Celgpl+WbiN9E752w7ctg0w2xNAgO62jGlcTb JDCbQKDCnGJMwUiu58Ox0lSYm90ocAHcnvAcNef7IsvyyCzVbEIK7anH5xqKO9wwX64Q Gl51i5DJGBE1bfFJ3Px0ZVdUmQ60Nj9gs9ZgwsyLFV1YsdV9tOaCYRPNzJgsmrrMz/6U +OGPlUnq4gnMuEeIsHuOLprwWeC5gicnFuIThfI3u/cUvDHosaT3r5KPZJQA84w+rPs6 gN6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778774994; x=1779379794; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=a/TOhWCMb69A8t8x3HcmEMu6AFhsGlWitaL0c7QL/dg=; b=AyEz9Pb2UKtVgVo3O0d4MRfaMYmR64kPBwAM0CgwkDUXS70Bm9MHaPWrJka8OUnTgB lbhhKIYVJ3GgjJPSPBA6qnzOhIoQm+wQhwfWC/z9+ay3yBD745CM95sMlMsASqOkhCVZ OY7VCFc8HmvqNGsk37V4DkL+KD5npaIAX03JNAYJwdz5Sp8+hv2i+eDX7keJY4Q49tsL wxMQ8feWliRNuOqXEpC/NNdTZIUTsmg6lYaBhY6bd4DkCo7KFX3F5IfXNdaZyW47J4Mk u0h5z3PsfT2AX8meNauvLh8Z3TDPNGUtgp96wNvOiz8ZVL4lRHzIju7IaxMIZ6eH6/vu MQcA== X-Forwarded-Encrypted: i=1; AFNElJ8U78pwrkZtuRKhyRxfucrTsPZ5nY0ncLfRejyyYpqLnc3QuBAOh1D+91+lnTn9Rx07AaYb3tn/Q/SuIek=@vger.kernel.org X-Gm-Message-State: AOJu0YxB8z5dCDiJx/O0IHWPML6Eb/CEpdmqZp6Pnfvw7vBMpjHG626B eJhjS3Uo5IDYKsw16zVbeXkkmJXa2j8gx19wiaN4mgXkuuldC1xrnFRj X-Gm-Gg: Acq92OFl8Y4LgXDhGFYWJgh2sDxvvcIUeuexSzyzVzLc1XiX6gA7s1Pg3oKEVhGz+wg d3jzsJn8knM9491Pj2CROtvRqL0K6jCuafHK9taN5SOP6EuOylE1xeu6DIL856+Z/+IbRkyaDCg Pe3/cBggHwis0JnHBYUdjbZMGRu0253LBwr0ANG3yigu7Xjw8IuZJab8x4+EHUXjwp/+roUPVV7 GXHs7bobA1qMv7QZ00DhwenlTz+2dsO5960txcSncLlav7+fe5sOBGuCE+3fxNyz4/Icrl5PeXK mBkKDDvXGjSJ2LfEbZmWrow7b7nwynayOAsI0gH3d60VYjNJPAqJ4JXQukMyZ4mHWZb2OpEzh6o 9f8p8WWqxCSluqYNm455u72ywtb+AYJBGED4Or4p7xgQOQdQCovHiDdSgxr3WTkxLyBjQNWujkX qdHZtkp3GxmsAsy0xToG4gFELdZg55ZkxUmVxlyuAVR7Vn6kNFj9J9 X-Received: by 2002:a17:907:7255:b0:bd3:5e5d:7ea3 with SMTP id a640c23a62f3a-bd3c181e7fdmr577804766b.33.1778774993895; Thu, 14 May 2026 09:09:53 -0700 (PDT) Received: from RTRKW671-LIN.domain.local ([77.243.27.125]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-bd4f4969850sm107368966b.0.2026.05.14.09.09.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 May 2026 09:09:53 -0700 (PDT) From: Milan Tripkovic To: pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu Cc: alex@ghiti.fr, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Dusan.Stojkovic@rt-rk.com, Milan Tripkovic Subject: [PATCH] riscv: lib: add strrchr() zbb implementation Date: Thu, 14 May 2026 18:09:10 +0200 Message-ID: <20260514160910.1796966-1-milant2002@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Milan Tripkovic Add an zbb assembly implementation of strrchr() for RISC-V. The implementation uses ZBB bit-manipulation instructions such as orc.b, ctz, and clz to process multiple bytes per iteration and significantly improve performance for longer strings compared to the generic byte-by-byte implementation. For the test case, I used the existing string_bench_strrchr benchmark, but I changed the input character from '\0' to 'a' to obtain more realistic results, because I added a check for '\0' in the assembly code. Benchmark results (QEMU TCG, rv64): Len | ZBB | WoZBB | %ZBB/WoZBB ------|--------|--------|------------ 1 B | 20.0 | 22.9 | -12.7% 7 B | 87.5 | 110.1 | -20.5% 8 B | 166.8 | 130.3 | +28.0% 16 B | 329.5 | 189.1 | +74.2% 31 B | 366.9 | 195.7 | +87.5% 64 B | 870.3 | 231.5 | +275.9% 127 B | 1007.0 | 278.9 | +261.1% 512 B | 1751.9 | 305.5 | +473.5% 1024 B| 1841.9 | 294.7 | +525.0% 2048 B| 1955.4 | 310.4 | +530.0% 4096 B| 2034.6 | 312.5 | +551.1% Signed-off-by: Milan Tripkovic --- arch/riscv/lib/strrchr.S | 129 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 128 insertions(+), 1 deletion(-) diff --git a/arch/riscv/lib/strrchr.S b/arch/riscv/lib/strrchr.S index ac58b20ca21d..46ca232a6b43 100644 --- a/arch/riscv/lib/strrchr.S +++ b/arch/riscv/lib/strrchr.S @@ -6,13 +6,17 @@ #include #include +#include +#include /* char *strrchr(const char *s, int c) */ SYM_FUNC_START(strrchr) + __ALTERNATIVE_CFG("nop", "j strrchr_zbb", 0, RISCV_ISA_EXT_ZBB, + IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB)) /* * Parameters * a0 - The string to be searched - * a1 - The character to seaerch for + * a1 - The character to search for * * Returns * a0 - Address of last occurrence of 'c' or 0 @@ -31,6 +35,129 @@ SYM_FUNC_START(strrchr) addi t1, t1, 1 bnez t0, 1b ret + +/* + * Variant of strrchr using the ZBB extension if available + */ + +strrchr_zbb: +.option push +.option arch,+zbb + /* + * Parameters + * a0 - The string to be searched + * a1 - The character to search for + * + * Returns + * a0 - Address of last occurrence of 'c' or 0 + * + * Clobbers + * t0, t1, t2, t3, t4, t5, t6 + */ + andi a1, a1, 0xff + mv t1, a0 + li a0, 0 + beqz a1, .Lfind_end_zbb + + slli t5, a1, 8 + or t5, t5, a1 + slli t2, t5, 16 + or t5, t5, t2 +#if __riscv_xlen == 64 + slli t2, t5, 32 + or t5, t5, t2 +#endif + + andi t2, t1, SZREG-1 + bnez t2, .Lmisaligned_start + +.Lmain_loop_pre: + li t4, -1 + + .balign 16 +.Lmain_loop: + REG_L t0, 0(t1) + addi t1, t1, SZREG + xor t6, t0, t5 + orc.b t2, t0 + orc.b t6, t6 + and t3, t2, t6 + beq t3, t4, .Lmain_loop + + not t2, t2 + not t6, t6 + + beqz t2, .Lonly_matches + + addi t1, t1, -SZREG + ctz t3, t2 + sll t4, t4, t3 + andn t6, t6, t4 + beqz t6, .Ldone + + clz t3, t6 + srli t3, t3, 3 + xori t3, t3, SZREG-1 + add a0, t1, t3 +.Ldone: + ret + +.Lonly_matches: + clz t3, t6 + srli t3, t3, 3 + not t3, t3 + add a0, t1, t3 + j .Lmain_loop + +.Lfind_end_zbb: + andi t2, t1, SZREG-1 + bnez t2, .Lmisaligned_end_start + +.Lfind_end_pre: + li t4, -1 + + .balign 16 +.Lfind_end_loop: + REG_L t0, 0(t1) + addi t1, t1, SZREG + orc.b t2, t0 + beq t2, t4, .Lfind_end_loop + + addi t1, t1, -SZREG + not t2, t2 + ctz t3, t2 + srli t3, t3, 3 + add a0, t1, t3 + ret + +.Lfound_zero: + mv a0, t1 + ret +.Lmisaligned_start: + ori t2, t1, SZREG-1 + addi t2, t2, 1 +.Lalign_loop: + lbu t0, 0(t1) + beqz t0, .Ldone + bne t0, a1, 1f + mv a0, t1 +1: + addi t1, t1, 1 + bne t1, t2, .Lalign_loop + j .Lmain_loop_pre + +.Lmisaligned_end_start: + ori t2, t1, SZREG-1 + addi t2, t2, 1 +.Lfind_end_align: + lbu t0, 0(t1) + beqz t0, .Lfound_zero + addi t1, t1, 1 + bne t1, t2, .Lfind_end_align + j .Lfind_end_pre + +.option pop + SYM_FUNC_END(strrchr) SYM_FUNC_ALIAS_WEAK(__pi_strrchr, strrchr) -- 2.43.0