From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB9403D564E for ; Thu, 7 May 2026 10:22:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778149360; cv=none; b=huFujsdhGYuMyjppHnQG0QDszuhLyxrysI24owN6zoTXH0yVNrhpV0bMHJmAdiusQbG96x3naH5i+ewGi0P2C1vPoKLr0Y76Wtd3wcRu6fTOyDow47ILeiZ/WD/aUbf3tRyklqkWE09EUHxGE3sJIv3Fq1oa+0THrwDxUi2ZojI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778149360; c=relaxed/simple; bh=EgpJApVDE0ItJy8T3suj6wxCwfRi4jVnsHqLKLQSCws=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qbeF7v6gA1fFR++pfFr0vstswu9RG2GAU0Yabk8DrSFUUfGRgcPmAVp0JLdAdQ67iYrkA/sbtOzXGp1NXqCc+4JNcwkMZv6Kew78CaQtWqIsVFgI80gCIMjdpbwAcDzXHvF29S0scxXKiLrOXmUXWH5QnMRkEFoMV4jBrY+24eE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HUuhpYdm; arc=none smtp.client-ip=209.85.221.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HUuhpYdm" Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-43fe62837baso354640f8f.3 for ; Thu, 07 May 2026 03:22:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778149357; x=1778754157; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=VG5+crceacfEG7lUxSRm1vLSDuNECrSBsB0KagNLyvo=; b=HUuhpYdmyo/b0n4Mhy87uNv+h5GfH3VKIsj4Q5EosuRw2zROBxD+hPygzTPZiOeGcC GA7+fBXY0ugXY2Lpijzm4LOWPeagWirRgq+w+kYGFLcdxgpuU+z4yJ937xJ99OcgItmr oARQQ5U4+V44PSq+X/X4jMZrkUvdc3eg8FD6TfI4Lz2RpC+fpWBBHbWdx6sVtUD48OyJ A0A508S/BN8roGQil9zog156YR61Z8sqaPJRb4ETji65Qej5EkYZD9/J6BvEWwvprK8I MSgLSZ34nrgAum3Kl3zD7HqGe4ADHte2QflG+63mV0iFDZIWvNWS66OJL8XPaatme4IN tnkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778149357; x=1778754157; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VG5+crceacfEG7lUxSRm1vLSDuNECrSBsB0KagNLyvo=; b=VOxHUoxfxLjPOlcK46kBL0puGroFwzvWq4k2jjjXyxjVA+HtFmdQMN/XAgcDM5VH5v BsiwuqeHAu+Ewlv2MY22KE6L+cT/CdvBKxZEQWe2rwXjZiDWIauaNyV8kjoD3iL+/vZe 6w7B2r+6F1JwZ5jlvO6ySO9N3rW2EzeBXMx3Tdb5fwZ1Pfwh58ncWxvLx40kZlCy+5+o qY4/ufdazw5sadyEUPOi2qR0jcDEGPL3cDsVz4ljbBSeGd6MMGPcvRVFNucGxajTjOE6 nkY5fUwCFcXtm7OG0RO+R6vCeNUFyyW1j8rEuDA+ZmWDsGBj2MGiHP3ER4oMzPsLGZtq Sqsw== X-Forwarded-Encrypted: i=1; AFNElJ/j3HuSfQOqxJJDPz20Y0k7n3Ztbf8SRSEXxPePhvcVC6M8KiLB6KbzQ5mmstmQ3+84nfU1pvcIe02HdHQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yy15Nixpp0HYLwe15hXHEZk/K6X5AqhfuD0sP6Ohv/fgHMud616 K/g9+ojQRFtdjnFLfA9SwAp5vgexximTflCr6qD8gEbneq+7Jg+1/JJC X-Gm-Gg: AeBDietUjoasWkQuy0dpt6dIwIxDEiMWXUsyeo25SAIqoqAQ8eZFX/JaD8uNCGbrai1 WglxCHCFOiiU9MYFLeIZfnI11/08bGerCh9lwVpxWD86Xkh1KTWqY8yW5m4Aa4athQAd7lRO+6m YqpXl/E+wN92HXlRrM7S04MmmCGqqcdn+Ok2iNxIQOU1wBktbjJCeQ+HqcJOse16rRpSbfJ8Osz DEHcsRzhfiQTF6EwUp/30cRAhdDsy+qXbN7m36KOCMtJ3CiKSC/ND9hzBUePaYELkqG/Qo8FNOZ ILm/fOW9Q51OoagUITw6ZJyxyeV7XH90OdWj7XK/lcIsbWkP4l87IlhsY6ESfV4zZbCRWK8wlTX 57NEJHND76fEKCZA+VENIudV2iFx1lZ26TsYybAFDc6k2Rzye4KwBNlHSIUVCA21n7ZOROPXg/W P8wpoyiGjN9pS6aiyQE0Z4fx+sFsrgcH0alYgletOwK4LQe6aH5BODzfbQhxsTZdZByzr8mRM= X-Received: by 2002:a05:6000:4203:b0:441:36b7:7262 with SMTP id ffacd0b85a97d-4515b61ae87mr11877520f8f.13.1778149356886; Thu, 07 May 2026 03:22:36 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45055f2203csm19358959f8f.37.2026.05.07.03.22.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 May 2026 03:22:36 -0700 (PDT) Date: Thu, 7 May 2026 11:22:35 +0100 From: David Laight To: Zongmin Zhou Cc: pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Zongmin Zhou Subject: Re: [PATCH] riscv: lib: optimize strchr() with Zbb extension Message-ID: <20260507112235.41e539fa@pumpkin> In-Reply-To: <20260507020620.134225-1-min_halo@163.com> References: <20260507020620.134225-1-min_halo@163.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 7 May 2026 10:06:20 +0800 Zongmin Zhou wrote: > From: Zongmin Zhou > > Add a Zbb-powered optimization to the existing strchr() implementation > using the 'orc.b' instruction, following the same pattern established > by strnlen(). > > The Zbb variant processes data in word-sized chunks using orc.b to > detect both NUL terminators and target characters in parallel. On > systems without Zbb support, the original byte-by-byte implementation > is used as a fallback via the alternatives mechanism. > > Benchmark results (QEMU TCG, rv64): > Length | zbb=off (MB/s) | zbb=on (MB/s) | Improvement > -------|----------------|---------------|------------ > 1 B | 27 | 25 | -7.4% > 7 B | 147 | 128 | -12.9% > 16 B | 216 | 372 | +72.2% > 64 B | 378 | 958 | +153.4% > 512 B | 480 | 1990 | +314.6% > 4096 B | 501 | 2269 | +352.9% > > The regression on very short strings (1-7 bytes) is due to the fixed > overhead of the word-level path: broadcasting the target character to > all byte lanes via multiplication and checking pointer alignment before > entering the main loop. For strings shorter than one machine word, this > setup cost outweighs the benefit of parallel comparison. As string > length increases beyond 16 bytes, the word-at-a-time processing shows > significant gains. > > Signed-off-by: Zongmin Zhou > --- > arch/riscv/lib/strchr.S | 115 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 115 insertions(+) > > diff --git a/arch/riscv/lib/strchr.S b/arch/riscv/lib/strchr.S > index 48c3a9da53e3..600b19452bc2 100644 > --- a/arch/riscv/lib/strchr.S > +++ b/arch/riscv/lib/strchr.S > @@ -6,9 +6,15 @@ > > #include > #include > +#include > +#include > > /* char *strchr(const char *s, int c) */ > SYM_FUNC_START(strchr) > + > + __ALTERNATIVE_CFG("nop", "j strchr_zbb", 0, RISCV_ISA_EXT_ZBB, > + IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB)) > + > /* > * Parameters > * a0 - The string to be searched > @@ -29,6 +35,115 @@ SYM_FUNC_START(strchr) > li a0, 0 > 2: > ret > + > +/* > + * Variant of strchr using the ZBB extension if available > + * > + * This implementation uses orc.b to detect both NUL terminators and target > + * characters in parallel, processing word-sized chunks for efficiency. > + */ > +#if defined(CONFIG_RISCV_ISA_ZBB) && defined(CONFIG_TOOLCHAIN_HAS_ZBB) > +strchr_zbb: > + > +#ifdef CONFIG_CPU_BIG_ENDIAN > +# define CZ clz > +#else > +# define CZ ctz > +#endif > + > +.option push > +.option arch,+zbb > + > + /* > + * Returns > + * a0 - Address of first occurrence of 'c' or NULL > + * > + * Parameters > + * a0 - String to search > + * a1 - Character to find > + * > + * Clobbers > + * t0, t1, t2, t3, t4 > + */ > + > + /* > + * Prepare target character mask. > + * Broadcast target character to all bytes using multiply. > + */ > + andi a1, a1, 0xff > + li t1, 0x01010101 > +#if __riscv_xlen == 64 > + slli t2, t1, 32 > + or t1, t1, t2 > +#endif > + mul t2, a1, t1 > + > + /* All-ones mask for orc.b comparisons. */ > + li t4, -1 > + > + /* Check alignment. */ > + andi t0, a0, SZREG-1 > + beqz t0, 2f It is almost certainly faster to jump 'out of line' for misaligned strings and fallthrough for aligned ones. > + > + /* Handle misaligned portion byte-by-byte. */ > +1: > + lbu t1, 0(a0) > + beq t1, a1, 9f > + beqz t1, 8f > + addi a0, a0, 1 > + andi t0, a0, SZREG-1 > + bnez t0, 1b > + > + /* Main loop: process word-sized chunks. */ Tweak to remove a branch from the loop: addi a0, a0, -SZREG > +2: > + REG_L t0, 0(a0) Do read first for better instruction scheduling. REG_L t0, SZREG(a0) addi a0, a0, SZREG > + > + /* Check for NUL terminator. */ > + orc.b t1, t0 > + bne t1, t4, 3f > + > + /* Check for target character. */ > + xor t1, t0, t2 > + orc.b t1, t1 > + bne t1, t4, 4f be t1, t4, 2b and move the code at '4:' here. -- David > + > + addi a0, a0, SZREG > + j 2b > + > +3: > + /* NUL found in current chunk. Check if target appears before NUL. */ > + not t1, t1 > + > + xor t3, t0, t2 > + orc.b t3, t3 > + not t3, t3 > + > + CZ t3, t3 > + CZ t1, t1 > + > + /* If NUL appears before target, character not found. */ > + bltu t1, t3, 8f > + > + srli t3, t3, 3 > + add a0, a0, t3 > + ret > + > +4: > + /* Target found in chunk without NUL. */ > + not t1, t1 > + CZ t1, t1 > + srli t1, t1, 3 > + add a0, a0, t1 > + ret > + > +8: > + /* Character not found, return NULL. */ > + li a0, 0 > +9: > + ret > + > +.option pop > +#endif > SYM_FUNC_END(strchr) > > SYM_FUNC_ALIAS_WEAK(__pi_strchr, strchr)