From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C3E84A23 for ; Sun, 19 Apr 2026 10:41:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776595281; cv=none; b=KKSmU3+7DbGSncv9cbjITTLH/WCo+siaYiBv1w1lWnjGI2MKVTMdp+rFFK+Gv5VtEkbmCVFSusbxMkvh3jULaC7oUeYyiajWykqDzz2vYiuhkPPpXvAZW22koflTz2uviIUR7cv8SX6KC+hD/LomhcG0xf2hRgmnMxLbX7oOPeY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776595281; c=relaxed/simple; bh=qS6jpmxPn0ciU4hAsvRwK00rIYLp0V6xTjVqSfHBlQU=; h=Date:From:To:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mz+pFIB/g+nlxHacDSg77dStIlhzunc6wORZyLqa4EoEkOGt39Y76J//R9Jb/cIyPSuUNCqJDkT5noh4yo06WCY6xHBr+SdAm5G5MRFoWwhi1wxOxPYqpZsdb6bGUCnSjrmmvF9aLeprX+5MLb+WR8XXGWsxv9sYzE+nVdCk+xg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=rIepnjRy; arc=none smtp.client-ip=209.85.128.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="rIepnjRy" Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-488a88aeec9so33032475e9.2 for ; Sun, 19 Apr 2026 03:41:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776595279; x=1777200079; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:to:from:date:from:to:cc:subject:date:message-id :reply-to; bh=SewYd26qMSqsJ3PdBJdB1FU0Fv0N/RaIvQ2whLAWfGQ=; b=rIepnjRy33BSUCRbiBUu1zmJbkUmoIlmq7WoBhJCBKxwFvni3KBJCXAvZ+dmLwBEU/ VFivMEldX+wkiLA87zEtLT4+/S0QK2tgLU/WG1McLa6OH1hQKqrJV9ZX6y4jdGO5bcjb KqoikEZ5QUQZlHucRP3RKCbztYjbNPDJ2I6oFpcBBpGcatxieTDFt0D69mEVk0cSYcOa 5li4qH9L2n4Rx9g6v6hWI4tWSKgBLnDexSi0W0Tzj5Zx5Q+u/LUhQhFmHouWDZa3baYG XNDywxEoQsYIB6mdM8NT+T00L/fioc18rLAc2jrIBSDkOhwNz6bYpVz0B7FsGDbhD4tN l8YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776595279; x=1777200079; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:to:from:date:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=SewYd26qMSqsJ3PdBJdB1FU0Fv0N/RaIvQ2whLAWfGQ=; b=fWh6w5feK+i1gP4mI+NWqME4HkduS5ulfk+f3Cy5bjgGcfRlBnuhp3zked6F67iFe4 DsQjkud90IJRwANfNSdnf1CDlGNst5l0HJbmAW+FDLqtyN2PhoKfvtZ+GbUKbhcTxZak Pg0DCiLYxr+P7TDp+OTCihyWRoTm0JXnOO2XnbbajyIOoP4Cvifw0DC8S5q0u1+X5J8T AZIOS/AY0iWpZsY+pDLOJjn5gXHnwSs6J93102VppKeDQnEyD8+wbXZwXjs8iXMJ4vkA 5Oxm8OvtTkDCbASayoVZ+evzKw5i/Hx1QknnDYw5B81zXmGhhb4LAIp+qBJ5Qoukpjyo Ekyw== X-Forwarded-Encrypted: i=1; AFNElJ9T06nj6wZotieZ32UULFfUYQq2HbUpjgm4tO01bsDpV/7m6SKWnXrfeim4iu3DBSEAEqDu0ahSutVI+us=@vger.kernel.org X-Gm-Message-State: AOJu0Yw5b0CgwX/JLBDyDJfAmjt5UnwMJJrJWo/Q8k8mDnqaMC1eLj1/ T4VR+lS+P/pfuP4FABU3tA2F5AOPNAa1sBWgo5vnHZNlgqcR3f2AtwWQ X-Gm-Gg: AeBDieu8MoAWjA4+BOmf0+ZFlsrjDtaoBxOMbu/yA4p5bQUbHfHb0/a0ZbY07CItEx4 EeJaZRs2rtuqRDocPTkPdpINuFPQ4H9sP9chm/n96JShGxtIt5vFeRCQBurmrnY/zwOQwAJbCPR X21lxhky3NnQkwYS0cfpxOzsZv5EiFFf+NnQ5hUagRoIWylSa2hfhyj9q1dABtw9JnYRDe8FJK6 kVDQ9dAWikXl8CTw3FSDpOYdz6dEjdho+DNdK1XIVHEI9FamWEdKZznPr0WMtIT8jaTBnwmEVQk EKYUtqBaDz4FM7bOogctCpXKUSYYgbBQ3WuL0j8Kvw7Ixyk3/PKiZ6JxrT4l6/Q7l+DlRc5BqZ6 Dqu3+FcpyowLxnWqdUJaRz5R2tI2kLlmUwgfV543jeyiEJArSA6MJ3KuTtxRB0J/IjH0VO3Ff9l qvpnpwbUjHbJc8rx39YwzblZ1htPN0lShWpFIgFs51zwTPGfFsNdbtS4Xgue3HNFIYio6KSaDlI +k= X-Received: by 2002:a05:600d:10:b0:489:1b0c:8b43 with SMTP id 5b1f17b1804b1-4891b0c8c48mr4142765e9.1.1776595278820; Sun, 19 Apr 2026 03:41:18 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43fe4cc2cacsm19934215f8f.13.2026.04.19.03.41.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Apr 2026 03:41:18 -0700 (PDT) Date: Sun, 19 Apr 2026 11:41:17 +0100 From: David Laight To: Andrew Morton , Kees Cook , Andy Shevchenko , linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, Linus Torvalds Subject: Re: [PATCH next] string: Optimise strlen() Message-ID: <20260419114117.7cf50b2b@pumpkin> In-Reply-To: <20260327195737.89537-1-david.laight.linux@gmail.com> References: <20260327195737.89537-1-david.laight.linux@gmail.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 27 Mar 2026 19:57:37 +0000 david.laight.linux@gmail.com wrote: > From: David Laight > > Unrolling the loop once significantly improves performance on some CPU. > Userspace testing on a Zen-5 shows it runs at two bytes/clock rather than > one byte/clock with only a marginal additional overhead. I hate benchmarks. I've finally got around to looking at this again (on x86-64). I changed the order of the 'single byte' and 'two byte' tests and the 'two byte' loop slowed down massively - to pretty much the same speed as the 'single byte' loop. gcc had swapped over the two functions in the object file. Swapping the order changed the alignment of the loop top between odd and even multiples of 16 (this alignment is disabled in kernel to avoid bloat). The loop in the 'two byte' code is 17 bytes, in the slow case the loop top is aligned to an odd boundary so that the last byte is in a different 32 byte code block - which is presumably slow. Changing the two 'cmpb $0, mem' to (say) 'cmpb %cl, mem' would reduce the loop to 15 bytes and so wouldn't cross a 16 byte boundary. (The 'single byte' loop doesn't cross a 16 byte boundary in the test program.) The kernel build I just looked at has strlen() aligned to a 16 byte boundary with the branch crossing the next 16 byte boundary. So, if the same is true as in my test program, strlen() will run a lot slower on 50% of kernel builds. (And other cpu may have costs associated with the 16 byte boundary.) Mostly this means that however hard you try you are guaranteed to lose somewhere :-( > > Using 'byte masking' is faster for longer strings - the break-even point > is around 56 bytes on the same Zen-5 (there is much larger overhead, then > it runs at 16 bytes in 3 clocks). > But the majority of kernel calls won't be near that length. > There will also be extra overhead for big-endian systems and those > without a fast ffs(). I've had a further thought on that as well. The 'byte masking' code is somewhat larger (112 rather than 32 or 48). While the extra overhead is ~20 clocks, that is less than a 'branch mispredict' penalty that the byte loop suffers every time the length changes. So for randomly changing lengths I'm beginning to think the 'byte mask' version is better. I ran the code on a Haswell a while back, the break even length was also somewhat shorter (I'm remembering 32 bytes). This all means the byte masking code may actually be sensible provided. - LE or BE with byte swapping memory read. - fast ffsl() - 64bit David