From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5DFE34EEEE for ; Fri, 27 Mar 2026 22:49:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774651773; cv=none; b=VpJi1Fs0oXMrKMVU2wwmgFkxFCegtFkykF8d8JrmxtKdRdmKhMtAwJnnc6smirsRzjHtyeU/C31yBUcyRHlZLRMwAjsr+z/zYJDhNA0kfCtXcQFvzSPD/GMdD8/fVu1BmzT+SeQ9aR7By46Ebatw4TPckXtNA3pqEckMpgLzHNY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774651773; c=relaxed/simple; bh=SLT6pPwsHaBokjVl8eIUXxkeSr54MoOnOtASW7ZtGXk=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Gj2N8708C16HKcinV7aKknMmzubOoTh0JaLWjWFlSpHaKPcuq9fJfaaTA5tBPYoVwf3lkKJBMhruYq6IsmDm1tsQKBN7Qr4DHzlgQcBoEpGgWoEkGpFaubwh7xrBXPuRU1Sa4ac/Xp8Qw0OnvUa2UA8reMgTmQACaKhkeDm5mgU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=tKB/FG4n; arc=none smtp.client-ip=209.85.221.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="tKB/FG4n" Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-439b6d9c981so1666821f8f.1 for ; Fri, 27 Mar 2026 15:49:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774651770; x=1775256570; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=7JYL4utRBXc9JpGjI8ycBBhWcwQT8ai5I1Ne6KWWFZ4=; b=tKB/FG4n/fene3SVNxEZb3hZw+pPNKDcWpYB7kVZAnRBYNb3sPJr3hyWQsDrNCjdf0 GXj/xysdMfBg2FNxZsDM1YnOQcmZfUtksv7sUpLyTGCGJlfvUHY7hai28h+H56jCrzZG MQHxCdr3r3BINCKhCloYWTD5f/nN5sZGzkjlNb73XPmj7zmtknxiErvhid/nZROJ18SZ /dE/nso8C5RJQyU1/E1oR6pnNgJBWqsg//TQ/8eeY9FrfTEYIceZIvdPr67g6bgru4ss uFE0LfTTqc8MBqxkERQY0lc3WS9KPfnxg9E+c6O96OntpFGkYIrvoyUmI98/kGQsZXtR xBcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774651770; x=1775256570; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=7JYL4utRBXc9JpGjI8ycBBhWcwQT8ai5I1Ne6KWWFZ4=; b=WJm7b/HgXLHpMH6TyhG77ltgUzsUHzpLTztdrMccS58vy2Xg+ZLaeVx6YtSawBbLKz UznndX5hAr2cca6ZSuXkEyDWUZHsnRr5Rt+eYmw+VyWMcnjO2LzP0Fp2A0etPiB5cBr/ q6xOdLRG5fE8KEjClv7pBGKTTKGiJUEJCaPRDhv7ZzP0NYa5BkWZqTzkSBz3DDWUkRCY rEGNa/cogS98H0gQaX+IhrWnokVh4WTUuZn9TYGIURkq1xu3e6Nme133HatotWZAPgaZ rdNw2VAxaCzCqlJb8iHS1L5Ygusx5+GqTRNXSU4B1PPSQrGBWX9UEZwcq3i0vgYoxEk3 B+3g== X-Forwarded-Encrypted: i=1; AJvYcCWu35cFJJnhP5TA/kNTYuZzYcrx8fNcjYKdTBuNPFeUJ8Peoknn/HAGFwDCj7azRFokzibDcWSTaBY9uvk=@vger.kernel.org X-Gm-Message-State: AOJu0YyTCSOM1vSmm9nxvm0rBZOGTMNMQOgAAoTyyrVMuR9MtEH9sbdh IHYJeHQpnuBN+rAJfqSUYvGJWmpa+tnSFFLcYZV4350dVGjDBe0bfqNo X-Gm-Gg: ATEYQzyqZoFhpaKpGhA3TGDDxWwjpSDA7IDjxOw4CSd31N2X/OnmvoSDoo2efTYXLXW 9Gd0wgIksTn0VinipPOSr3NvtCh6YtK/sbifN2aACdkd4mE8N+csEH9EllkGwKDxRV9THl7n87W rn3+m7/7Ae+eHDxFDtSyRIu2Ony6CrDhFt/KjjBeWdCfK/BtG9QIAEDHCyuKE52obCVTxO1SK9J Q9TfMmgXoYjrKbv3B4ulxRS435yc4M2BSx7Ajly0KXbm42I6VEF6jvO7vGe2ZaYcPr9R71JvKO2 i1FCtBxBzs14AJpeLWNjOXsqWbwFXyM9maRtsyV1pz/zhfMIOWhhEOxlIOWkOik7T+PIu7DVAOK VUGN9xN++Z5f9pJQsBNQC6P1K9g48am4IPjyMSCPRinIIrNLoq6wvkXBI7KldfToEGJQzAJ7BCL v//526lRrgawBp2UVjx3oKNBJuHnGeggnefwl+9O/QeF65goLsg5PrVyNsGtTtcMbC X-Received: by 2002:a05:6000:605:b0:439:bee4:8a93 with SMTP id ffacd0b85a97d-43b9eac532fmr6834514f8f.12.1774651769760; Fri, 27 Mar 2026 15:49:29 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43cf21e265fsm1245770f8f.1.2026.03.27.15.49.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2026 15:49:29 -0700 (PDT) Date: Fri, 27 Mar 2026 22:49:28 +0000 From: David Laight To: Linus Torvalds Cc: Andrew Morton , Kees Cook , Andy Shevchenko , linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org Subject: Re: [PATCH next] string: Optimise strlen() Message-ID: <20260327224928.7c4220cb@pumpkin> In-Reply-To: References: <20260327195737.89537-1-david.laight.linux@gmail.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 27 Mar 2026 13:37:29 -0700 Linus Torvalds wrote: > On Fri, 27 Mar 2026 at 12:57, wrote: > > > > Using 'byte masking' is faster for longer strings - the break-even point > > is around 56 bytes on the same Zen-5 (there is much larger overhead, then > > it runs at 16 bytes in 3 clocks). > > What byte masking approach did you actually use? This is the code I was testing. It does aligned accesses - I did measure it without the alignment code, made little/no difference. The OPTIMIZER_HIDE_VAR() is needed to stop gcc generating different 64bit constants and to make it generate the constant in a sane way (especially on architectures with only 16bit immediates). size_t strlen_longs(const char *s) { unsigned int off = (unsigned long)s % sizeof (long); const unsigned long *p = (void *)(s - off); unsigned long ones = 0x01010101ul; unsigned long val; unsigned long mask; int first = 1; OPTIMIZER_HIDE_VAR(ones); ones |= ones << 16 << 16; mask = (~0ul >> 8) >> 8 * (sizeof (long) - 1 - off); // I've just realised that might be better as: // mask = ones >> 1 + 8 * (sizeof (long) - 1 - off); // which has the right properties and stops the compiler generating // 0x00ffffffffffffff val = *p | mask; do { if (!first) val = *++p; first = 0; mask = (val - ones) & ~val & (ones << 7); } while (!mask); off = (__builtin_ffsl(mask) - 1)/8; return (const char *)p + off - s; } That loop is the one that compiled best, ISTR it has a 'spare' register move in it ('first' gets optimised out). On many BE systems doing a byteswapping memory read may be best. > We have 'lib/strnlen_user.c', which is actually the only strlen() in > the kernel that I've really ever seen in profiles (it shows up for > execve() with lots of arguments). > > That has tons of extra overhead due to the whole user access setup, > but the core loop should be pretty good with that has_zero() thing. I've not measured strnlen(), but it wouldn't surprise me if argv[] processing wouldn't be faster with something like the strlen() in this patch. After all arguments are usually relatively short. If you were going to use the above then both 'ones' and 'ones << 7' need so be calculated once and kept in registers. > I do agree that we shouldn't use 'rep scas'. It goes back to the > *very* original linux kernel sources, though, and I've never seen it > in profiles because very few things in the kernel actually use strings > a lot. True, and most are short. strscpy() is next on the list... And the arm64 strlen() has special code to optimise crossing page boundaries. God knows how slow it is on your typical 10 character string. David > > Linus