From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDA0525A2CF for ; Sun, 11 Jan 2026 23:57:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768175874; cv=none; b=gZVv4O92I5dOD6LOUdcoAA3Tz5M9hIMX51uTltnxa+ZIiHN5btFFg+ORLekReTT8NNmsh2S1ZarrINUb39ayegk4leokBc3lWNfKYlfFu7G5LfViq6jJ1hlL2DjS8P1anA9aTRMZerIiuCfsUPVYOtEnNc13Q7XEoYepaK6G3qs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768175874; c=relaxed/simple; bh=lRgOyHOLfdp6FjUPlcweg2QqyV1F3yOnGIj7KoKg/Ok=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=niTFL/UqsbjHzIlO8wQD/D63PZdNuhO9RnODfR5btxle5LrVRPK5OTI6I5/BYBwm2MNyZrpK56Mmu07fGF4wEf4cG5wywyLV5EJMnaH8pWSy1fEmfCaLh7JWqwe8JlAwsudh0SR1zqA2YIS+fK2vOOx6xUQ2UCEP8ZU57RXMeQE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mC4Myb+p; arc=none smtp.client-ip=209.85.128.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mC4Myb+p" Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-477a2ab455fso53324345e9.3 for ; Sun, 11 Jan 2026 15:57:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768175871; x=1768780671; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=mNLTTCKX0Dd+kcg9LG3wJtwsRHM82cDAkp6Wh3sE4oo=; b=mC4Myb+pCcrP8hPKtPrQbSu4spZ1+bMRnxti6ku8jinThq/rzgqlB8E8s8iHjT6azm X7lb8kiOp4U+8oqGivVXwKfvReCqeS3AV9x0FXRyd+95ygEKjK5HuoIOSkwERmw2bqkE GIgNzMRsPOXfTJLiW3VDT2jW0VDzrSGZrXODC5yKAigkxoSRrJAAvs7w8n02yQsq5a6K LW0t0xaKFcJbXFvSnGY5KoBIylz3xnGkTQeexGoTllPBZgYPb8O2BTSnHwqWcLeeHJQy eZEQpSkO6FxzVVrmwqv/Rg5bYWZlYbkiMzEvyNNot8PntI6cyzY7NWRclaJapUo/Lo7Q sqCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768175871; x=1768780671; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=mNLTTCKX0Dd+kcg9LG3wJtwsRHM82cDAkp6Wh3sE4oo=; b=TXStn6QEvpu4OWx9OjwhoPV2WHV9emneol48O/XQaxrMf8kgibPEYRBJU4BUy+aUWY Vbwq6rWH5Ip2o06IKf0K976FP+IhaZvBZVIocwVM4wD/mmXFdltQa8MFlJaHnK3VmkSp jVBhiNY27FsA6tgWqUEBZVHUlIhmp5wjezjkAc4q1kVsRmSl8XzToWUpW3+7gKDr2VBI YxGJ9nmAWQh0eZ1/uHuGokDz16rX8ur7kX3qQIbR+iPfeKjgxK1YjaisOa4KC+CCfJXB YMJtNnadT82ywfCCykrss2n/c57PNHDtpK0m+hYh/glDggl09MXywSzeH/hVK8OYYD26 FBYg== X-Forwarded-Encrypted: i=1; AJvYcCXuwRGVoCGdAssAN9bzoHI62UsAdQz7KM7/gj94OmVKd3k5VYjkITdhsbwSC+lBvHUBbkf2cS0Zf8nyzqw=@vger.kernel.org X-Gm-Message-State: AOJu0YxCDMMO3kx2CL51qOgDHNa1Q2f72FTZCE+qnJtYBDDHdw/j33UD S/IRhnKKJNO0ufBlrVdp1tFKK3Y0x0yyfxKsYtRqprb+wZjByoL/JUhi X-Gm-Gg: AY/fxX5OrA3/pint/zEDFbIT16E8zTblOnB+l1Os7wYpPBso+KL/mOWAbtQic03LUMK aLotaJ63+EtSXxNsF7bUOMZuMf/wH704lnv9GH2cDcCJ4qStNRxzhw0ARueLPpVINF9K89Z81XX Nqx82Hcf9iXphoQV3o5dLJ3Upk2UQTsV403vImIN5bXCp0XHQD2gkIrRhXC64ksHXqPcNGQzVXu a58v5sPnw2m1NvhId40gj8byFMh5Hgb2HstVlWlg38U5QfKu+z7968KC/wugqHAz1WPPqFIt8FT M+Lyw71Ph4oS061M0j0uHxoBC/JNV7E+Y+4po+fv7wVLWyTfk6Y/Jo3R+YlVF0NyiCoWSrsMRYU GfM+0xeDkBIq2zczDB0BVeeoTDD+HDlZ8tCDoXPe8S5M7D1lCAAAoA12GE509fXIF1sTFN904Te AAh9qddo5eLVJSIkPhdykgOMJwdtcjT9zpYcK/chGakYiTrzZOLORy3bytgYhqRYw= X-Google-Smtp-Source: AGHT+IFwJeb0QhiFQBH+nu5LeniBU/zQ6kOlB7s8xpq+ZeAjh6sYrxJwYLk0/eegc+LpQ7Y2MRh46A== X-Received: by 2002:adf:f311:0:b0:431:808:2d45 with SMTP id ffacd0b85a97d-432c374f499mr13982743f8f.2.1768175870961; Sun, 11 Jan 2026 15:57:50 -0800 (PST) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-432bd5ee893sm35007582f8f.37.2026.01.11.15.57.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Jan 2026 15:57:50 -0800 (PST) Date: Sun, 11 Jan 2026 23:57:45 +0000 From: David Laight To: "Maciej W. Rozycki" Cc: Yury Norov , Petr Tesarik , Yury Norov , Rasmus Villemoes , Richard Henderson , Matt Turner , Magnus Lindholm , Vineet Gupta , Geert Uytterhoeven , Thomas Bogendoerfer , Madhavan Srinivasan , Michael Ellerman , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Chris Zankel , Max Filippov , Patrik Jakobsson , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter , Robin Murphy , Joerg Roedel , Will Deacon , Jakub Kicinski , Andrew Lunn , "David S. Miller" , Eric Dumazet , Paolo Abeni , Oliver Neukum , Arnd Bergmann , Kuan-Wei Chiu , Andrew Morton , Marcel Holtmann , Johan Hedberg , Luiz Augusto von Dentz , Pablo Neira Ayuso , Florian Westphal , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/2] treewide, bits: use ffs_val() where it is open-coded Message-ID: <20260111235745.53d16a59@pumpkin> In-Reply-To: References: <1ce341045ad0487c66ca21003f9974916aba0bce.1767975412.git.ptesarik@suse.com> <20260110115439.184b7a66@pumpkin> <20260111104002.3e017fe2@pumpkin> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 11 Jan 2026 21:22:03 +0000 (GMT) "Maciej W. Rozycki" wrote: > On Sun, 11 Jan 2026, David Laight wrote: > > > > It's 4 MIPS instructions and exactly the same machine code produced as > > > compiler output from `is_power_of_2'. > > > > The compiler must be detecting that x == (x & -x) is the same as > > (x & (x - 1)) == 0. > > There are 3 basic operations either way, so unless a CPU has an odd > instruction that combines any, there's no way to reduce the instruction > count. > > > Although for MIPS the former is negate, and, beq(x,y) - so 3 instructions. > > On x86 it is negate, and, cmp, beq(zero flag) - one extra instruction. > > (The 4th MIPS instruction will be beq(syn,0).) > > This code only ever runs on R2000/R3000 and R4000/R4400 processors, on > which the cost of an untaken branch is nil. The cost of a taken branch on > the former ones is nil too, with their 4-stage pipelines, and while there > is a fixed penalty on the latter ones, the compiler should be able to > arrange code here such that at most one branch is taken anyway. > > > So maybe s/Badly/In an unusual way/ > > What's unusual in this way of determining that exactly one bit is set in > a word? It's as good as any I suppose. > > > > If you know of a better alternative > > > > Doing is_power_of_2() with only one conditional branch would be a gain. > > But I've not thought about it hard enough to find one. > > Sure you can, either of: > > (x != 0) & (x == (x & -x)) > > and: > > (x != 0) & ((x & (x - 1)) == 0) > > will do, but it's 5 instructions on MIPS, so no gain here. > > If you have a population count instruction such as CTPOP on EV67 Alpha, > you can do it in two instructions, such as: > > ctpop $16, $0 > cmpeq $0, 1, $0 > > so there seems to be room for improvement here, but GCC stubbornly refuses > to produce this instruction sequence for this code: > > bool is_power_of_2(unsigned long n) > { > return __builtin_popcountl(n) == 1; > } > > apparently owing to no insn cost set for the POPCOUNT RTX in the backend > causing the middle end to pick up some default. Instead GCC comes up with > what can be coded in C as: > > bool is_power_of_2(unsigned long n) > { > return n - 1 < (n ^ (n - 1)); > } Not seen that one - and not thought of popcnt, far too new for me. (I learnt asm for a pdp11 first...) > which seems an interesting alternative that produces 3 instructions only > across Alpha, MIPS and RISC-V targets. It's 5 instructions on POWER and > x86-64 vs 8 and 9 respectively with our current implementation. There are > no branches produced here, although an inline expansion will likely end > with one. I'm seeing: is_power_of_2: leaq -1(%rdi), %rax xorq %rax, %rdi cmpq %rdi, %rax setb %al retq on x86-64 - which is really 3 instructions. While using the popcnt instruction would reduce it by one, it will only be faster on amd zen cpu, on intel cpu popcnt has a latency of 3 and only executes on p1 (according to Agner). > > Overall it seems a worthwhile improvement even if still an instruction > longer than necessary for EV67 Alpha (and also for POWER, which also has > a population count operation, and which GCC does pick up in this case). > > FWIW on VAX, notorious for the lack of conditional set operations, it's 7 > instructions and 1 branch, down from 14 and 2 respectively. I worked for ICL through the 1980s so missed out on VAX. Picked up 8085, x86, sparc32 and m68k. > > I suspect there may not be one, but all these 'tricks' come from the 1970s > > (or earlier) and don't allow for the behaviour of modern cpu with multiple > > execution units and long pipelines. > > As we can see you never know. I've sent a code update proposal. I'd suggest moving is_power_of_2() to bits.h as: #define is_power_of_2(n) ({ auto _n = n; _n - 1 < _n ^ (_n - 1); }) and add: #define is_zero_or_power_of_2(n) ({ auto _n = n; _n & (_n - 1) != 0; }) > > The use of the population count operation can be considered separately, > but we'd have to be careful here as the quality of code produced by the > intrinsic varies, e.g. with pre-EV67 Alpha, RISC-V and VAX (!) GCC emits > code equivalent to my proposal, but with MIPS or x86-64 it resorts to a > libcall, so a guarding condition would be required. So I'd leave it for > another occasion. I'd guess that many of those popcnt instructions aren't actually as fast as the dec/xor pair. But the libcall is just plain stupid. I think that happens elsewhere - which is one reason __ffs() is x86 asm. > > Thanks for sparking this discussion overall! Nice wet Sunday bikeshed... David > > Maciej >