From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98498C83F11 for ; Sat, 26 Aug 2023 03:18:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231694AbjHZDSM (ORCPT ); Fri, 25 Aug 2023 23:18:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231769AbjHZDSB (ORCPT ); Fri, 25 Aug 2023 23:18:01 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5646010EF for ; Fri, 25 Aug 2023 20:17:58 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1b52875b8d9so83055ad.0 for ; Fri, 25 Aug 2023 20:17:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693019878; x=1693624678; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=NTpWWoAzCcGiRnwVxW+wyPGfNXm/UmsRStzWL2401ms=; b=teQLc7wAUgGrER8vTD1pcB6PLRVdVTwkVG9WKMNyN6+TCd5oCGlJESlTncu6f8VdQP qUYGszWCYKw4qYMn5yc0yrthvYgBgSEjpKMKYnCGVHpK2dSNWIYAOBz09QLbTDTK/1Q1 e+bYeitkmm9iaYQYhYBPqMj2y0l7jOevVP32SGNN4ABk2Tj4mQsEobj1s5kMItYq8nJe 4PoByq3CP7G/b9yV8es943koSrvFzG260D6zZwop7NaSILAS5xQVzp/8V2AjKtGPDYsg 7Bbp6tmLgIdhfeHhg3uup4RLFA6hYT+jQ9mmApTUGNAFlg1hGDuhfthJ45DULTmXbV4G iMHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693019878; x=1693624678; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NTpWWoAzCcGiRnwVxW+wyPGfNXm/UmsRStzWL2401ms=; b=km7XrWc39lThTQriceVMijMGsAvFEDYP9XVOsjCCV0tTwpvU1lmuxv9ANm+sUGp5hW DEVPIHO+uqzNUpk1H7ZlqZEiWlHYB9liq9ZSVba2Erbvqv+5nV1eMs6uk3xQa9iRlnbw 7/zh81IYS+hMGnh2UXPpWZmEANU+G4EubeppLJnqk+AaqIlHiQPcH2ve0ahTaID6xrkx NI2JkpMLyc1fngarIY/IS0zeh82VBq3eCec4Cqnpdim+Pqew6nakyMpVV5VoWZloRVMb hJpf3lLNYGY9AB73ZxG/YvZRBY0n1oOKGFl/SypFaImfhBezumaN+Zv6LfiUFkSd4eB0 CNEA== X-Gm-Message-State: AOJu0YxnpdM4MKOKxzysKAVkH7rKLgDQUsdd4rcvRXUwF7rf0aDAqQsV VijgAZI3igp5OQaKimTWb6wVdg== X-Google-Smtp-Source: AGHT+IFgTjL85NMW7QVTfT1LIEunufc8m3DLUqvJfm8kEADFDEa+5LGQPIut13OdOpGUuedULTjPsw== X-Received: by 2002:a17:902:c703:b0:1bd:b75a:e95f with SMTP id p3-20020a170902c70300b001bdb75ae95fmr128179plp.0.1693019877605; Fri, 25 Aug 2023 20:17:57 -0700 (PDT) Received: from google.com ([2620:15c:2d3:205:6671:452e:1913:d899]) by smtp.gmail.com with ESMTPSA id a14-20020a62bd0e000000b006870ff20254sm2251739pff.125.2023.08.25.20.17.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Aug 2023 20:17:57 -0700 (PDT) Date: Fri, 25 Aug 2023 20:17:52 -0700 From: Fangrui Song To: Linus Torvalds Cc: Nick Desaulniers , Bill Wendling , Helge Deller , Nathan Chancellor , linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org, Andrew Morton , Chanho Min , Geert Uytterhoeven , Greg Kroah-Hartman , Kees Cook , clang-built-linux , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Subject: Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels Message-ID: <20230826031752.gongmxkr56zyycol@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-parisc@vger.kernel.org On 2023-08-25, Linus Torvalds wrote: >On Fri, 25 Aug 2023 at 17:52, Nick Desaulniers wrote: >> >> So 2 concerns where "I'll do it in inline asm" can pessimize codegen: >> 1. You alluded to this, but what happens when one of these functions >> is called with a constant? > >This is why our headers have a lot of __builtin_constant_p()'s in them.. > >In this particular case, see the x86 asm/bitops.h code: > > #define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : >variable_ffs(x)) For the curious (like me), __builtin_ffs https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fffs says Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. The hangling of 0 seems the cause that __builtin_ffs codegen is not as well as inline asm. Clang implemented the builtin in 2008 and took the same constraint (penalty). GCC compiles __builtin_ctzl(x) to xorl %eax, %eax; tzcntq %rdi, %rax on most Intel processors (AMD -march= values are unaffected). The extra xor is due to a false dependency issue https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=73543b2286027da1de561902440b53f775a03a86 Inline asm wins here as well since we know the argument 0 is undefined. In May 2023, https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cc6eb8b51f9568ae0caf46b80e2a0aff050030ce "Disable avoid_false_dep_for_bmi for atom and icelake(and later) core processors." removed the extra xor for icelake (and later) core processors. >but this is actually quite a common pattern, and it's often not about >something like __builtin_ffs() at all. > >See all the other __builtin_constant_p()'s that we have in that same >file because we basically just use different code sequences for >constants. > >And that file isn't even unusual. We use it quite a lot when we care >about code generation for some particular case. > >> 2. by providing the definition of a symbol typically provided by libc >> (and then not building with -ffreestanding) pessimizes libcall >> optimization. > >.. and this is partly why we often avoid libgcc things, and do certain >things by hand. > >The classic rule is "Don't do 64-bit divisions using the C '/' operator". > >So in the kernel you have to use do_div() and friends, because the >library versions are often disgusting and might not know that 64/32 is >much much cheaper and is what you want. > >And quite often we simply use other names - but then we also do *not* >build with -freestanding, because -freestanding has at least >traditionally meant that the compiler won't optimize the simple and >obvious cases (typically things like "memcpy with a constant size"). > >So we mix and match and pick the best option. > >The kernel really doesn't care about architecture portability, because >honestly, something like "ffs()" is entirely *trivial* to get right, >compared to the real differences between architectures (eg VM and IO >differences etc). > > Linus >