From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97500C433EF for ; Fri, 18 Mar 2022 23:52:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241541AbiCRXx2 convert rfc822-to-8bit (ORCPT ); Fri, 18 Mar 2022 19:53:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232324AbiCRXx0 (ORCPT ); Fri, 18 Mar 2022 19:53:26 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7FFA814925C for ; Fri, 18 Mar 2022 16:52:05 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mtapsc-7-vQC6cHK1MJW3GTkXq5uV1A-1; Fri, 18 Mar 2022 23:52:02 +0000 X-MC-Unique: vQC6cHK1MJW3GTkXq5uV1A-1 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) by AcuMS.aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Fri, 18 Mar 2022 23:52:02 +0000 Received: from AcuMS.Aculab.com ([fe80::994c:f5c2:35d6:9b65]) by AcuMS.aculab.com ([fe80::994c:f5c2:35d6:9b65%12]) with mapi id 15.00.1497.033; Fri, 18 Mar 2022 23:52:01 +0000 From: David Laight To: 'Segher Boessenkool' , Linus Torvalds CC: Andrew Cooper , Nick Desaulniers , "H. Peter Anvin" , Bill Wendling , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , Nathan Chancellor , "Juergen Gross" , Peter Zijlstra , "Andy Lutomirski" , "llvm@lists.linux.dev" , LKML , linux-toolchains Subject: RE: [PATCH v5] x86: use builtins to read eflags Thread-Topic: [PATCH v5] x86: use builtins to read eflags Thread-Index: AQHYOxyCBe9clLf+8ESM+1cEdgpAzqzFyeaQ Date: Fri, 18 Mar 2022 23:52:01 +0000 Message-ID: <04f65d1a90f640d4943c810f37016b01@AcuMS.aculab.com> References: <20220210223134.233757-1-morbo@google.com> <20220301201903.4113977-1-morbo@google.com> <20220318230425.GT614@gate.crashing.org> In-Reply-To: <20220318230425.GT614@gate.crashing.org> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=C51A453 smtp.mailfrom=david.laight@aculab.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: linux-toolchains@vger.kernel.org From: Segher Boessenkool > Sent: 18 March 2022 23:04 ... > The vast majority of compiler builtins are for simple transformations > that the machine can do, for example with vector instructions. Using > such builtins does *not* instruct the compiler to use those machine > insns, even if the builtin name would suggest that; instead, it asks to > have code generated that has such semantics. So it can be optimised by > the compiler, much more than what can be done with inline asm. Bah. I wrote some small functions to convert blocks of 80 audio samples between C 'float' and the 8-bit u-law and A-law floating point formats - one set use the F16C conversions for denormalised values. I really want the instructions I've asked for in the order I've asked for them. I don't want the compiler doing stupid things. (Like deciding to try to vectorise the bit of code at the end that handled non 80 byte blocks.) > It also can be optimised better by the compiler than if you would > open-code the transforms (if you ask to frobnicate something, the > compiler will know you want to frobnicate that thing, and it will not > always recognise that is what you want if you just write it out in more > general code). Yep. If I write 'for (i = 0; i < n; i++) foo[i] = bar[i]' I want a loop - not a call to memcpy(). If I want a memcpy() I'll call memcpy(). And if I write: do { sum64a += buff32[0]; sum64b += buff32[1]; sum64a += buff32[2]; sum64b += buff32[3]; buff += 4; } while (buff != lim); I don't want to see 'buff[1] + buff[2]' anywhere! That loop has half a chance of running at 8 bytes/clock. But not how gcc compiles it. > Well-chosen builtin names are also much more readable than the best > inline asm can ever be, and it can express much more in a much smaller > space, without so much opportunity to make mistakes, either. Hmmm... Trying to write that SSE2/AVX code was a nightmare. Chase through the cpu instruction set trying to sort out the name of the required instruction. Then search through the 'intrinsic' header to find the name of the builtin. Then disassemble the code to check the I'd got the right one. I'm pretty sure the asm would have been shorter and needed just as many comments. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)