From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D977C433ED for ; Fri, 16 Apr 2021 18:40:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 57A4F61152 for ; Fri, 16 Apr 2021 18:40:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244624AbhDPSkf (ORCPT ); Fri, 16 Apr 2021 14:40:35 -0400 Received: from mail.efficios.com ([167.114.26.124]:36108 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236156AbhDPSke (ORCPT ); Fri, 16 Apr 2021 14:40:34 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id D98F3339EA4; Fri, 16 Apr 2021 14:40:08 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id MfEER9SYlT-k; Fri, 16 Apr 2021 14:40:08 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 2624433A1B4; Fri, 16 Apr 2021 14:40:08 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 2624433A1B4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1618598408; bh=WCKBxmGTFBeDZ28R3kM5fNv86SBZcZ3ck6Twung5YT8=; h=Date:From:To:Message-ID:MIME-Version; b=ktLdyLF6AvrGq+C+41wREeiPdB4ypB3ZWp8NOWQVioOyPUY55/P73+SPiAF67HVxV zudr3HcbJWyGze3vmNjieFQP9/JlM794qXUKQE+zpVvkFC+wFiW6UsvIqMp80VSd7b cRHzJ8biOfYBw4DPSgpDpoJUjNXDRSdI5YT/gHtqxIJ9Tq0PhVpx8gtkSGNM3+Y2ww qXHHfnbEunp9c2WqOw/3qMPY3CbUYOuQPT266G8jbn5Yi7NaIIFyCdrwrV4SrL+bsL A4sMX1ZFiBoQEHx5icfkz3wjx2Dt9zKA1uV8ENNulEWWqP8jmvJf5TiV3NlGkyNLD+ OC3SxlsMTWs6A== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id DbH28hQ8dRPq; Fri, 16 Apr 2021 14:40:08 -0400 (EDT) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 1431033A202; Fri, 16 Apr 2021 14:40:08 -0400 (EDT) Date: Fri, 16 Apr 2021 14:40:08 -0400 (EDT) From: Mathieu Desnoyers To: paulmck Cc: Peter Zijlstra , Will Deacon , linux-kernel , lttng-dev , carlos Message-ID: <2089952450.84139.1618598408015.JavaMail.zimbra@efficios.com> In-Reply-To: <20210416160139.GF4212@paulmck-ThinkPad-P17-Gen-1> References: <1680415903.81652.1618584736742.JavaMail.zimbra@efficios.com> <20210416160139.GF4212@paulmck-ThinkPad-P17-Gen-1> Subject: Re: liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3996 (ZimbraWebClient - FF87 (Linux)/8.8.15_GA_4007) Thread-Topic: liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ? Thread-Index: GN3Gd8n4H94A1nWxv3DRudXtV0FXrQ== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Apr 16, 2021, at 12:01 PM, paulmck paulmck@kernel.org wrote: > On Fri, Apr 16, 2021 at 05:17:11PM +0200, Peter Zijlstra wrote: >> On Fri, Apr 16, 2021 at 10:52:16AM -0400, Mathieu Desnoyers wrote: >> > Hi Paul, Will, Peter, >> > >> > I noticed in this discussion https://lkml.org/lkml/2021/4/16/118 that LTO >> > is able to break rcu_dereference. This seems to be taken care of by >> > arch/arm64/include/asm/rwonce.h on arm64 in the Linux kernel tree. >> > >> > In the liburcu user-space library, we have this comment near rcu_dereference() >> > in >> > include/urcu/static/pointer.h: >> > >> > * The compiler memory barrier in CMM_LOAD_SHARED() ensures that >> > value-speculative >> > * optimizations (e.g. VSS: Value Speculation Scheduling) does not perform the >> > * data read before the pointer read by speculating the value of the pointer. >> > * Correct ordering is ensured because the pointer is read as a volatile access. >> > * This acts as a global side-effect operation, which forbids reordering of >> > * dependent memory operations. Note that such concern about dependency-breaking >> > * optimizations will eventually be taken care of by the "memory_order_consume" >> > * addition to forthcoming C++ standard. >> > >> > (note: CMM_LOAD_SHARED() is the equivalent of READ_ONCE(), but was introduced in >> > liburcu as a public API before READ_ONCE() existed in the Linux kernel) >> > >> > Peter tells me the "memory_order_consume" is not something which can be used >> > today. >> > Any information on its status at C/C++ standard levels and implementation-wise ? > > Actually, you really can use memory_order_consume. All current > implementations will compile it as if it was memory_order_acquire. > This will work correctly, but may be slower than you would like on ARM, > PowerPC, and so on. > > On things like x86, the penalty is forgone optimizations, so less > of a problem there. OK > >> > Pragmatically speaking, what should we change in liburcu to ensure we don't >> > generate >> > broken code when LTO is enabled ? I suspect there are a few options here: >> > >> > 1) Fail to build if LTO is enabled, >> > 2) Generate slower code for rcu_dereference, either on all architectures or only >> > on weakly-ordered architectures, >> > 3) Generate different code depending on whether LTO is enabled or not. AFAIU >> > this would only >> > work if every compile unit is aware that it will end up being optimized with >> > LTO. Not sure >> > how this could be done in the context of user-space. >> > 4) [ Insert better idea here. ] > > Use memory_order_consume if LTO is enabled. That will work now, and > might generate good code in some hoped-for future. In the context of a user-space library, how does one check whether LTO is enabled with preprocessor directives ? A quick test with gcc seems to show that both with and without -flto cannot be distinguished from a preprocessor POV, e.g. the output of both gcc --std=c11 -O2 -dM -E - < /dev/null and gcc --std=c11 -O2 -flto -dM -E - < /dev/null is exactly the same. Am I missing something here ? If we accept to use memory_order_consume all the time in both C and C++ code starting from C11 and C++11, the following code snippet could do the trick: #define CMM_ACCESS_ONCE(x) (*(__volatile__ __typeof__(x) *)&(x)) #define CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p) #if defined (__cplusplus) # if __cplusplus >= 201103L # include # define rcu_dereference(x) ((std::atomic<__typeof__(x)>)(x)).load(std::memory_order_consume) # else # define rcu_dereference(x) CMM_LOAD_SHARED(x) # endif #else # if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L) # include # define rcu_dereference(x) atomic_load_explicit(&(x), memory_order_consume) # else # define rcu_dereference(x) CMM_LOAD_SHARED(x) # endif #endif This uses the volatile approach prior to C11/C++11, and moves to memory_order_consume afterwards. This will bring a performance penalty on weakly-ordered architectures even when -flto is not specified though. Then the burden is pushed on the compiler people to eventually implement an efficient memory_order_consume. Is that acceptable ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com