From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2DDDC4363C for ; Wed, 7 Oct 2020 10:31:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5EB56207EA for ; Wed, 7 Oct 2020 10:31:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726660AbgJGKbH (ORCPT ); Wed, 7 Oct 2020 06:31:07 -0400 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:42971 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726219AbgJGKbH (ORCPT ); Wed, 7 Oct 2020 06:31:07 -0400 Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id 097AUXEr006622; Wed, 7 Oct 2020 12:30:33 +0200 Date: Wed, 7 Oct 2020 12:30:33 +0200 From: Willy Tarreau To: Peter Zijlstra Cc: Florian Weimer , linux-toolchains@vger.kernel.org, Will Deacon , Paul McKenney , linux-kernel@vger.kernel.org, stern@rowland.harvard.edu, parri.andrea@gmail.com, boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr, akiyks@gmail.com, dlustig@nvidia.com, joel@joelfernandes.org, torvalds@linux-foundation.org Subject: Re: Control Dependencies vs C Compilers Message-ID: <20201007103033.GB6550@1wt.eu> References: <20201006114710.GQ2628@hirez.programming.kicks-ass.net> <875z7nm4qm.fsf@oldenburg2.str.redhat.com> <20201007093243.GB2628@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201007093243.GB2628@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.6.1 (2016-04-27) Precedence: bulk List-ID: X-Mailing-List: linux-toolchains@vger.kernel.org On Wed, Oct 07, 2020 at 11:32:43AM +0200, Peter Zijlstra wrote: > A branch that cannot be optimized away and prohibits lifting stores > over. One possible suggestion would be allowing the volatile keyword as > a qualifier to if. > > x = *foo; > volatile if (x > 42) > *bar = 1; > > This would tell the compiler that the condition is special in that it > must emit a conditional branch instruction and that it must not lift > stores (or sequence points) over it. This test is interesting, because if foo and bar are of the same type, nothing prevents them from aliasing and the compiler cannot make wild guesses on them (i.e. they may be plain memory as well as memory-mapped registers). Extending it like this shows a difference between the use of volatile and __atomic_{load,store}_n. While both are correct in that each access is properly performed, for an unknown reason the compiler decided to implement two distinct branches in the atomic case and to inflate the code: $ gcc -v gcc version 9.3.0 (GCC) $ cat foo-volatile.c long foobar(long *foo, long *bar) { *(volatile long *)bar = 10; if (*(volatile long *)foo <= 42) *(volatile long *)bar = 64; if (*(volatile long *)foo > 42) *(volatile long *)bar = 0; return *(volatile long *)bar; } $ gcc -c -O2 foo-volatile.c $ objdump -dr foo-volatile.o 0000000000000000 : 0: 48 c7 06 0a 00 00 00 movq $0xa,(%rsi) 7: 48 8b 07 mov (%rdi),%rax a: 48 83 f8 2a cmp $0x2a,%rax e: 7f 07 jg 17 10: 48 c7 06 40 00 00 00 movq $0x40,(%rsi) 17: 48 8b 07 mov (%rdi),%rax 1a: 48 83 f8 2a cmp $0x2a,%rax 1e: 7e 07 jle 27 20: 48 c7 06 00 00 00 00 movq $0x0,(%rsi) 27: 48 8b 06 mov (%rsi),%rax 2a: c3 retq $ cat foo-atomic.c long foobar(long *foo, long *bar) { __atomic_store_n(bar, 10, __ATOMIC_RELAXED); if (__atomic_load_n(foo, __ATOMIC_RELAXED) <= 42) __atomic_store_n(bar, 64, __ATOMIC_RELAXED); if (__atomic_load_n(foo, __ATOMIC_RELAXED) > 42) __atomic_store_n(bar, 0, __ATOMIC_RELAXED); return __atomic_load_n(bar, __ATOMIC_RELAXED); } $ objdump -dr foo-atomic.o 0000000000000000 : 0: 48 c7 06 0a 00 00 00 movq $0xa,(%rsi) 7: 48 8b 07 mov (%rdi),%rax a: 48 83 f8 2a cmp $0x2a,%rax e: 7e 10 jle 20 10: 48 8b 07 mov (%rdi),%rax 13: 48 83 f8 2a cmp $0x2a,%rax 17: 7f 17 jg 30 19: 48 8b 06 mov (%rsi),%rax 1c: c3 retq 1d: 0f 1f 00 nopl (%rax) 20: 48 c7 06 40 00 00 00 movq $0x40,(%rsi) 27: 48 8b 07 mov (%rdi),%rax 2a: 48 83 f8 2a cmp $0x2a,%rax 2e: 7e e9 jle 19 30: 48 c7 06 00 00 00 00 movq $0x0,(%rsi) 37: 48 8b 06 mov (%rsi),%rax 3a: c3 retq When building at -Os both produce the same code as the volatile version above. It *seems* to me that the volatile version always produces more optimal code, but is it always correct ? This is just an illustration of how tricky this can currently be and how confusing it can sometimes be for the developer to make sure the desired code is emitted in a few special cases. And just for this, having the compiler support more easily predictable constructs would be a nice improvement. Willy