Re: Supporting New Memory Barrier Types in BPF

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Peilin Ye <yepeilin@google.com>
To: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Yonghong Song <yonghong.song@linux.dev>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf <bpf@vger.kernel.org>, Josh Don <joshdon@google.com>,
	Barret Rhoden <brho@google.com>, Neel Natu <neelnatu@google.com>,
	Benjamin Segall <bsegall@google.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	David Vernet <dvernet@meta.com>,
	Dave Marchevsky <davemarchevsky@meta.com>
Subject: Re: Supporting New Memory Barrier Types in BPF
Date: Thu, 1 Aug 2024 22:00:04 +0000	[thread overview]
Message-ID: <ZqwFZFbWxSNEUHfp@google.com> (raw)
In-Reply-To: <87v80kfhox.fsf@gnu.org>

Hi Jose,

On Thu, Aug 01, 2024 at 04:20:30PM +0200, Jose E. Marchesi wrote:
> > GCC behaves similarly.
> >
> > For program A:
> >
> >   long foo;
> >   
> >   long func () {
> >         return __sync_fetch_and_add(&foo, 1);
> >   }
> >
> > bpf-unknown-none-gcc -O2 compiles to:
> >
> >   0000000000000000 <func>:
> >      0:	18 00 00 00 00 00 00 00 	r0=0 ll
> >      8:	00 00 00 00 00 00 00 00 
> >     10:	b7 01 00 00 01 00 00 00 	r1=1
> >     18:	db 10 00 00 01 00 00 00 	r1=atomic_fetch_add((u64*)(r0+0),r1)
> >     20:	bf 10 00 00 00 00 00 00 	r0=r1
> >     28:	95 00 00 00 00 00 00 00 	exit
> >
> > And for program B:
> >
> >   long foo;
> >   
> >   long func () {
> >        __sync_fetch_and_add(&foo, 1);
> >         return foo;
> >   }
> >
> > bpf-unknown-none-gcc -O2 compiles to:
> >
> >   0000000000000000 <func>:
> >      0:	18 00 00 00 00 00 00 00 	r0=0 ll
> >      8:	00 00 00 00 00 00 00 00 
> >     10:	b7 01 00 00 01 00 00 00 	r1=1
> >     18:	db 10 00 00 00 00 00 00 	lock *(u64*)(r0+0)+=r1
> >     20:	79 00 00 00 00 00 00 00 	r0=*(u64*)(r0+0)
> >     28:	95 00 00 00 00 00 00 00 	exit
> >
> > Internally:
> >
> > - When compiling the program A GCC decides to emit an
> >   `atomic_fetch_addDI' insn, documented as:
> >
> >   'atomic_fetch_addMODE', 'atomic_fetch_subMODE'
> >   'atomic_fetch_orMODE', 'atomic_fetch_andMODE'
> >   'atomic_fetch_xorMODE', 'atomic_fetch_nandMODE'
> >
> >      These patterns emit code for an atomic operation on memory with
> >      memory model semantics, and return the original value.  Operand 0
> >      is an output operand which contains the value of the memory
> >      location before the operation was performed.  Operand 1 is the
> >      memory on which the atomic operation is performed.  Operand 2 is
> >      the second operand to the binary operator.  Operand 3 is the memory
> >      model to be used by the operation.
> >
> >   The BPF backend defines atomic_fetch_add for DI modes (long) to expand
> >   to this BPF instruction:
> >
> >       %w0 = atomic_fetch_add((<smop> *)%1, %w0)
> >
> > - When compiling the program B GCC decides to emit an `atomic_addDI'
> >   insn, documented as:
> >
> >   'atomic_addMODE', 'atomic_subMODE'
> >   'atomic_orMODE', 'atomic_andMODE'
> >   'atomic_xorMODE', 'atomic_nandMODE'
> >
> >      These patterns emit code for an atomic operation on memory with
> >      memory model semantics.  Operand 0 is the memory on which the
> >      atomic operation is performed.  Operand 1 is the second operand to
> >      the binary operator.  Operand 2 is the memory model to be used by
> >      the operation.
> >
> >   The BPF backend defines atomic_fetch_add for DI modes (long) to expand
> >   to this BPF instruction:
> >
> >       lock *(<smop> *)%w0 += %w1
> >
> > This is done for all targets. In x86-64, for example, case A compiles
> > to:
> >
> >   0000000000000000 <func>:
> >      0:	b8 01 00 00 00       	mov    $0x1,%eax
> >      5:	f0 48 0f c1 05 00 00 	lock xadd %rax,0x0(%rip)        # e <func+0xe>
> >      c:	00 00 
> >      e:	c3                   	retq   
> >
> > And case B compiles to:
> >
> >   0000000000000000 <func>:
> >      0:	f0 48 83 05 00 00 00 	lock addq $0x1,0x0(%rip)        # 9 <func+0x9>
> >      7:	00 01 
> >      9:	48 8b 05 00 00 00 00 	mov    0x0(%rip),%rax        # 10 <func+0x10>
> >     10:	c3                   	retq   
> >
> > Why wouldn't the compiler be allowed to optimize from atomic_fetch_add
> > to atomic_add in this case?
> 
> Ok I see.  The generic compiler optimization is ok.  It is the backend
> that is buggy because it emits BPF instruction sequences with different
> memory ordering semantics for atomic_OP and atomic_fetch_OP.
> 
> The only difference between fetching and non-fetching builtins is that
> in one case the original value is returned, in the other the new value.
> Other than that they should be equivalent.
> 
> For ARM64, GCC generates for case A:
> 
>   0000000000000000 <func>:
>      0:	90000001 	adrp	x1, 0 <func>
>      4:	d2800020 	mov	x0, #0x1                   	// #1
>      8:	91000021 	add	x1, x1, #0x0
>      c:	f8e00020 	ldaddal	x0, x0, [x1]
>     10:	d65f03c0 	ret
> 
> And this for case B:
> 
>   0000000000000000 <func>:
>      0:	90000000 	adrp	x0, 0 <func>
>      4:	d2800022 	mov	x2, #0x1                   	// #1
>      8:	91000001 	add	x1, x0, #0x0
>      c:	f8e20021 	ldaddal	x2, x1, [x1]
>     10:	f9400000 	ldr	x0, [x0]
>     14:	d65f03c0 	ret
> 
> i.e. GCC emits LDADDAL for both atomic_add and atomic_fetch_add internal
> insns.  Like in x86-64, both sequences have same memory ordering
> semantics.
> 
> Allright we are changing GCC to always emit fetch versions of sequences
> for all the supported atomic operations: add, and, or, xor.  After the
> change the `lock' versions of the instructions will not be generated by
> the compiler at all out of inline asm.
> 
> Will send a headsup when done.

Thanks for taking care of this!

Peilin Ye

next prev parent reply	other threads:[~2024-08-01 22:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-29 18:32 Supporting New Memory Barrier Types in BPF Peilin Ye
2024-07-30  1:28 ` Alexei Starovoitov
2024-07-30  3:49   ` Paul E. McKenney
2024-07-30  4:03     ` Alexei Starovoitov
2024-07-30  5:14   ` Yonghong Song
2024-07-31  1:19     ` Alexei Starovoitov
2024-07-31  3:51       ` Yonghong Song
2024-07-31 20:44         ` Peilin Ye
2024-07-31 23:17           ` Yonghong Song
2024-08-01  0:11             ` Peilin Ye
2024-08-01 12:47     ` Jose E. Marchesi
2024-08-01 14:20       ` Jose E. Marchesi
2024-08-01 16:44         ` Yonghong Song
2024-08-05 16:13           ` Jose E. Marchesi
2024-08-01 22:00         ` Peilin Ye [this message]
2024-08-06 19:22   ` Peilin Ye
2024-08-08 16:33     ` Alexei Starovoitov
2024-08-08 20:59       ` Peilin Ye
2024-09-16 21:14         ` Peilin Ye
2024-09-17  0:08           ` Peilin Ye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZqwFZFbWxSNEUHfp@google.com \
    --to=yepeilin@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=bsegall@google.com \
    --cc=davemarchevsky@meta.com \
    --cc=dvernet@meta.com \
    --cc=jemarch@gnu.org \
    --cc=joshdon@google.com \
    --cc=neelnatu@google.com \
    --cc=paulmck@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox