From: Peilin Ye <yepeilin@google.com>
To: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Yonghong Song <yonghong.song@linux.dev>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
bpf <bpf@vger.kernel.org>, Josh Don <joshdon@google.com>,
Barret Rhoden <brho@google.com>, Neel Natu <neelnatu@google.com>,
Benjamin Segall <bsegall@google.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
David Vernet <dvernet@meta.com>,
Dave Marchevsky <davemarchevsky@meta.com>
Subject: Re: Supporting New Memory Barrier Types in BPF
Date: Thu, 1 Aug 2024 22:00:04 +0000 [thread overview]
Message-ID: <ZqwFZFbWxSNEUHfp@google.com> (raw)
In-Reply-To: <87v80kfhox.fsf@gnu.org>
Hi Jose,
On Thu, Aug 01, 2024 at 04:20:30PM +0200, Jose E. Marchesi wrote:
> > GCC behaves similarly.
> >
> > For program A:
> >
> > long foo;
> >
> > long func () {
> > return __sync_fetch_and_add(&foo, 1);
> > }
> >
> > bpf-unknown-none-gcc -O2 compiles to:
> >
> > 0000000000000000 <func>:
> > 0: 18 00 00 00 00 00 00 00 r0=0 ll
> > 8: 00 00 00 00 00 00 00 00
> > 10: b7 01 00 00 01 00 00 00 r1=1
> > 18: db 10 00 00 01 00 00 00 r1=atomic_fetch_add((u64*)(r0+0),r1)
> > 20: bf 10 00 00 00 00 00 00 r0=r1
> > 28: 95 00 00 00 00 00 00 00 exit
> >
> > And for program B:
> >
> > long foo;
> >
> > long func () {
> > __sync_fetch_and_add(&foo, 1);
> > return foo;
> > }
> >
> > bpf-unknown-none-gcc -O2 compiles to:
> >
> > 0000000000000000 <func>:
> > 0: 18 00 00 00 00 00 00 00 r0=0 ll
> > 8: 00 00 00 00 00 00 00 00
> > 10: b7 01 00 00 01 00 00 00 r1=1
> > 18: db 10 00 00 00 00 00 00 lock *(u64*)(r0+0)+=r1
> > 20: 79 00 00 00 00 00 00 00 r0=*(u64*)(r0+0)
> > 28: 95 00 00 00 00 00 00 00 exit
> >
> > Internally:
> >
> > - When compiling the program A GCC decides to emit an
> > `atomic_fetch_addDI' insn, documented as:
> >
> > 'atomic_fetch_addMODE', 'atomic_fetch_subMODE'
> > 'atomic_fetch_orMODE', 'atomic_fetch_andMODE'
> > 'atomic_fetch_xorMODE', 'atomic_fetch_nandMODE'
> >
> > These patterns emit code for an atomic operation on memory with
> > memory model semantics, and return the original value. Operand 0
> > is an output operand which contains the value of the memory
> > location before the operation was performed. Operand 1 is the
> > memory on which the atomic operation is performed. Operand 2 is
> > the second operand to the binary operator. Operand 3 is the memory
> > model to be used by the operation.
> >
> > The BPF backend defines atomic_fetch_add for DI modes (long) to expand
> > to this BPF instruction:
> >
> > %w0 = atomic_fetch_add((<smop> *)%1, %w0)
> >
> > - When compiling the program B GCC decides to emit an `atomic_addDI'
> > insn, documented as:
> >
> > 'atomic_addMODE', 'atomic_subMODE'
> > 'atomic_orMODE', 'atomic_andMODE'
> > 'atomic_xorMODE', 'atomic_nandMODE'
> >
> > These patterns emit code for an atomic operation on memory with
> > memory model semantics. Operand 0 is the memory on which the
> > atomic operation is performed. Operand 1 is the second operand to
> > the binary operator. Operand 2 is the memory model to be used by
> > the operation.
> >
> > The BPF backend defines atomic_fetch_add for DI modes (long) to expand
> > to this BPF instruction:
> >
> > lock *(<smop> *)%w0 += %w1
> >
> > This is done for all targets. In x86-64, for example, case A compiles
> > to:
> >
> > 0000000000000000 <func>:
> > 0: b8 01 00 00 00 mov $0x1,%eax
> > 5: f0 48 0f c1 05 00 00 lock xadd %rax,0x0(%rip) # e <func+0xe>
> > c: 00 00
> > e: c3 retq
> >
> > And case B compiles to:
> >
> > 0000000000000000 <func>:
> > 0: f0 48 83 05 00 00 00 lock addq $0x1,0x0(%rip) # 9 <func+0x9>
> > 7: 00 01
> > 9: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 10 <func+0x10>
> > 10: c3 retq
> >
> > Why wouldn't the compiler be allowed to optimize from atomic_fetch_add
> > to atomic_add in this case?
>
> Ok I see. The generic compiler optimization is ok. It is the backend
> that is buggy because it emits BPF instruction sequences with different
> memory ordering semantics for atomic_OP and atomic_fetch_OP.
>
> The only difference between fetching and non-fetching builtins is that
> in one case the original value is returned, in the other the new value.
> Other than that they should be equivalent.
>
> For ARM64, GCC generates for case A:
>
> 0000000000000000 <func>:
> 0: 90000001 adrp x1, 0 <func>
> 4: d2800020 mov x0, #0x1 // #1
> 8: 91000021 add x1, x1, #0x0
> c: f8e00020 ldaddal x0, x0, [x1]
> 10: d65f03c0 ret
>
> And this for case B:
>
> 0000000000000000 <func>:
> 0: 90000000 adrp x0, 0 <func>
> 4: d2800022 mov x2, #0x1 // #1
> 8: 91000001 add x1, x0, #0x0
> c: f8e20021 ldaddal x2, x1, [x1]
> 10: f9400000 ldr x0, [x0]
> 14: d65f03c0 ret
>
> i.e. GCC emits LDADDAL for both atomic_add and atomic_fetch_add internal
> insns. Like in x86-64, both sequences have same memory ordering
> semantics.
>
> Allright we are changing GCC to always emit fetch versions of sequences
> for all the supported atomic operations: add, and, or, xor. After the
> change the `lock' versions of the instructions will not be generated by
> the compiler at all out of inline asm.
>
> Will send a headsup when done.
Thanks for taking care of this!
Peilin Ye
next prev parent reply other threads:[~2024-08-01 22:00 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-29 18:32 Supporting New Memory Barrier Types in BPF Peilin Ye
2024-07-30 1:28 ` Alexei Starovoitov
2024-07-30 3:49 ` Paul E. McKenney
2024-07-30 4:03 ` Alexei Starovoitov
2024-07-30 5:14 ` Yonghong Song
2024-07-31 1:19 ` Alexei Starovoitov
2024-07-31 3:51 ` Yonghong Song
2024-07-31 20:44 ` Peilin Ye
2024-07-31 23:17 ` Yonghong Song
2024-08-01 0:11 ` Peilin Ye
2024-08-01 12:47 ` Jose E. Marchesi
2024-08-01 14:20 ` Jose E. Marchesi
2024-08-01 16:44 ` Yonghong Song
2024-08-05 16:13 ` Jose E. Marchesi
2024-08-01 22:00 ` Peilin Ye [this message]
2024-08-06 19:22 ` Peilin Ye
2024-08-08 16:33 ` Alexei Starovoitov
2024-08-08 20:59 ` Peilin Ye
2024-09-16 21:14 ` Peilin Ye
2024-09-17 0:08 ` Peilin Ye
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZqwFZFbWxSNEUHfp@google.com \
--to=yepeilin@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=bsegall@google.com \
--cc=davemarchevsky@meta.com \
--cc=dvernet@meta.com \
--cc=jemarch@gnu.org \
--cc=joshdon@google.com \
--cc=neelnatu@google.com \
--cc=paulmck@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox