From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: will@kernel.org, boqun.feng@gmail.com,
linux-kernel@vger.kernel.org, x86@kernel.org, elver@google.com,
keescook@chromium.org, hch@infradead.org,
torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl()
Date: Thu, 9 Dec 2021 17:00:18 +0000 [thread overview]
Message-ID: <YbI2IjU7qoSTlCH6@FVFF77S0Q05N> (raw)
In-Reply-To: <YbIB7aJU5ngCcaNj@FVFF77S0Q05N>
On Thu, Dec 09, 2021 at 01:17:33PM +0000, Mark Rutland wrote:
> On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote:
> > Use the shiny new atomic_*_ofl() functions in order to have better
> > code-gen.
> >
> > Notably refcount_inc() case no longer distinguishes between
> > inc-from-zero and inc-negative in the fast path, this improves
> > code-gen:
> >
> > 4b88: b8 01 00 00 00 mov $0x1,%eax
> > 4b8d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx)
> > 4b92: 85 c0 test %eax,%eax
> > 4b94: 74 1b je 4bb1 <alloc_perf_context+0xf1>
> > 4b96: 8d 50 01 lea 0x1(%rax),%edx
> > 4b99: 09 c2 or %eax,%edx
> > 4b9b: 78 20 js 4bbd <alloc_perf_context+0xfd>
> >
> > to:
> >
> > 4768: b8 01 00 00 00 mov $0x1,%eax
> > 476d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx)
> > 4772: 85 c0 test %eax,%eax
> > 4774: 7e 14 jle 478a <alloc_perf_context+0xea>
>
> For comparison, I generated the same for arm64 below with kernel.org crosstool
> GCC 11.1.0 and defconfig.
>
> For arm64 there's an existing sub-optimiality for inc/dec where the register
> for `1` or `-1` gets generated with a `MOV;MOV` chain or `MOV;NEG` chain rather
> than a single `MOV` in either case. I think taht's due to the way we build
> LSE/LL-SC variants of add() and build a common inc() atop, and the compiler
> just loses the opportunity to constant-fold. I'll take a look at how to make
> that neater.
With some improvement's to arm64's LSE atomics, that becomes a comparable
sequence to x86's:
2df8: 52800021 mov w1, #0x1 // #1
...
2e20: b8e10002 ldaddal w1, w2, [x0]
2e24: 7100005f cmp w2, #0x0
2e28: 5400012d b.le 2e4c <alloc_perf_context+0xc8>
> > without sacrificing on functionality; the only thing that suffers is
> > the reported error condition, which might now 'spuriously' report
> > 'saturated' instead of 'inc-from-zero'.
> >
> > refcount_dec_and_test() is also improved:
> >
> > aa40: b8 ff ff ff ff mov $0xffffffff,%eax
> > aa45: f0 0f c1 07 lock xadd %eax,(%rdi)
> > aa49: 83 f8 01 cmp $0x1,%eax
> > aa4c: 74 05 je aa53 <ring_buffer_put+0x13>
> > aa4e: 85 c0 test %eax,%eax
> > aa50: 7e 1e jle aa70 <ring_buffer_put+0x30>
> > aa52: c3 ret
> >
> > to:
> >
> > a980: b8 ff ff ff ff mov $0xffffffff,%eax
> > a985: f0 0f c1 07 lock xadd %eax,(%rdi)
> > a989: 83 e8 01 sub $0x1,%eax
> > a98c: 78 20 js a9ae <ring_buffer_put+0x2e>
> > a98e: 74 01 je a991 <ring_buffer_put+0x11>
> > a990: c3 ret
Likewise I can get the arm64 equivalent down to:
bebc: 12800001 mov w1, #0xffffffff // #-1
...
becc: b8e10001 ldaddal w1, w1, [x0]
bed0: 71000421 subs w1, w1, #0x1
bed4: 540000c4 b.mi beec <ring_buffer_put+0x3c> // b.first
bed8: 54000120 b.eq befc <ring_buffer_put+0x4c> // b.none
I've pushed my WIP patches to:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/atomics/improvements
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/atomics/improvements
... and I'll try to get those cleaned up and posted soon.
Thanks,
Mark.
next prev parent reply other threads:[~2021-12-09 17:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra
2021-12-09 12:42 ` Mark Rutland
2021-12-09 13:34 ` Peter Zijlstra
2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra
2021-12-08 19:19 ` Peter Zijlstra
2021-12-08 20:56 ` Peter Zijlstra
2021-12-09 13:17 ` Mark Rutland
2021-12-09 17:00 ` Mark Rutland [this message]
2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra
2021-12-09 8:33 ` Peter Zijlstra
2021-12-09 17:51 ` Linus Torvalds
2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra
2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra
2021-12-09 8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
2021-12-09 16:19 ` Jens Axboe
2021-12-09 16:51 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YbI2IjU7qoSTlCH6@FVFF77S0Q05N \
--to=mark.rutland@arm.com \
--cc=boqun.feng@gmail.com \
--cc=elver@google.com \
--cc=hch@infradead.org \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox