All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: will@kernel.org, boqun.feng@gmail.com,
	linux-kernel@vger.kernel.org, x86@kernel.org, elver@google.com,
	keescook@chromium.org, hch@infradead.org,
	torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 2/5] refcount: Use atomic_*_ofl()
Date: Thu, 9 Dec 2021 13:17:33 +0000	[thread overview]
Message-ID: <YbIB7aJU5ngCcaNj@FVFF77S0Q05N> (raw)
In-Reply-To: <20211208183906.468934357@infradead.org>

On Wed, Dec 08, 2021 at 07:36:57PM +0100, Peter Zijlstra wrote:
> Use the shiny new atomic_*_ofl() functions in order to have better
> code-gen.
> 
> Notably refcount_inc() case no longer distinguishes between
> inc-from-zero and inc-negative in the fast path, this improves
> code-gen:
> 
>     4b88:       b8 01 00 00 00          mov    $0x1,%eax
>     4b8d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
>     4b92:       85 c0                   test   %eax,%eax
>     4b94:       74 1b                   je     4bb1 <alloc_perf_context+0xf1>
>     4b96:       8d 50 01                lea    0x1(%rax),%edx
>     4b99:       09 c2                   or     %eax,%edx
>     4b9b:       78 20                   js     4bbd <alloc_perf_context+0xfd>
> 
> to:
> 
>     4768:       b8 01 00 00 00          mov    $0x1,%eax
>     476d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
>     4772:       85 c0                   test   %eax,%eax
>     4774:       7e 14                   jle    478a <alloc_perf_context+0xea>

For comparison, I generated the same for arm64 below with kernel.org crosstool
GCC 11.1.0 and defconfig.

For arm64 there's an existing sub-optimiality for inc/dec where the register
for `1` or `-1` gets generated with a `MOV;MOV` chain or `MOV;NEG` chain rather
than a single `MOV` in either case. I think taht's due to the way we build
LSE/LL-SC variants of add() and build a common inc() atop, and the compiler
just loses the opportunity to constant-fold. I'll take a look at how to make
that neater.

Regardless, the code goes from:

    51f4:       52800024        mov     w4, #0x1                        // #1
    ...
    5224:       2a0403e1        mov     w1, w4
    5228:       b8210001        ldadd   w1, w1, [x0]
    522c:       34000261        cbz     w1, 5278 <alloc_perf_context+0xf8>
    5230:       11000422        add     w2, w1, #0x1
    5234:       2a010041        orr     w1, w2, w1
    5238:       37f80181        tbnz    w1, #31, 5268 <alloc_perf_context+0xe8>

to:

    40e8:       52800024        mov     w4, #0x1                        // #1
    ...
    4118:       2a0403e1        mov     w1, w4
    411c:       b8e10001        ldaddal w1, w1, [x0]
    4120:       7100003f        cmp     w1, #0x0
    4124:       5400018d        b.le    4154 <alloc_perf_context+0xe0>

> without sacrificing on functionality; the only thing that suffers is
> the reported error condition, which might now 'spuriously' report
> 'saturated' instead of 'inc-from-zero'.
> 
> refcount_dec_and_test() is also improved:
> 
>     aa40:       b8 ff ff ff ff          mov    $0xffffffff,%eax
>     aa45:       f0 0f c1 07             lock xadd %eax,(%rdi)
>     aa49:       83 f8 01                cmp    $0x1,%eax
>     aa4c:       74 05                   je     aa53 <ring_buffer_put+0x13>
>     aa4e:       85 c0                   test   %eax,%eax
>     aa50:       7e 1e                   jle    aa70 <ring_buffer_put+0x30>
>     aa52:       c3                      ret
> 
> to:
> 
>     a980:       b8 ff ff ff ff          mov    $0xffffffff,%eax
>     a985:       f0 0f c1 07             lock xadd %eax,(%rdi)
>     a989:       83 e8 01                sub    $0x1,%eax
>     a98c:       78 20                   js     a9ae <ring_buffer_put+0x2e>
>     a98e:       74 01                   je     a991 <ring_buffer_put+0x11>
>     a990:       c3                      ret

For arm64 (with your fixlet applied) that goes from:

    c42c:       52800021        mov     w1, #0x1                        // #1
    c430:       4b0103e1        neg     w1, w1
    c434:       b8610001        ldaddl  w1, w1, [x0]
    c438:       7100043f        cmp     w1, #0x1
    c43c:       54000140        b.eq    c464 <ring_buffer_put+0x50>  // b.none
    c440:       7100003f        cmp     w1, #0x0
    c444:       5400028d        b.le    c494 <ring_buffer_put+0x80>

to:

    c3dc:       52800021        mov     w1, #0x1                        // #1
    c3e0:       4b0103e1        neg     w1, w1
    c3e4:       b8e10002        ldaddal w1, w2, [x0]
    c3e8:       0b020021        add     w1, w1, w2
    c3ec:       7100003f        cmp     w1, #0x0
    c3f0:       5400012b        b.lt    c414 <ring_buffer_put+0x50>  // b.tstop
    c3f4:       540001a0        b.eq    c428 <ring_buffer_put+0x64>  // b.none

I think the add here is due to  the change in your fixlet:

-       if (unlikely(old <= 1))                                                                                                                                                                             
+       if (unlikely(old - 1 <= 0)) 

> XXX atomic_dec_and_test_ofl() is strictly stronger ordered than
> refcount_dec_and_test() is defined -- Power and Arrghh64 ?

I'll leave the ordering to Will.

Thanks,
Mark.

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/refcount.h |   15 ++++++++++++---
>  lib/refcount.c           |    5 +++++
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -264,7 +264,10 @@ static inline void __refcount_inc(refcou
>   */
>  static inline void refcount_inc(refcount_t *r)
>  {
> -	__refcount_inc(r, NULL);
> +	atomic_inc_ofl(&r->refs, Eoverflow);
> +	return;
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_ADD_OVF);
>  }
>  
>  static inline __must_check bool __refcount_sub_and_test(int i, refcount_t *r, int *oldp)
> @@ -330,7 +333,10 @@ static inline __must_check bool __refcou
>   */
>  static inline __must_check bool refcount_dec_and_test(refcount_t *r)
>  {
> -	return __refcount_dec_and_test(r, NULL);
> +	return atomic_dec_and_test_ofl(&r->refs, Eoverflow);
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
> +	return false;
>  }
>  
>  static inline void __refcount_dec(refcount_t *r, int *oldp)
> @@ -356,7 +362,10 @@ static inline void __refcount_dec(refcou
>   */
>  static inline void refcount_dec(refcount_t *r)
>  {
> -	__refcount_dec(r, NULL);
> +	atomic_dec_ofl(&r->refs, Eoverflow);
> +	return;
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
>  }
>  
>  extern __must_check bool refcount_dec_if_one(refcount_t *r);
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -12,8 +12,13 @@
>  
>  void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
>  {
> +	int old = refcount_read(r);
>  	refcount_set(r, REFCOUNT_SATURATED);
>  
> +	/* racy; who cares */
> +	if (old == 1 && t == REFCOUNT_ADD_OVF)
> +		t = REFCOUNT_ADD_UAF;
> +
>  	switch (t) {
>  	case REFCOUNT_ADD_NOT_ZERO_OVF:
>  		REFCOUNT_WARN("saturated; leaking memory");
> 
> 

  parent reply	other threads:[~2021-12-09 13:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 18:36 [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
2021-12-08 18:36 ` [RFC][PATCH 1/5] atomic: Introduce atomic_{inc,dec,dec_and_test}_ofl() Peter Zijlstra
2021-12-09 12:42   ` Mark Rutland
2021-12-09 13:34     ` Peter Zijlstra
2021-12-08 18:36 ` [RFC][PATCH 2/5] refcount: Use atomic_*_ofl() Peter Zijlstra
2021-12-08 19:19   ` Peter Zijlstra
2021-12-08 20:56     ` Peter Zijlstra
2021-12-09 13:17   ` Mark Rutland [this message]
2021-12-09 17:00     ` Mark Rutland
2021-12-08 18:36 ` [RFC][PATCH 3/5] refcount: Improve out-of-line code-gen Peter Zijlstra
2021-12-09  8:33   ` Peter Zijlstra
2021-12-09 17:51     ` Linus Torvalds
2021-12-08 18:36 ` [RFC][PATCH 4/5] atomic,x86: Implement atomic_dec_and_test_ofl() Peter Zijlstra
2021-12-08 18:37 ` [RFC][PATCH 5/5] atomic: Document the atomic_{}_ofl() functions Peter Zijlstra
2021-12-09  8:25 ` [RFC][PATCH 0/5] refcount: Improve code-gen Peter Zijlstra
2021-12-09 16:19   ` Jens Axboe
2021-12-09 16:51     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbIB7aJU5ngCcaNj@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=boqun.feng@gmail.com \
    --cc=elver@google.com \
    --cc=hch@infradead.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.