All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>,
	broonie@kernel.org, maz@kernel.org, oliver.upton@linux.dev,
	joey.gouly@arm.com, james.morse@arm.com, ardb@kernel.org,
	scott@os.amperecomputing.com, suzuki.poulose@arm.com,
	yuzenghui@huawei.com, mark.rutland@arm.com,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND v7 4/6] arm64: futex: refactor futex atomic operation
Date: Tue, 16 Sep 2025 10:24:42 +0100	[thread overview]
Message-ID: <aMks2haYCZia8LR/@e129823.arm.com> (raw)
In-Reply-To: <aMkqt1LMUd5QPc3S@e129823.arm.com>

Sorry, Ignore this. I've sent wrong this :(
I'll send it again.

> Hi,
>
> > On Mon, Sep 15, 2025 at 09:35:55PM +0100, Will Deacon wrote:
> > > On Mon, Sep 15, 2025 at 08:40:33PM +0100, Catalin Marinas wrote:
> > > > On Mon, Sep 15, 2025 at 11:32:39AM +0100, Yeoreum Yun wrote:
> > > > > So I think it would be better to keep the current LLSC implementation
> > > > > in LSUI.
> > > >
> > > > I think the code would look simpler with LL/SC but you can give it a try
> > > > and post the code sample here (not in a new series).
> > >
> > > If you stick the cas*t instruction in its own helper say, cmpxchg_user(),
> > > then you can do all the shifting/masking in C and I don't reckon it's
> > > that bad. It means we (a) get rid of exclusives, which is the whole
> > > point of this and (b) don't have to mess around with PAN.
> >
> > We get rid of PAN toggling already since FEAT_LSUI introduces
> > LDTXR/STTXR. But, I'm all for CAS if it doesn't look too bad. Easier
> > I think if we do a get_user() of a u64 and combine it with the futex u32
> > while taking care of CPU endianness. All in a loop. Hopefully the
> > compiler is smart enough to reduce masking/or'ing to fewer instructions.
> >
>
> Hmm, I think sure shifting/masking can be replace by single bfi
> instruction like:
>
> diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
> index 1d6d9f856ac5..30da0006c0c8 100644
> --- a/arch/arm64/include/asm/futex.h
> +++ b/arch/arm64/include/asm/futex.h
> @@ -126,6 +126,59 @@ LSUI_FUTEX_ATOMIC_OP(or, ldtset, al)
>  LSUI_FUTEX_ATOMIC_OP(andnot, ldtclr, al)
>  LSUI_FUTEX_ATOMIC_OP(set, swpt, al)
>
> +
> +#define LSUI_CMPXCHG_HELPER(suffix, start_bit)                                 \
> +static __always_inline int                                                     \
> +__lsui_cmpxchg_helper_##suffix(u64 __user *uaddr, u32 oldval, u32 newval)      \
> +{                                                                              \
> +       int ret = 0;                                                            \
> +       u64 oval, nval, tmp;                                                    \
> +                                                                               \
> +       asm volatile("//__lsui_cmpxchg_helper_" #suffix "\n"                    \
> +       __LSUI_PREAMBLE                                                         \
> +"      prfm    pstl1strm, %2\n"                                                \
> +"1:    ldtr    %x1, %2\n"                                                      \
> +"      bfi     %x1, %x5, #" #start_bit ", #32\n"                               \
> +"      bfi     %x1, %x6, #" #start_bit ", #32\n"                               \
> +"      mov     %x4, %x5\n"                                                     \
> +"2:    caslt   %x5, %x6, %2\n"                                                 \
> +"      sub     %x4, %x4, %x5\n"                                                \
> +"      cbz     %x4, 3f\n"                                                      \
> +"      mov     %w0, %w7\n"                                                     \
> +"3:\n"                                                                         \
> +"      dmb     ish\n"                                                          \
> +"4:\n"                                                                         \
> +       _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)                                   \
> +       _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)                                   \
> +       : "+r" (ret), "=&r" (oval), "+Q" (*uaddr), "=&r" (nval), "=&r" (tmp)    \
> +       : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)                            \
> +       : "memory");                                                            \
> +                                                                               \
> +       return ret;                                                             \
> +}
> +
> +LSUI_CMPXCHG_HELPER(lo, 0)
> +LSUI_CMPXCHG_HELPER(hi, 32)
> +
> +static __always_inline int
> +__lsui_cmpxchg_helper(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
> +{
> +       int ret;
> +       unsigned long uaddr_al;
> +
> +       uaddr_al = ALIGN_DOWN((unsigned long)uaddr, sizeof(u64));
> +
> +       if (uaddr_al != (unsigned long)uaddr)
> +               ret = __lsui_cmpxchg_helper_hi((u64 __user *)uaddr_al, oldval, newval);
> +       else
> +               ret = __lsui_cmpxchg_helper_lo((u64 __user *)uaddr_al, oldval, newval);
> +
> +       if (!ret)
> +               *oval = oldval;
> +
> +       return ret;
> +}
> +
>  static __always_inline int
>  __lsui_futex_atomic_and(int oparg, u32 __user *uaddr, int *oval)
>  {
> @@ -135,71 +188,25 @@ __lsui_futex_atomic_and(int oparg, u32 __user *uaddr, int *oval)
>  static __always_inline int
>  __lsui_futex_atomic_eor(int oparg, u32 __user *uaddr, int *oval)
>  {
> -       unsigned int loops = FUTEX_MAX_LOOPS;
> -       int ret, oldval, tmp;
> +       int ret = -EAGAIN;
> +       u32 oldval, newval;
>
>         /*
>          * there are no ldteor/stteor instructions...
>          */
> -       asm volatile("// __lsui_futex_atomic_eor\n"
> -       __LSUI_PREAMBLE
> -"      prfm    pstl1strm, %2\n"
> -"1:    ldtxr   %w1, %2\n"
> -"      eor     %w3, %w1, %w5\n"
> -"2:    stltxr  %w0, %w3, %2\n"
> -"      cbz     %w0, 3f\n"
> -"      sub     %w4, %w4, %w0\n"
> -"      cbnz    %w4, 1b\n"
> -"      mov     %w0, %w6\n"
> -"3:\n"
> -"      dmb     ish\n"
> -       _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
> -       _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
> -       : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp),
> -         "+r" (loops)
> -       : "r" (oparg), "Ir" (-EAGAIN)
> -       : "memory");
> +       unsafe_get_user(oldval, uaddr, err_fault);
> +       newval = oldval ^ oparg;
>
> -       if (!ret)
> -               *oval = oldval;
> +       ret = __lsui_cmpxchg_helper(uaddr, oldval, newval, oval);
>
> +err_fault:
>         return ret;
>  }
>
>  static __always_inline int
>  __lsui_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
>  {
> -       int ret = 0;
> -       unsigned int loops = FUTEX_MAX_LOOPS;
> -       u32 val, tmp;
> -
> -       /*
> -        * cas{al}t doesn't support word size...
> -        */
> -       asm volatile("//__lsui_futex_cmpxchg\n"
> -       __LSUI_PREAMBLE
> -"      prfm    pstl1strm, %2\n"
> -"1:    ldtxr   %w1, %2\n"
> -"      eor     %w3, %w1, %w5\n"
> -"      cbnz    %w3, 4f\n"
> -"2:    stltxr  %w3, %w6, %2\n"
> -"      cbz     %w3, 3f\n"
> -"      sub     %w4, %w4, %w3\n"
> -"      cbnz    %w4, 1b\n"
> -"      mov     %w0, %w7\n"
> -"3:\n"
> -"      dmb     ish\n"
> -"4:\n"
> -       _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
> -       _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
> -       : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
> -       : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
> -       : "memory");
> -
> -       if (!ret)
> -               *oval = oldval;
> -
> -       return ret;
> +       return __lsui_cmpxchg_helper(uaddr, oldval, newval, oval);
>  }
>
>  #define __lsui_llsc_body(op, ...)                                      \
>
>
> This is based on the patch #6.
>
> Am I missing something?
>
> --
> Sincerely,
> Yeoreum Yun
>

--
Sincerely,
Yeoreum Yun

  reply	other threads:[~2025-09-16  9:25 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 1/6] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
2025-09-12 16:12   ` Catalin Marinas
2025-08-16 15:19 ` [PATCH RESEND v7 2/6] KVM: arm64: expose FEAT_LSUI to guest Yeoreum Yun
2025-09-12 16:25   ` Catalin Marinas
2025-08-16 15:19 ` [PATCH RESEND v7 3/6] arm64: Kconfig: add LSUI Kconfig Yeoreum Yun
2025-09-12 16:24   ` Catalin Marinas
2025-09-15 10:42     ` Yeoreum Yun
2025-09-15 11:32       ` Will Deacon
2025-09-15 11:41         ` Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 4/6] arm64: futex: refactor futex atomic operation Yeoreum Yun
2025-09-11 15:38   ` Will Deacon
2025-09-11 16:04     ` Yeoreum Yun
2025-09-12 16:44   ` Catalin Marinas
2025-09-12 17:01     ` Catalin Marinas
2025-09-15 10:39     ` Yeoreum Yun
2025-09-12 16:53   ` Catalin Marinas
2025-09-15 10:32     ` Yeoreum Yun
2025-09-15 19:40       ` Catalin Marinas
2025-09-15 20:35         ` Will Deacon
2025-09-16  7:02           ` Catalin Marinas
2025-09-16  9:15             ` Yeoreum Yun
2025-09-16  9:24               ` Yeoreum Yun [this message]
2025-09-16 10:02             ` Yeoreum Yun
2025-09-16 10:16               ` Will Deacon
2025-09-16 12:50                 ` Yeoreum Yun
2025-09-17  9:32                   ` Yeoreum Yun
2025-09-16 12:47               ` Mark Rutland
2025-09-16 13:27                 ` Yeoreum Yun
2025-09-16 13:45                   ` Mark Rutland
2025-09-16 13:58                     ` Yeoreum Yun
2025-09-16 14:07                       ` Mark Rutland
2025-09-16 14:15                         ` Yeoreum Yun
2025-09-15 22:34         ` Yeoreum Yun
2025-09-16 12:53           ` Catalin Marinas
2025-08-16 15:19 ` [PATCH v7 RESEND 5/6] arm64: futex: small optimisation for __llsc_futex_atomic_set() Yeoreum Yun
2025-09-11 15:28   ` Will Deacon
2025-09-11 16:19     ` Yeoreum Yun
2025-09-12 16:36       ` Catalin Marinas
2025-09-15 10:41         ` Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 6/6] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
2025-09-11 15:22   ` Will Deacon
2025-09-11 16:45     ` Yeoreum Yun
2025-09-12 17:16     ` Catalin Marinas
2025-09-15  9:15       ` Yeoreum Yun
2025-09-12 17:09   ` Catalin Marinas
2025-09-15  8:24     ` Yeoreum Yun
2025-09-01 10:06 ` [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-09-11 15:09 ` Will Deacon
2025-09-11 16:22   ` Catalin Marinas
2025-09-15 20:37     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMks2haYCZia8LR/@e129823.arm.com \
    --to=yeoreum.yun@arm.com \
    --cc=ardb@kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=scott@os.amperecomputing.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.