From: Michael Ellerman <mpe@ellerman.id.au>
To: "Nysal Jan K.A." <nysal@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linuxppc-dev@lists.ozlabs.org,
"Nysal Jan K.A" <nysal@linux.ibm.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Bill Wendling <morbo@google.com>,
Justin Stitt <justinstitt@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Kent Overstreet <kent.overstreet@linux.dev>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
linux-kernel@vger.kernel.org, llvm@lists.linux.dev
Subject: Re: [PATCH] sched/membarrier: Fix redundant load of membarrier_state
Date: Fri, 25 Oct 2024 11:29:38 +1100 [thread overview]
Message-ID: <87frolja8d.fsf@mail.lhotse> (raw)
In-Reply-To: <20241007053936.833392-1-nysal@linux.ibm.com>
[To += Mathieu]
"Nysal Jan K.A." <nysal@linux.ibm.com> writes:
> From: "Nysal Jan K.A" <nysal@linux.ibm.com>
>
> On architectures where ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> is not selected, sync_core_before_usermode() is a no-op.
> In membarrier_mm_sync_core_before_usermode() the compiler does not
> eliminate redundant branches and the load of mm->membarrier_state
> for this case as the atomic_read() cannot be optimized away.
I was wondering if this was caused by powerpc's arch_atomic_read() which
uses asm volatile.
But replacing arch_atomic_read() with READ_ONCE() makes no difference,
presumably because the compiler still can't see that the READ_ONCE() is
unnecessary (which is kind of by design).
> Here's a snippet of the code generated for finish_task_switch() on powerpc:
>
> 1b786c: ld r26,2624(r30) # mm = rq->prev_mm;
> .......
> 1b78c8: cmpdi cr7,r26,0
> 1b78cc: beq cr7,1b78e4 <finish_task_switch+0xd0>
> 1b78d0: ld r9,2312(r13) # current
> 1b78d4: ld r9,1888(r9) # current->mm
> 1b78d8: cmpd cr7,r26,r9
> 1b78dc: beq cr7,1b7a70 <finish_task_switch+0x25c>
> 1b78e0: hwsync
> 1b78e4: cmplwi cr7,r27,128
> .......
> 1b7a70: lwz r9,176(r26) # atomic_read(&mm->membarrier_state)
> 1b7a74: b 1b78e0 <finish_task_switch+0xcc>
>
> This was found while analyzing "perf c2c" reports on kernels prior
> to commit c1753fd02a00 ("mm: move mm_count into its own cache line")
> where mm_count was false sharing with membarrier_state.
So it was causing a noticable performance blip? But isn't anymore?
> There is a minor improvement in the size of finish_task_switch().
> The following are results from bloat-o-meter:
>
> GCC 7.5.0:
> ----------
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32)
> Function old new delta
> finish_task_switch 884 852 -32
>
> GCC 12.2.1:
> -----------
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32)
> Function old new delta
> finish_task_switch.isra 852 820 -32
GCC 12 is a couple of years old, I assume GCC 14 behaves similarly?
> LLVM 17.0.6:
> ------------
> add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-36 (-36)
> Function old new delta
> rt_mutex_schedule 120 104 -16
> finish_task_switch 792 772 -20
>
> Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
> ---
> include/linux/sched/mm.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 07bb8d4181d7..042e60ab853a 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -540,6 +540,8 @@ enum {
>
> static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
> {
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE))
> + return;
> if (current->mm != mm)
> return;
> if (likely(!(atomic_read(&mm->membarrier_state) &
The other option would be to have a completely separate stub, eg:
#ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
{
if (current->mm != mm)
return;
if (likely(!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
return;
sync_core_before_usermode();
}
#else
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { }
#endif
Not sure what folks prefer.
In either case I think it's probably worth a short comment explaining
why it's worth the trouble (ie. that the atomic_read() prevents the
compiler from doing DCE).
cheers
next prev parent reply other threads:[~2024-10-25 0:29 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-07 5:39 [PATCH] sched/membarrier: Fix redundant load of membarrier_state Nysal Jan K.A.
2024-10-25 0:29 ` Michael Ellerman [this message]
2024-10-25 2:40 ` Stephen Rothwell
2024-10-25 3:22 ` Segher Boessenkool
2024-10-25 12:40 ` Mathieu Desnoyers
2024-10-25 18:30 ` Nysal Jan K.A.
2024-10-29 5:51 ` [PATCH v2] " Nysal Jan K.A.
2024-10-29 17:51 ` Mathieu Desnoyers
2024-10-29 23:29 ` Michael Ellerman
2024-10-30 13:33 ` Segher Boessenkool
2024-11-18 9:04 ` Michal Hocko
2024-11-18 9:25 ` Peter Zijlstra
2024-11-18 9:51 ` Michal Hocko
2025-01-09 8:46 ` Michal Hocko
2025-03-03 6:04 ` [PATCH v2 RESEND] " Nysal Jan K.A.
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87frolja8d.fsf@mail.lhotse \
--to=mpe@ellerman.id.au \
--cc=akpm@linux-foundation.org \
--cc=justinstitt@google.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=llvm@lists.linux.dev \
--cc=mathieu.desnoyers@efficios.com \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=nysal@linux.ibm.com \
--cc=rick.p.edgecombe@intel.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.