From: Ingo Molnar <mingo@kernel.org>
To: Brian Gerst <brgerst@gmail.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
"H . Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>, Ard Biesheuvel <ardb@kernel.org>,
Uros Bizjak <ubizjak@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
Date: Sun, 23 Feb 2025 10:36:57 +0100 [thread overview]
Message-ID: <Z7rsOVaxhfCJdn2P@gmail.com> (raw)
In-Reply-To: <20250222190623.262689-1-brgerst@gmail.com>
* Brian Gerst <brgerst@gmail.com> wrote:
> Add a new percpu subsection for data that is frequently accessed and
> exclusive to each processor. This is intended to replace the pcpu_hot
> struct on X86, and is available to all architectures.
>
> The one caveat with this approach is that it depends on the linker to
> effeciently pack data that is smaller than machine word size. The
> binutils linker does this properly:
>
> ffffffff842f6000 D __per_cpu_hot_start
> ffffffff842f6000 D softirq_pending
> ffffffff842f6002 D hardirq_stack_inuse
> ffffffff842f6008 D hardirq_stack_ptr
> ffffffff842f6010 D __ref_stack_chk_guard
> ffffffff842f6010 D __stack_chk_guard
> ffffffff842f6018 D const_cpu_current_top_of_stack
> ffffffff842f6018 D cpu_current_top_of_stack
> ffffffff842f6020 D const_current_task
> ffffffff842f6020 D current_task
> ffffffff842f6028 D __preempt_count
> ffffffff842f602c D cpu_number
> ffffffff842f6030 D this_cpu_off
> ffffffff842f6038 D __x86_call_depth
> ffffffff842f6040 D __per_cpu_hot_end
>
> The LLVM linker doesn't do as well with packing smaller data objects,
> causing it to spill over into a second cacheline.
Ok, so I like it how it decentralizes the decision about what is 'hot'
and what is not:
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);
+DEFINE_PER_CPU_HOT(u16, softirq_pending);
This can also be a drawback if it's abused by random driver code - so I
think it should at minimum be documented to be used by core & arch
code. Maybe add a build #error too if it's defined in modular code?
Other variants like DEFINE_PER_CPU_SHARED_ALIGNED aren't being abused
really AFAICS, so maybe this isn't too much of a concern.
One potential drawback would be that previously the section was
hand-ordered:
struct pcpu_hot {
union {
struct {
struct task_struct *current_task;
int preempt_count;
int cpu_number;
#ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
u64 call_depth;
#endif
unsigned long top_of_stack;
void *hardirq_stack_ptr;
u16 softirq_pending;
#ifdef CONFIG_X86_64
bool hardirq_stack_inuse;
#else
void *softirq_stack_ptr;
#endif
};
u8 pad[64];
};
};
... while now it's linker-ordered. But on the other hand that can be an
advantage too: the linker will try to (or at least has a chance to)
order the fields optimally for cache density, while the hand-packing
always has the potential to bitrot without much of an outside,
actionable indicator for the bitrot.
One naming suggestion, wouldn't it be better to make it explicit that
the 'hot' qualifier is about cache locality:
+DEFINE_PER_CPU_CACHE_HOT(u16, softirq_pending);
Makes it more of a mouthful to write definitions/declarations, but the
actual per-cpu usage sites are unaffected as this too is otherwise part
of the generic percpu namespace.
... and yes, DEFINE_PER_CPU_ALIGNED should probably have been named
DEFINE_PER_CPU_CACHE_ALIGNED too. (Because 'aligned' often means
machine word unit, so the naming is a bit ambiguous.)
I.e. in an ideal world the complete set of DEFINE_PER_CPU_XXX
attributes should be something like:
DEFINE_PER_CPU_CACHE_HOT
DEFINE_PER_CPU_CACHE_ALIGNED # was: DEFINE_PER_CPU_ALIGNED
DEFINE_PER_CPU_CACHE_ALIGNED_SHARED # was: DEFINE_PER_CPU_SHARED_ALIGNED
DEFINE_PER_CPU_PAGE_ALIGNED
DEFINE_PER_CPU_READ_MOSTLY
DEFINE_PER_CPU_DECRYPTED
But I digress...
Anyway, I've Cc:-ed various potentially interested parties, please
speak up now or forever hold your peace. ;-)
Thanks,
Ingo
next prev parent reply other threads:[~2025-02-23 9:37 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 01/11] percpu: Introduce percpu hot section Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 02/11] x86/preempt: Move preempt count to " Brian Gerst
2025-02-23 10:05 ` kernel test robot
2025-02-23 10:49 ` kernel test robot
2025-02-23 11:31 ` kernel test robot
2025-02-22 19:06 ` [RFC PATCH 03/11] x86/smp: Move cpu number " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 04/11] x86/retbleed: Move call depth " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 05/11] x86/percpu: Move top_of_stack " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 06/11] x86/percpu: Move current_task " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 07/11] x86/softirq: Move softirq_pending " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 08/11] x86/irq: Move irq stacks " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 09/11] x86/percpu: Remove pcpu_hot Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 10/11] x86/stackprotector: Move __stack_chk_guard to percpu hot section Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 11/11] x86/smp: Move this_cpu_off " Brian Gerst
2025-02-23 9:36 ` Ingo Molnar [this message]
2025-02-23 10:20 ` [RFC PATCH 00/11] Add a percpu subsection for hot data Ard Biesheuvel
2025-02-23 10:30 ` Uros Bizjak
2025-02-23 17:25 ` Brian Gerst
2025-02-23 17:30 ` Ard Biesheuvel
2025-02-23 14:44 ` Brian Gerst
2025-02-23 18:00 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7rsOVaxhfCJdn2P@gmail.com \
--to=mingo@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=ubizjak@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.