From: Yury Norov <yury.norov@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>, Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Tim Chen <tim.c.chen@intel.com>,
Nitin Tekchandani <nitin.tekchandani@intel.com>,
Yu Chen <yu.c.chen@intel.com>, Waiman Long <longman@redhat.com>,
linux-kernel@vger.kernel.org, andriy.shevchenko@linux.intel.com,
linux@rasmusvillemoes.dk, rppt@kernel.org
Subject: Re: [RFC PATCH 2/4] sched/fair: Make tg->load_avg per node
Date: Wed, 19 Jul 2023 08:59:57 -0700 [thread overview]
Message-ID: <ZLgIIo/Q0UzA4ROr@yury-ThinkPad> (raw)
In-Reply-To: <20230719115358.GB3529734@hirez.programming.kicks-ass.net>
On Wed, Jul 19, 2023 at 01:53:58PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 18, 2023 at 09:41:18PM +0800, Aaron Lu wrote:
> > +#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP)
> > +static inline long tg_load_avg(struct task_group *tg)
> > +{
> > + long load_avg = 0;
> > + int i;
> > +
> > + /*
> > + * The only path that can give us a root_task_group
> > + * here is from print_cfs_rq() thus unlikely.
> > + */
> > + if (unlikely(tg == &root_task_group))
> > + return 0;
> > +
> > + for_each_node(i)
> > + load_avg += atomic_long_read(&tg->node_info[i]->load_avg);
> > +
> > + return load_avg;
> > +}
> > +#endif
>
> So I was working on something else numa and noticed that for_each_node()
> (and most of the nodemask stuff) is quite moronic, afaict we should do
> something like the below.
>
> I now see Mike added the nr_node_ids thing fairly recent, but given
> distros have NODES_SHIFT=10 and actual machines typically only have <=4
> nodes, this would save a factor of 256 scanning.
>
> Specifically, your for_each_node() would scan the full 1024 bit bitmap
> looking for more bits that would never be there.
>
> ---
>
> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
> index 8d07116caaf1..c23c0889b8cf 100644
> --- a/include/linux/nodemask.h
> +++ b/include/linux/nodemask.h
> @@ -109,7 +109,7 @@ extern nodemask_t _unused_nodemask_arg_;
> __nodemask_pr_bits(maskp)
> static inline unsigned int __nodemask_pr_numnodes(const nodemask_t *m)
> {
> - return m ? MAX_NUMNODES : 0;
> + return m ? nr_node_ids : 0;
> }
> static inline const unsigned long *__nodemask_pr_bits(const nodemask_t *m)
> {
> @@ -137,13 +137,13 @@ static inline void __node_clear(int node, volatile nodemask_t *dstp)
> clear_bit(node, dstp->bits);
> }
>
> -#define nodes_setall(dst) __nodes_setall(&(dst), MAX_NUMNODES)
> +#define nodes_setall(dst) __nodes_setall(&(dst), nr_node_ids)
> static inline void __nodes_setall(nodemask_t *dstp, unsigned int nbits)
> {
> bitmap_fill(dstp->bits, nbits);
> }
>
> -#define nodes_clear(dst) __nodes_clear(&(dst), MAX_NUMNODES)
> +#define nodes_clear(dst) __nodes_clear(&(dst), nr_node_ids)
> static inline void __nodes_clear(nodemask_t *dstp, unsigned int nbits)
> {
> bitmap_zero(dstp->bits, nbits);
> @@ -160,7 +160,7 @@ static inline bool __node_test_and_set(int node, nodemask_t *addr)
> }
>
> #define nodes_and(dst, src1, src2) \
> - __nodes_and(&(dst), &(src1), &(src2), MAX_NUMNODES)
> + __nodes_and(&(dst), &(src1), &(src2), nr_node_ids)
> static inline void __nodes_and(nodemask_t *dstp, const nodemask_t *src1p,
> const nodemask_t *src2p, unsigned int nbits)
> {
This would break small_const_nbits() optimization for those configuring
their kernels properly. This is very similar to cpumasks and nr_cpu_ids
problem.
See 596ff4a09b8 ("cpumask: re-introduce constant-sized cpumask optimizations")
Thanks,
Yury
next prev parent reply other threads:[~2023-07-19 16:00 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 13:41 [RFC PATCH 0/4] Reduce cost of accessing tg->load_avg Aaron Lu
2023-07-18 13:41 ` [PATCH 1/4] sched/fair: free allocated memory on error in alloc_fair_sched_group() Aaron Lu
2023-07-18 15:13 ` Chen Yu
2023-07-19 2:13 ` Aaron Lu
2023-08-02 7:01 ` Aaron Lu
2023-08-02 8:17 ` Chen Yu
2023-07-18 13:41 ` [RFC PATCH 2/4] sched/fair: Make tg->load_avg per node Aaron Lu
2023-07-19 11:53 ` Peter Zijlstra
2023-07-19 13:45 ` Aaron Lu
2023-07-19 13:53 ` Peter Zijlstra
2023-07-19 14:22 ` Aaron Lu
2023-08-02 11:28 ` Peter Zijlstra
2023-08-11 9:48 ` Aaron Lu
2023-07-19 15:59 ` Yury Norov [this message]
2023-07-18 13:41 ` [RFC PATCH 3/4] sched/fair: delay update_tg_load_avg() for cfs_rq's removed load Aaron Lu
2023-07-18 16:01 ` Vincent Guittot
2023-07-19 5:18 ` Aaron Lu
2023-07-19 8:01 ` Aaron Lu
2023-07-19 9:47 ` Vincent Guittot
2023-07-19 13:29 ` Aaron Lu
2023-07-20 13:10 ` Vincent Guittot
2023-07-20 14:42 ` Aaron Lu
2023-07-20 15:02 ` Vincent Guittot
2023-07-20 15:22 ` Dietmar Eggemann
2023-07-20 15:24 ` Vincent Guittot
2023-07-21 6:42 ` Aaron Lu
2023-07-21 1:57 ` Aaron Lu
2023-08-11 9:28 ` Aaron Lu
2023-07-20 15:04 ` Vincent Guittot
2023-07-19 8:11 ` Aaron Lu
2023-07-19 9:12 ` Vincent Guittot
2023-07-19 9:09 ` Vincent Guittot
2023-07-18 13:41 ` [RFC PATCH 4/4] sched/fair: skip some update_cfs_group() on en/dequeue_entity() Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZLgIIo/Q0UzA4ROr@yury-ThinkPad \
--to=yury.norov@gmail.com \
--cc=aaron.lu@intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=daniel.m.jordan@oracle.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=nitin.tekchandani@intel.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=tim.c.chen@intel.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox