Re: [RFC PATCH -V2] NUMA balancing: fix NUMA topology for systems with CPU-less nodes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: kernel test robot <lkp@intel.com>
To: kbuild-all@lists.01.org
Subject: Re: [RFC PATCH -V2] NUMA balancing: fix NUMA topology for systems with CPU-less nodes
Date: Wed, 09 Feb 2022 00:51:33 +0800	[thread overview]
Message-ID: <202202090047.ctMW1o2G-lkp@intel.com> (raw)
In-Reply-To: <20220208122322.604285-1-ying.huang@intel.com>

[-- Attachment #1: Type: text/plain, Size: 10421 bytes --]

Hi Huang,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on tip/sched/core]
[also build test ERROR on linux/master linus/master v5.17-rc3 next-20220208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Huang-Ying/NUMA-balancing-fix-NUMA-topology-for-systems-with-CPU-less-nodes/20220208-212402
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git c8eaf6ac76f40f6c59fc7d056e2e08c4a57ea9c7
config: arc-randconfig-r035-20220208 (https://download.01.org/0day-ci/archive/20220209/202202090047.ctMW1o2G-lkp(a)intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ed82092e509333870d756fc8e53d816885922fc4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Huang-Ying/NUMA-balancing-fix-NUMA-topology-for-systems-with-CPU-less-nodes/20220208-212402
        git checkout ed82092e509333870d756fc8e53d816885922fc4
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/core.c:3454:6: warning: no previous prototype for 'sched_set_stop_task' [-Wmissing-prototypes]
    3454 | void sched_set_stop_task(int cpu, struct task_struct *stop)
         |      ^~~~~~~~~~~~~~~~~~~
   kernel/sched/core.c: In function 'sched_cpu_activate':
>> kernel/sched/core.c:9055:17: error: implicit declaration of function 'sched_reinit_numa'; did you mean 'sched_init_numa'? [-Werror=implicit-function-declaration]
    9055 |                 sched_reinit_numa(true, cpu);
         |                 ^~~~~~~~~~~~~~~~~
         |                 sched_init_numa
   kernel/sched/core.c: In function 'sched_init_smp':
>> kernel/sched/core.c:9241:9: error: too many arguments to function 'sched_init_numa'
    9241 |         sched_init_numa(NUMA_NO_NODE);
         |         ^~~~~~~~~~~~~~~
   In file included from kernel/sched/core.c:13:
   kernel/sched/sched.h:1671:20: note: declared here
    1671 | static inline void sched_init_numa(void) { }
         |                    ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +9055 kernel/sched/core.c

  9033	
  9034	int sched_cpu_activate(unsigned int cpu)
  9035	{
  9036		struct rq *rq = cpu_rq(cpu);
  9037		struct rq_flags rf;
  9038	
  9039		/*
  9040		 * Clear the balance_push callback and prepare to schedule
  9041		 * regular tasks.
  9042		 */
  9043		balance_push_set(cpu, false);
  9044	
  9045	#ifdef CONFIG_SCHED_SMT
  9046		/*
  9047		 * When going up, increment the number of cores with SMT present.
  9048		 */
  9049		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
  9050			static_branch_inc_cpuslocked(&sched_smt_present);
  9051	#endif
  9052		set_cpu_active(cpu, true);
  9053	
  9054		if (sched_smp_initialized) {
> 9055			sched_reinit_numa(true, cpu);
  9056			sched_domains_numa_masks_set(cpu);
  9057			cpuset_cpu_active();
  9058		}
  9059	
  9060		/*
  9061		 * Put the rq online, if not already. This happens:
  9062		 *
  9063		 * 1) In the early boot process, because we build the real domains
  9064		 *    after all CPUs have been brought up.
  9065		 *
  9066		 * 2) At runtime, if cpuset_cpu_active() fails to rebuild the
  9067		 *    domains.
  9068		 */
  9069		rq_lock_irqsave(rq, &rf);
  9070		if (rq->rd) {
  9071			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
  9072			set_rq_online(rq);
  9073		}
  9074		rq_unlock_irqrestore(rq, &rf);
  9075	
  9076		return 0;
  9077	}
  9078	
  9079	int sched_cpu_deactivate(unsigned int cpu)
  9080	{
  9081		struct rq *rq = cpu_rq(cpu);
  9082		struct rq_flags rf;
  9083		int ret;
  9084	
  9085		/*
  9086		 * Remove CPU from nohz.idle_cpus_mask to prevent participating in
  9087		 * load balancing when not active
  9088		 */
  9089		nohz_balance_exit_idle(rq);
  9090	
  9091		set_cpu_active(cpu, false);
  9092	
  9093		/*
  9094		 * From this point forward, this CPU will refuse to run any task that
  9095		 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
  9096		 * push those tasks away until this gets cleared, see
  9097		 * sched_cpu_dying().
  9098		 */
  9099		balance_push_set(cpu, true);
  9100	
  9101		/*
  9102		 * We've cleared cpu_active_mask / set balance_push, wait for all
  9103		 * preempt-disabled and RCU users of this state to go away such that
  9104		 * all new such users will observe it.
  9105		 *
  9106		 * Specifically, we rely on ttwu to no longer target this CPU, see
  9107		 * ttwu_queue_cond() and is_cpu_allowed().
  9108		 *
  9109		 * Do sync before park smpboot threads to take care the rcu boost case.
  9110		 */
  9111		synchronize_rcu();
  9112	
  9113		rq_lock_irqsave(rq, &rf);
  9114		if (rq->rd) {
  9115			update_rq_clock(rq);
  9116			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
  9117			set_rq_offline(rq);
  9118		}
  9119		rq_unlock_irqrestore(rq, &rf);
  9120	
  9121	#ifdef CONFIG_SCHED_SMT
  9122		/*
  9123		 * When going down, decrement the number of cores with SMT present.
  9124		 */
  9125		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
  9126			static_branch_dec_cpuslocked(&sched_smt_present);
  9127	
  9128		sched_core_cpu_deactivate(cpu);
  9129	#endif
  9130	
  9131		if (!sched_smp_initialized)
  9132			return 0;
  9133	
  9134		sched_reinit_numa(false, cpu);
  9135		ret = cpuset_cpu_inactive(cpu);
  9136		if (ret) {
  9137			balance_push_set(cpu, false);
  9138			set_cpu_active(cpu, true);
  9139			return ret;
  9140		}
  9141		sched_domains_numa_masks_clear(cpu);
  9142		return 0;
  9143	}
  9144	
  9145	static void sched_rq_cpu_starting(unsigned int cpu)
  9146	{
  9147		struct rq *rq = cpu_rq(cpu);
  9148	
  9149		rq->calc_load_update = calc_load_update;
  9150		update_max_interval();
  9151	}
  9152	
  9153	int sched_cpu_starting(unsigned int cpu)
  9154	{
  9155		sched_core_cpu_starting(cpu);
  9156		sched_rq_cpu_starting(cpu);
  9157		sched_tick_start(cpu);
  9158		return 0;
  9159	}
  9160	
  9161	#ifdef CONFIG_HOTPLUG_CPU
  9162	
  9163	/*
  9164	 * Invoked immediately before the stopper thread is invoked to bring the
  9165	 * CPU down completely. At this point all per CPU kthreads except the
  9166	 * hotplug thread (current) and the stopper thread (inactive) have been
  9167	 * either parked or have been unbound from the outgoing CPU. Ensure that
  9168	 * any of those which might be on the way out are gone.
  9169	 *
  9170	 * If after this point a bound task is being woken on this CPU then the
  9171	 * responsible hotplug callback has failed to do it's job.
  9172	 * sched_cpu_dying() will catch it with the appropriate fireworks.
  9173	 */
  9174	int sched_cpu_wait_empty(unsigned int cpu)
  9175	{
  9176		balance_hotplug_wait();
  9177		return 0;
  9178	}
  9179	
  9180	/*
  9181	 * Since this CPU is going 'away' for a while, fold any nr_active delta we
  9182	 * might have. Called from the CPU stopper task after ensuring that the
  9183	 * stopper is the last running task on the CPU, so nr_active count is
  9184	 * stable. We need to take the teardown thread which is calling this into
  9185	 * account, so we hand in adjust = 1 to the load calculation.
  9186	 *
  9187	 * Also see the comment "Global load-average calculations".
  9188	 */
  9189	static void calc_load_migrate(struct rq *rq)
  9190	{
  9191		long delta = calc_load_fold_active(rq, 1);
  9192	
  9193		if (delta)
  9194			atomic_long_add(delta, &calc_load_tasks);
  9195	}
  9196	
  9197	static void dump_rq_tasks(struct rq *rq, const char *loglvl)
  9198	{
  9199		struct task_struct *g, *p;
  9200		int cpu = cpu_of(rq);
  9201	
  9202		lockdep_assert_rq_held(rq);
  9203	
  9204		printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, rq->nr_running);
  9205		for_each_process_thread(g, p) {
  9206			if (task_cpu(p) != cpu)
  9207				continue;
  9208	
  9209			if (!task_on_rq_queued(p))
  9210				continue;
  9211	
  9212			printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm);
  9213		}
  9214	}
  9215	
  9216	int sched_cpu_dying(unsigned int cpu)
  9217	{
  9218		struct rq *rq = cpu_rq(cpu);
  9219		struct rq_flags rf;
  9220	
  9221		/* Handle pending wakeups and then migrate everything off */
  9222		sched_tick_stop(cpu);
  9223	
  9224		rq_lock_irqsave(rq, &rf);
  9225		if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
  9226			WARN(true, "Dying CPU not properly vacated!");
  9227			dump_rq_tasks(rq, KERN_WARNING);
  9228		}
  9229		rq_unlock_irqrestore(rq, &rf);
  9230	
  9231		calc_load_migrate(rq);
  9232		update_max_interval();
  9233		hrtick_clear(rq);
  9234		sched_core_cpu_dying(cpu);
  9235		return 0;
  9236	}
  9237	#endif
  9238	
  9239	void __init sched_init_smp(void)
  9240	{
> 9241		sched_init_numa(NUMA_NO_NODE);
  9242	
  9243		/*
  9244		 * There's no userspace yet to cause hotplug operations; hence all the
  9245		 * CPU masks are stable and all blatant races in the below code cannot
  9246		 * happen.
  9247		 */
  9248		mutex_lock(&sched_domains_mutex);
  9249		sched_init_domains(cpu_active_mask);
  9250		mutex_unlock(&sched_domains_mutex);
  9251	
  9252		/* Move init over to a non-isolated CPU */
  9253		if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) < 0)
  9254			BUG();
  9255		current->flags &= ~PF_NO_SETAFFINITY;
  9256		sched_init_granularity();
  9257	
  9258		init_sched_rt_class();
  9259		init_sched_dl_class();
  9260	
  9261		sched_smp_initialized = true;
  9262	}
  9263	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

next prev parent reply	other threads:[~2022-02-08 16:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 12:23 [RFC PATCH -V2] NUMA balancing: fix NUMA topology for systems with CPU-less nodes Huang Ying
2022-02-08 16:51 ` kernel test robot [this message]
2022-02-08 17:23 ` kernel test robot
2022-02-08 17:23   ` kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2022-02-08 20:36 kernel test robot
2022-02-09 16:00 kernel test robot
2022-02-10  7:55 ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202202090047.ctMW1o2G-lkp@intel.com \
    --to=lkp@intel.com \
    --cc=kbuild-all@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.