From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF5772F26 for ; Tue, 8 Feb 2022 17:23:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1644341036; x=1675877036; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=Bry3aWD9TRbSoveX9496rnvNRFvlltHLTNzNL37BPLA=; b=m7e3YxScDQMqAqPyv9NywXBJDghWvqtU7AvIML/nG1+D6bYgHo6gttQN X0G9SKxYUdx4ERxXoULJm2X9ywEGdJr4TIlxdKONATikVmF7phGgz/g01 qyH+gT9Vx6m2iewfE4ik0xM+ZcTEUEe1q1DtxSG4oJlMLPRWipDDm5eZr weEk7AL3qpuPSHv52Es59Th16DxHzSDpjWjvajcfIY5WdFQGUCiqMrYLN uEspu7nk/SyJE7/nYMkfWQfQOVCPOzjwHPUO+D5+s3lsrmK6F34GMaiSb G6Dngvsydh8N0qnm7MFhSouwntfaLeSh/RWh3zpL5ApLnlQbXtPMH99eU Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10252"; a="232566709" X-IronPort-AV: E=Sophos;i="5.88,353,1635231600"; d="scan'208";a="232566709" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Feb 2022 09:23:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,353,1635231600"; d="scan'208";a="482013260" Received: from lkp-server01.sh.intel.com (HELO d95dc2dabeb1) ([10.239.97.150]) by orsmga003.jf.intel.com with ESMTP; 08 Feb 2022 09:23:54 -0800 Received: from kbuild by d95dc2dabeb1 with local (Exim 4.92) (envelope-from ) id 1nHUDZ-0000VK-S2; Tue, 08 Feb 2022 17:23:53 +0000 Date: Wed, 9 Feb 2022 01:23:17 +0800 From: kernel test robot To: Huang Ying Cc: llvm@lists.linux.dev, kbuild-all@lists.01.org Subject: Re: [RFC PATCH -V2] NUMA balancing: fix NUMA topology for systems with CPU-less nodes Message-ID: <202202090152.0LoawqhI-lkp@intel.com> References: <20220208122322.604285-1-ying.huang@intel.com> Precedence: bulk X-Mailing-List: llvm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220208122322.604285-1-ying.huang@intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Hi Huang, [FYI, it's a private test report for your RFC patch.] [auto build test ERROR on tip/sched/core] [also build test ERROR on linux/master linus/master v5.17-rc3 next-20220208] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Huang-Ying/NUMA-balancing-fix-NUMA-topology-for-systems-with-CPU-less-nodes/20220208-212402 base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git c8eaf6ac76f40f6c59fc7d056e2e08c4a57ea9c7 config: hexagon-randconfig-r045-20220208 (https://download.01.org/0day-ci/archive/20220209/202202090152.0LoawqhI-lkp@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project e8bff9ae54a55b4dbfeb6ba55f723abbd81bf494) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ed82092e509333870d756fc8e53d816885922fc4 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Huang-Ying/NUMA-balancing-fix-NUMA-topology-for-systems-with-CPU-less-nodes/20220208-212402 git checkout ed82092e509333870d756fc8e53d816885922fc4 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/sched/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): kernel/sched/core.c:3454:6: warning: no previous prototype for function 'sched_set_stop_task' [-Wmissing-prototypes] void sched_set_stop_task(int cpu, struct task_struct *stop) ^ kernel/sched/core.c:3454:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void sched_set_stop_task(int cpu, struct task_struct *stop) ^ static >> kernel/sched/core.c:9055:3: error: implicit declaration of function 'sched_reinit_numa' [-Werror,-Wimplicit-function-declaration] sched_reinit_numa(true, cpu); ^ kernel/sched/core.c:9055:3: note: did you mean 'sched_init_numa'? kernel/sched/sched.h:1671:20: note: 'sched_init_numa' declared here static inline void sched_init_numa(void) { } ^ kernel/sched/core.c:9134:2: error: implicit declaration of function 'sched_reinit_numa' [-Werror,-Wimplicit-function-declaration] sched_reinit_numa(false, cpu); ^ >> kernel/sched/core.c:9241:18: error: too many arguments to function call, expected 0, have 1 sched_init_numa(NUMA_NO_NODE); ~~~~~~~~~~~~~~~ ^~~~~~~~~~~~ include/linux/numa.h:14:22: note: expanded from macro 'NUMA_NO_NODE' #define NUMA_NO_NODE (-1) ^~~~ kernel/sched/sched.h:1671:20: note: 'sched_init_numa' declared here static inline void sched_init_numa(void) { } ^ 1 warning and 3 errors generated. vim +/sched_reinit_numa +9055 kernel/sched/core.c 9033 9034 int sched_cpu_activate(unsigned int cpu) 9035 { 9036 struct rq *rq = cpu_rq(cpu); 9037 struct rq_flags rf; 9038 9039 /* 9040 * Clear the balance_push callback and prepare to schedule 9041 * regular tasks. 9042 */ 9043 balance_push_set(cpu, false); 9044 9045 #ifdef CONFIG_SCHED_SMT 9046 /* 9047 * When going up, increment the number of cores with SMT present. 9048 */ 9049 if (cpumask_weight(cpu_smt_mask(cpu)) == 2) 9050 static_branch_inc_cpuslocked(&sched_smt_present); 9051 #endif 9052 set_cpu_active(cpu, true); 9053 9054 if (sched_smp_initialized) { > 9055 sched_reinit_numa(true, cpu); 9056 sched_domains_numa_masks_set(cpu); 9057 cpuset_cpu_active(); 9058 } 9059 9060 /* 9061 * Put the rq online, if not already. This happens: 9062 * 9063 * 1) In the early boot process, because we build the real domains 9064 * after all CPUs have been brought up. 9065 * 9066 * 2) At runtime, if cpuset_cpu_active() fails to rebuild the 9067 * domains. 9068 */ 9069 rq_lock_irqsave(rq, &rf); 9070 if (rq->rd) { 9071 BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span)); 9072 set_rq_online(rq); 9073 } 9074 rq_unlock_irqrestore(rq, &rf); 9075 9076 return 0; 9077 } 9078 9079 int sched_cpu_deactivate(unsigned int cpu) 9080 { 9081 struct rq *rq = cpu_rq(cpu); 9082 struct rq_flags rf; 9083 int ret; 9084 9085 /* 9086 * Remove CPU from nohz.idle_cpus_mask to prevent participating in 9087 * load balancing when not active 9088 */ 9089 nohz_balance_exit_idle(rq); 9090 9091 set_cpu_active(cpu, false); 9092 9093 /* 9094 * From this point forward, this CPU will refuse to run any task that 9095 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively 9096 * push those tasks away until this gets cleared, see 9097 * sched_cpu_dying(). 9098 */ 9099 balance_push_set(cpu, true); 9100 9101 /* 9102 * We've cleared cpu_active_mask / set balance_push, wait for all 9103 * preempt-disabled and RCU users of this state to go away such that 9104 * all new such users will observe it. 9105 * 9106 * Specifically, we rely on ttwu to no longer target this CPU, see 9107 * ttwu_queue_cond() and is_cpu_allowed(). 9108 * 9109 * Do sync before park smpboot threads to take care the rcu boost case. 9110 */ 9111 synchronize_rcu(); 9112 9113 rq_lock_irqsave(rq, &rf); 9114 if (rq->rd) { 9115 update_rq_clock(rq); 9116 BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span)); 9117 set_rq_offline(rq); 9118 } 9119 rq_unlock_irqrestore(rq, &rf); 9120 9121 #ifdef CONFIG_SCHED_SMT 9122 /* 9123 * When going down, decrement the number of cores with SMT present. 9124 */ 9125 if (cpumask_weight(cpu_smt_mask(cpu)) == 2) 9126 static_branch_dec_cpuslocked(&sched_smt_present); 9127 9128 sched_core_cpu_deactivate(cpu); 9129 #endif 9130 9131 if (!sched_smp_initialized) 9132 return 0; 9133 9134 sched_reinit_numa(false, cpu); 9135 ret = cpuset_cpu_inactive(cpu); 9136 if (ret) { 9137 balance_push_set(cpu, false); 9138 set_cpu_active(cpu, true); 9139 return ret; 9140 } 9141 sched_domains_numa_masks_clear(cpu); 9142 return 0; 9143 } 9144 9145 static void sched_rq_cpu_starting(unsigned int cpu) 9146 { 9147 struct rq *rq = cpu_rq(cpu); 9148 9149 rq->calc_load_update = calc_load_update; 9150 update_max_interval(); 9151 } 9152 9153 int sched_cpu_starting(unsigned int cpu) 9154 { 9155 sched_core_cpu_starting(cpu); 9156 sched_rq_cpu_starting(cpu); 9157 sched_tick_start(cpu); 9158 return 0; 9159 } 9160 9161 #ifdef CONFIG_HOTPLUG_CPU 9162 9163 /* 9164 * Invoked immediately before the stopper thread is invoked to bring the 9165 * CPU down completely. At this point all per CPU kthreads except the 9166 * hotplug thread (current) and the stopper thread (inactive) have been 9167 * either parked or have been unbound from the outgoing CPU. Ensure that 9168 * any of those which might be on the way out are gone. 9169 * 9170 * If after this point a bound task is being woken on this CPU then the 9171 * responsible hotplug callback has failed to do it's job. 9172 * sched_cpu_dying() will catch it with the appropriate fireworks. 9173 */ 9174 int sched_cpu_wait_empty(unsigned int cpu) 9175 { 9176 balance_hotplug_wait(); 9177 return 0; 9178 } 9179 9180 /* 9181 * Since this CPU is going 'away' for a while, fold any nr_active delta we 9182 * might have. Called from the CPU stopper task after ensuring that the 9183 * stopper is the last running task on the CPU, so nr_active count is 9184 * stable. We need to take the teardown thread which is calling this into 9185 * account, so we hand in adjust = 1 to the load calculation. 9186 * 9187 * Also see the comment "Global load-average calculations". 9188 */ 9189 static void calc_load_migrate(struct rq *rq) 9190 { 9191 long delta = calc_load_fold_active(rq, 1); 9192 9193 if (delta) 9194 atomic_long_add(delta, &calc_load_tasks); 9195 } 9196 9197 static void dump_rq_tasks(struct rq *rq, const char *loglvl) 9198 { 9199 struct task_struct *g, *p; 9200 int cpu = cpu_of(rq); 9201 9202 lockdep_assert_rq_held(rq); 9203 9204 printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, rq->nr_running); 9205 for_each_process_thread(g, p) { 9206 if (task_cpu(p) != cpu) 9207 continue; 9208 9209 if (!task_on_rq_queued(p)) 9210 continue; 9211 9212 printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm); 9213 } 9214 } 9215 9216 int sched_cpu_dying(unsigned int cpu) 9217 { 9218 struct rq *rq = cpu_rq(cpu); 9219 struct rq_flags rf; 9220 9221 /* Handle pending wakeups and then migrate everything off */ 9222 sched_tick_stop(cpu); 9223 9224 rq_lock_irqsave(rq, &rf); 9225 if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) { 9226 WARN(true, "Dying CPU not properly vacated!"); 9227 dump_rq_tasks(rq, KERN_WARNING); 9228 } 9229 rq_unlock_irqrestore(rq, &rf); 9230 9231 calc_load_migrate(rq); 9232 update_max_interval(); 9233 hrtick_clear(rq); 9234 sched_core_cpu_dying(cpu); 9235 return 0; 9236 } 9237 #endif 9238 9239 void __init sched_init_smp(void) 9240 { > 9241 sched_init_numa(NUMA_NO_NODE); 9242 9243 /* 9244 * There's no userspace yet to cause hotplug operations; hence all the 9245 * CPU masks are stable and all blatant races in the below code cannot 9246 * happen. 9247 */ 9248 mutex_lock(&sched_domains_mutex); 9249 sched_init_domains(cpu_active_mask); 9250 mutex_unlock(&sched_domains_mutex); 9251 9252 /* Move init over to a non-isolated CPU */ 9253 if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) < 0) 9254 BUG(); 9255 current->flags &= ~PF_NO_SETAFFINITY; 9256 sched_init_granularity(); 9257 9258 init_sched_rt_class(); 9259 init_sched_dl_class(); 9260 9261 sched_smp_initialized = true; 9262 } 9263 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org