From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64A2EC54F2E for ; Fri, 23 May 2025 12:57:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D85D86B00CD; Fri, 23 May 2025 08:57:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D35B46B00CE; Fri, 23 May 2025 08:57:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFDBF6B00CF; Fri, 23 May 2025 08:57:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9F3C26B00CD for ; Fri, 23 May 2025 08:57:40 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 54EDD12041B for ; Fri, 23 May 2025 12:57:40 +0000 (UTC) X-FDA: 83474174280.02.B2BB96F Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by imf25.hostedemail.com (Postfix) with ESMTP id 5E9A0A000B for ; Fri, 23 May 2025 12:57:38 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JkquPrtG; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf25.hostedemail.com: domain of yu.c.chen@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=yu.c.chen@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748005058; a=rsa-sha256; cv=none; b=TLNgG5IxuHuXhObWtY0w9iptItuOvRYjB0IQlHR8Idk2hsI8sPO3u2cw3U3cE/ptG/UPEu 9iyCi7sS6/rTKeVgM/rn8/84Jdi9sPveJzHFSQan2CQTXcldNqMWXARWQOTJS72L9sJLOv DK5Og0FQJteASOh0Wc1INmUAI0wypt8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JkquPrtG; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf25.hostedemail.com: domain of yu.c.chen@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=yu.c.chen@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748005058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kUMgTcE2t+4k7yV5uMQ1TXHz0oiGMoNFqsth0aJI2C0=; b=my9wUDyYOga5d6KrmxKFMh/BDXqDWJXhpMx3NezNB2JMFeRgOak8SPEfZ8jUcIhGBwwsVD 6Wd1I/yH/bsFyKpkVA5UL1g9kR5gNGgqewxwd4swvt515Ah3Q4h45vnj8zOeMMu6XsKw6R LjrYicxsxBpwL+6S9CgA7X+E7KztRWM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1748005059; x=1779541059; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uAzqmQ27CSJpty7RbW9XS5900LTZJ6jxrKZ6Q4FYnfw=; b=JkquPrtGkn4wZj/8PrCLhzhNOByFwV8EBYcmCgsmyvPrwubclTE3Z4yL azftZkxosxmlKk7sZpjoYhIJhnUA/6ektb3/ZafzHJyUhyVnwRy++Xt49 Sh/E9RzrWtJG/7ritm6LnEtW/9dgYBhZClE3GfbaRNTgB2Rlz8/zrL2+j a3Iz7e2yweTS+06sXNtyy8wX8ESP5I+HrF8BAIrqOf8c4+3kTlntngJ+7 cy56vyJDDB1ffhJvD4bVCMlttJs+NjFIWwLNP3nhAQB9t7OUA7qZGR62h tS/5fitLk42swOGmTLYXy0/7RGejRBJvNFVvGZazScXwNE0WD5vfc8Ulu w==; X-CSE-ConnectionGUID: DAUKV81UQPW9ZCqm+MYNHw== X-CSE-MsgGUID: Y6Z8tPdTTNKt+T6llmP5jA== X-IronPort-AV: E=McAfee;i="6700,10204,11441"; a="61464597" X-IronPort-AV: E=Sophos;i="6.15,308,1739865600"; d="scan'208";a="61464597" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 May 2025 05:57:37 -0700 X-CSE-ConnectionGUID: +g9hGL3ySHW33/mWARsHNg== X-CSE-MsgGUID: SXlsmNlyRpOEtkAbRj4YxA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,308,1739865600"; d="scan'208";a="141196204" Received: from chenyu-dev.sh.intel.com ([10.239.62.107]) by fmviesa007.fm.intel.com with ESMTP; 23 May 2025 05:57:32 -0700 From: Chen Yu To: peterz@infradead.org, akpm@linux-foundation.org Cc: mkoutny@suse.com, mingo@redhat.com, tj@kernel.org, hannes@cmpxchg.org, corbet@lwn.net, mgorman@suse.de, mhocko@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, tim.c.chen@intel.com, aubrey.li@intel.com, libo.chen@oracle.com, kprateek.nayak@amd.com, vineethr@linux.ibm.com, venkat88@linux.ibm.com, ayushjai@amd.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yu.chen.surf@foxmail.com, Ayush Jain , Chen Yu Subject: [PATCH v5 1/2] sched/numa: fix task swap by skipping kernel threads Date: Fri, 23 May 2025 20:51:01 +0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5E9A0A000B X-Stat-Signature: tgsnrr3oaxgndfc4c7ezocqrmhbqs3ad X-Rspam-User: X-HE-Tag: 1748005058-929877 X-HE-Meta: U2FsdGVkX1/oxAqzZ6efoLOEJxKKgNBkcM5yCPYdu3zI2CvLU398GxuB0Avib/xnDYDLY/vnMCW8b4dQDtFX9UNgx7bq7bxp3hpBL9L2FPzbpuhFh8uLycC3kBqf5mmA61ic3ozSvz5kp3uDlqzotka6TgBYPQHt3khKFGu16oKG9OzKuGZqMCJKsUhTRdykT50YPBP0m+LPVZgbLghSghfdhc3iN011v1A1AFawSJM6GkMQNHc2B59uXGD1pdHM2btdK8cFtpdJLciVJTi9XceKbQsAEO7KvOG/7uDdN1fQO5GU2VbHp59tsVqG2SdLfKV5gg8xE1EtYTDdsw5f6uKFOFvnBDPf7+sk+DFXjSOnoHu/6EaBhiF57yQ3cPDQwjFZJlwlVnkvJHgXS9aw60SN9oJIUPG8qfxpai0SVbuW9giZ63OTeXFBeapr09UeD6/P87MZKigGydNi7eUn4xs9Z6FzxnAo+0tCXbDYllddWLC3+L2F6/hl91kp8OAH7Y0cTEiDvPWRW7iWHZ5GnIFmTIw7XC8aa83eSojk8KvGILwQMq5k/zekJrESipxoGxNJ1cGW1BYJgibl6ctWj0L0Nc0fMcC+ehvDu8LqsrjhEZ7bxwyC/72kFj3ZmSTjJxrRmVYhn/m/GB6jzm8G7YEOyT+qNfIc/B279n3PGeMHwBTwg2NY54absnMBPd9DAl9S16qSZRxTxRjZEQqQT7gISFxxqyimkuher6vMuKZuvlOaIWIlOgrrLT3Scm8JkxlFCyVHhZtBrzOe6Q/davkyMEQQdk5ZGHBcqDBIsypwBWO4sJgf8TnOztD4ILBw7tjekr68yp40ua5y7o70goHd3/8IDcL8T/KQlSMxzpiYJs//3LhBEf+uSC9rArNn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Libo Chen Task swapping is triggered when there are no idle CPUs in task A's preferred node. In this case, the NUMA load balancer chooses a task B on A's preferred node and swaps B with A. This helps improve NUMA locality without introducing load imbalance between nodes. In the current implementation, B's NUMA node preference is not mandatory. That is to say, a kernel thread might be incorrectly chosen as B. However, kernel thread and user space thread that does not have mm are not supposed to be covered by NUMA balancing because NUMA balancing only considers user pages via VMAs. According to Peter's suggestion for fixing this issue, we use PF_KTHREAD to skip the kernel thread. curr->mm is also checked because it is possible that user_mode_thread() might create a user thread without an mm. As per Prateek's analysis, after adding the PF_KTHREAD check, there is no need to further check the PF_IDLE flag: " - play_idle_precise() already ensures PF_KTHREAD is set before adding PF_IDLE - cpu_startup_entry() is only called from the startup thread which should be marked with PF_KTHREAD (based on my understanding looking at commit cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")) " In summary, the check in task_numa_compare() now aligns with task_tick_numa(). Suggested-by: Michal Koutny Tested-by: Ayush Jain Signed-off-by: Libo Chen Tested-by: Venkat Rao Bagalkote Signed-off-by: Chen Yu --- v4->v5: Add PF_KTHREAD check, and remove PF_IDLE check. --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fb9bf995a47..03d9a49a68b9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2273,7 +2273,8 @@ static bool task_numa_compare(struct task_numa_env *env, rcu_read_lock(); cur = rcu_dereference(dst_rq->curr); - if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur))) + if (cur && ((cur->flags & (PF_EXITING | PF_KTHREAD)) || + !cur->mm)) cur = NULL; /* -- 2.25.1