From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96CD2C433F5 for ; Tue, 25 Jan 2022 05:53:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C1116B0081; Tue, 25 Jan 2022 00:53:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 070DC6B0083; Tue, 25 Jan 2022 00:53:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7A3B6B0085; Tue, 25 Jan 2022 00:53:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id D9A4D6B0081 for ; Tue, 25 Jan 2022 00:53:04 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 818D093D7F for ; Tue, 25 Jan 2022 05:53:04 +0000 (UTC) X-FDA: 79067741088.04.F2CC94F Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 4473540009 for ; Tue, 25 Jan 2022 05:53:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643089983; x=1674625983; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=BN/L4/oqQxn7sbeHhVV5KCEMyRN2/eURYjfCprBCqus=; b=JXNQl0PvRZmpv4UGngUZzdu4oeWek6hUEQ+OHVqZXDyv6iEN7wrtjJaf QKcuEfguVJnImKdZWgSUZ2ibEalIOIk60Vv/KNL/QF0v7c9MX4CeooMCR edvChs5M5tvOgDumNf6x/liJK1E7N/SinKwFe+pLjAigIlYFVPX7mTW7L s8rvWLu7VVrGGVdGHSrVTAsjjXz9tqbhheeoGNWRBcNDoLuSMD8MmvVsh 6ReA/2fkuMLt7HueSsMi/+QdlWcgoNr2xMGE5Mbpyc2S62syZBhBywVM2 M/h9agaQa/18415KM3wkWwCFpZP0cJpw47jgrWJLZwufuuH7DfEXtBQdc A==; X-IronPort-AV: E=McAfee;i="6200,9189,10237"; a="246444201" X-IronPort-AV: E=Sophos;i="5.88,314,1635231600"; d="scan'208";a="246444201" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 21:53:01 -0800 X-IronPort-AV: E=Sophos;i="5.88,314,1635231600"; d="scan'208";a="479366480" Received: from yhuang6-desk2.sh.intel.com ([10.239.13.11]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 21:52:59 -0800 From: Huang Ying To: Peter Zijlstra , Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Mel Gorman , Valentin Schneider , Greg Kroah-Hartman Subject: [PATCH -V3 RESEND] numa balancing: move some document to make it consistent with the code Date: Tue, 25 Jan 2022 13:51:50 +0800 Message-Id: <20220125055150.2935392-1-ying.huang@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Stat-Signature: b3z49uhbfy3eixw8unxku4mbz3k9i447 X-Rspam-User: nil Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JXNQl0Pv; spf=none (imf11.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 4473540009 X-HE-Tag: 1643089983-234900 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After commit 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs"), some NUMA balancing sysctls enclosed with SCHED_DEBUG has been moved to debugfs. This patch move the document for these sysctls from Documentation/admin-guide/sysctl/kernel.rst to Documentation/scheduler/sched-debug.rst to make the document consistent with the code. Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman Reviewed-by: Valentin Schneider Cc: Peter Zijlstra (Intel) Cc: Greg Kroah-Hartman --- Documentation/admin-guide/sysctl/kernel.rst | 46 +----------------- Documentation/scheduler/index.rst | 1 + Documentation/scheduler/sched-debug.rst | 54 +++++++++++++++++++++ 3 files changed, 56 insertions(+), 45 deletions(-) create mode 100644 Documentation/scheduler/sched-debug.rst diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/= admin-guide/sysctl/kernel.rst index d359bcfadd39..8551aeca1574 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -609,51 +609,7 @@ be migrated to a local memory node. The unmapping of pages and trapping faults incur additional overhead tha= t ideally is offset by improved memory locality but there is no universal guarantee. If the target workload is already bound to NUMA nodes then th= is -feature should be disabled. Otherwise, if the system overhead from the -feature is too high then the rate the kernel samples for NUMA hinting -faults may be controlled by the `numa_balancing_scan_period_min_ms, -numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, -numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls. - - -numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_ba= lancing_scan_period_max_ms, numa_balancing_scan_size_mb -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D - - -Automatic NUMA balancing scans tasks address space and unmaps pages to -detect if pages are properly placed or if the data should be migrated to= a -memory node local to where the task is running. Every "scan delay" the = task -scans the next "scan size" number of pages in its address space. When th= e -end of the address space is reached the scanner restarts from the beginn= ing. - -In combination, the "scan delay" and "scan size" determine the scan rate= . -When "scan delay" decreases, the scan rate increases. The scan delay an= d -hence the scan rate of every task is adaptive and depends on historical -behaviour. If pages are properly placed then the scan delay increases, -otherwise the scan delay decreases. The "scan size" is not adaptive but -the higher the "scan size", the higher the scan rate. - -Higher scan rates incur higher system overhead as page faults must be -trapped and potentially data must be migrated. However, the higher the s= can -rate, the more quickly a tasks memory is migrated to a local node if the -workload pattern changes and minimises performance impact due to remote -memory accesses. These sysctls control the thresholds for scan delays an= d -the number of pages scanned. - -``numa_balancing_scan_period_min_ms`` is the minimum time in millisecond= s to -scan a tasks virtual memory. It effectively controls the maximum scannin= g -rate for each task. - -``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a= task -when it initially forks. - -``numa_balancing_scan_period_max_ms`` is the maximum time in millisecond= s to -scan a tasks virtual memory. It effectively controls the minimum scannin= g -rate for each task. - -``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are -scanned for a given scan. - +feature should be disabled. =20 oops_all_cpu_backtrace =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/= index.rst index 88900aabdbf7..30cca8a37b3b 100644 --- a/Documentation/scheduler/index.rst +++ b/Documentation/scheduler/index.rst @@ -17,6 +17,7 @@ Linux Scheduler sched-nice-design sched-rt-group sched-stats + sched-debug =20 text_files =20 diff --git a/Documentation/scheduler/sched-debug.rst b/Documentation/sche= duler/sched-debug.rst new file mode 100644 index 000000000000..4d3d24f2a439 --- /dev/null +++ b/Documentation/scheduler/sched-debug.rst @@ -0,0 +1,54 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Scheduler debugfs +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Booting a kernel with CONFIG_SCHED_DEBUG=3Dy will give access to +scheduler specific debug files under /sys/kernel/debug/sched. Some of +those files are described below. + +numa_balancing +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +`numa_balancing` directory is used to hold files to control NUMA +balancing feature. If the system overhead from the feature is too +high then the rate the kernel samples for NUMA hinting faults may be +controlled by the `scan_period_min_ms, scan_delay_ms, +scan_period_max_ms, scan_size_mb` files. + + +scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb +------------------------------------------------------------------- + +Automatic NUMA balancing scans tasks address space and unmaps pages to +detect if pages are properly placed or if the data should be migrated to= a +memory node local to where the task is running. Every "scan delay" the = task +scans the next "scan size" number of pages in its address space. When th= e +end of the address space is reached the scanner restarts from the beginn= ing. + +In combination, the "scan delay" and "scan size" determine the scan rate= . +When "scan delay" decreases, the scan rate increases. The scan delay an= d +hence the scan rate of every task is adaptive and depends on historical +behaviour. If pages are properly placed then the scan delay increases, +otherwise the scan delay decreases. The "scan size" is not adaptive but +the higher the "scan size", the higher the scan rate. + +Higher scan rates incur higher system overhead as page faults must be +trapped and potentially data must be migrated. However, the higher the s= can +rate, the more quickly a tasks memory is migrated to a local node if the +workload pattern changes and minimises performance impact due to remote +memory accesses. These files control the thresholds for scan delays and +the number of pages scanned. + +``scan_period_min_ms`` is the minimum time in milliseconds to scan a +tasks virtual memory. It effectively controls the maximum scanning +rate for each task. + +``scan_delay_ms`` is the starting "scan delay" used for a task when it +initially forks. + +``scan_period_max_ms`` is the maximum time in milliseconds to scan a +tasks virtual memory. It effectively controls the minimum scanning +rate for each task. + +``scan_size_mb`` is how many megabytes worth of pages are scanned for +a given scan. --=20 2.30.2