From: Saravanan D <saravanand@fb.com>
To: <x86@kernel.org>, <dave.hansen@linux.intel.com>,
<luto@kernel.org>, <peterz@infradead.org>, <corbet@lwn.net>
Cc: <linux-kernel@vger.kernel.org>, <kernel-team@fb.com>,
<linux-doc@vger.kernel.org>, Saravanan D <saravanand@fb.com>
Subject: [PATCH V4] x86/mm: Tracking linear mapping split events
Date: Wed, 27 Jan 2021 20:35:47 -0800 [thread overview]
Message-ID: <20210128043547.1560435-1-saravanand@fb.com> (raw)
In-Reply-To: <a936a943-9d8f-7e3c-af38-1c99ae176e1f@intel.com>
To help with debugging the sluggishness caused by TLB miss/reload,
we introduce monotonic lifetime hugepage split event counts since
system state: SYSTEM_RUNNING to be displayed as part of
/proc/vmstat in x86 servers
The lifetime split event information will be displayed at the bottom of
/proc/vmstat
....
swap_ra 0
swap_ra_hit 0
direct_map_level2_splits 94
direct_map_level3_splits 4
nr_unstable 0
....
One of the many lasting (as we don't coalesce back) sources for huge page
splits is tracing as the granular page attribute/permission changes would
force the kernel to split code segments mapped to huge pages to smaller
ones thereby increasing the probability of TLB miss/reload even after
tracing has been stopped.
Documentation regarding linear mapping split events added to admin-guide
as requested in V3 of the patch.
Signed-off-by: Saravanan D <saravanand@fb.com>
---
.../admin-guide/mm/direct_mapping_splits.rst | 59 +++++++++++++++++++
Documentation/admin-guide/mm/index.rst | 1 +
arch/x86/mm/pat/set_memory.c | 13 ++++
include/linux/vm_event_item.h | 4 ++
mm/vmstat.c | 4 ++
5 files changed, 81 insertions(+)
create mode 100644 Documentation/admin-guide/mm/direct_mapping_splits.rst
diff --git a/Documentation/admin-guide/mm/direct_mapping_splits.rst b/Documentation/admin-guide/mm/direct_mapping_splits.rst
new file mode 100644
index 000000000000..298751391deb
--- /dev/null
+++ b/Documentation/admin-guide/mm/direct_mapping_splits.rst
@@ -0,0 +1,59 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Direct Mapping Splits
+=====================
+
+Kernel maps all of physical memory in linear/direct mapped pages with
+translation of virtual kernel address to physical address is achieved
+through a simple subtraction of offset. CPUs maintain a cache of these
+translations on fast caches called TLBs. CPU architectures like x86 allow
+direct mapping large portions of memory into hugepages (2M, 1G, etc) in
+various page table levels.
+
+Maintaining huge direct mapped pages greatly reduces TLB miss pressure.
+The splintering of huge direct pages into smaller ones does result in
+a measurable performance hit caused by frequent TLB miss and reloads.
+
+One of the many lasting (as we don't coalesce back) sources for huge page
+splits is tracing as the granular page attribute/permission changes would
+force the kernel to split code segments mapped to hugepages to smaller
+ones thus increasing the probability of TLB miss/reloads even after
+tracing has been stopped.
+
+On x86 systems, we can track the splitting of huge direct mapped pages
+through lifetime event counters in ``/proc/vmstat``
+
+ direct_map_level2_splits xxx
+ direct_map_level3_splits yyy
+
+where:
+
+direct_map_level2_splits
+ are 2M/4M hugepage split events
+direct_map_level3_splits
+ are 1G hugepage split events
+
+The distribution of direct mapped system memory in various page sizes
+post splits can be viewed through ``/proc/meminfo`` whose output
+will include the following lines depending upon supporting CPU
+architecture
+
+ DirectMap4k: xxxxx kB
+ DirectMap2M: yyyyy kB
+ DirectMap1G: zzzzz kB
+
+where:
+
+DirectMap4k
+ is the total amount of direct mapped memory (in kB)
+ accessed through 4k pages
+DirectMap2M
+ is the total amount of direct mapped memory (in kB)
+ accessed through 2M pages
+DirectMap1G
+ is the total amount of direct mapped memory (in kB)
+ accessed through 1G pages
+
+
+-- Saravanan D, Jan 27, 2021
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 4b14d8b50e9e..9439780f3f07 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -38,3 +38,4 @@ the Linux memory management.
soft-dirty
transhuge
userfaultfd
+ direct_mapping_splits
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 16f878c26667..767cade53bdc 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -16,6 +16,8 @@
#include <linux/pci.h>
#include <linux/vmalloc.h>
#include <linux/libnvdimm.h>
+#include <linux/vmstat.h>
+#include <linux/kernel.h>
#include <asm/e820/api.h>
#include <asm/processor.h>
@@ -85,12 +87,23 @@ void update_page_count(int level, unsigned long pages)
spin_unlock(&pgd_lock);
}
+void update_split_page_event_count(int level)
+{
+ if (system_state == SYSTEM_RUNNING) {
+ if (level == PG_LEVEL_2M)
+ count_vm_event(DIRECT_MAP_LEVEL2_SPLIT);
+ else if (level == PG_LEVEL_1G)
+ count_vm_event(DIRECT_MAP_LEVEL3_SPLIT);
+ }
+}
+
static void split_page_count(int level)
{
if (direct_pages_count[level] == 0)
return;
direct_pages_count[level]--;
+ update_split_page_event_count(level);
direct_pages_count[level - 1] += PTRS_PER_PTE;
}
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 18e75974d4e3..7c06c2bdc33b 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -120,6 +120,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_SWAP
SWAP_RA,
SWAP_RA_HIT,
+#endif
+#ifdef CONFIG_X86
+ DIRECT_MAP_LEVEL2_SPLIT,
+ DIRECT_MAP_LEVEL3_SPLIT,
#endif
NR_VM_EVENT_ITEMS
};
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f8942160fc95..a43ac4ac98a2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1350,6 +1350,10 @@ const char * const vmstat_text[] = {
"swap_ra",
"swap_ra_hit",
#endif
+#ifdef CONFIG_X86
+ "direct_map_level2_splits",
+ "direct_map_level3_splits",
+#endif
#endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
};
#endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */
--
2.24.1
next prev parent reply other threads:[~2021-01-28 4:36 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BYAPR01MB40856478D5BE74CB6A7D5578CFBD9@BYAPR01MB4085.prod.exchangelabs.com>
2021-01-25 20:15 ` [PATCH] x86/mm: Tracking linear mapping split events since boot Dave Hansen
2021-01-25 20:32 ` Tejun Heo
2021-01-26 0:47 ` Dave Hansen
2021-01-26 0:53 ` Tejun Heo
2021-01-26 1:04 ` Dave Hansen
2021-01-26 1:17 ` Tejun Heo
2021-01-27 17:51 ` [PATCH V2] x86/mm: Tracking linear mapping split events Saravanan D
2021-01-27 21:03 ` Tejun Heo
2021-01-27 21:32 ` Dave Hansen
2021-01-27 21:36 ` Tejun Heo
2021-01-27 21:42 ` Saravanan D
2021-01-27 22:50 ` [PATCH V3] " Saravanan D
2021-01-27 23:00 ` Randy Dunlap
2021-01-27 23:56 ` Saravanan D
2021-01-27 23:41 ` Dave Hansen
2021-01-28 0:15 ` Saravanan D
2021-01-28 4:35 ` Saravanan D [this message]
2021-01-28 4:51 ` [PATCH V4] " Matthew Wilcox
2021-01-28 10:49 ` [PATCH V5] " Saravanan D
2021-01-28 15:04 ` Matthew Wilcox
2021-01-28 19:49 ` Saravanan D
2021-01-28 16:33 ` Zi Yan
2021-01-28 16:41 ` Dave Hansen
2021-01-28 16:56 ` Zi Yan
2021-01-28 16:59 ` Song Liu
2021-01-28 19:17 ` Dave Hansen
2021-01-28 21:20 ` Saravanan D
2021-01-28 23:34 ` [PATCH V6] " Saravanan D
2021-01-28 23:41 ` Tejun Heo
2021-01-29 19:27 ` Johannes Weiner
2021-02-08 23:17 ` Saravanan D
2021-02-08 23:30 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210128043547.1560435-1-saravanand@fb.com \
--to=saravanand@fb.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.