All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Andres Lagar-Cavilla
	<andreslc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Raghavendra K T
	<raghavendra.kt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Michel Lespinasse
	<walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
	Cyrill Gorcunov
	<gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Subject: Re: [PATCH -mm v9 0/8] idle memory tracking
Date: Sat, 25 Jul 2015 19:24:56 +0300	[thread overview]
Message-ID: <20150725162456.GM8100@esperanza> (raw)
In-Reply-To: <20150722162353.GM23374@esperanza>

On Wed, Jul 22, 2015 at 07:23:53PM +0300, Vladimir Davydov wrote:
> On Tue, Jul 21, 2015 at 04:34:02PM -0700, Andrew Morton wrote:
> > On Sun, 19 Jul 2015 15:31:09 +0300 Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:

> > > Documentation/vm/pagemap.txt           |  22 ++-
> > 
> > I think we'll need quite a lot more than this to fully describe the
> > interface?
> 
> Agree, the documentation sucks :-( Will try to forge something more
> thorough.

The incremental patch is attached. Could you please merge it into
proc-add-kpageidle-file?
---
From: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Subject: [PATCH] Documentation: Add idle page tracking description

Signed-off-by: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 081c49777abb..6a5e2a102a45 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -14,6 +14,8 @@ hugetlbpage.txt
 	- a brief summary of hugetlbpage support in the Linux kernel.
 hwpoison.txt
 	- explains what hwpoison is
+idle_page_tracking.txt
+	- description of the idle page tracking feature.
 ksm.txt
 	- how to use the Kernel Samepage Merging feature.
 numa
diff --git a/Documentation/vm/idle_page_tracking.txt b/Documentation/vm/idle_page_tracking.txt
new file mode 100644
index 000000000000..d0f332d544c4
--- /dev/null
+++ b/Documentation/vm/idle_page_tracking.txt
@@ -0,0 +1,94 @@
+MOTIVATION
+
+The idle page tracking feature allows to track which memory pages are being
+accessed by a workload and which are idle. This information can be useful for
+estimating the workload's working set size, which, in turn, can be taken into
+account when configuring the workload parameters, setting memory cgroup limits,
+or deciding where to place the workload within a compute cluster.
+
+USER API
+
+If CONFIG_IDLE_PAGE_TRACKING was enabled on compile time, a new read-write file
+is present on the proc filesystem, /proc/kpageidle.
+
+The file implements a bitmap where each bit corresponds to a memory page. The
+bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
+mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
+set, the corresponding page is idle.
+
+A page is considered idle if it has not been accessed since it was marked idle
+(for more details on what "accessed" actually means see the IMPLEMENTATION
+DETAILS section). To mark a page idle one has to set the bit corresponding to
+the page by writing to the file. A value written to the file is OR-ed with the
+current bitmap value.
+
+Only accesses to user memory pages are tracked. These are pages mapped to a
+process address space, page cache and buffer pages, swap cache pages. For other
+page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored,
+and hence such pages are never reported idle.
+
+For huge pages the idle flag is set only on the head page, so one has to read
+/proc/kpageflags in order to correctly count idle huge pages.
+
+Reading from or writing to /proc/kpageidle will return -EINVAL if you are not
+starting the read/write on an 8-byte boundary, or if the size of the read/write
+is not a multiple of 8 bytes. Writing to this file beyond max PFN will return
+-ENXIO.
+
+That said, in order to estimate the amount of pages that are not used by a
+workload one should:
+
+ 1. Mark all the workload's pages as idle by setting corresponding bits in the
+    /proc/kpageidle bitmap. The pages can be found by reading /proc/pid/pagemap
+    if the workload is represented by a process, or by filtering out alien pages
+    using /proc/kpagecgroup in case the workload is placed in a memory cgroup.
+
+ 2. Wait until the workload accesses its working set.
+
+ 3. Read /proc/kpageidle and count the number of bits set. If one wants to
+    ignore certain types of pages, e.g. mlocked pages since they are not
+    reclaimable, he or she can filter them out using /proc/kpageflags.
+
+See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap,
+/proc/kpageflags, and /proc/kpagecgroup.
+
+IMPLEMENTATION DETAILS
+
+The kernel internally keeps track of accesses to user memory pages in order to
+reclaim unreferenced pages first on memory shortage conditions. A page is
+considered referenced if it has been recently accessed via a process address
+space, in which case one or more PTEs it is mapped to will have the Accessed bit
+set, or marked accessed explicitly by the kernel (see mark_page_accessed()). The
+latter happens when:
+
+ - a userspace process reads or writes a page using a system call (e.g. read(2)
+   or write(2))
+
+ - a page that is used for storing filesystem buffers is read or written,
+   because a process needs filesystem metadata stored in it (e.g. lists a
+   directory tree)
+
+ - a page is accessed by a device driver using get_user_pages()
+
+When a dirty page is written to swap or disk as a result of memory reclaim or
+exceeding the dirty memory limit, it is not marked referenced.
+
+The idle memory tracking feature adds a new page flag, the Idle flag. This flag
+is set manually, by writing to /proc/kpageidle (see the USER API section), and
+cleared automatically whenever a page is referenced as defined above.
+
+When a page is marked idle, the Accessed bit must be cleared in all PTEs it is
+mapped to, otherwise we will not be able to detect accesses to the page coming
+from a process address space. To avoid interference with the reclaimer, which,
+as noted above, uses the Accessed bit to promote actively referenced pages, one
+more page flag is introduced, the Young flag. When the PTE Accessed bit is
+cleared as a result of setting or updating a page's Idle flag, the Young flag
+is set on the page. The reclaimer treats the Young flag as an extra PTE
+Accessed bit and therefore will consider such a page as referenced.
+
+Since the idle memory tracking feature is based on the memory reclaimer logic,
+it only works with pages that are on an LRU list, other pages are silently
+ignored. That means it will ignore a user memory page if it is isolated, but
+since there are usually not many of them, it should not affect the overall
+result noticeably. In order not to stall scanning of /proc/kpageidle, locked
+pages may be skipped too.
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 538735465693..cff513e28a13 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -71,15 +71,8 @@ There are five components to pagemap:
    memory cgroup each page is charged to, indexed by PFN. Only available when
    CONFIG_MEMCG is set.
 
- * /proc/kpageidle.  This file implements a bitmap where each bit corresponds
-   to a page, indexed by PFN. When the bit is set, the corresponding page is
-   idle. A page is considered idle if it has not been accessed since it was
-   marked idle. To mark a page idle one should set the bit corresponding to the
-   page by writing to the file. A value written to the file is OR-ed with the
-   current bitmap value. Only user memory pages can be marked idle, for other
-   page types input is silently ignored. Writing to this file beyond max PFN
-   results in the ENXIO error. Only available when CONFIG_IDLE_PAGE_TRACKING is
-   set.
+ * /proc/kpageidle.  This file comprises API of the idle page tracking feature.
+   See Documentation/vm/idle_page_tracking.txt for more details.
 
 Short descriptions to the page flags:
 
diff --git a/mm/Kconfig b/mm/Kconfig
index a1de09926171..90fa89175102 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -666,4 +666,4 @@ config IDLE_PAGE_TRACKING
 	  be useful to tune memory cgroup limits and/or for job placement
 	  within a compute cluster.
 
-	  See Documentation/vm/pagemap.txt for more details.
+	  See Documentation/vm/idle_page_tracking.txt for more details.

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andres Lagar-Cavilla <andreslc@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH -mm v9 0/8] idle memory tracking
Date: Sat, 25 Jul 2015 19:24:56 +0300	[thread overview]
Message-ID: <20150725162456.GM8100@esperanza> (raw)
In-Reply-To: <20150722162353.GM23374@esperanza>

On Wed, Jul 22, 2015 at 07:23:53PM +0300, Vladimir Davydov wrote:
> On Tue, Jul 21, 2015 at 04:34:02PM -0700, Andrew Morton wrote:
> > On Sun, 19 Jul 2015 15:31:09 +0300 Vladimir Davydov <vdavydov@parallels.com> wrote:

> > > Documentation/vm/pagemap.txt           |  22 ++-
> > 
> > I think we'll need quite a lot more than this to fully describe the
> > interface?
> 
> Agree, the documentation sucks :-( Will try to forge something more
> thorough.

The incremental patch is attached. Could you please merge it into
proc-add-kpageidle-file?
---
From: Vladimir Davydov <vdavydov@parallels.com>
Subject: [PATCH] Documentation: Add idle page tracking description

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>

diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 081c49777abb..6a5e2a102a45 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -14,6 +14,8 @@ hugetlbpage.txt
 	- a brief summary of hugetlbpage support in the Linux kernel.
 hwpoison.txt
 	- explains what hwpoison is
+idle_page_tracking.txt
+	- description of the idle page tracking feature.
 ksm.txt
 	- how to use the Kernel Samepage Merging feature.
 numa
diff --git a/Documentation/vm/idle_page_tracking.txt b/Documentation/vm/idle_page_tracking.txt
new file mode 100644
index 000000000000..d0f332d544c4
--- /dev/null
+++ b/Documentation/vm/idle_page_tracking.txt
@@ -0,0 +1,94 @@
+MOTIVATION
+
+The idle page tracking feature allows to track which memory pages are being
+accessed by a workload and which are idle. This information can be useful for
+estimating the workload's working set size, which, in turn, can be taken into
+account when configuring the workload parameters, setting memory cgroup limits,
+or deciding where to place the workload within a compute cluster.
+
+USER API
+
+If CONFIG_IDLE_PAGE_TRACKING was enabled on compile time, a new read-write file
+is present on the proc filesystem, /proc/kpageidle.
+
+The file implements a bitmap where each bit corresponds to a memory page. The
+bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
+mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
+set, the corresponding page is idle.
+
+A page is considered idle if it has not been accessed since it was marked idle
+(for more details on what "accessed" actually means see the IMPLEMENTATION
+DETAILS section). To mark a page idle one has to set the bit corresponding to
+the page by writing to the file. A value written to the file is OR-ed with the
+current bitmap value.
+
+Only accesses to user memory pages are tracked. These are pages mapped to a
+process address space, page cache and buffer pages, swap cache pages. For other
+page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored,
+and hence such pages are never reported idle.
+
+For huge pages the idle flag is set only on the head page, so one has to read
+/proc/kpageflags in order to correctly count idle huge pages.
+
+Reading from or writing to /proc/kpageidle will return -EINVAL if you are not
+starting the read/write on an 8-byte boundary, or if the size of the read/write
+is not a multiple of 8 bytes. Writing to this file beyond max PFN will return
+-ENXIO.
+
+That said, in order to estimate the amount of pages that are not used by a
+workload one should:
+
+ 1. Mark all the workload's pages as idle by setting corresponding bits in the
+    /proc/kpageidle bitmap. The pages can be found by reading /proc/pid/pagemap
+    if the workload is represented by a process, or by filtering out alien pages
+    using /proc/kpagecgroup in case the workload is placed in a memory cgroup.
+
+ 2. Wait until the workload accesses its working set.
+
+ 3. Read /proc/kpageidle and count the number of bits set. If one wants to
+    ignore certain types of pages, e.g. mlocked pages since they are not
+    reclaimable, he or she can filter them out using /proc/kpageflags.
+
+See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap,
+/proc/kpageflags, and /proc/kpagecgroup.
+
+IMPLEMENTATION DETAILS
+
+The kernel internally keeps track of accesses to user memory pages in order to
+reclaim unreferenced pages first on memory shortage conditions. A page is
+considered referenced if it has been recently accessed via a process address
+space, in which case one or more PTEs it is mapped to will have the Accessed bit
+set, or marked accessed explicitly by the kernel (see mark_page_accessed()). The
+latter happens when:
+
+ - a userspace process reads or writes a page using a system call (e.g. read(2)
+   or write(2))
+
+ - a page that is used for storing filesystem buffers is read or written,
+   because a process needs filesystem metadata stored in it (e.g. lists a
+   directory tree)
+
+ - a page is accessed by a device driver using get_user_pages()
+
+When a dirty page is written to swap or disk as a result of memory reclaim or
+exceeding the dirty memory limit, it is not marked referenced.
+
+The idle memory tracking feature adds a new page flag, the Idle flag. This flag
+is set manually, by writing to /proc/kpageidle (see the USER API section), and
+cleared automatically whenever a page is referenced as defined above.
+
+When a page is marked idle, the Accessed bit must be cleared in all PTEs it is
+mapped to, otherwise we will not be able to detect accesses to the page coming
+from a process address space. To avoid interference with the reclaimer, which,
+as noted above, uses the Accessed bit to promote actively referenced pages, one
+more page flag is introduced, the Young flag. When the PTE Accessed bit is
+cleared as a result of setting or updating a page's Idle flag, the Young flag
+is set on the page. The reclaimer treats the Young flag as an extra PTE
+Accessed bit and therefore will consider such a page as referenced.
+
+Since the idle memory tracking feature is based on the memory reclaimer logic,
+it only works with pages that are on an LRU list, other pages are silently
+ignored. That means it will ignore a user memory page if it is isolated, but
+since there are usually not many of them, it should not affect the overall
+result noticeably. In order not to stall scanning of /proc/kpageidle, locked
+pages may be skipped too.
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 538735465693..cff513e28a13 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -71,15 +71,8 @@ There are five components to pagemap:
    memory cgroup each page is charged to, indexed by PFN. Only available when
    CONFIG_MEMCG is set.
 
- * /proc/kpageidle.  This file implements a bitmap where each bit corresponds
-   to a page, indexed by PFN. When the bit is set, the corresponding page is
-   idle. A page is considered idle if it has not been accessed since it was
-   marked idle. To mark a page idle one should set the bit corresponding to the
-   page by writing to the file. A value written to the file is OR-ed with the
-   current bitmap value. Only user memory pages can be marked idle, for other
-   page types input is silently ignored. Writing to this file beyond max PFN
-   results in the ENXIO error. Only available when CONFIG_IDLE_PAGE_TRACKING is
-   set.
+ * /proc/kpageidle.  This file comprises API of the idle page tracking feature.
+   See Documentation/vm/idle_page_tracking.txt for more details.
 
 Short descriptions to the page flags:
 
diff --git a/mm/Kconfig b/mm/Kconfig
index a1de09926171..90fa89175102 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -666,4 +666,4 @@ config IDLE_PAGE_TRACKING
 	  be useful to tune memory cgroup limits and/or for job placement
 	  within a compute cluster.
 
-	  See Documentation/vm/pagemap.txt for more details.
+	  See Documentation/vm/idle_page_tracking.txt for more details.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andres Lagar-Cavilla <andreslc@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, "Greg Thelen" <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>,
	"David Rientjes" <rientjes@google.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	"Cyrill Gorcunov" <gorcunov@openvz.org>,
	Jonathan Corbet <corbet@lwn.net>, <linux-api@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <linux-mm@kvack.org>,
	<cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH -mm v9 0/8] idle memory tracking
Date: Sat, 25 Jul 2015 19:24:56 +0300	[thread overview]
Message-ID: <20150725162456.GM8100@esperanza> (raw)
In-Reply-To: <20150722162353.GM23374@esperanza>

On Wed, Jul 22, 2015 at 07:23:53PM +0300, Vladimir Davydov wrote:
> On Tue, Jul 21, 2015 at 04:34:02PM -0700, Andrew Morton wrote:
> > On Sun, 19 Jul 2015 15:31:09 +0300 Vladimir Davydov <vdavydov@parallels.com> wrote:

> > > Documentation/vm/pagemap.txt           |  22 ++-
> > 
> > I think we'll need quite a lot more than this to fully describe the
> > interface?
> 
> Agree, the documentation sucks :-( Will try to forge something more
> thorough.

The incremental patch is attached. Could you please merge it into
proc-add-kpageidle-file?
---
From: Vladimir Davydov <vdavydov@parallels.com>
Subject: [PATCH] Documentation: Add idle page tracking description

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>

diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 081c49777abb..6a5e2a102a45 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -14,6 +14,8 @@ hugetlbpage.txt
 	- a brief summary of hugetlbpage support in the Linux kernel.
 hwpoison.txt
 	- explains what hwpoison is
+idle_page_tracking.txt
+	- description of the idle page tracking feature.
 ksm.txt
 	- how to use the Kernel Samepage Merging feature.
 numa
diff --git a/Documentation/vm/idle_page_tracking.txt b/Documentation/vm/idle_page_tracking.txt
new file mode 100644
index 000000000000..d0f332d544c4
--- /dev/null
+++ b/Documentation/vm/idle_page_tracking.txt
@@ -0,0 +1,94 @@
+MOTIVATION
+
+The idle page tracking feature allows to track which memory pages are being
+accessed by a workload and which are idle. This information can be useful for
+estimating the workload's working set size, which, in turn, can be taken into
+account when configuring the workload parameters, setting memory cgroup limits,
+or deciding where to place the workload within a compute cluster.
+
+USER API
+
+If CONFIG_IDLE_PAGE_TRACKING was enabled on compile time, a new read-write file
+is present on the proc filesystem, /proc/kpageidle.
+
+The file implements a bitmap where each bit corresponds to a memory page. The
+bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
+mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
+set, the corresponding page is idle.
+
+A page is considered idle if it has not been accessed since it was marked idle
+(for more details on what "accessed" actually means see the IMPLEMENTATION
+DETAILS section). To mark a page idle one has to set the bit corresponding to
+the page by writing to the file. A value written to the file is OR-ed with the
+current bitmap value.
+
+Only accesses to user memory pages are tracked. These are pages mapped to a
+process address space, page cache and buffer pages, swap cache pages. For other
+page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored,
+and hence such pages are never reported idle.
+
+For huge pages the idle flag is set only on the head page, so one has to read
+/proc/kpageflags in order to correctly count idle huge pages.
+
+Reading from or writing to /proc/kpageidle will return -EINVAL if you are not
+starting the read/write on an 8-byte boundary, or if the size of the read/write
+is not a multiple of 8 bytes. Writing to this file beyond max PFN will return
+-ENXIO.
+
+That said, in order to estimate the amount of pages that are not used by a
+workload one should:
+
+ 1. Mark all the workload's pages as idle by setting corresponding bits in the
+    /proc/kpageidle bitmap. The pages can be found by reading /proc/pid/pagemap
+    if the workload is represented by a process, or by filtering out alien pages
+    using /proc/kpagecgroup in case the workload is placed in a memory cgroup.
+
+ 2. Wait until the workload accesses its working set.
+
+ 3. Read /proc/kpageidle and count the number of bits set. If one wants to
+    ignore certain types of pages, e.g. mlocked pages since they are not
+    reclaimable, he or she can filter them out using /proc/kpageflags.
+
+See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap,
+/proc/kpageflags, and /proc/kpagecgroup.
+
+IMPLEMENTATION DETAILS
+
+The kernel internally keeps track of accesses to user memory pages in order to
+reclaim unreferenced pages first on memory shortage conditions. A page is
+considered referenced if it has been recently accessed via a process address
+space, in which case one or more PTEs it is mapped to will have the Accessed bit
+set, or marked accessed explicitly by the kernel (see mark_page_accessed()). The
+latter happens when:
+
+ - a userspace process reads or writes a page using a system call (e.g. read(2)
+   or write(2))
+
+ - a page that is used for storing filesystem buffers is read or written,
+   because a process needs filesystem metadata stored in it (e.g. lists a
+   directory tree)
+
+ - a page is accessed by a device driver using get_user_pages()
+
+When a dirty page is written to swap or disk as a result of memory reclaim or
+exceeding the dirty memory limit, it is not marked referenced.
+
+The idle memory tracking feature adds a new page flag, the Idle flag. This flag
+is set manually, by writing to /proc/kpageidle (see the USER API section), and
+cleared automatically whenever a page is referenced as defined above.
+
+When a page is marked idle, the Accessed bit must be cleared in all PTEs it is
+mapped to, otherwise we will not be able to detect accesses to the page coming
+from a process address space. To avoid interference with the reclaimer, which,
+as noted above, uses the Accessed bit to promote actively referenced pages, one
+more page flag is introduced, the Young flag. When the PTE Accessed bit is
+cleared as a result of setting or updating a page's Idle flag, the Young flag
+is set on the page. The reclaimer treats the Young flag as an extra PTE
+Accessed bit and therefore will consider such a page as referenced.
+
+Since the idle memory tracking feature is based on the memory reclaimer logic,
+it only works with pages that are on an LRU list, other pages are silently
+ignored. That means it will ignore a user memory page if it is isolated, but
+since there are usually not many of them, it should not affect the overall
+result noticeably. In order not to stall scanning of /proc/kpageidle, locked
+pages may be skipped too.
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 538735465693..cff513e28a13 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -71,15 +71,8 @@ There are five components to pagemap:
    memory cgroup each page is charged to, indexed by PFN. Only available when
    CONFIG_MEMCG is set.
 
- * /proc/kpageidle.  This file implements a bitmap where each bit corresponds
-   to a page, indexed by PFN. When the bit is set, the corresponding page is
-   idle. A page is considered idle if it has not been accessed since it was
-   marked idle. To mark a page idle one should set the bit corresponding to the
-   page by writing to the file. A value written to the file is OR-ed with the
-   current bitmap value. Only user memory pages can be marked idle, for other
-   page types input is silently ignored. Writing to this file beyond max PFN
-   results in the ENXIO error. Only available when CONFIG_IDLE_PAGE_TRACKING is
-   set.
+ * /proc/kpageidle.  This file comprises API of the idle page tracking feature.
+   See Documentation/vm/idle_page_tracking.txt for more details.
 
 Short descriptions to the page flags:
 
diff --git a/mm/Kconfig b/mm/Kconfig
index a1de09926171..90fa89175102 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -666,4 +666,4 @@ config IDLE_PAGE_TRACKING
 	  be useful to tune memory cgroup limits and/or for job placement
 	  within a compute cluster.
 
-	  See Documentation/vm/pagemap.txt for more details.
+	  See Documentation/vm/idle_page_tracking.txt for more details.

  reply	other threads:[~2015-07-25 16:24 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-19 12:31 [PATCH -mm v9 0/8] idle memory tracking Vladimir Davydov
2015-07-19 12:31 ` Vladimir Davydov
2015-07-19 12:31 ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 1/8] memcg: add page_cgroup_ino helper Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-21 23:34     ` Andrew Morton
2015-07-21 23:34     ` Andrew Morton
2015-07-22  9:21     ` Vladimir Davydov
2015-07-22  9:21       ` Vladimir Davydov
2015-07-22  9:21       ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 2/8] hwpoison: use page_cgroup_ino for filtering by memcg Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-21 23:34     ` Andrew Morton
     [not found]     ` <20150721163412.1b44e77f5ac3b742734d1ce6-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22  9:45       ` Vladimir Davydov
2015-07-22  9:45         ` Vladimir Davydov
2015-07-22  9:45         ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 3/8] memcg: zap try_get_mem_cgroup_from_page Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 4/8] proc: add kpagecgroup file Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-21 23:34     ` Andrew Morton
     [not found]     ` <20150721163433.618855e1f61536a09dfac30b-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22 10:33       ` Vladimir Davydov
2015-07-22 10:33         ` Vladimir Davydov
2015-07-22 10:33         ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 5/8] mmu-notifier: add clear_young callback Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
2015-07-20 18:34   ` Andres Lagar-Cavilla
2015-07-21  8:51     ` Vladimir Davydov
2015-07-21  8:51       ` Vladimir Davydov
2015-07-22 16:33       ` Vladimir Davydov
2015-07-22 16:33         ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 6/8] proc: add kpageidle file Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-21 23:34     ` Andrew Morton
     [not found]     ` <20150721163452.c1e4075a2b193bcd325fad56-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22 15:20       ` Vladimir Davydov
2015-07-22 15:20         ` Vladimir Davydov
2015-07-22 15:20         ` Vladimir Davydov
     [not found]   ` <d7a78b72053cf529c0c9ff6cbc02ffbb3d58fe35.1437303956.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2015-07-24 14:08     ` Paul Gortmaker
2015-07-24 14:08       ` Paul Gortmaker
2015-07-24 14:08       ` Paul Gortmaker
     [not found]       ` <CAP=VYLqiNfQJ6oyQg2GszeHwdOmeY_uD3XPvw=++weJOKdx4_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-24 14:17         ` Vladimir Davydov
2015-07-24 14:17           ` Vladimir Davydov
2015-07-24 14:17           ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 7/8] proc: export idle flag via kpageflags Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
2015-07-21 23:35   ` Andrew Morton
2015-07-21 23:35     ` Andrew Morton
     [not found]     ` <20150721163500.528bd39bbbc71abc3c8d429b-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22 16:25       ` Vladimir Davydov
2015-07-22 16:25         ` Vladimir Davydov
2015-07-22 16:25         ` Vladimir Davydov
2015-07-22 19:44         ` Andrew Morton
2015-07-22 19:44           ` Andrew Morton
2015-07-22 19:44           ` Andrew Morton
2015-07-22 20:46           ` Andres Lagar-Cavilla
2015-07-23  7:57             ` Vladimir Davydov
2015-07-23  7:57               ` Vladimir Davydov
2015-07-23  7:57               ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 8/8] proc: add cond_resched to /proc/kpage* read/write loop Vladimir Davydov
2015-07-19 12:31   ` Vladimir Davydov
     [not found] ` <cover.1437303956.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2015-07-19 12:37   ` [PATCH -mm v9 0/8] idle memory tracking Vladimir Davydov
2015-07-19 12:37     ` Vladimir Davydov
2015-07-19 12:37     ` Vladimir Davydov
2015-07-21 21:39 ` Andres Lagar-Cavilla
2015-07-21 23:34 ` Andrew Morton
2015-07-21 23:34   ` Andrew Morton
2015-07-22 16:23   ` Vladimir Davydov
2015-07-22 16:23     ` Vladimir Davydov
2015-07-22 16:23     ` Vladimir Davydov
2015-07-25 16:24     ` Vladimir Davydov [this message]
2015-07-25 16:24       ` Vladimir Davydov
2015-07-25 16:24       ` Vladimir Davydov
2015-07-27 19:18   ` Kees Cook
2015-07-27 19:18     ` Kees Cook
2015-07-27 19:25     ` Andrew Morton
2015-07-27 19:25       ` Andrew Morton
2015-07-29 12:36 ` Michal Hocko
2015-07-29 12:36   ` Michal Hocko
     [not found]   ` <20150729123629.GI15801-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-29 13:59     ` Vladimir Davydov
2015-07-29 13:59       ` Vladimir Davydov
2015-07-29 13:59       ` Vladimir Davydov
2015-07-29 14:12       ` Michel Lespinasse
     [not found]         ` <CANN689HJX2ZL891uOd8TW9ct4PNH9d5odQZm86WMxkpkCWhA-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-29 14:13           ` Michel Lespinasse
2015-07-29 14:13             ` Michel Lespinasse
2015-07-29 14:13             ` Michel Lespinasse
2015-07-29 14:45           ` Vladimir Davydov
2015-07-29 14:45             ` Vladimir Davydov
2015-07-29 14:45             ` Vladimir Davydov
2015-07-29 15:08             ` Michel Lespinasse
     [not found]               ` <CANN689Euq3Y-CHQo8q88vzFAYZX4S6rK+rZRfbuSKfS74u=gcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-29 15:31                 ` Vladimir Davydov
2015-07-29 15:31                   ` Vladimir Davydov
2015-07-29 15:31                   ` Vladimir Davydov
2015-07-29 15:34                   ` Michel Lespinasse
2015-07-29 15:08             ` Michal Hocko
2015-07-29 15:08               ` Michal Hocko
     [not found]               ` <20150729150855.GM15801-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-29 15:36                 ` Vladimir Davydov
2015-07-29 15:36                   ` Vladimir Davydov
2015-07-29 15:36                   ` Vladimir Davydov
2015-07-29 15:58                   ` Michal Hocko
2015-07-29 15:58                     ` Michal Hocko
2015-07-29 14:26       ` Michal Hocko
2015-07-29 14:26         ` Michal Hocko
2015-07-29 15:28         ` Vladimir Davydov
2015-07-29 15:28           ` Vladimir Davydov
2015-07-29 15:47           ` Michal Hocko
2015-07-29 15:47             ` Michal Hocko
2015-07-29 15:47             ` Michal Hocko
     [not found]             ` <20150729154718.GN15801-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-29 16:29               ` Vladimir Davydov
2015-07-29 16:29                 ` Vladimir Davydov
2015-07-29 16:29                 ` Vladimir Davydov
2015-07-29 21:30                 ` Andrew Morton
2015-07-29 21:30                   ` Andrew Morton
2015-07-29 21:30                   ` Andrew Morton
2015-07-30  9:12                   ` Vladimir Davydov
2015-07-30  9:12                     ` Vladimir Davydov
2015-07-30 13:01                     ` Vladimir Davydov
2015-07-30 13:01                       ` Vladimir Davydov
2015-07-30 13:01                       ` Vladimir Davydov
2015-07-31  9:34                       ` Vladimir Davydov
2015-07-31  9:34                         ` Vladimir Davydov
2015-07-31  9:34                         ` Vladimir Davydov
2015-07-30  9:07                 ` Michal Hocko
2015-07-30  9:07                   ` Michal Hocko
2015-07-30  9:07                   ` Michal Hocko
     [not found]                   ` <20150730090708.GE9387-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-30  9:31                     ` Vladimir Davydov
2015-07-30  9:31                       ` Vladimir Davydov
2015-07-30  9:31                       ` Vladimir Davydov
2015-07-29 15:55           ` Andres Lagar-Cavilla
2015-07-29 15:55             ` Andres Lagar-Cavilla
     [not found]             ` <CAJu=L59RdowYjTyVM0Vhz79A4d=d8=ZmU7PB59CmEj5B0_c48Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-29 16:37               ` Vladimir Davydov
2015-07-29 16:37                 ` Vladimir Davydov
2015-07-29 16:37                 ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150725162456.GM8100@esperanza \
    --to=vdavydov-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=andreslc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=corbet-T1hC0tSOHrs@public.gmane.org \
    --cc=gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=raghavendra.kt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.