Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH 28/32] docs/vm: z3fold.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/z3fold.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/z3fold.txt b/Documentation/vm/z3fold.txt
index 38e4dac..224e3c6 100644
--- a/Documentation/vm/z3fold.txt
+++ b/Documentation/vm/z3fold.txt
@@ -1,5 +1,8 @@
+.. _z3fold:
+
+======
 z3fold
-------
+======
 
 z3fold is a special purpose allocator for storing compressed pages.
 It is designed to store up to three compressed pages per physical page.
@@ -7,6 +10,7 @@ It is a zbud derivative which allows for higher compression
 ratio keeping the simplicity and determinism of its predecessor.
 
 The main differences between z3fold and zbud are:
+
 * unlike zbud, z3fold allows for up to PAGE_SIZE allocations
 * z3fold can hold up to 3 compressed pages in its page
 * z3fold doesn't export any API itself and is thus intended to be used
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 30/32] docs/vm: zswap.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/zswap.txt | 71 +++++++++++++++++++++++++++-------------------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt
index 0b3a114..1444ecd 100644
--- a/Documentation/vm/zswap.txt
+++ b/Documentation/vm/zswap.txt
@@ -1,4 +1,11 @@
-Overview:
+.. _zswap:
+
+=====
+zswap
+=====
+
+Overview
+========
 
 Zswap is a lightweight compressed cache for swap pages. It takes pages that are
 in the process of being swapped out and attempts to compress them into a
@@ -7,32 +14,34 @@ for potentially reduced swap I/O.  This trade-off can also result in a
 significant performance improvement if reads from the compressed cache are
 faster than reads from a swap device.
 
-NOTE: Zswap is a new feature as of v3.11 and interacts heavily with memory
-reclaim.  This interaction has not been fully explored on the large set of
-potential configurations and workloads that exist.  For this reason, zswap
-is a work in progress and should be considered experimental.
+.. note::
+   Zswap is a new feature as of v3.11 and interacts heavily with memory
+   reclaim.  This interaction has not been fully explored on the large set of
+   potential configurations and workloads that exist.  For this reason, zswap
+   is a work in progress and should be considered experimental.
+
+   Some potential benefits:
 
-Some potential benefits:
 * Desktop/laptop users with limited RAM capacities can mitigate the
-    performance impact of swapping.
+  performance impact of swapping.
 * Overcommitted guests that share a common I/O resource can
-    dramatically reduce their swap I/O pressure, avoiding heavy handed I/O
-    throttling by the hypervisor. This allows more work to get done with less
-    impact to the guest workload and guests sharing the I/O subsystem
+  dramatically reduce their swap I/O pressure, avoiding heavy handed I/O
+  throttling by the hypervisor. This allows more work to get done with less
+  impact to the guest workload and guests sharing the I/O subsystem
 * Users with SSDs as swap devices can extend the life of the device by
-    drastically reducing life-shortening writes.
+  drastically reducing life-shortening writes.
 
 Zswap evicts pages from compressed cache on an LRU basis to the backing swap
 device when the compressed pool reaches its size limit.  This requirement had
 been identified in prior community discussions.
 
 Zswap is disabled by default but can be enabled at boot time by setting
-the "enabled" attribute to 1 at boot time. ie: zswap.enabled=1.  Zswap
+the ``enabled`` attribute to 1 at boot time. ie: ``zswap.enabled=1``.  Zswap
 can also be enabled and disabled at runtime using the sysfs interface.
 An example command to enable zswap at runtime, assuming sysfs is mounted
-at /sys, is:
+at ``/sys``, is::
 
-echo 1 > /sys/module/zswap/parameters/enabled
+	echo 1 > /sys/module/zswap/parameters/enabled
 
 When zswap is disabled at runtime it will stop storing pages that are
 being swapped out.  However, it will _not_ immediately write out or fault
@@ -43,7 +52,8 @@ pages out of the compressed pool, a swapoff on the swap device(s) will
 fault back into memory all swapped out pages, including those in the
 compressed pool.
 
-Design:
+Design
+======
 
 Zswap receives pages for compression through the Frontswap API and is able to
 evict pages from its own compressed pool on an LRU basis and write them back to
@@ -53,12 +63,12 @@ Zswap makes use of zpool for the managing the compressed memory pool.  Each
 allocation in zpool is not directly accessible by address.  Rather, a handle is
 returned by the allocation routine and that handle must be mapped before being
 accessed.  The compressed memory pool grows on demand and shrinks as compressed
-pages are freed.  The pool is not preallocated.  By default, a zpool of type
-zbud is created, but it can be selected at boot time by setting the "zpool"
-attribute, e.g. zswap.zpool=zbud.  It can also be changed at runtime using the
-sysfs "zpool" attribute, e.g.
+pages are freed.  The pool is not preallocated.  By default, a zpool
+of type zbud is created, but it can be selected at boot time by
+setting the ``zpool`` attribute, e.g. ``zswap.zpool=zbud``. It can
+also be changed at runtime using the sysfs ``zpool`` attribute, e.g.::
 
-echo zbud > /sys/module/zswap/parameters/zpool
+	echo zbud > /sys/module/zswap/parameters/zpool
 
 The zbud type zpool allocates exactly 1 page to store 2 compressed pages, which
 means the compression ratio will always be 2:1 or worse (because of half-full
@@ -83,14 +93,16 @@ via frontswap, to free the compressed entry.
 
 Zswap seeks to be simple in its policies.  Sysfs attributes allow for one user
 controlled policy:
+
 * max_pool_percent - The maximum percentage of memory that the compressed
-    pool can occupy.
+  pool can occupy.
 
-The default compressor is lzo, but it can be selected at boot time by setting
-the “compressor” attribute, e.g. zswap.compressor=lzo.  It can also be changed
-at runtime using the sysfs "compressor" attribute, e.g.
+The default compressor is lzo, but it can be selected at boot time by
+setting the ``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
+It can also be changed at runtime using the sysfs "compressor"
+attribute, e.g.::
 
-echo lzo > /sys/module/zswap/parameters/compressor
+	echo lzo > /sys/module/zswap/parameters/compressor
 
 When the zpool and/or compressor parameter is changed at runtime, any existing
 compressed pages are not modified; they are left in their own zpool.  When a
@@ -106,11 +118,12 @@ compressed length of the page is set to zero and the pattern or same-filled
 value is stored.
 
 Same-value filled pages identification feature is enabled by default and can be
-disabled at boot time by setting the "same_filled_pages_enabled" attribute to 0,
-e.g. zswap.same_filled_pages_enabled=0. It can also be enabled and disabled at
-runtime using the sysfs "same_filled_pages_enabled" attribute, e.g.
+disabled at boot time by setting the ``same_filled_pages_enabled`` attribute
+to 0, e.g. ``zswap.same_filled_pages_enabled=0``. It can also be enabled and
+disabled at runtime using the sysfs ``same_filled_pages_enabled``
+attribute, e.g.::
 
-echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
+	echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
 
 When zswap same-filled page identification is disabled at runtime, it will stop
 checking for the same-value filled pages during store operation. However, the
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 27/32] docs/vm: userfaultfd.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/userfaultfd.txt | 66 ++++++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 27 deletions(-)

diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt
index bb2f945..5048cf6 100644
--- a/Documentation/vm/userfaultfd.txt
+++ b/Documentation/vm/userfaultfd.txt
@@ -1,6 +1,11 @@
-= Userfaultfd =
+.. _userfaultfd:
 
-== Objective ==
+===========
+Userfaultfd
+===========
+
+Objective
+=========
 
 Userfaults allow the implementation of on-demand paging from userland
 and more generally they allow userland to take control of various
@@ -9,7 +14,8 @@ memory page faults, something otherwise only the kernel code could do.
 For example userfaults allows a proper and more optimal implementation
 of the PROT_NONE+SIGSEGV trick.
 
-== Design ==
+Design
+======
 
 Userfaults are delivered and resolved through the userfaultfd syscall.
 
@@ -41,7 +47,8 @@ different processes without them being aware about what is going on
 themselves on the same region the manager is already tracking, which
 is a corner case that would currently return -EBUSY).
 
-== API ==
+API
+===
 
 When first opened the userfaultfd must be enabled invoking the
 UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or
@@ -101,7 +108,8 @@ UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
 half copied page since it'll keep userfaulting until the copy has
 finished.
 
-== QEMU/KVM ==
+QEMU/KVM
+========
 
 QEMU/KVM is using the userfaultfd syscall to implement postcopy live
 migration. Postcopy live migration is one form of memory
@@ -163,7 +171,8 @@ sending the same page twice (in case the userfault is read by the
 postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration
 thread).
 
-== Non-cooperative userfaultfd ==
+Non-cooperative userfaultfd
+===========================
 
 When the userfaultfd is monitored by an external manager, the manager
 must be able to track changes in the process virtual memory
@@ -172,27 +181,30 @@ the same read(2) protocol as for the page fault notifications. The
 manager has to explicitly enable these events by setting appropriate
 bits in uffdio_api.features passed to UFFDIO_API ioctl:
 
-UFFD_FEATURE_EVENT_FORK - enable userfaultfd hooks for fork(). When
-this feature is enabled, the userfaultfd context of the parent process
-is duplicated into the newly created process. The manager receives
-UFFD_EVENT_FORK with file descriptor of the new userfaultfd context in
-the uffd_msg.fork.
-
-UFFD_FEATURE_EVENT_REMAP - enable notifications about mremap()
-calls. When the non-cooperative process moves a virtual memory area to
-a different location, the manager will receive UFFD_EVENT_REMAP. The
-uffd_msg.remap will contain the old and new addresses of the area and
-its original length.
-
-UFFD_FEATURE_EVENT_REMOVE - enable notifications about
-madvise(MADV_REMOVE) and madvise(MADV_DONTNEED) calls. The event
-UFFD_EVENT_REMOVE will be generated upon these calls to madvise. The
-uffd_msg.remove will contain start and end addresses of the removed
-area.
-
-UFFD_FEATURE_EVENT_UNMAP - enable notifications about memory
-unmapping. The manager will get UFFD_EVENT_UNMAP with uffd_msg.remove
-containing start and end addresses of the unmapped area.
+UFFD_FEATURE_EVENT_FORK
+	enable userfaultfd hooks for fork(). When this feature is
+	enabled, the userfaultfd context of the parent process is
+	duplicated into the newly created process. The manager
+	receives UFFD_EVENT_FORK with file descriptor of the new
+	userfaultfd context in the uffd_msg.fork.
+
+UFFD_FEATURE_EVENT_REMAP
+	enable notifications about mremap() calls. When the
+	non-cooperative process moves a virtual memory area to a
+	different location, the manager will receive
+	UFFD_EVENT_REMAP. The uffd_msg.remap will contain the old and
+	new addresses of the area and its original length.
+
+UFFD_FEATURE_EVENT_REMOVE
+	enable notifications about madvise(MADV_REMOVE) and
+	madvise(MADV_DONTNEED) calls. The event UFFD_EVENT_REMOVE will
+	be generated upon these calls to madvise. The uffd_msg.remove
+	will contain start and end addresses of the removed area.
+
+UFFD_FEATURE_EVENT_UNMAP
+	enable notifications about memory unmapping. The manager will
+	get UFFD_EVENT_UNMAP with uffd_msg.remove containing start and
+	end addresses of the unmapped area.
 
 Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP
 are pretty similar, they quite differ in the action expected from the
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 24/32] docs/vm: swap_numa.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/swap_numa.txt | 55 +++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/Documentation/vm/swap_numa.txt b/Documentation/vm/swap_numa.txt
index d5960c9..e0466f2 100644
--- a/Documentation/vm/swap_numa.txt
+++ b/Documentation/vm/swap_numa.txt
@@ -1,5 +1,8 @@
+.. _swap_numa:
+
+===========================================
 Automatically bind swap device to numa node
--------------------------------------------
+===========================================
 
 If the system has more than one swap device and swap device has the node
 information, we can make use of this information to decide which swap
@@ -7,15 +10,16 @@ device to use in get_swap_pages() to get better performance.
 
 
 How to use this feature
------------------------
+=======================
 
 Swap device has priority and that decides the order of it to be used. To make
 use of automatically binding, there is no need to manipulate priority settings
 for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
 swapB, with swapA attached to node 0 and swapB attached to node 1, are going
-to be swapped on. Simply swapping them on by doing:
-# swapon /dev/swapA
-# swapon /dev/swapB
+to be swapped on. Simply swapping them on by doing::
+
+	# swapon /dev/swapA
+	# swapon /dev/swapB
 
 Then node 0 will use the two swap devices in the order of swapA then swapB and
 node 1 will use the two swap devices in the order of swapB then swapA. Note
@@ -24,32 +28,39 @@ that the order of them being swapped on doesn't matter.
 A more complex example on a 4 node machine. Assume 6 swap devices are going to
 be swapped on: swapA and swapB are attached to node 0, swapC is attached to
 node 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
-The way to swap them on is the same as above:
-# swapon /dev/swapA
-# swapon /dev/swapB
-# swapon /dev/swapC
-# swapon /dev/swapD
-# swapon /dev/swapE
-# swapon /dev/swapF
-
-Then node 0 will use them in the order of:
-swapA/swapB -> swapC -> swapD -> swapE -> swapF
+The way to swap them on is the same as above::
+
+	# swapon /dev/swapA
+	# swapon /dev/swapB
+	# swapon /dev/swapC
+	# swapon /dev/swapD
+	# swapon /dev/swapE
+	# swapon /dev/swapF
+
+Then node 0 will use them in the order of::
+
+	swapA/swapB -> swapC -> swapD -> swapE -> swapF
+
 swapA and swapB will be used in a round robin mode before any other swap device.
 
-node 1 will use them in the order of:
-swapC -> swapA -> swapB -> swapD -> swapE -> swapF
+node 1 will use them in the order of::
+
+	swapC -> swapA -> swapB -> swapD -> swapE -> swapF
+
+node 2 will use them in the order of::
+
+	swapD/swapE -> swapA -> swapB -> swapC -> swapF
 
-node 2 will use them in the order of:
-swapD/swapE -> swapA -> swapB -> swapC -> swapF
 Similaly, swapD and swapE will be used in a round robin mode before any
 other swap devices.
 
-node 3 will use them in the order of:
-swapF -> swapA -> swapB -> swapC -> swapD -> swapE
+node 3 will use them in the order of::
+
+	swapF -> swapA -> swapB -> swapC -> swapD -> swapE
 
 
 Implementation details
-----------------------
+======================
 
 The current code uses a priority based list, swap_avail_list, to decide
 which swap device to use and if multiple swap devices share the same
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 25/32] docs/vm: transhuge.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/transhuge.txt | 286 ++++++++++++++++++++++++-----------------
 1 file changed, 166 insertions(+), 120 deletions(-)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 4dde03b..569d182 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -1,6 +1,11 @@
-= Transparent Hugepage Support =
+.. _transhuge:
 
-== Objective ==
+============================
+Transparent Hugepage Support
+============================
+
+Objective
+=========
 
 Performance critical computing applications dealing with large memory
 working sets are already running on top of libhugetlbfs and in turn
@@ -33,7 +38,8 @@ are using hugepages but a significant speedup already happens if only
 one of the two is using hugepages just because of the fact the TLB
 miss is going to run faster.
 
-== Design ==
+Design
+======
 
 - "graceful fallback": mm components which don't have transparent hugepage
   knowledge fall back to breaking huge pmd mapping into table of ptes and,
@@ -88,16 +94,17 @@ Applications that gets a lot of benefit from hugepages and that don't
 risk to lose memory by using hugepages, should use
 madvise(MADV_HUGEPAGE) on their critical mmapped regions.
 
-== sysfs ==
+sysfs
+=====
 
 Transparent Hugepage Support for anonymous memory can be entirely disabled
 (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
 regions (to avoid the risk of consuming more memory resources) or enabled
-system wide. This can be achieved with one of:
+system wide. This can be achieved with one of::
 
-echo always >/sys/kernel/mm/transparent_hugepage/enabled
-echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
-echo never >/sys/kernel/mm/transparent_hugepage/enabled
+	echo always >/sys/kernel/mm/transparent_hugepage/enabled
+	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
+	echo never >/sys/kernel/mm/transparent_hugepage/enabled
 
 It's also possible to limit defrag efforts in the VM to generate
 anonymous hugepages in case they're not immediately free to madvise
@@ -108,44 +115,53 @@ use hugepages later instead of regular pages. This isn't always
 guaranteed, but it may be more likely in case the allocation is for a
 MADV_HUGEPAGE region.
 
-echo always >/sys/kernel/mm/transparent_hugepage/defrag
-echo defer >/sys/kernel/mm/transparent_hugepage/defrag
-echo defer+madvise >/sys/kernel/mm/transparent_hugepage/defrag
-echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
-echo never >/sys/kernel/mm/transparent_hugepage/defrag
-
-"always" means that an application requesting THP will stall on allocation
-failure and directly reclaim pages and compact memory in an effort to
-allocate a THP immediately. This may be desirable for virtual machines
-that benefit heavily from THP use and are willing to delay the VM start
-to utilise them.
-
-"defer" means that an application will wake kswapd in the background
-to reclaim pages and wake kcompactd to compact memory so that THP is
-available in the near future. It's the responsibility of khugepaged
-to then install the THP pages later.
-
-"defer+madvise" will enter direct reclaim and compaction like "always", but
-only for regions that have used madvise(MADV_HUGEPAGE); all other regions
-will wake kswapd in the background to reclaim pages and wake kcompactd to
-compact memory so that THP is available in the near future.
-
-"madvise" will enter direct reclaim like "always" but only for regions
-that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.
-
-"never" should be self-explanatory.
+::
+
+	echo always >/sys/kernel/mm/transparent_hugepage/defrag
+	echo defer >/sys/kernel/mm/transparent_hugepage/defrag
+	echo defer+madvise >/sys/kernel/mm/transparent_hugepage/defrag
+	echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
+	echo never >/sys/kernel/mm/transparent_hugepage/defrag
+
+always
+	means that an application requesting THP will stall on
+	allocation failure and directly reclaim pages and compact
+	memory in an effort to allocate a THP immediately. This may be
+	desirable for virtual machines that benefit heavily from THP
+	use and are willing to delay the VM start to utilise them.
+
+defer
+	means that an application will wake kswapd in the background
+	to reclaim pages and wake kcompactd to compact memory so that
+	THP is available in the near future. It's the responsibility
+	of khugepaged to then install the THP pages later.
+
+defer+madvise
+	will enter direct reclaim and compaction like ``always``, but
+	only for regions that have used madvise(MADV_HUGEPAGE); all
+	other regions will wake kswapd in the background to reclaim
+	pages and wake kcompactd to compact memory so that THP is
+	available in the near future.
+
+madvise
+	will enter direct reclaim like ``always`` but only for regions
+	that are have used madvise(MADV_HUGEPAGE). This is the default
+	behaviour.
+
+never
+	should be self-explanatory.
 
 By default kernel tries to use huge zero page on read page fault to
 anonymous mapping. It's possible to disable huge zero page by writing 0
-or enable it back by writing 1:
+or enable it back by writing 1::
 
-echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
-echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
+	echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
+	echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
 
 Some userspace (such as a test program, or an optimized memory allocation
-library) may want to know the size (in bytes) of a transparent hugepage:
+library) may want to know the size (in bytes) of a transparent hugepage::
 
-cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
+	cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
 
 khugepaged will be automatically started when
 transparent_hugepage/enabled is set to "always" or "madvise, and it'll
@@ -155,84 +171,86 @@ khugepaged runs usually at low frequency so while one may not want to
 invoke defrag algorithms synchronously during the page faults, it
 should be worth invoking defrag at least in khugepaged. However it's
 also possible to disable defrag in khugepaged by writing 0 or enable
-defrag in khugepaged by writing 1:
+defrag in khugepaged by writing 1::
 
-echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
-echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
+	echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
+	echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
 
 You can also control how many pages khugepaged should scan at each
-pass:
+pass::
 
-/sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
+	/sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
 
 and how many milliseconds to wait in khugepaged between each pass (you
-can set this to 0 to run khugepaged at 100% utilization of one core):
+can set this to 0 to run khugepaged at 100% utilization of one core)::
 
-/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs
+	/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs
 
 and how many milliseconds to wait in khugepaged if there's an hugepage
-allocation failure to throttle the next allocation attempt.
+allocation failure to throttle the next allocation attempt::
 
-/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs
+	/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs
 
-The khugepaged progress can be seen in the number of pages collapsed:
+The khugepaged progress can be seen in the number of pages collapsed::
 
-/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
+	/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
 
-for each pass:
+for each pass::
 
-/sys/kernel/mm/transparent_hugepage/khugepaged/full_scans
+	/sys/kernel/mm/transparent_hugepage/khugepaged/full_scans
 
-max_ptes_none specifies how many extra small pages (that are
+``max_ptes_none`` specifies how many extra small pages (that are
 not already mapped) can be allocated when collapsing a group
-of small pages into one large page.
+of small pages into one large page::
 
-/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
+	/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
 
 A higher value leads to use additional memory for programs.
 A lower value leads to gain less thp performance. Value of
 max_ptes_none can waste cpu time very little, you can
 ignore it.
 
-max_ptes_swap specifies how many pages can be brought in from
-swap when collapsing a group of pages into a transparent huge page.
+``max_ptes_swap`` specifies how many pages can be brought in from
+swap when collapsing a group of pages into a transparent huge page::
 
-/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap
+	/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap
 
 A higher value can cause excessive swap IO and waste
 memory. A lower value can prevent THPs from being
 collapsed, resulting fewer pages being collapsed into
 THPs, and lower memory access performance.
 
-== Boot parameter ==
+Boot parameter
+==============
 
 You can change the sysfs boot time defaults of Transparent Hugepage
-Support by passing the parameter "transparent_hugepage=always" or
-"transparent_hugepage=madvise" or "transparent_hugepage=never"
-(without "") to the kernel command line.
+Support by passing the parameter ``transparent_hugepage=always`` or
+``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
+to the kernel command line.
 
-== Hugepages in tmpfs/shmem ==
+Hugepages in tmpfs/shmem
+========================
 
 You can control hugepage allocation policy in tmpfs with mount option
-"huge=". It can have following values:
+``huge=``. It can have following values:
 
-  - "always":
+always
     Attempt to allocate huge pages every time we need a new page;
 
-  - "never":
+never
     Do not allocate huge pages;
 
-  - "within_size":
+within_size
     Only allocate huge page if it will be fully within i_size.
     Also respect fadvise()/madvise() hints;
 
-  - "advise:
+advise
     Only allocate huge pages if requested with fadvise()/madvise();
 
-The default policy is "never".
+The default policy is ``never``.
 
-"mount -o remount,huge= /mountpoint" works fine after mount: remounting
-huge=never will not attempt to break up huge pages at all, just stop more
+``mount -o remount,huge= /mountpoint`` works fine after mount: remounting
+``huge=never`` will not attempt to break up huge pages at all, just stop more
 from being allocated.
 
 There's also sysfs knob to control hugepage allocation policy for internal
@@ -243,110 +261,130 @@ MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem.
 In addition to policies listed above, shmem_enabled allows two further
 values:
 
-  - "deny":
+deny
     For use in emergencies, to force the huge option off from
     all mounts;
-  - "force":
+force
     Force the huge option on for all - very useful for testing;
 
-== Need of application restart ==
+Need of application restart
+===========================
 
 The transparent_hugepage/enabled values and tmpfs mount option only affect
 future behavior. So to make them effective you need to restart any
 application that could have been using hugepages. This also applies to the
 regions registered in khugepaged.
 
-== Monitoring usage ==
+Monitoring usage
+================
 
 The number of anonymous transparent huge pages currently used by the
-system is available by reading the AnonHugePages field in /proc/meminfo.
+system is available by reading the AnonHugePages field in ``/proc/meminfo``.
 To identify what applications are using anonymous transparent huge pages,
-it is necessary to read /proc/PID/smaps and count the AnonHugePages fields
+it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages fields
 for each mapping.
 
 The number of file transparent huge pages mapped to userspace is available
-by reading ShmemPmdMapped and ShmemHugePages fields in /proc/meminfo.
+by reading ShmemPmdMapped and ShmemHugePages fields in ``/proc/meminfo``.
 To identify what applications are mapping file transparent huge pages, it
-is necessary to read /proc/PID/smaps and count the FileHugeMapped fields
+is necessary to read ``/proc/PID/smaps`` and count the FileHugeMapped fields
 for each mapping.
 
 Note that reading the smaps file is expensive and reading it
 frequently will incur overhead.
 
-There are a number of counters in /proc/vmstat that may be used to
+There are a number of counters in ``/proc/vmstat`` that may be used to
 monitor how successfully the system is providing huge pages for use.
 
-thp_fault_alloc is incremented every time a huge page is successfully
+thp_fault_alloc
+	is incremented every time a huge page is successfully
 	allocated to handle a page fault. This applies to both the
 	first time a page is faulted and for COW faults.
 
-thp_collapse_alloc is incremented by khugepaged when it has found
+thp_collapse_alloc
+	is incremented by khugepaged when it has found
 	a range of pages to collapse into one huge page and has
 	successfully allocated a new huge page to store the data.
 
-thp_fault_fallback is incremented if a page fault fails to allocate
+thp_fault_fallback
+	is incremented if a page fault fails to allocate
 	a huge page and instead falls back to using small pages.
 
-thp_collapse_alloc_failed is incremented if khugepaged found a range
+thp_collapse_alloc_failed
+	is incremented if khugepaged found a range
 	of pages that should be collapsed into one huge page but failed
 	the allocation.
 
-thp_file_alloc is incremented every time a file huge page is successfully
+thp_file_alloc
+	is incremented every time a file huge page is successfully
 	allocated.
 
-thp_file_mapped is incremented every time a file huge page is mapped into
+thp_file_mapped
+	is incremented every time a file huge page is mapped into
 	user address space.
 
-thp_split_page is incremented every time a huge page is split into base
+thp_split_page
+	is incremented every time a huge page is split into base
 	pages. This can happen for a variety of reasons but a common
 	reason is that a huge page is old and is being reclaimed.
 	This action implies splitting all PMD the page mapped with.
 
-thp_split_page_failed is incremented if kernel fails to split huge
+thp_split_page_failed
+	is incremented if kernel fails to split huge
 	page. This can happen if the page was pinned by somebody.
 
-thp_deferred_split_page is incremented when a huge page is put onto split
+thp_deferred_split_page
+	is incremented when a huge page is put onto split
 	queue. This happens when a huge page is partially unmapped and
 	splitting it would free up some memory. Pages on split queue are
 	going to be split under memory pressure.
 
-thp_split_pmd is incremented every time a PMD split into table of PTEs.
+thp_split_pmd
+	is incremented every time a PMD split into table of PTEs.
 	This can happen, for instance, when application calls mprotect() or
 	munmap() on part of huge page. It doesn't split huge page, only
 	page table entry.
 
-thp_zero_page_alloc is incremented every time a huge zero page is
+thp_zero_page_alloc
+	is incremented every time a huge zero page is
 	successfully allocated. It includes allocations which where
 	dropped due race with other allocation. Note, it doesn't count
 	every map of the huge zero page, only its allocation.
 
-thp_zero_page_alloc_failed is incremented if kernel fails to allocate
+thp_zero_page_alloc_failed
+	is incremented if kernel fails to allocate
 	huge zero page and falls back to using small pages.
 
 As the system ages, allocating huge pages may be expensive as the
 system uses memory compaction to copy data around memory to free a
-huge page for use. There are some counters in /proc/vmstat to help
+huge page for use. There are some counters in ``/proc/vmstat`` to help
 monitor this overhead.
 
-compact_stall is incremented every time a process stalls to run
+compact_stall
+	is incremented every time a process stalls to run
 	memory compaction so that a huge page is free for use.
 
-compact_success is incremented if the system compacted memory and
+compact_success
+	is incremented if the system compacted memory and
 	freed a huge page for use.
 
-compact_fail is incremented if the system tries to compact memory
+compact_fail
+	is incremented if the system tries to compact memory
 	but failed.
 
-compact_pages_moved is incremented each time a page is moved. If
+compact_pages_moved
+	is incremented each time a page is moved. If
 	this value is increasing rapidly, it implies that the system
 	is copying a lot of data to satisfy the huge page allocation.
 	It is possible that the cost of copying exceeds any savings
 	from reduced TLB misses.
 
-compact_pagemigrate_failed is incremented when the underlying mechanism
+compact_pagemigrate_failed
+	is incremented when the underlying mechanism
 	for moving a page failed.
 
-compact_blocks_moved is incremented each time memory compaction examines
+compact_blocks_moved
+	is incremented each time memory compaction examines
 	a huge page aligned range of pages.
 
 It is possible to establish how long the stalls were using the function
@@ -354,7 +392,8 @@ tracer to record how long was spent in __alloc_pages_nodemask and
 using the mm_page_alloc tracepoint to identify which allocations were
 for huge pages.
 
-== get_user_pages and follow_page ==
+get_user_pages and follow_page
+==============================
 
 get_user_pages and follow_page if run on a hugepage, will return the
 head or tail pages as usual (exactly as they would do on
@@ -367,10 +406,11 @@ for the head page and not the tail page), it should be updated to jump
 to check head page instead. Taking reference on any head/tail page would
 prevent page from being split by anyone.
 
-NOTE: these aren't new constraints to the GUP API, and they match the
-same constrains that applies to hugetlbfs too, so any driver capable
-of handling GUP on hugetlbfs will also work fine on transparent
-hugepage backed mappings.
+.. note::
+   these aren't new constraints to the GUP API, and they match the
+   same constrains that applies to hugetlbfs too, so any driver capable
+   of handling GUP on hugetlbfs will also work fine on transparent
+   hugepage backed mappings.
 
 In case you can't handle compound pages if they're returned by
 follow_page, the FOLL_SPLIT bit can be specified as parameter to
@@ -383,13 +423,15 @@ hugepages being returned (as it's not only checking the pfn of the
 page and pinning it during the copy but it pretends to migrate the
 memory in regular page sizes and with regular pte/pmd mappings).
 
-== Optimizing the applications ==
+Optimizing the applications
+===========================
 
 To be guaranteed that the kernel will map a 2M page immediately in any
 memory region, the mmap region has to be hugepage naturally
 aligned. posix_memalign() can provide that guarantee.
 
-== Hugetlbfs ==
+Hugetlbfs
+=========
 
 You can use hugetlbfs on a kernel that has transparent hugepage
 support enabled just fine as always. No difference can be noted in
@@ -397,7 +439,8 @@ hugetlbfs other than there will be less overall fragmentation. All
 usual features belonging to hugetlbfs are preserved and
 unaffected. libhugetlbfs will also work fine as usual.
 
-== Graceful fallback ==
+Graceful fallback
+=================
 
 Code walking pagetables but unaware about huge pmds can simply call
 split_huge_pmd(vma, pmd, addr) where the pmd is the one returned by
@@ -415,20 +458,21 @@ it tries to swapout the hugepage for example. split_huge_page() can fail
 if the page is pinned and you must handle this correctly.
 
 Example to make mremap.c transparent hugepage aware with a one liner
-change:
+change::
 
-diff --git a/mm/mremap.c b/mm/mremap.c
---- a/mm/mremap.c
-+++ b/mm/mremap.c
-@@ -41,6 +41,7 @@ static pmd_t *get_old_pmd(struct mm_stru
-		return NULL;
+	diff --git a/mm/mremap.c b/mm/mremap.c
+	--- a/mm/mremap.c
+	+++ b/mm/mremap.c
+	@@ -41,6 +41,7 @@ static pmd_t *get_old_pmd(struct mm_stru
+			return NULL;
 
-	pmd = pmd_offset(pud, addr);
-+	split_huge_pmd(vma, pmd, addr);
-	if (pmd_none_or_clear_bad(pmd))
-		return NULL;
+		pmd = pmd_offset(pud, addr);
+	+	split_huge_pmd(vma, pmd, addr);
+		if (pmd_none_or_clear_bad(pmd))
+			return NULL;
 
-== Locking in hugepage aware code ==
+Locking in hugepage aware code
+==============================
 
 We want as much code as possible hugepage aware, as calling
 split_huge_page() or split_huge_pmd() has a cost.
@@ -448,7 +492,8 @@ should just drop the page table lock and fallback to the old code as
 before. Otherwise you can proceed to process the huge pmd and the
 hugepage natively. Once finished you can drop the page table lock.
 
-== Refcounts and transparent huge pages ==
+Refcounts and transparent huge pages
+====================================
 
 Refcounting on THP is mostly consistent with refcounting on other compound
 pages:
@@ -510,7 +555,8 @@ clear where reference should go after split: it will stay on head page.
 Note that split_huge_pmd() doesn't have any limitation on refcounting:
 pmd can be split at any point and never fails.
 
-== Partial unmap and deferred_split_huge_page() ==
+Partial unmap and deferred_split_huge_page()
+============================================
 
 Unmapping part of THP (with munmap() or other way) is not going to free
 memory immediately. Instead, we detect that a subpage of THP is not in use
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 23/32] docs/vm: split_page_table_lock: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/split_page_table_lock | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
index 62842a8..889b00b 100644
--- a/Documentation/vm/split_page_table_lock
+++ b/Documentation/vm/split_page_table_lock
@@ -1,3 +1,6 @@
+.. _split_page_table_lock:
+
+=====================
 Split page table lock
 =====================
 
@@ -11,6 +14,7 @@ access to the table. At the moment we use split lock for PTE and PMD
 tables. Access to higher level tables protected by mm->page_table_lock.
 
 There are helpers to lock/unlock a table and other accessor functions:
+
  - pte_offset_map_lock()
 	maps pte and takes PTE table lock, returns pointer to the taken
 	lock;
@@ -34,12 +38,13 @@ Split page table lock for PMD tables is enabled, if it's enabled for PTE
 tables and the architecture supports it (see below).
 
 Hugetlb and split page table lock
----------------------------------
+=================================
 
 Hugetlb can support several page sizes. We use split lock only for PMD
 level, but not for PUD.
 
 Hugetlb-specific helpers:
+
  - huge_pte_lock()
 	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
 	otherwise;
@@ -47,7 +52,7 @@ Hugetlb-specific helpers:
 	returns pointer to table lock;
 
 Support of split page table lock by an architecture
----------------------------------------------------
+===================================================
 
 There's no need in special enabling of PTE split page table lock:
 everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
@@ -73,7 +78,7 @@ NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
 be handled properly.
 
 page->ptl
----------
+=========
 
 page->ptl is used to access split page table lock, where 'page' is struct
 page of page containing the table. It shares storage with page->private
@@ -81,6 +86,7 @@ page of page containing the table. It shares storage with page->private
 
 To avoid increasing size of struct page and have best performance, we use a
 trick:
+
  - if spinlock_t fits into long, we use page->ptr as spinlock, so we
    can avoid indirect access and save a cache line.
  - if size of spinlock_t is bigger then size of long, we use page->ptl as
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 22/32] docs/vm: soft-dirty.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/soft-dirty.txt | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/Documentation/vm/soft-dirty.txt b/Documentation/vm/soft-dirty.txt
index 55684d1..cb0cfd6 100644
--- a/Documentation/vm/soft-dirty.txt
+++ b/Documentation/vm/soft-dirty.txt
@@ -1,34 +1,38 @@
-                            SOFT-DIRTY PTEs
+.. _soft_dirty:
 
-  The soft-dirty is a bit on a PTE which helps to track which pages a task
+===============
+Soft-Dirty PTEs
+===============
+
+The soft-dirty is a bit on a PTE which helps to track which pages a task
 writes to. In order to do this tracking one should
 
   1. Clear soft-dirty bits from the task's PTEs.
 
-     This is done by writing "4" into the /proc/PID/clear_refs file of the
+     This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the
      task in question.
 
   2. Wait some time.
 
   3. Read soft-dirty bits from the PTEs.
 
-     This is done by reading from the /proc/PID/pagemap. The bit 55 of the
+     This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the
      64-bit qword is the soft-dirty one. If set, the respective PTE was
      written to since step 1.
 
 
-  Internally, to do this tracking, the writable bit is cleared from PTEs
+Internally, to do this tracking, the writable bit is cleared from PTEs
 when the soft-dirty bit is cleared. So, after this, when the task tries to
 modify a page at some virtual address the #PF occurs and the kernel sets
 the soft-dirty bit on the respective PTE.
 
-  Note, that although all the task's address space is marked as r/o after the
+Note, that although all the task's address space is marked as r/o after the
 soft-dirty bits clear, the #PF-s that occur after that are processed fast.
 This is so, since the pages are still mapped to physical memory, and thus all
 the kernel does is finds this fact out and puts both writable and soft-dirty
 bits on the PTE.
 
-  While in most cases tracking memory changes by #PF-s is more than enough
+While in most cases tracking memory changes by #PF-s is more than enough
 there is still a scenario when we can lose soft dirty bits -- a task
 unmaps a previously mapped memory region and then maps a new one at exactly
 the same place. When unmap is called, the kernel internally clears PTE values
@@ -36,7 +40,7 @@ including soft dirty bits. To notify user space application about such
 memory region renewal the kernel always marks new memory regions (and
 expanded regions) as soft dirty.
 
-  This feature is actively used by the checkpoint-restore project. You
+This feature is actively used by the checkpoint-restore project. You
 can find more details about it on http://criu.org
 
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 20/32] docs/vm: remap_file_pages.txt: conert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/remap_file_pages.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/vm/remap_file_pages.txt b/Documentation/vm/remap_file_pages.txt
index f609142..7bef671 100644
--- a/Documentation/vm/remap_file_pages.txt
+++ b/Documentation/vm/remap_file_pages.txt
@@ -1,3 +1,9 @@
+.. _remap_file_pages:
+
+==============================
+remap_file_pages() system call
+==============================
+
 The remap_file_pages() system call is used to create a nonlinear mapping,
 that is, a mapping in which the pages of the file are mapped into a
 nonsequential order in memory. The advantage of using remap_file_pages()
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 21/32] docs/vm: slub.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/slub.txt | 357 ++++++++++++++++++++++++----------------------
 1 file changed, 188 insertions(+), 169 deletions(-)

diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index 8465241..3a775fd 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -1,5 +1,8 @@
+.. _slub:
+
+==========================
 Short users guide for SLUB
---------------------------
+==========================
 
 The basic philosophy of SLUB is very different from SLAB. SLAB
 requires rebuilding the kernel to activate debug options for all
@@ -8,18 +11,19 @@ SLUB can enable debugging only for selected slabs in order to avoid
 an impact on overall system performance which may make a bug more
 difficult to find.
 
-In order to switch debugging on one can add an option "slub_debug"
+In order to switch debugging on one can add an option ``slub_debug``
 to the kernel command line. That will enable full debugging for
 all slabs.
 
-Typically one would then use the "slabinfo" command to get statistical
-data and perform operation on the slabs. By default slabinfo only lists
+Typically one would then use the ``slabinfo`` command to get statistical
+data and perform operation on the slabs. By default ``slabinfo`` only lists
 slabs that have data in them. See "slabinfo -h" for more options when
-running the command. slabinfo can be compiled with
+running the command. ``slabinfo`` can be compiled with
+::
 
-gcc -o slabinfo tools/vm/slabinfo.c
+	gcc -o slabinfo tools/vm/slabinfo.c
 
-Some of the modes of operation of slabinfo require that slub debugging
+Some of the modes of operation of ``slabinfo`` require that slub debugging
 be enabled on the command line. F.e. no tracking information will be
 available without debugging on and validation can only partially
 be performed if debugging was not switched on.
@@ -27,14 +31,17 @@ be performed if debugging was not switched on.
 Some more sophisticated uses of slub_debug:
 -------------------------------------------
 
-Parameters may be given to slub_debug. If none is specified then full
+Parameters may be given to ``slub_debug``. If none is specified then full
 debugging is enabled. Format:
 
-slub_debug=<Debug-Options>       Enable options for all slabs
+slub_debug=<Debug-Options>
+	Enable options for all slabs
 slub_debug=<Debug-Options>,<slab name>
-				Enable options only for select slabs
+	Enable options only for select slabs
+
+
+Possible debug options are::
 
-Possible debug options are
 	F		Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS
 			Sorry SLAB legacy issues)
 	Z		Red zoning
@@ -47,18 +54,18 @@ Possible debug options are
 	-		Switch all debugging off (useful if the kernel is
 			configured with CONFIG_SLUB_DEBUG_ON)
 
-F.e. in order to boot just with sanity checks and red zoning one would specify:
+F.e. in order to boot just with sanity checks and red zoning one would specify::
 
 	slub_debug=FZ
 
-Trying to find an issue in the dentry cache? Try
+Trying to find an issue in the dentry cache? Try::
 
 	slub_debug=,dentry
 
 to only enable debugging on the dentry cache.
 
 Red zoning and tracking may realign the slab.  We can just apply sanity checks
-to the dentry cache with
+to the dentry cache with::
 
 	slub_debug=F,dentry
 
@@ -66,15 +73,15 @@ Debugging options may require the minimum possible slab order to increase as
 a result of storing the metadata (for example, caches with PAGE_SIZE object
 sizes).  This has a higher liklihood of resulting in slab allocation errors
 in low memory situations or if there's high fragmentation of memory.  To
-switch off debugging for such caches by default, use
+switch off debugging for such caches by default, use::
 
 	slub_debug=O
 
 In case you forgot to enable debugging on the kernel command line: It is
 possible to enable debugging manually when the kernel is up. Look at the
-contents of:
+contents of::
 
-/sys/kernel/slab/<slab name>/
+	/sys/kernel/slab/<slab name>/
 
 Look at the writable files. Writing 1 to them will enable the
 corresponding debug option. All options can be set on a slab that does
@@ -86,98 +93,103 @@ Careful with tracing: It may spew out lots of information and never stop if
 used on the wrong slab.
 
 Slab merging
-------------
+============
 
 If no debug options are specified then SLUB may merge similar slabs together
 in order to reduce overhead and increase cache hotness of objects.
-slabinfo -a displays which slabs were merged together.
+``slabinfo -a`` displays which slabs were merged together.
 
 Slab validation
----------------
+===============
 
 SLUB can validate all object if the kernel was booted with slub_debug. In
-order to do so you must have the slabinfo tool. Then you can do
+order to do so you must have the ``slabinfo`` tool. Then you can do
+::
 
-slabinfo -v
+	slabinfo -v
 
 which will test all objects. Output will be generated to the syslog.
 
 This also works in a more limited way if boot was without slab debug.
-In that case slabinfo -v simply tests all reachable objects. Usually
+In that case ``slabinfo -v`` simply tests all reachable objects. Usually
 these are in the cpu slabs and the partial slabs. Full slabs are not
 tracked by SLUB in a non debug situation.
 
 Getting more performance
-------------------------
+========================
 
 To some degree SLUB's performance is limited by the need to take the
 list_lock once in a while to deal with partial slabs. That overhead is
 governed by the order of the allocation for each slab. The allocations
 can be influenced by kernel parameters:
 
-slub_min_objects=x		(default 4)
-slub_min_order=x		(default 0)
-slub_max_order=x		(default 3 (PAGE_ALLOC_COSTLY_ORDER))
-
-slub_min_objects allows to specify how many objects must at least fit
-into one slab in order for the allocation order to be acceptable.
-In general slub will be able to perform this number of allocations
-on a slab without consulting centralized resources (list_lock) where
-contention may occur.
-
-slub_min_order specifies a minim order of slabs. A similar effect like
-slub_min_objects.
-
-slub_max_order specified the order at which slub_min_objects should no
-longer be checked. This is useful to avoid SLUB trying to generate
-super large order pages to fit slub_min_objects of a slab cache with
-large object sizes into one high order page. Setting command line
-parameter debug_guardpage_minorder=N (N > 0), forces setting
-slub_max_order to 0, what cause minimum possible order of slabs
-allocation.
+.. slub_min_objects=x		(default 4)
+.. slub_min_order=x		(default 0)
+.. slub_max_order=x		(default 3 (PAGE_ALLOC_COSTLY_ORDER))
+
+``slub_min_objects``
+	allows to specify how many objects must at least fit into one
+	slab in order for the allocation order to be acceptable.  In
+	general slub will be able to perform this number of
+	allocations on a slab without consulting centralized resources
+	(list_lock) where contention may occur.
+
+``slub_min_order``
+	specifies a minim order of slabs. A similar effect like
+	``slub_min_objects``.
+
+``slub_max_order``
+	specified the order at which ``slub_min_objects`` should no
+	longer be checked. This is useful to avoid SLUB trying to
+	generate super large order pages to fit ``slub_min_objects``
+	of a slab cache with large object sizes into one high order
+	page. Setting command line parameter
+	``debug_guardpage_minorder=N`` (N > 0), forces setting
+	``slub_max_order`` to 0, what cause minimum possible order of
+	slabs allocation.
 
 SLUB Debug output
------------------
-
-Here is a sample of slub debug output:
-
-====================================================================
-BUG kmalloc-8: Redzone overwritten
---------------------------------------------------------------------
-
-INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc
-INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58
-INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58
-INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554
-
-Bytes b4 0xc90f6d10:  00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
-  Object 0xc90f6d20:  31 30 31 39 2e 30 30 35                         1019.005
- Redzone 0xc90f6d28:  00 cc cc cc                                     .
- Padding 0xc90f6d50:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
-
-  [<c010523d>] dump_trace+0x63/0x1eb
-  [<c01053df>] show_trace_log_lvl+0x1a/0x2f
-  [<c010601d>] show_trace+0x12/0x14
-  [<c0106035>] dump_stack+0x16/0x18
-  [<c017e0fa>] object_err+0x143/0x14b
-  [<c017e2cc>] check_object+0x66/0x234
-  [<c017eb43>] __slab_free+0x239/0x384
-  [<c017f446>] kfree+0xa6/0xc6
-  [<c02e2335>] get_modalias+0xb9/0xf5
-  [<c02e23b7>] dmi_dev_uevent+0x27/0x3c
-  [<c027866a>] dev_uevent+0x1ad/0x1da
-  [<c0205024>] kobject_uevent_env+0x20a/0x45b
-  [<c020527f>] kobject_uevent+0xa/0xf
-  [<c02779f1>] store_uevent+0x4f/0x58
-  [<c027758e>] dev_attr_store+0x29/0x2f
-  [<c01bec4f>] sysfs_write_file+0x16e/0x19c
-  [<c0183ba7>] vfs_write+0xd1/0x15a
-  [<c01841d7>] sys_write+0x3d/0x72
-  [<c0104112>] sysenter_past_esp+0x5f/0x99
-  [<b7f7b410>] 0xb7f7b410
-  =======================
-
-FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc
+=================
+
+Here is a sample of slub debug output::
+
+ ====================================================================
+ BUG kmalloc-8: Redzone overwritten
+ --------------------------------------------------------------------
+
+ INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc
+ INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58
+ INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58
+ INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554
+
+ Bytes b4 0xc90f6d10:  00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
+   Object 0xc90f6d20:  31 30 31 39 2e 30 30 35                         1019.005
+  Redzone 0xc90f6d28:  00 cc cc cc                                     .
+  Padding 0xc90f6d50:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
+
+   [<c010523d>] dump_trace+0x63/0x1eb
+   [<c01053df>] show_trace_log_lvl+0x1a/0x2f
+   [<c010601d>] show_trace+0x12/0x14
+   [<c0106035>] dump_stack+0x16/0x18
+   [<c017e0fa>] object_err+0x143/0x14b
+   [<c017e2cc>] check_object+0x66/0x234
+   [<c017eb43>] __slab_free+0x239/0x384
+   [<c017f446>] kfree+0xa6/0xc6
+   [<c02e2335>] get_modalias+0xb9/0xf5
+   [<c02e23b7>] dmi_dev_uevent+0x27/0x3c
+   [<c027866a>] dev_uevent+0x1ad/0x1da
+   [<c0205024>] kobject_uevent_env+0x20a/0x45b
+   [<c020527f>] kobject_uevent+0xa/0xf
+   [<c02779f1>] store_uevent+0x4f/0x58
+   [<c027758e>] dev_attr_store+0x29/0x2f
+   [<c01bec4f>] sysfs_write_file+0x16e/0x19c
+   [<c0183ba7>] vfs_write+0xd1/0x15a
+   [<c01841d7>] sys_write+0x3d/0x72
+   [<c0104112>] sysenter_past_esp+0x5f/0x99
+   [<b7f7b410>] 0xb7f7b410
+   =======================
+
+ FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc
 
 If SLUB encounters a corrupted object (full detection requires the kernel
 to be booted with slub_debug) then the following output will be dumped
@@ -185,38 +197,38 @@ into the syslog:
 
 1. Description of the problem encountered
 
-This will be a message in the system log starting with
+   This will be a message in the system log starting with::
 
-===============================================
-BUG <slab cache affected>: <What went wrong>
------------------------------------------------
+     ===============================================
+     BUG <slab cache affected>: <What went wrong>
+     -----------------------------------------------
 
-INFO: <corruption start>-<corruption_end> <more info>
-INFO: Slab <address> <slab information>
-INFO: Object <address> <object information>
-INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by
+     INFO: <corruption start>-<corruption_end> <more info>
+     INFO: Slab <address> <slab information>
+     INFO: Object <address> <object information>
+     INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by
 	cpu> pid=<pid of the process>
-INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu>
-	 pid=<pid of the process>
+     INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu>
+	pid=<pid of the process>
 
-(Object allocation / free information is only available if SLAB_STORE_USER is
-set for the slab. slub_debug sets that option)
+   (Object allocation / free information is only available if SLAB_STORE_USER is
+   set for the slab. slub_debug sets that option)
 
 2. The object contents if an object was involved.
 
-Various types of lines can follow the BUG SLUB line:
+   Various types of lines can follow the BUG SLUB line:
 
-Bytes b4 <address> : <bytes>
+   Bytes b4 <address> : <bytes>
 	Shows a few bytes before the object where the problem was detected.
 	Can be useful if the corruption does not stop with the start of the
 	object.
 
-Object <address> : <bytes>
+   Object <address> : <bytes>
 	The bytes of the object. If the object is inactive then the bytes
 	typically contain poison values. Any non-poison value shows a
 	corruption by a write after free.
 
-Redzone <address> : <bytes>
+   Redzone <address> : <bytes>
 	The Redzone following the object. The Redzone is used to detect
 	writes after the object. All bytes should always have the same
 	value. If there is any deviation then it is due to a write after
@@ -225,7 +237,7 @@ Redzone <address> : <bytes>
 	(Redzone information is only available if SLAB_RED_ZONE is set.
 	slub_debug sets that option)
 
-Padding <address> : <bytes>
+   Padding <address> : <bytes>
 	Unused data to fill up the space in order to get the next object
 	properly aligned. In the debug case we make sure that there are
 	at least 4 bytes of padding. This allows the detection of writes
@@ -233,29 +245,29 @@ Padding <address> : <bytes>
 
 3. A stackdump
 
-The stackdump describes the location where the error was detected. The cause
-of the corruption is may be more likely found by looking at the function that
-allocated or freed the object.
+   The stackdump describes the location where the error was detected. The cause
+   of the corruption is may be more likely found by looking at the function that
+   allocated or freed the object.
 
 4. Report on how the problem was dealt with in order to ensure the continued
-operation of the system.
+   operation of the system.
 
-These are messages in the system log beginning with
+   These are messages in the system log beginning with::
 
-FIX <slab cache affected>: <corrective action taken>
+	FIX <slab cache affected>: <corrective action taken>
 
-In the above sample SLUB found that the Redzone of an active object has
-been overwritten. Here a string of 8 characters was written into a slab that
-has the length of 8 characters. However, a 8 character string needs a
-terminating 0. That zero has overwritten the first byte of the Redzone field.
-After reporting the details of the issue encountered the FIX SLUB message
-tells us that SLUB has restored the Redzone to its proper value and then
-system operations continue.
+   In the above sample SLUB found that the Redzone of an active object has
+   been overwritten. Here a string of 8 characters was written into a slab that
+   has the length of 8 characters. However, a 8 character string needs a
+   terminating 0. That zero has overwritten the first byte of the Redzone field.
+   After reporting the details of the issue encountered the FIX SLUB message
+   tells us that SLUB has restored the Redzone to its proper value and then
+   system operations continue.
 
-Emergency operations:
----------------------
+Emergency operations
+====================
 
-Minimal debugging (sanity checks alone) can be enabled by booting with
+Minimal debugging (sanity checks alone) can be enabled by booting with::
 
 	slub_debug=F
 
@@ -270,73 +282,80 @@ No guarantees. The kernel component still needs to be fixed. Performance
 may be optimized further by locating the slab that experiences corruption
 and enabling debugging only for that cache
 
-I.e.
+I.e.::
 
 	slub_debug=F,dentry
 
 If the corruption occurs by writing after the end of the object then it
 may be advisable to enable a Redzone to avoid corrupting the beginning
-of other objects.
+of other objects::
 
 	slub_debug=FZ,dentry
 
 Extended slabinfo mode and plotting
------------------------------------
+===================================
 
-The slabinfo tool has a special 'extended' ('-X') mode that includes:
+The ``slabinfo`` tool has a special 'extended' ('-X') mode that includes:
  - Slabcache Totals
  - Slabs sorted by size (up to -N <num> slabs, default 1)
  - Slabs sorted by loss (up to -N <num> slabs, default 1)
 
-Additionally, in this mode slabinfo does not dynamically scale sizes (G/M/K)
-and reports everything in bytes (this functionality is also available to
-other slabinfo modes via '-B' option) which makes reporting more precise and
-accurate. Moreover, in some sense the `-X' mode also simplifies the analysis
-of slabs' behaviour, because its output can be plotted using the
-slabinfo-gnuplot.sh script. So it pushes the analysis from looking through
-the numbers (tons of numbers) to something easier -- visual analysis.
+Additionally, in this mode ``slabinfo`` does not dynamically scale
+sizes (G/M/K) and reports everything in bytes (this functionality is
+also available to other slabinfo modes via '-B' option) which makes
+reporting more precise and accurate. Moreover, in some sense the `-X'
+mode also simplifies the analysis of slabs' behaviour, because its
+output can be plotted using the ``slabinfo-gnuplot.sh`` script. So it
+pushes the analysis from looking through the numbers (tons of numbers)
+to something easier -- visual analysis.
 
 To generate plots:
-a) collect slabinfo extended records, for example:
-
-  while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done
-
-b) pass stats file(-s) to slabinfo-gnuplot.sh script:
-  slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN]
-
-The slabinfo-gnuplot.sh script will pre-processes the collected records
-and generates 3 png files (and 3 pre-processing cache files) per STATS
-file:
- - Slabcache Totals: FOO_STATS-totals.png
- - Slabs sorted by size: FOO_STATS-slabs-by-size.png
- - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png
-
-Another use case, when slabinfo-gnuplot can be useful, is when you need
-to compare slabs' behaviour "prior to" and "after" some code modification.
-To help you out there, slabinfo-gnuplot.sh script can 'merge' the
-`Slabcache Totals` sections from different measurements. To visually
-compare N plots:
-
-a) Collect as many STATS1, STATS2, .. STATSN files as you need
-  while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done
-
-b) Pre-process those STATS files
-  slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN
-
-c) Execute slabinfo-gnuplot.sh in '-t' mode, passing all of the
-generated pre-processed *-totals
-  slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals
-
-This will produce a single plot (png file).
-
-Plots, expectedly, can be large so some fluctuations or small spikes
-can go unnoticed. To deal with that, `slabinfo-gnuplot.sh' has two
-options to 'zoom-in'/'zoom-out':
- a) -s %d,%d  overwrites the default image width and heigh
- b) -r %d,%d  specifies a range of samples to use (for example,
-              in `slabinfo -X >> FOO_STATS; sleep 1;' case, using
-              a "-r 40,60" range will plot only samples collected
-              between 40th and 60th seconds).
+
+a) collect slabinfo extended records, for example::
+
+	while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done
+
+b) pass stats file(-s) to ``slabinfo-gnuplot.sh`` script::
+
+	slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN]
+
+   The ``slabinfo-gnuplot.sh`` script will pre-processes the collected records
+   and generates 3 png files (and 3 pre-processing cache files) per STATS
+   file:
+   - Slabcache Totals: FOO_STATS-totals.png
+   - Slabs sorted by size: FOO_STATS-slabs-by-size.png
+   - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png
+
+Another use case, when ``slabinfo-gnuplot.sh`` can be useful, is when you
+need to compare slabs' behaviour "prior to" and "after" some code
+modification.  To help you out there, ``slabinfo-gnuplot.sh`` script
+can 'merge' the `Slabcache Totals` sections from different
+measurements. To visually compare N plots:
+
+a) Collect as many STATS1, STATS2, .. STATSN files as you need::
+
+	while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done
+
+b) Pre-process those STATS files::
+
+	slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN
+
+c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
+   generated pre-processed \*-totals::
+
+	slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals
+
+   This will produce a single plot (png file).
+
+   Plots, expectedly, can be large so some fluctuations or small spikes
+   can go unnoticed. To deal with that, ``slabinfo-gnuplot.sh`` has two
+   options to 'zoom-in'/'zoom-out':
+
+   a) ``-s %d,%d`` -- overwrites the default image width and heigh
+   b) ``-r %d,%d`` -- specifies a range of samples to use (for example,
+      in ``slabinfo -X >> FOO_STATS; sleep 1;`` case, using a ``-r
+      40,60`` range will plot only samples collected between 40th and
+      60th seconds).
 
 Christoph Lameter, May 30, 2007
 Sergey Senozhatsky, October 23, 2015
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 18/32] docs/vm: page_migration: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/page_migration | 149 +++++++++++++++++++++-------------------
 1 file changed, 77 insertions(+), 72 deletions(-)

diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index 0478ae2..07b67a8 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -1,5 +1,8 @@
+.. _page_migration:
+
+==============
 Page migration
---------------
+==============
 
 Page migration allows the moving of the physical location of pages between
 nodes in a numa system while the process is running. This means that the
@@ -20,7 +23,7 @@ Page migration functions are provided by the numactl package by Andi Kleen
 (a version later than 0.9.3 is required. Get it from
 ftp://oss.sgi.com/www/projects/libnuma/download/). numactl provides libnuma
 which provides an interface similar to other numa functionality for page
-migration.  cat /proc/<pid>/numa_maps allows an easy review of where the
+migration.  cat ``/proc/<pid>/numa_maps`` allows an easy review of where the
 pages of a process are located. See also the numa_maps documentation in the
 proc(5) man page.
 
@@ -56,8 +59,8 @@ description for those trying to use migrate_pages() from the kernel
 (for userspace usage see the Andi Kleen's numactl package mentioned above)
 and then a low level description of how the low level details work.
 
-A. In kernel use of migrate_pages()
------------------------------------
+In kernel use of migrate_pages()
+================================
 
 1. Remove pages from the LRU.
 
@@ -78,8 +81,8 @@ A. In kernel use of migrate_pages()
    the new page for each page that is considered for
    moving.
 
-B. How migrate_pages() works
-----------------------------
+How migrate_pages() works
+=========================
 
 migrate_pages() does several passes over its list of pages. A page is moved
 if all references to a page are removable at the time. The page has
@@ -142,8 +145,8 @@ Steps:
 20. The new page is moved to the LRU and can be scanned by the swapper
     etc again.
 
-C. Non-LRU page migration
--------------------------
+Non-LRU page migration
+======================
 
 Although original migration aimed for reducing the latency of memory access
 for NUMA, compaction who want to create high-order page is also main customer.
@@ -164,89 +167,91 @@ migration path.
 If a driver want to make own pages movable, it should define three functions
 which are function pointers of struct address_space_operations.
 
-1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
+1. ``bool (*isolate_page) (struct page *page, isolate_mode_t mode);``
 
-What VM expects on isolate_page function of driver is to return *true*
-if driver isolates page successfully. On returing true, VM marks the page
-as PG_isolated so concurrent isolation in several CPUs skip the page
-for isolation. If a driver cannot isolate the page, it should return *false*.
+   What VM expects on isolate_page function of driver is to return *true*
+   if driver isolates page successfully. On returing true, VM marks the page
+   as PG_isolated so concurrent isolation in several CPUs skip the page
+   for isolation. If a driver cannot isolate the page, it should return *false*.
 
-Once page is successfully isolated, VM uses page.lru fields so driver
-shouldn't expect to preserve values in that fields.
+   Once page is successfully isolated, VM uses page.lru fields so driver
+   shouldn't expect to preserve values in that fields.
 
-2. int (*migratepage) (struct address_space *mapping,
-		struct page *newpage, struct page *oldpage, enum migrate_mode);
+2. ``int (*migratepage) (struct address_space *mapping,``
+|	``struct page *newpage, struct page *oldpage, enum migrate_mode);``
 
-After isolation, VM calls migratepage of driver with isolated page.
-The function of migratepage is to move content of the old page to new page
-and set up fields of struct page newpage. Keep in mind that you should
-indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
-under page_lock if you migrated the oldpage successfully and returns
-MIGRATEPAGE_SUCCESS. If driver cannot migrate the page at the moment, driver
-can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time
-because VM interprets -EAGAIN as "temporal migration failure". On returning
-any error except -EAGAIN, VM will give up the page migration without retrying
-in this time.
+   After isolation, VM calls migratepage of driver with isolated page.
+   The function of migratepage is to move content of the old page to new page
+   and set up fields of struct page newpage. Keep in mind that you should
+   indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
+   under page_lock if you migrated the oldpage successfully and returns
+   MIGRATEPAGE_SUCCESS. If driver cannot migrate the page at the moment, driver
+   can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time
+   because VM interprets -EAGAIN as "temporal migration failure". On returning
+   any error except -EAGAIN, VM will give up the page migration without retrying
+   in this time.
 
-Driver shouldn't touch page.lru field VM using in the functions.
+   Driver shouldn't touch page.lru field VM using in the functions.
 
-3. void (*putback_page)(struct page *);
+3. ``void (*putback_page)(struct page *);``
 
-If migration fails on isolated page, VM should return the isolated page
-to the driver so VM calls driver's putback_page with migration failed page.
-In this function, driver should put the isolated page back to the own data
-structure.
+   If migration fails on isolated page, VM should return the isolated page
+   to the driver so VM calls driver's putback_page with migration failed page.
+   In this function, driver should put the isolated page back to the own data
+   structure.
 
 4. non-lru movable page flags
 
-There are two page flags for supporting non-lru movable page.
+   There are two page flags for supporting non-lru movable page.
 
-* PG_movable
+   * PG_movable
 
-Driver should use the below function to make page movable under page_lock.
+     Driver should use the below function to make page movable under page_lock::
 
 	void __SetPageMovable(struct page *page, struct address_space *mapping)
 
-It needs argument of address_space for registering migration family functions
-which will be called by VM. Exactly speaking, PG_movable is not a real flag of
-struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
+     It needs argument of address_space for registering migration
+     family functions which will be called by VM. Exactly speaking,
+     PG_movable is not a real flag of struct page. Rather than, VM
+     reuses page->mapping's lower bits to represent it.
 
+::
 	#define PAGE_MAPPING_MOVABLE 0x2
 	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
 
-so driver shouldn't access page->mapping directly. Instead, driver should
-use page_mapping which mask off the low two bits of page->mapping under
-page lock so it can get right struct address_space.
-
-For testing of non-lru movable page, VM supports __PageMovable function.
-However, it doesn't guarantee to identify non-lru movable page because
-page->mapping field is unified with other variables in struct page.
-As well, if driver releases the page after isolation by VM, page->mapping
-doesn't have stable value although it has PAGE_MAPPING_MOVABLE
-(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
-page is LRU or non-lru movable once the page has been isolated. Because
-LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
-good for just peeking to test non-lru movable pages before more expensive
-checking with lock_page in pfn scanning to select victim.
-
-For guaranteeing non-lru movable page, VM provides PageMovable function.
-Unlike __PageMovable, PageMovable functions validates page->mapping and
-mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
-destroying of page->mapping.
-
-Driver using __SetPageMovable should clear the flag via __ClearMovablePage
-under page_lock before the releasing the page.
-
-* PG_isolated
-
-To prevent concurrent isolation among several CPUs, VM marks isolated page
-as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
-movable page, it can skip it. Driver doesn't need to manipulate the flag
-because VM will set/clear it automatically. Keep in mind that if driver
-sees PG_isolated page, it means the page have been isolated by VM so it
-shouldn't touch page.lru field.
-PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
-for own purpose.
+     so driver shouldn't access page->mapping directly. Instead, driver should
+     use page_mapping which mask off the low two bits of page->mapping under
+     page lock so it can get right struct address_space.
+
+     For testing of non-lru movable page, VM supports __PageMovable function.
+     However, it doesn't guarantee to identify non-lru movable page because
+     page->mapping field is unified with other variables in struct page.
+     As well, if driver releases the page after isolation by VM, page->mapping
+     doesn't have stable value although it has PAGE_MAPPING_MOVABLE
+     (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
+     page is LRU or non-lru movable once the page has been isolated. Because
+     LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
+     good for just peeking to test non-lru movable pages before more expensive
+     checking with lock_page in pfn scanning to select victim.
+
+     For guaranteeing non-lru movable page, VM provides PageMovable function.
+     Unlike __PageMovable, PageMovable functions validates page->mapping and
+     mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
+     destroying of page->mapping.
+
+     Driver using __SetPageMovable should clear the flag via __ClearMovablePage
+     under page_lock before the releasing the page.
+
+   * PG_isolated
+
+     To prevent concurrent isolation among several CPUs, VM marks isolated page
+     as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
+     movable page, it can skip it. Driver doesn't need to manipulate the flag
+     because VM will set/clear it automatically. Keep in mind that if driver
+     sees PG_isolated page, it means the page have been isolated by VM so it
+     shouldn't touch page.lru field.
+     PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
+     for own purpose.
 
 Christoph Lameter, May 8, 2006.
 Minchan Kim, Mar 28, 2016.
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 15/32] docs/vm: page_frags convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/page_frags | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/page_frags b/Documentation/vm/page_frags
index a671456..637cc49 100644
--- a/Documentation/vm/page_frags
+++ b/Documentation/vm/page_frags
@@ -1,5 +1,8 @@
+.. _page_frags:
+
+==============
 Page fragments
---------------
+==============
 
 A page fragment is an arbitrary-length arbitrary-offset area of memory
 which resides within a 0 or higher order compound page.  Multiple
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 17/32] docs/vm: pagemap.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/pagemap.txt | 164 +++++++++++++++++++++++--------------------
 1 file changed, 89 insertions(+), 75 deletions(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index eafcefa..bd6d717 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -1,13 +1,16 @@
-pagemap, from the userspace perspective
----------------------------------------
+.. _pagemap:
+
+======================================
+pagemap from the Userspace Perspective
+======================================
 
 pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
 userspace programs to examine the page tables and related information by
-reading files in /proc.
+reading files in ``/proc``.
 
 There are four components to pagemap:
 
- * /proc/pid/pagemap.  This file lets a userspace process find out which
+ * ``/proc/pid/pagemap``.  This file lets a userspace process find out which
    physical frame each virtual page is mapped to.  It contains one 64-bit
    value for each virtual page, containing the following data (from
    fs/proc/task_mmu.c, above pagemap_read):
@@ -37,24 +40,24 @@ There are four components to pagemap:
    determine which areas of memory are actually mapped and llseek to
    skip over unmapped regions.
 
- * /proc/kpagecount.  This file contains a 64-bit count of the number of
+ * ``/proc/kpagecount``.  This file contains a 64-bit count of the number of
    times each page is mapped, indexed by PFN.
 
- * /proc/kpageflags.  This file contains a 64-bit set of flags for each
+ * ``/proc/kpageflags``.  This file contains a 64-bit set of flags for each
    page, indexed by PFN.
 
-   The flags are (from fs/proc/page.c, above kpageflags_read):
-
-     0. LOCKED
-     1. ERROR
-     2. REFERENCED
-     3. UPTODATE
-     4. DIRTY
-     5. LRU
-     6. ACTIVE
-     7. SLAB
-     8. WRITEBACK
-     9. RECLAIM
+   The flags are (from ``fs/proc/page.c``, above kpageflags_read):
+
+    0. LOCKED
+    1. ERROR
+    2. REFERENCED
+    3. UPTODATE
+    4. DIRTY
+    5. LRU
+    6. ACTIVE
+    7. SLAB
+    8. WRITEBACK
+    9. RECLAIM
     10. BUDDY
     11. MMAP
     12. ANON
@@ -72,98 +75,108 @@ There are four components to pagemap:
     24. ZERO_PAGE
     25. IDLE
 
- * /proc/kpagecgroup.  This file contains a 64-bit inode number of the
+ * ``/proc/kpagecgroup``.  This file contains a 64-bit inode number of the
    memory cgroup each page is charged to, indexed by PFN. Only available when
    CONFIG_MEMCG is set.
 
 Short descriptions to the page flags:
-
- 0. LOCKED
-    page is being locked for exclusive access, eg. by undergoing read/write IO
-
- 7. SLAB
-    page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
-    When compound page is used, SLUB/SLQB will only set this flag on the head
-    page; SLOB will not flag it at all.
-
-10. BUDDY
+=====================================
+
+0 - LOCKED
+   page is being locked for exclusive access, eg. by undergoing read/write IO
+7 - SLAB
+   page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
+   When compound page is used, SLUB/SLQB will only set this flag on the head
+   page; SLOB will not flag it at all.
+10 - BUDDY
     a free memory block managed by the buddy system allocator
     The buddy system organizes free memory in blocks of various orders.
     An order N block has 2^N physically contiguous pages, with the BUDDY flag
     set for and _only_ for the first page.
-
-15. COMPOUND_HEAD
-16. COMPOUND_TAIL
+15 - COMPOUND_HEAD
     A compound page with order N consists of 2^N physically contiguous pages.
     A compound page with order 2 takes the form of "HTTT", where H donates its
     head page and T donates its tail page(s).  The major consumers of compound
     pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc.
     memory allocators and various device drivers. However in this interface,
     only huge/giga pages are made visible to end users.
-17. HUGE
+16 - COMPOUND_TAIL
+    A compound page tail (see description above).
+17 - HUGE
     this is an integral part of a HugeTLB page
-
-19. HWPOISON
+19 - HWPOISON
     hardware detected memory corruption on this page: don't touch the data!
-
-20. NOPAGE
+20 - NOPAGE
     no page frame exists at the requested address
-
-21. KSM
+21 - KSM
     identical memory pages dynamically shared between one or more processes
-
-22. THP
+22 - THP
     contiguous pages which construct transparent hugepages
-
-23. BALLOON
+23 - BALLOON
     balloon compaction page
-
-24. ZERO_PAGE
+24 - ZERO_PAGE
     zero page for pfn_zero or huge_zero page
-
-25. IDLE
+25 - IDLE
     page has not been accessed since it was marked idle (see
     Documentation/vm/idle_page_tracking.txt). Note that this flag may be
     stale in case the page was accessed via a PTE. To make sure the flag
-    is up-to-date one has to read /sys/kernel/mm/page_idle/bitmap first.
-
-    [IO related page flags]
- 1. ERROR     IO error occurred
- 3. UPTODATE  page has up-to-date data
-              ie. for file backed page: (in-memory data revision >= on-disk one)
- 4. DIRTY     page has been written to, hence contains new data
-              ie. for file backed page: (in-memory data revision >  on-disk one)
- 8. WRITEBACK page is being synced to disk
-
-    [LRU related page flags]
- 5. LRU         page is in one of the LRU lists
- 6. ACTIVE      page is in the active LRU list
-18. UNEVICTABLE page is in the unevictable (non-)LRU list
-                It is somehow pinned and not a candidate for LRU page reclaims,
-		eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments
- 2. REFERENCED  page has been referenced since last LRU list enqueue/requeue
- 9. RECLAIM     page will be reclaimed soon after its pageout IO completed
-11. MMAP        a memory mapped page
-12. ANON        a memory mapped page that is not part of a file
-13. SWAPCACHE   page is mapped to swap space, ie. has an associated swap entry
-14. SWAPBACKED  page is backed by swap/RAM
+    is up-to-date one has to read ``/sys/kernel/mm/page_idle/bitmap`` first.
+
+IO related page flags
+---------------------
+
+1 - ERROR
+   IO error occurred
+3 - UPTODATE
+   page has up-to-date data
+   ie. for file backed page: (in-memory data revision >= on-disk one)
+4 - DIRTY
+   page has been written to, hence contains new data
+   ie. for file backed page: (in-memory data revision >  on-disk one)
+8 - WRITEBACK
+   page is being synced to disk
+
+LRU related page flags
+----------------------
+
+5 - LRU
+   page is in one of the LRU lists
+6 - ACTIVE
+   page is in the active LRU list
+18 - UNEVICTABLE
+   page is in the unevictable (non-)LRU list It is somehow pinned and
+   not a candidate for LRU page reclaims, eg. ramfs pages,
+   shmctl(SHM_LOCK) and mlock() memory segments
+2 - REFERENCED
+   page has been referenced since last LRU list enqueue/requeue
+9 - RECLAIM
+   page will be reclaimed soon after its pageout IO completed
+11 - MMAP
+   a memory mapped page
+12 - ANON
+   a memory mapped page that is not part of a file
+13 - SWAPCACHE
+   page is mapped to swap space, ie. has an associated swap entry
+14 - SWAPBACKED
+   page is backed by swap/RAM
 
 The page-types tool in the tools/vm directory can be used to query the
 above flags.
 
-Using pagemap to do something useful:
+Using pagemap to do something useful
+====================================
 
 The general procedure for using pagemap to find out about a process' memory
 usage goes like this:
 
- 1. Read /proc/pid/maps to determine which parts of the memory space are
+ 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
     mapped to what.
  2. Select the maps you are interested in -- all of them, or a particular
     library, or the stack or the heap, etc.
- 3. Open /proc/pid/pagemap and seek to the pages you would like to examine.
+ 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
  4. Read a u64 for each page from pagemap.
- 5. Open /proc/kpagecount and/or /proc/kpageflags.  For each PFN you just
-    read, seek to that entry in the file, and read the data you want.
+ 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``.  For each PFN you
+    just read, seek to that entry in the file, and read the data you want.
 
 For example, to find the "unique set size" (USS), which is the amount of
 memory that a process is using that is not shared with any other process,
@@ -171,7 +184,8 @@ you can go through every map in the process, find the PFNs, look those up
 in kpagecount, and tally up the number of pages that are only referenced
 once.
 
-Other notes:
+Other notes
+===========
 
 Reading from any of the files will return -EINVAL if you are not starting
 the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 16/32] docs/vm: numa: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/numa | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/vm/numa b/Documentation/vm/numa
index a31b85b..c81e7c5 100644
--- a/Documentation/vm/numa
+++ b/Documentation/vm/numa
@@ -1,6 +1,10 @@
+.. _numa:
+
 Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
 
+=============
 What is NUMA?
+=============
 
 This question can be answered from a couple of perspectives:  the
 hardware view and the Linux software view.
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 13/32] docs/vm: numa_memory_policy.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/numa_memory_policy.txt | 533 +++++++++++++++++---------------
 1 file changed, 283 insertions(+), 250 deletions(-)

diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
index 622b927..8cd942c 100644
--- a/Documentation/vm/numa_memory_policy.txt
+++ b/Documentation/vm/numa_memory_policy.txt
@@ -1,5 +1,11 @@
+.. _numa_memory_policy:
+
+===================
+Linux Memory Policy
+===================
 
 What is Linux Memory Policy?
+============================
 
 In the Linux kernel, "memory policy" determines from which node the kernel will
 allocate memory in a NUMA system or in an emulated NUMA system.  Linux has
@@ -9,35 +15,36 @@ document attempts to describe the concepts and APIs of the 2.6 memory policy
 support.
 
 Memory policies should not be confused with cpusets
-(Documentation/cgroup-v1/cpusets.txt)
+(``Documentation/cgroup-v1/cpusets.txt``)
 which is an administrative mechanism for restricting the nodes from which
 memory may be allocated by a set of processes. Memory policies are a
 programming interface that a NUMA-aware application can take advantage of.  When
 both cpusets and policies are applied to a task, the restrictions of the cpuset
-takes priority.  See "MEMORY POLICIES AND CPUSETS" below for more details.
+takes priority.  See :ref:`Memory Policies and cpusets <mem_pol_and_cpusets>`
+below for more details.
 
-MEMORY POLICY CONCEPTS
+Memory Policy Concepts
+======================
 
 Scope of Memory Policies
+------------------------
 
 The Linux kernel supports _scopes_ of memory policy, described here from
 most general to most specific:
 
-    System Default Policy:  this policy is "hard coded" into the kernel.  It
-    is the policy that governs all page allocations that aren't controlled
-    by one of the more specific policy scopes discussed below.  When the
-    system is "up and running", the system default policy will use "local
-    allocation" described below.  However, during boot up, the system
-    default policy will be set to interleave allocations across all nodes
-    with "sufficient" memory, so as not to overload the initial boot node
-    with boot-time allocations.
-
-    Task/Process Policy:  this is an optional, per-task policy.  When defined
-    for a specific task, this policy controls all page allocations made by or
-    on behalf of the task that aren't controlled by a more specific scope.
-    If a task does not define a task policy, then all page allocations that
-    would have been controlled by the task policy "fall back" to the System
-    Default Policy.
+System Default Policy
+	this policy is "hard coded" into the kernel.  It is the policy
+	that governs all page allocations that aren't controlled by
+	one of the more specific policy scopes discussed below.  When
+	the system is "up and running", the system default policy will
+	use "local allocation" described below.  However, during boot
+	up, the system default policy will be set to interleave
+	allocations across all nodes with "sufficient" memory, so as
+	not to overload the initial boot node with boot-time
+	allocations.
+
+Task/Process Policy
+	this is an optional, per-task policy.  When defined for a specific task, this policy controls all page allocations made by or on behalf of the task that aren't controlled by a more specific scope. If a task does not define a task policy, then all page allocations that would have been controlled by the task policy "fall back" to the System Default Policy.
 
 	The task policy applies to the entire address space of a task. Thus,
 	it is inheritable, and indeed is inherited, across both fork()
@@ -58,56 +65,66 @@ most general to most specific:
 	changes its task policy remain where they were allocated based on
 	the policy at the time they were allocated.
 
-    VMA Policy:  A "VMA" or "Virtual Memory Area" refers to a range of a task's
-    virtual address space.  A task may define a specific policy for a range
-    of its virtual address space.   See the MEMORY POLICIES APIS section,
-    below, for an overview of the mbind() system call used to set a VMA
-    policy.
-
-    A VMA policy will govern the allocation of pages that back this region of
-    the address space.  Any regions of the task's address space that don't
-    have an explicit VMA policy will fall back to the task policy, which may
-    itself fall back to the System Default Policy.
-
-    VMA policies have a few complicating details:
-
-	VMA policy applies ONLY to anonymous pages.  These include pages
-	allocated for anonymous segments, such as the task stack and heap, and
-	any regions of the address space mmap()ed with the MAP_ANONYMOUS flag.
-	If a VMA policy is applied to a file mapping, it will be ignored if
-	the mapping used the MAP_SHARED flag.  If the file mapping used the
-	MAP_PRIVATE flag, the VMA policy will only be applied when an
-	anonymous page is allocated on an attempt to write to the mapping--
-	i.e., at Copy-On-Write.
-
-	VMA policies are shared between all tasks that share a virtual address
-	space--a.k.a. threads--independent of when the policy is installed; and
-	they are inherited across fork().  However, because VMA policies refer
-	to a specific region of a task's address space, and because the address
-	space is discarded and recreated on exec*(), VMA policies are NOT
-	inheritable across exec().  Thus, only NUMA-aware applications may
-	use VMA policies.
-
-	A task may install a new VMA policy on a sub-range of a previously
-	mmap()ed region.  When this happens, Linux splits the existing virtual
-	memory area into 2 or 3 VMAs, each with it's own policy.
-
-	By default, VMA policy applies only to pages allocated after the policy
-	is installed.  Any pages already faulted into the VMA range remain
-	where they were allocated based on the policy at the time they were
-	allocated.  However, since 2.6.16, Linux supports page migration via
-	the mbind() system call, so that page contents can be moved to match
-	a newly installed policy.
-
-    Shared Policy:  Conceptually, shared policies apply to "memory objects"
-    mapped shared into one or more tasks' distinct address spaces.  An
-    application installs a shared policies the same way as VMA policies--using
-    the mbind() system call specifying a range of virtual addresses that map
-    the shared object.  However, unlike VMA policies, which can be considered
-    to be an attribute of a range of a task's address space, shared policies
-    apply directly to the shared object.  Thus, all tasks that attach to the
-    object share the policy, and all pages allocated for the shared object,
-    by any task, will obey the shared policy.
+.. _vma_policy:
+
+VMA Policy
+	A "VMA" or "Virtual Memory Area" refers to a range of a task's
+	virtual address space.  A task may define a specific policy for a range
+	of its virtual address space.   See the MEMORY POLICIES APIS section,
+	below, for an overview of the mbind() system call used to set a VMA
+	policy.
+
+	A VMA policy will govern the allocation of pages that back
+	this region ofthe address space.  Any regions of the task's
+	address space that don't have an explicit VMA policy will fall
+	back to the task policy, which may itself fall back to the
+	System Default Policy.
+
+	VMA policies have a few complicating details:
+
+	* VMA policy applies ONLY to anonymous pages.  These include
+	  pages allocated for anonymous segments, such as the task
+	  stack and heap, and any regions of the address space
+	  mmap()ed with the MAP_ANONYMOUS flag.  If a VMA policy is
+	  applied to a file mapping, it will be ignored if the mapping
+	  used the MAP_SHARED flag.  If the file mapping used the
+	  MAP_PRIVATE flag, the VMA policy will only be applied when
+	  an anonymous page is allocated on an attempt to write to the
+	  mapping-- i.e., at Copy-On-Write.
+
+	* VMA policies are shared between all tasks that share a
+	  virtual address space--a.k.a. threads--independent of when
+	  the policy is installed; and they are inherited across
+	  fork().  However, because VMA policies refer to a specific
+	  region of a task's address space, and because the address
+	  space is discarded and recreated on exec*(), VMA policies
+	  are NOT inheritable across exec().  Thus, only NUMA-aware
+	  applications may use VMA policies.
+
+	* A task may install a new VMA policy on a sub-range of a
+	  previously mmap()ed region.  When this happens, Linux splits
+	  the existing virtual memory area into 2 or 3 VMAs, each with
+	  it's own policy.
+
+	* By default, VMA policy applies only to pages allocated after
+	  the policy is installed.  Any pages already faulted into the
+	  VMA range remain where they were allocated based on the
+	  policy at the time they were allocated.  However, since
+	  2.6.16, Linux supports page migration via the mbind() system
+	  call, so that page contents can be moved to match a newly
+	  installed policy.
+
+Shared Policy
+	Conceptually, shared policies apply to "memory objects" mapped
+	shared into one or more tasks' distinct address spaces.  An
+	application installs a shared policies the same way as VMA
+	policies--using the mbind() system call specifying a range of
+	virtual addresses that map the shared object.  However, unlike
+	VMA policies, which can be considered to be an attribute of a
+	range of a task's address space, shared policies apply
+	directly to the shared object.  Thus, all tasks that attach to
+	the object share the policy, and all pages allocated for the
+	shared object, by any task, will obey the shared policy.
 
 	As of 2.6.22, only shared memory segments, created by shmget() or
 	mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  When shared
@@ -118,11 +135,12 @@ most general to most specific:
 	Although hugetlbfs segments now support lazy allocation, their support
 	for shared policy has not been completed.
 
-	As mentioned above [re: VMA policies], allocations of page cache
-	pages for regular files mmap()ed with MAP_SHARED ignore any VMA
-	policy installed on the virtual address range backed by the shared
-	file mapping.  Rather, shared page cache pages, including pages backing
-	private mappings that have not yet been written by the task, follow
+	As mentioned above :ref:`VMA policies <vma_policy>`,
+	allocations of page cache pages for regular files mmap()ed
+	with MAP_SHARED ignore any VMA policy installed on the virtual
+	address range backed by the shared file mapping.  Rather,
+	shared page cache pages, including pages backing private
+	mappings that have not yet been written by the task, follow
 	task policy, if any, else System Default Policy.
 
 	The shared policy infrastructure supports different policies on subset
@@ -135,164 +153,175 @@ most general to most specific:
 	one or more ranges of the region.
 
 Components of Memory Policies
-
-    A Linux memory policy consists of a "mode", optional mode flags, and an
-    optional set of nodes.  The mode determines the behavior of the policy,
-    the optional mode flags determine the behavior of the mode, and the
-    optional set of nodes can be viewed as the arguments to the policy
-    behavior.
-
-   Internally, memory policies are implemented by a reference counted
-   structure, struct mempolicy.  Details of this structure will be discussed
-   in context, below, as required to explain the behavior.
-
-   Linux memory policy supports the following 4 behavioral modes:
-
-	Default Mode--MPOL_DEFAULT:  This mode is only used in the memory
-	policy APIs.  Internally, MPOL_DEFAULT is converted to the NULL
-	memory policy in all policy scopes.  Any existing non-default policy
-	will simply be removed when MPOL_DEFAULT is specified.  As a result,
-	MPOL_DEFAULT means "fall back to the next most specific policy scope."
-
-	    For example, a NULL or default task policy will fall back to the
-	    system default policy.  A NULL or default vma policy will fall
-	    back to the task policy.
-
-	    When specified in one of the memory policy APIs, the Default mode
-	    does not use the optional set of nodes.
-
-	    It is an error for the set of nodes specified for this policy to
-	    be non-empty.
-
-	MPOL_BIND:  This mode specifies that memory must come from the
-	set of nodes specified by the policy.  Memory will be allocated from
-	the node in the set with sufficient free memory that is closest to
-	the node where the allocation takes place.
-
-	MPOL_PREFERRED:  This mode specifies that the allocation should be
-	attempted from the single node specified in the policy.  If that
-	allocation fails, the kernel will search other nodes, in order of
-	increasing distance from the preferred node based on information
-	provided by the platform firmware.
-
-	    Internally, the Preferred policy uses a single node--the
-	    preferred_node member of struct mempolicy.  When the internal
-	    mode flag MPOL_F_LOCAL is set, the preferred_node is ignored and
-	    the policy is interpreted as local allocation.  "Local" allocation
-	    policy can be viewed as a Preferred policy that starts at the node
-	    containing the cpu where the allocation takes place.
-
-	    It is possible for the user to specify that local allocation is
-	    always preferred by passing an empty nodemask with this mode.
-	    If an empty nodemask is passed, the policy cannot use the
-	    MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described
-	    below.
-
-	MPOL_INTERLEAVED:  This mode specifies that page allocations be
-	interleaved, on a page granularity, across the nodes specified in
-	the policy.  This mode also behaves slightly differently, based on
-	the context where it is used:
-
-	    For allocation of anonymous pages and shared memory pages,
-	    Interleave mode indexes the set of nodes specified by the policy
-	    using the page offset of the faulting address into the segment
-	    [VMA] containing the address modulo the number of nodes specified
-	    by the policy.  It then attempts to allocate a page, starting at
-	    the selected node, as if the node had been specified by a Preferred
-	    policy or had been selected by a local allocation.  That is,
-	    allocation will follow the per node zonelist.
-
-	    For allocation of page cache pages, Interleave mode indexes the set
-	    of nodes specified by the policy using a node counter maintained
-	    per task.  This counter wraps around to the lowest specified node
-	    after it reaches the highest specified node.  This will tend to
-	    spread the pages out over the nodes specified by the policy based
-	    on the order in which they are allocated, rather than based on any
-	    page offset into an address range or file.  During system boot up,
-	    the temporary interleaved system default policy works in this
-	    mode.
-
-   Linux memory policy supports the following optional mode flags:
-
-	MPOL_F_STATIC_NODES:  This flag specifies that the nodemask passed by
+-----------------------------
+
+A Linux memory policy consists of a "mode", optional mode flags, and
+an optional set of nodes.  The mode determines the behavior of the
+policy, the optional mode flags determine the behavior of the mode,
+and the optional set of nodes can be viewed as the arguments to the
+policy behavior.
+
+Internally, memory policies are implemented by a reference counted
+structure, struct mempolicy.  Details of this structure will be
+discussed in context, below, as required to explain the behavior.
+
+Linux memory policy supports the following 4 behavioral modes:
+
+Default Mode--MPOL_DEFAULT
+	This mode is only used in the memory policy APIs.  Internally,
+	MPOL_DEFAULT is converted to the NULL memory policy in all
+	policy scopes.  Any existing non-default policy will simply be
+	removed when MPOL_DEFAULT is specified.  As a result,
+	MPOL_DEFAULT means "fall back to the next most specific policy
+	scope."
+
+	For example, a NULL or default task policy will fall back to the
+	system default policy.  A NULL or default vma policy will fall
+	back to the task policy.
+
+	When specified in one of the memory policy APIs, the Default mode
+	does not use the optional set of nodes.
+
+	It is an error for the set of nodes specified for this policy to
+	be non-empty.
+
+MPOL_BIND
+	This mode specifies that memory must come from the set of
+	nodes specified by the policy.  Memory will be allocated from
+	the node in the set with sufficient free memory that is
+	closest to the node where the allocation takes place.
+
+MPOL_PREFERRED
+	This mode specifies that the allocation should be attempted
+	from the single node specified in the policy.  If that
+	allocation fails, the kernel will search other nodes, in order
+	of increasing distance from the preferred node based on
+	information provided by the platform firmware.
+
+	Internally, the Preferred policy uses a single node--the
+	preferred_node member of struct mempolicy.  When the internal
+	mode flag MPOL_F_LOCAL is set, the preferred_node is ignored
+	and the policy is interpreted as local allocation.  "Local"
+	allocation policy can be viewed as a Preferred policy that
+	starts at the node containing the cpu where the allocation
+	takes place.
+
+	It is possible for the user to specify that local allocation
+	is always preferred by passing an empty nodemask with this
+	mode.  If an empty nodemask is passed, the policy cannot use
+	the MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags
+	described below.
+
+MPOL_INTERLEAVED
+	This mode specifies that page allocations be interleaved, on a
+	page granularity, across the nodes specified in the policy.
+	This mode also behaves slightly differently, based on the
+	context where it is used:
+
+	For allocation of anonymous pages and shared memory pages,
+	Interleave mode indexes the set of nodes specified by the
+	policy using the page offset of the faulting address into the
+	segment [VMA] containing the address modulo the number of
+	nodes specified by the policy.  It then attempts to allocate a
+	page, starting at the selected node, as if the node had been
+	specified by a Preferred policy or had been selected by a
+	local allocation.  That is, allocation will follow the per
+	node zonelist.
+
+	For allocation of page cache pages, Interleave mode indexes
+	the set of nodes specified by the policy using a node counter
+	maintained per task.  This counter wraps around to the lowest
+	specified node after it reaches the highest specified node.
+	This will tend to spread the pages out over the nodes
+	specified by the policy based on the order in which they are
+	allocated, rather than based on any page offset into an
+	address range or file.  During system boot up, the temporary
+	interleaved system default policy works in this mode.
+
+Linux memory policy supports the following optional mode flags:
+
+MPOL_F_STATIC_NODES
+	This flag specifies that the nodemask passed by
 	the user should not be remapped if the task or VMA's set of allowed
 	nodes changes after the memory policy has been defined.
 
-	    Without this flag, anytime a mempolicy is rebound because of a
-	    change in the set of allowed nodes, the node (Preferred) or
-	    nodemask (Bind, Interleave) is remapped to the new set of
-	    allowed nodes.  This may result in nodes being used that were
-	    previously undesired.
-
-	    With this flag, if the user-specified nodes overlap with the
-	    nodes allowed by the task's cpuset, then the memory policy is
-	    applied to their intersection.  If the two sets of nodes do not
-	    overlap, the Default policy is used.
-
-	    For example, consider a task that is attached to a cpuset with
-	    mems 1-3 that sets an Interleave policy over the same set.  If
-	    the cpuset's mems change to 3-5, the Interleave will now occur
-	    over nodes 3, 4, and 5.  With this flag, however, since only node
-	    3 is allowed from the user's nodemask, the "interleave" only
-	    occurs over that node.  If no nodes from the user's nodemask are
-	    now allowed, the Default behavior is used.
-
-	    MPOL_F_STATIC_NODES cannot be combined with the
-	    MPOL_F_RELATIVE_NODES flag.  It also cannot be used for
-	    MPOL_PREFERRED policies that were created with an empty nodemask
-	    (local allocation).
-
-	MPOL_F_RELATIVE_NODES:  This flag specifies that the nodemask passed
+	Without this flag, anytime a mempolicy is rebound because of a
+	change in the set of allowed nodes, the node (Preferred) or
+	nodemask (Bind, Interleave) is remapped to the new set of
+	allowed nodes.  This may result in nodes being used that were
+	previously undesired.
+
+	With this flag, if the user-specified nodes overlap with the
+	nodes allowed by the task's cpuset, then the memory policy is
+	applied to their intersection.  If the two sets of nodes do not
+	overlap, the Default policy is used.
+
+	For example, consider a task that is attached to a cpuset with
+	mems 1-3 that sets an Interleave policy over the same set.  If
+	the cpuset's mems change to 3-5, the Interleave will now occur
+	over nodes 3, 4, and 5.  With this flag, however, since only node
+	3 is allowed from the user's nodemask, the "interleave" only
+	occurs over that node.  If no nodes from the user's nodemask are
+	now allowed, the Default behavior is used.
+
+	MPOL_F_STATIC_NODES cannot be combined with the
+	MPOL_F_RELATIVE_NODES flag.  It also cannot be used for
+	MPOL_PREFERRED policies that were created with an empty nodemask
+	(local allocation).
+
+MPOL_F_RELATIVE_NODES
+	This flag specifies that the nodemask passed
 	by the user will be mapped relative to the set of the task or VMA's
 	set of allowed nodes.  The kernel stores the user-passed nodemask,
 	and if the allowed nodes changes, then that original nodemask will
 	be remapped relative to the new set of allowed nodes.
 
-	    Without this flag (and without MPOL_F_STATIC_NODES), anytime a
-	    mempolicy is rebound because of a change in the set of allowed
-	    nodes, the node (Preferred) or nodemask (Bind, Interleave) is
-	    remapped to the new set of allowed nodes.  That remap may not
-	    preserve the relative nature of the user's passed nodemask to its
-	    set of allowed nodes upon successive rebinds: a nodemask of
-	    1,3,5 may be remapped to 7-9 and then to 1-3 if the set of
-	    allowed nodes is restored to its original state.
-
-	    With this flag, the remap is done so that the node numbers from
-	    the user's passed nodemask are relative to the set of allowed
-	    nodes.  In other words, if nodes 0, 2, and 4 are set in the user's
-	    nodemask, the policy will be effected over the first (and in the
-	    Bind or Interleave case, the third and fifth) nodes in the set of
-	    allowed nodes.  The nodemask passed by the user represents nodes
-	    relative to task or VMA's set of allowed nodes.
-
-	    If the user's nodemask includes nodes that are outside the range
-	    of the new set of allowed nodes (for example, node 5 is set in
-	    the user's nodemask when the set of allowed nodes is only 0-3),
-	    then the remap wraps around to the beginning of the nodemask and,
-	    if not already set, sets the node in the mempolicy nodemask.
-
-	    For example, consider a task that is attached to a cpuset with
-	    mems 2-5 that sets an Interleave policy over the same set with
-	    MPOL_F_RELATIVE_NODES.  If the cpuset's mems change to 3-7, the
-	    interleave now occurs over nodes 3,5-7.  If the cpuset's mems
-	    then change to 0,2-3,5, then the interleave occurs over nodes
-	    0,2-3,5.
-
-	    Thanks to the consistent remapping, applications preparing
-	    nodemasks to specify memory policies using this flag should
-	    disregard their current, actual cpuset imposed memory placement
-	    and prepare the nodemask as if they were always located on
-	    memory nodes 0 to N-1, where N is the number of memory nodes the
-	    policy is intended to manage.  Let the kernel then remap to the
-	    set of memory nodes allowed by the task's cpuset, as that may
-	    change over time.
-
-	    MPOL_F_RELATIVE_NODES cannot be combined with the
-	    MPOL_F_STATIC_NODES flag.  It also cannot be used for
-	    MPOL_PREFERRED policies that were created with an empty nodemask
-	    (local allocation).
-
-MEMORY POLICY REFERENCE COUNTING
+	Without this flag (and without MPOL_F_STATIC_NODES), anytime a
+	mempolicy is rebound because of a change in the set of allowed
+	nodes, the node (Preferred) or nodemask (Bind, Interleave) is
+	remapped to the new set of allowed nodes.  That remap may not
+	preserve the relative nature of the user's passed nodemask to its
+	set of allowed nodes upon successive rebinds: a nodemask of
+	1,3,5 may be remapped to 7-9 and then to 1-3 if the set of
+	allowed nodes is restored to its original state.
+
+	With this flag, the remap is done so that the node numbers from
+	the user's passed nodemask are relative to the set of allowed
+	nodes.  In other words, if nodes 0, 2, and 4 are set in the user's
+	nodemask, the policy will be effected over the first (and in the
+	Bind or Interleave case, the third and fifth) nodes in the set of
+	allowed nodes.  The nodemask passed by the user represents nodes
+	relative to task or VMA's set of allowed nodes.
+
+	If the user's nodemask includes nodes that are outside the range
+	of the new set of allowed nodes (for example, node 5 is set in
+	the user's nodemask when the set of allowed nodes is only 0-3),
+	then the remap wraps around to the beginning of the nodemask and,
+	if not already set, sets the node in the mempolicy nodemask.
+
+	For example, consider a task that is attached to a cpuset with
+	mems 2-5 that sets an Interleave policy over the same set with
+	MPOL_F_RELATIVE_NODES.  If the cpuset's mems change to 3-7, the
+	interleave now occurs over nodes 3,5-7.  If the cpuset's mems
+	then change to 0,2-3,5, then the interleave occurs over nodes
+	0,2-3,5.
+
+	Thanks to the consistent remapping, applications preparing
+	nodemasks to specify memory policies using this flag should
+	disregard their current, actual cpuset imposed memory placement
+	and prepare the nodemask as if they were always located on
+	memory nodes 0 to N-1, where N is the number of memory nodes the
+	policy is intended to manage.  Let the kernel then remap to the
+	set of memory nodes allowed by the task's cpuset, as that may
+	change over time.
+
+	MPOL_F_RELATIVE_NODES cannot be combined with the
+	MPOL_F_STATIC_NODES flag.  It also cannot be used for
+	MPOL_PREFERRED policies that were created with an empty nodemask
+	(local allocation).
+
+Memory Policy Reference Counting
+================================
 
 To resolve use/free races, struct mempolicy contains an atomic reference
 count field.  Internal interfaces, mpol_get()/mpol_put() increment and
@@ -360,60 +389,62 @@ follows:
    or by prefaulting the entire shared memory region into memory and locking
    it down.  However, this might not be appropriate for all applications.
 
-MEMORY POLICY APIs
+Memory Policy APIs
 
 Linux supports 3 system calls for controlling memory policy.  These APIS
 always affect only the calling task, the calling task's address space, or
 some shared object mapped into the calling task's address space.
 
-	Note:  the headers that define these APIs and the parameter data types
-	for user space applications reside in a package that is not part of
-	the Linux kernel.  The kernel system call interfaces, with the 'sys_'
-	prefix, are defined in <linux/syscalls.h>; the mode and flag
-	definitions are defined in <linux/mempolicy.h>.
+.. note::
+   the headers that define these APIs and the parameter data types for
+   user space applications reside in a package that is not part of the
+   Linux kernel.  The kernel system call interfaces, with the 'sys\_'
+   prefix, are defined in <linux/syscalls.h>; the mode and flag
+   definitions are defined in <linux/mempolicy.h>.
 
-Set [Task] Memory Policy:
+Set [Task] Memory Policy::
 
 	long set_mempolicy(int mode, const unsigned long *nmask,
 					unsigned long maxnode);
 
-	Set's the calling task's "task/process memory policy" to mode
-	specified by the 'mode' argument and the set of nodes defined
-	by 'nmask'.  'nmask' points to a bit mask of node ids containing
-	at least 'maxnode' ids.  Optional mode flags may be passed by
-	combining the 'mode' argument with the flag (for example:
-	MPOL_INTERLEAVE | MPOL_F_STATIC_NODES).
+Set's the calling task's "task/process memory policy" to mode
+specified by the 'mode' argument and the set of nodes defined by
+'nmask'.  'nmask' points to a bit mask of node ids containing at least
+'maxnode' ids.  Optional mode flags may be passed by combining the
+'mode' argument with the flag (for example: MPOL_INTERLEAVE |
+MPOL_F_STATIC_NODES).
 
-	See the set_mempolicy(2) man page for more details
+See the set_mempolicy(2) man page for more details
 
 
-Get [Task] Memory Policy or Related Information
+Get [Task] Memory Policy or Related Information::
 
 	long get_mempolicy(int *mode,
 			   const unsigned long *nmask, unsigned long maxnode,
 			   void *addr, int flags);
 
-	Queries the "task/process memory policy" of the calling task, or
-	the policy or location of a specified virtual address, depending
-	on the 'flags' argument.
+Queries the "task/process memory policy" of the calling task, or the
+policy or location of a specified virtual address, depending on the
+'flags' argument.
 
-	See the get_mempolicy(2) man page for more details
+See the get_mempolicy(2) man page for more details
 
 
-Install VMA/Shared Policy for a Range of Task's Address Space
+Install VMA/Shared Policy for a Range of Task's Address Space::
 
 	long mbind(void *start, unsigned long len, int mode,
 		   const unsigned long *nmask, unsigned long maxnode,
 		   unsigned flags);
 
-	mbind() installs the policy specified by (mode, nmask, maxnodes) as
-	a VMA policy for the range of the calling task's address space
-	specified by the 'start' and 'len' arguments.  Additional actions
-	may be requested via the 'flags' argument.
+mbind() installs the policy specified by (mode, nmask, maxnodes) as a
+VMA policy for the range of the calling task's address space specified
+by the 'start' and 'len' arguments.  Additional actions may be
+requested via the 'flags' argument.
 
-	See the mbind(2) man page for more details.
+See the mbind(2) man page for more details.
 
-MEMORY POLICY COMMAND LINE INTERFACE
+Memory Policy Command Line Interface
+====================================
 
 Although not strictly part of the Linux implementation of memory policy,
 a command line tool, numactl(8), exists that allows one to:
@@ -428,8 +459,10 @@ containing the memory policy system call wrappers.  Some distributions
 package the headers and compile-time libraries in a separate development
 package.
 
+.. _mem_pol_and_cpusets:
 
-MEMORY POLICIES AND CPUSETS
+Memory Policies and cpusets
+===========================
 
 Memory policies work within cpusets as described above.  For memory policies
 that require a node or set of nodes, the nodes are restricted to the set of
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 10/32] docs/vm: idle_page_tracking.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/idle_page_tracking.txt | 55 +++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 19 deletions(-)

diff --git a/Documentation/vm/idle_page_tracking.txt b/Documentation/vm/idle_page_tracking.txt
index 85dcc3b..9cbe6f8 100644
--- a/Documentation/vm/idle_page_tracking.txt
+++ b/Documentation/vm/idle_page_tracking.txt
@@ -1,4 +1,11 @@
-MOTIVATION
+.. _idle_page_tracking:
+
+==================
+Idle Page Tracking
+==================
+
+Motivation
+==========
 
 The idle page tracking feature allows to track which memory pages are being
 accessed by a workload and which are idle. This information can be useful for
@@ -8,10 +15,14 @@ or deciding where to place the workload within a compute cluster.
 
 It is enabled by CONFIG_IDLE_PAGE_TRACKING=y.
 
-USER API
+.. _user_api:
 
-The idle page tracking API is located at /sys/kernel/mm/page_idle. Currently,
-it consists of the only read-write file, /sys/kernel/mm/page_idle/bitmap.
+User API
+========
+
+The idle page tracking API is located at ``/sys/kernel/mm/page_idle``.
+Currently, it consists of the only read-write file,
+``/sys/kernel/mm/page_idle/bitmap``.
 
 The file implements a bitmap where each bit corresponds to a memory page. The
 bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
@@ -19,8 +30,9 @@ mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
 set, the corresponding page is idle.
 
 A page is considered idle if it has not been accessed since it was marked idle
-(for more details on what "accessed" actually means see the IMPLEMENTATION
-DETAILS section). To mark a page idle one has to set the bit corresponding to
+(for more details on what "accessed" actually means see the :ref:`Implementation
+Details <impl_details>` section).
+To mark a page idle one has to set the bit corresponding to
 the page by writing to the file. A value written to the file is OR-ed with the
 current bitmap value.
 
@@ -30,9 +42,9 @@ page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored,
 and hence such pages are never reported idle.
 
 For huge pages the idle flag is set only on the head page, so one has to read
-/proc/kpageflags in order to correctly count idle huge pages.
+``/proc/kpageflags`` in order to correctly count idle huge pages.
 
-Reading from or writing to /sys/kernel/mm/page_idle/bitmap will return
+Reading from or writing to ``/sys/kernel/mm/page_idle/bitmap`` will return
 -EINVAL if you are not starting the read/write on an 8-byte boundary, or
 if the size of the read/write is not a multiple of 8 bytes. Writing to
 this file beyond max PFN will return -ENXIO.
@@ -41,21 +53,25 @@ That said, in order to estimate the amount of pages that are not used by a
 workload one should:
 
  1. Mark all the workload's pages as idle by setting corresponding bits in
-    /sys/kernel/mm/page_idle/bitmap. The pages can be found by reading
-    /proc/pid/pagemap if the workload is represented by a process, or by
-    filtering out alien pages using /proc/kpagecgroup in case the workload is
-    placed in a memory cgroup.
+    ``/sys/kernel/mm/page_idle/bitmap``. The pages can be found by reading
+    ``/proc/pid/pagemap`` if the workload is represented by a process, or by
+    filtering out alien pages using ``/proc/kpagecgroup`` in case the workload
+    is placed in a memory cgroup.
 
  2. Wait until the workload accesses its working set.
 
- 3. Read /sys/kernel/mm/page_idle/bitmap and count the number of bits set. If
-    one wants to ignore certain types of pages, e.g. mlocked pages since they
-    are not reclaimable, he or she can filter them out using /proc/kpageflags.
+ 3. Read ``/sys/kernel/mm/page_idle/bitmap`` and count the number of bits set.
+    If one wants to ignore certain types of pages, e.g. mlocked pages since they
+    are not reclaimable, he or she can filter them out using
+    ``/proc/kpageflags``.
+
+See Documentation/vm/pagemap.txt for more information about
+``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``.
 
-See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap,
-/proc/kpageflags, and /proc/kpagecgroup.
+.. _impl_details:
 
-IMPLEMENTATION DETAILS
+Implementation Details
+======================
 
 The kernel internally keeps track of accesses to user memory pages in order to
 reclaim unreferenced pages first on memory shortage conditions. A page is
@@ -77,7 +93,8 @@ When a dirty page is written to swap or disk as a result of memory reclaim or
 exceeding the dirty memory limit, it is not marked referenced.
 
 The idle memory tracking feature adds a new page flag, the Idle flag. This flag
-is set manually, by writing to /sys/kernel/mm/page_idle/bitmap (see the USER API
+is set manually, by writing to ``/sys/kernel/mm/page_idle/bitmap`` (see the
+:ref:`User API <user_api>`
 section), and cleared automatically whenever a page is referenced as defined
 above.
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 12/32] docs/vm: mmu_notifier.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/mmu_notifier.txt | 108 ++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 51 deletions(-)

diff --git a/Documentation/vm/mmu_notifier.txt b/Documentation/vm/mmu_notifier.txt
index 23b4625..47baa1c 100644
--- a/Documentation/vm/mmu_notifier.txt
+++ b/Documentation/vm/mmu_notifier.txt
@@ -1,7 +1,10 @@
+.. _mmu_notifier:
+
 When do you need to notify inside page table lock ?
+===================================================
 
 When clearing a pte/pmd we are given a choice to notify the event through
-(notify version of *_clear_flush call mmu_notifier_invalidate_range) under
+(notify version of \*_clear_flush call mmu_notifier_invalidate_range) under
 the page table lock. But that notification is not necessary in all cases.
 
 For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use
@@ -18,6 +21,7 @@ a page that might now be used by some completely different task.
 
 Case B is more subtle. For correctness it requires the following sequence to
 happen:
+
   - take page table lock
   - clear page table entry and notify ([pmd/pte]p_huge_clear_flush_notify())
   - set page table entry to point to new page
@@ -28,58 +32,60 @@ the device.
 
 Consider the following scenario (device use a feature similar to ATS/PASID):
 
-Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we assume
+Two address addrA and addrB such that \|addrA - addrB\| >= PAGE_SIZE we assume
 they are write protected for COW (other case of B apply too).
 
-[Time N] --------------------------------------------------------------------
-CPU-thread-0  {try to write to addrA}
-CPU-thread-1  {try to write to addrB}
-CPU-thread-2  {}
-CPU-thread-3  {}
-DEV-thread-0  {read addrA and populate device TLB}
-DEV-thread-2  {read addrB and populate device TLB}
-[Time N+1] ------------------------------------------------------------------
-CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
-CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
-CPU-thread-2  {}
-CPU-thread-3  {}
-DEV-thread-0  {}
-DEV-thread-2  {}
-[Time N+2] ------------------------------------------------------------------
-CPU-thread-0  {COW_step1: {update page table to point to new page for addrA}}
-CPU-thread-1  {COW_step1: {update page table to point to new page for addrB}}
-CPU-thread-2  {}
-CPU-thread-3  {}
-DEV-thread-0  {}
-DEV-thread-2  {}
-[Time N+3] ------------------------------------------------------------------
-CPU-thread-0  {preempted}
-CPU-thread-1  {preempted}
-CPU-thread-2  {write to addrA which is a write to new page}
-CPU-thread-3  {}
-DEV-thread-0  {}
-DEV-thread-2  {}
-[Time N+3] ------------------------------------------------------------------
-CPU-thread-0  {preempted}
-CPU-thread-1  {preempted}
-CPU-thread-2  {}
-CPU-thread-3  {write to addrB which is a write to new page}
-DEV-thread-0  {}
-DEV-thread-2  {}
-[Time N+4] ------------------------------------------------------------------
-CPU-thread-0  {preempted}
-CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
-CPU-thread-2  {}
-CPU-thread-3  {}
-DEV-thread-0  {}
-DEV-thread-2  {}
-[Time N+5] ------------------------------------------------------------------
-CPU-thread-0  {preempted}
-CPU-thread-1  {}
-CPU-thread-2  {}
-CPU-thread-3  {}
-DEV-thread-0  {read addrA from old page}
-DEV-thread-2  {read addrB from new page}
+::
+
+ [Time N] --------------------------------------------------------------------
+ CPU-thread-0  {try to write to addrA}
+ CPU-thread-1  {try to write to addrB}
+ CPU-thread-2  {}
+ CPU-thread-3  {}
+ DEV-thread-0  {read addrA and populate device TLB}
+ DEV-thread-2  {read addrB and populate device TLB}
+ [Time N+1] ------------------------------------------------------------------
+ CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
+ CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
+ CPU-thread-2  {}
+ CPU-thread-3  {}
+ DEV-thread-0  {}
+ DEV-thread-2  {}
+ [Time N+2] ------------------------------------------------------------------
+ CPU-thread-0  {COW_step1: {update page table to point to new page for addrA}}
+ CPU-thread-1  {COW_step1: {update page table to point to new page for addrB}}
+ CPU-thread-2  {}
+ CPU-thread-3  {}
+ DEV-thread-0  {}
+ DEV-thread-2  {}
+ [Time N+3] ------------------------------------------------------------------
+ CPU-thread-0  {preempted}
+ CPU-thread-1  {preempted}
+ CPU-thread-2  {write to addrA which is a write to new page}
+ CPU-thread-3  {}
+ DEV-thread-0  {}
+ DEV-thread-2  {}
+ [Time N+3] ------------------------------------------------------------------
+ CPU-thread-0  {preempted}
+ CPU-thread-1  {preempted}
+ CPU-thread-2  {}
+ CPU-thread-3  {write to addrB which is a write to new page}
+ DEV-thread-0  {}
+ DEV-thread-2  {}
+ [Time N+4] ------------------------------------------------------------------
+ CPU-thread-0  {preempted}
+ CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
+ CPU-thread-2  {}
+ CPU-thread-3  {}
+ DEV-thread-0  {}
+ DEV-thread-2  {}
+ [Time N+5] ------------------------------------------------------------------
+ CPU-thread-0  {preempted}
+ CPU-thread-1  {}
+ CPU-thread-2  {}
+ CPU-thread-3  {}
+ DEV-thread-0  {read addrA from old page}
+ DEV-thread-2  {read addrB from new page}
 
 So here because at time N+2 the clear page table entry was not pair with a
 notification to invalidate the secondary TLB, the device see the new value for
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 09/32] docs/vm: hwpoison.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/hwpoison.txt | 141 +++++++++++++++++++++---------------------
 1 file changed, 70 insertions(+), 71 deletions(-)

diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt
index e912d7e..b1a8c24 100644
--- a/Documentation/vm/hwpoison.txt
+++ b/Documentation/vm/hwpoison.txt
@@ -1,7 +1,14 @@
+.. hwpoison:
+
+========
+hwpoison
+========
+
 What is hwpoison?
+=================
 
 Upcoming Intel CPUs have support for recovering from some memory errors
-(``MCA recovery''). This requires the OS to declare a page "poisoned",
+(``MCA recovery``). This requires the OS to declare a page "poisoned",
 kill the processes associated with it and avoid using it in the future.
 
 This patchkit implements the necessary infrastructure in the VM.
@@ -46,9 +53,10 @@ address. This in theory allows other applications to handle
 memory failures too. The expection is that near all applications
 won't do that, but some very specialized ones might.
 
----
+Failure recovery modes
+======================
 
-There are two (actually three) modi memory failure recovery can be in:
+There are two (actually three) modes memory failure recovery can be in:
 
 vm.memory_failure_recovery sysctl set to zero:
 	All memory failures cause a panic. Do not attempt recovery.
@@ -67,9 +75,8 @@ late kill
 	This is best for memory error unaware applications and default
 	Note some pages are always handled as late kill.
 
----
-
-User control:
+User control
+============
 
 vm.memory_failure_recovery
 	See sysctl.txt
@@ -79,11 +86,19 @@ vm.memory_failure_early_kill
 
 PR_MCE_KILL
 	Set early/late kill mode/revert to system default
-	arg1: PR_MCE_KILL_CLEAR: Revert to system default
-	arg1: PR_MCE_KILL_SET: arg2 defines thread specific mode
-		PR_MCE_KILL_EARLY: Early kill
-		PR_MCE_KILL_LATE:  Late kill
-		PR_MCE_KILL_DEFAULT: Use system global default
+
+	arg1: PR_MCE_KILL_CLEAR:
+		Revert to system default
+	arg1: PR_MCE_KILL_SET:
+		arg2 defines thread specific mode
+
+		PR_MCE_KILL_EARLY:
+			Early kill
+		PR_MCE_KILL_LATE:
+			Late kill
+		PR_MCE_KILL_DEFAULT
+			Use system global default
+
 	Note that if you want to have a dedicated thread which handles
 	the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
 	call prctl(PR_MCE_KILL_EARLY) on the designated thread. Otherwise,
@@ -92,77 +107,64 @@ PR_MCE_KILL
 PR_MCE_KILL_GET
 	return current mode
 
+Testing
+=======
 
----
-
-Testing:
-
-madvise(MADV_HWPOISON, ....)
-	(as root)
-	Poison a page in the process for testing
-
+* madvise(MADV_HWPOISON, ....) (as root) - Poison a page in the
+  process for testing
 
-hwpoison-inject module through debugfs
+* hwpoison-inject module through debugfs ``/sys/kernel/debug/hwpoison/``
 
-/sys/kernel/debug/hwpoison/
+  corrupt-pfn
+	Inject hwpoison fault at PFN echoed into this file. This does
+	some early filtering to avoid corrupted unintended pages in test suites.
 
-corrupt-pfn
+  unpoison-pfn
+	Software-unpoison page at PFN echoed into this file. This way
+	a page can be reused again.  This only works for Linux
+	injected failures, not for real memory failures.
 
-Inject hwpoison fault at PFN echoed into this file. This does
-some early filtering to avoid corrupted unintended pages in test suites.
+  Note these injection interfaces are not stable and might change between
+  kernel versions
 
-unpoison-pfn
+  corrupt-filter-dev-major, corrupt-filter-dev-minor
+	Only handle memory failures to pages associated with the file
+	system defined by block device major/minor.  -1U is the
+	wildcard value.  This should be only used for testing with
+	artificial injection.
 
-Software-unpoison page at PFN echoed into this file. This
-way a page can be reused again.
-This only works for Linux injected failures, not for real
-memory failures.
+  corrupt-filter-memcg
+	Limit injection to pages owned by memgroup. Specified by inode
+	number of the memcg.
 
-Note these injection interfaces are not stable and might change between
-kernel versions
+	Example::
 
-corrupt-filter-dev-major
-corrupt-filter-dev-minor
+		mkdir /sys/fs/cgroup/mem/hwpoison
 
-Only handle memory failures to pages associated with the file system defined
-by block device major/minor.  -1U is the wildcard value.
-This should be only used for testing with artificial injection.
+	        usemem -m 100 -s 1000 &
+		echo `jobs -p` > /sys/fs/cgroup/mem/hwpoison/tasks
 
-corrupt-filter-memcg
+		memcg_ino=$(ls -id /sys/fs/cgroup/mem/hwpoison | cut -f1 -d' ')
+		echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg
 
-Limit injection to pages owned by memgroup. Specified by inode number
-of the memcg.
+		page-types -p `pidof init`   --hwpoison  # shall do nothing
+		page-types -p `pidof usemem` --hwpoison  # poison its pages
 
-Example:
-        mkdir /sys/fs/cgroup/mem/hwpoison
+  corrupt-filter-flags-mask, corrupt-filter-flags-value
+	When specified, only poison pages if ((page_flags & mask) ==
+	value).  This allows stress testing of many kinds of
+	pages. The page_flags are the same as in /proc/kpageflags. The
+	flag bits are defined in include/linux/kernel-page-flags.h and
+	documented in Documentation/vm/pagemap.txt
 
-        usemem -m 100 -s 1000 &
-        echo `jobs -p` > /sys/fs/cgroup/mem/hwpoison/tasks
+* Architecture specific MCE injector
 
-        memcg_ino=$(ls -id /sys/fs/cgroup/mem/hwpoison | cut -f1 -d' ')
-        echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg
+  x86 has mce-inject, mce-test
 
-        page-types -p `pidof init`   --hwpoison  # shall do nothing
-        page-types -p `pidof usemem` --hwpoison  # poison its pages
+  Some portable hwpoison test programs in mce-test, see below.
 
-corrupt-filter-flags-mask
-corrupt-filter-flags-value
-
-When specified, only poison pages if ((page_flags & mask) == value).
-This allows stress testing of many kinds of pages. The page_flags
-are the same as in /proc/kpageflags. The flag bits are defined in
-include/linux/kernel-page-flags.h and documented in
-Documentation/vm/pagemap.txt
-
-Architecture specific MCE injector
-
-x86 has mce-inject, mce-test
-
-Some portable hwpoison test programs in mce-test, see blow.
-
----
-
-References:
+References
+==========
 
 http://halobates.de/mce-lc09-2.pdf
 	Overview presentation from LinuxCon 09
@@ -174,14 +176,11 @@ git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
 	x86 specific injector
 
 
----
-
-Limitations:
-
+Limitations
+===========
 - Not all page types are supported and never will. Most kernel internal
-objects cannot be recovered, only LRU pages for now.
+  objects cannot be recovered, only LRU pages for now.
 - Right now hugepage support is missing.
 
 ---
 Andi Kleen, Oct 2009
-
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 08/32] docs/vm: hugetlbfs_reserv.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/hugetlbfs_reserv.txt | 212 ++++++++++++++++++++++------------
 1 file changed, 135 insertions(+), 77 deletions(-)

diff --git a/Documentation/vm/hugetlbfs_reserv.txt b/Documentation/vm/hugetlbfs_reserv.txt
index 9aca09a..36a87a2 100644
--- a/Documentation/vm/hugetlbfs_reserv.txt
+++ b/Documentation/vm/hugetlbfs_reserv.txt
@@ -1,6 +1,13 @@
-Hugetlbfs Reservation Overview
-------------------------------
-Huge pages as described at 'Documentation/vm/hugetlbpage.txt' are typically
+.. _hugetlbfs_reserve:
+
+=====================
+Hugetlbfs Reservation
+=====================
+
+Overview
+========
+
+Huge pages as described at :ref:`hugetlbpage` are typically
 preallocated for application use.  These huge pages are instantiated in a
 task's address space at page fault time if the VMA indicates huge pages are
 to be used.  If no huge page exists at page fault time, the task is sent
@@ -17,47 +24,55 @@ describe how huge page reserve processing is done in the v4.10 kernel.
 
 
 Audience
---------
+========
 This description is primarily targeted at kernel developers who are modifying
 hugetlbfs code.
 
 
 The Data Structures
--------------------
+===================
+
 resv_huge_pages
 	This is a global (per-hstate) count of reserved huge pages.  Reserved
 	huge pages are only available to the task which reserved them.
 	Therefore, the number of huge pages generally available is computed
-	as (free_huge_pages - resv_huge_pages).
+	as (``free_huge_pages - resv_huge_pages``).
 Reserve Map
-	A reserve map is described by the structure:
-	struct resv_map {
-		struct kref refs;
-		spinlock_t lock;
-		struct list_head regions;
-		long adds_in_progress;
-		struct list_head region_cache;
-		long region_cache_count;
-	};
+	A reserve map is described by the structure::
+
+		struct resv_map {
+			struct kref refs;
+			spinlock_t lock;
+			struct list_head regions;
+			long adds_in_progress;
+			struct list_head region_cache;
+			long region_cache_count;
+		};
+
 	There is one reserve map for each huge page mapping in the system.
 	The regions list within the resv_map describes the regions within
-	the mapping.  A region is described as:
-	struct file_region {
-		struct list_head link;
-		long from;
-		long to;
-	};
+	the mapping.  A region is described as::
+
+		struct file_region {
+			struct list_head link;
+			long from;
+			long to;
+		};
+
 	The 'from' and 'to' fields of the file region structure are huge page
 	indices into the mapping.  Depending on the type of mapping, a
 	region in the reserv_map may indicate reservations exist for the
 	range, or reservations do not exist.
 Flags for MAP_PRIVATE Reservations
 	These are stored in the bottom bits of the reservation map pointer.
-	#define HPAGE_RESV_OWNER    (1UL << 0) Indicates this task is the
-		owner of the reservations associated with the mapping.
-	#define HPAGE_RESV_UNMAPPED (1UL << 1) Indicates task originally
-		mapping this range (and creating reserves) has unmapped a
-		page from this task (the child) due to a failed COW.
+
+	``#define HPAGE_RESV_OWNER    (1UL << 0)``
+		Indicates this task is the owner of the reservations
+		associated with the mapping.
+	``#define HPAGE_RESV_UNMAPPED (1UL << 1)``
+		Indicates task originally mapping this range (and creating
+		reserves) has unmapped a page from this task (the child)
+		due to a failed COW.
 Page Flags
 	The PagePrivate page flag is used to indicate that a huge page
 	reservation must be restored when the huge page is freed.  More
@@ -65,12 +80,14 @@ Page Flags
 
 
 Reservation Map Location (Private or Shared)
---------------------------------------------
+============================================
+
 A huge page mapping or segment is either private or shared.  If private,
 it is typically only available to a single address space (task).  If shared,
 it can be mapped into multiple address spaces (tasks).  The location and
 semantics of the reservation map is significantly different for two types
 of mappings.  Location differences are:
+
 - For private mappings, the reservation map hangs off the the VMA structure.
   Specifically, vma->vm_private_data.  This reserve map is created at the
   time the mapping (mmap(MAP_PRIVATE)) is created.
@@ -82,15 +99,15 @@ of mappings.  Location differences are:
 
 
 Creating Reservations
----------------------
+=====================
 Reservations are created when a huge page backed shared memory segment is
 created (shmget(SHM_HUGETLB)) or a mapping is created via mmap(MAP_HUGETLB).
-These operations result in a call to the routine hugetlb_reserve_pages()
+These operations result in a call to the routine hugetlb_reserve_pages()::
 
-int hugetlb_reserve_pages(struct inode *inode,
-					long from, long to,
-					struct vm_area_struct *vma,
-					vm_flags_t vm_flags)
+	int hugetlb_reserve_pages(struct inode *inode,
+				  long from, long to,
+				  struct vm_area_struct *vma,
+				  vm_flags_t vm_flags)
 
 The first thing hugetlb_reserve_pages() does is check for the NORESERVE
 flag was specified in either the shmget() or mmap() call.  If NORESERVE
@@ -105,6 +122,7 @@ the 'from' and 'to' arguments have been adjusted by this offset.
 
 One of the big differences between PRIVATE and SHARED mappings is the way
 in which reservations are represented in the reservation map.
+
 - For shared mappings, an entry in the reservation map indicates a reservation
   exists or did exist for the corresponding page.  As reservations are
   consumed, the reservation map is not modified.
@@ -121,12 +139,13 @@ to indicate this VMA owns the reservations.
 The reservation map is consulted to determine how many huge page reservations
 are needed for the current mapping/segment.  For private mappings, this is
 always the value (to - from).  However, for shared mappings it is possible that some reservations may already exist within the range (to - from).  See the
-section "Reservation Map Modifications" for details on how this is accomplished.
+section :ref:`Reservation Map Modifications <resv_map_modifications>`
+for details on how this is accomplished.
 
 The mapping may be associated with a subpool.  If so, the subpool is consulted
 to ensure there is sufficient space for the mapping.  It is possible that the
 subpool has set aside reservations that can be used for the mapping.  See the
-section "Subpool Reservations" for more details.
+section :ref:`Subpool Reservations <sub_pool_resv>` for more details.
 
 After consulting the reservation map and subpool, the number of needed new
 reservations is known.  The routine hugetlb_acct_memory() is called to check
@@ -135,9 +154,11 @@ calls into routines that potentially allocate and adjust surplus page counts.
 However, within those routines the code is simply checking to ensure there
 are enough free huge pages to accommodate the reservation.  If there are,
 the global reservation count resv_huge_pages is adjusted something like the
-following.
+following::
+
 	if (resv_needed <= (resv_huge_pages - free_huge_pages))
 		resv_huge_pages += resv_needed;
+
 Note that the global lock hugetlb_lock is held when checking and adjusting
 these counters.
 
@@ -152,14 +173,18 @@ If hugetlb_reserve_pages() was successful, the global reservation count and
 reservation map associated with the mapping will be modified as required to
 ensure reservations exist for the range 'from' - 'to'.
 
+.. _consume_resv:
 
 Consuming Reservations/Allocating a Huge Page
----------------------------------------------
+=============================================
+
 Reservations are consumed when huge pages associated with the reservations
 are allocated and instantiated in the corresponding mapping.  The allocation
-is performed within the routine alloc_huge_page().
-struct page *alloc_huge_page(struct vm_area_struct *vma,
-                                    unsigned long addr, int avoid_reserve)
+is performed within the routine alloc_huge_page()::
+
+	struct page *alloc_huge_page(struct vm_area_struct *vma,
+				     unsigned long addr, int avoid_reserve)
+
 alloc_huge_page is passed a VMA pointer and a virtual address, so it can
 consult the reservation map to determine if a reservation exists.  In addition,
 alloc_huge_page takes the argument avoid_reserve which indicates reserves
@@ -170,8 +195,9 @@ page are being allocated.
 
 The helper routine vma_needs_reservation() is called to determine if a
 reservation exists for the address within the mapping(vma).  See the section
-"Reservation Map Helper Routines" for detailed information on what this
-routine does.  The value returned from vma_needs_reservation() is generally
+:ref:`Reservation Map Helper Routines <resv_map_helpers>` for detailed
+information on what this routine does.
+The value returned from vma_needs_reservation() is generally
 0 or 1.  0 if a reservation exists for the address, 1 if no reservation exists.
 If a reservation does not exist, and there is a subpool associated with the
 mapping the subpool is consulted to determine if it contains reservations.
@@ -180,21 +206,25 @@ However, in every case the avoid_reserve argument overrides the use of
 a reservation for the allocation.  After determining whether a reservation
 exists and can be used for the allocation, the routine dequeue_huge_page_vma()
 is called.  This routine takes two arguments related to reservations:
+
 - avoid_reserve, this is the same value/argument passed to alloc_huge_page()
 - chg, even though this argument is of type long only the values 0 or 1 are
   passed to dequeue_huge_page_vma.  If the value is 0, it indicates a
   reservation exists (see the section "Memory Policy and Reservations" for
   possible issues).  If the value is 1, it indicates a reservation does not
   exist and the page must be taken from the global free pool if possible.
+
 The free lists associated with the memory policy of the VMA are searched for
 a free page.  If a page is found, the value free_huge_pages is decremented
 when the page is removed from the free list.  If there was a reservation
-associated with the page, the following adjustments are made:
+associated with the page, the following adjustments are made::
+
 	SetPagePrivate(page);	/* Indicates allocating this page consumed
 				 * a reservation, and if an error is
 				 * encountered such that the page must be
 				 * freed, the reservation will be restored. */
 	resv_huge_pages--;	/* Decrement the global reservation count */
+
 Note, if no huge page can be found that satisfies the VMA's memory policy
 an attempt will be made to allocate one using the buddy allocator.  This
 brings up the issue of surplus huge pages and overcommit which is beyond
@@ -222,12 +252,14 @@ mapping.  In such cases, the reservation count and subpool free page count
 will be off by one.  This rare condition can be identified by comparing the
 return value from vma_needs_reservation and vma_commit_reservation.  If such
 a race is detected, the subpool and global reserve counts are adjusted to
-compensate.  See the section "Reservation Map Helper Routines" for more
+compensate.  See the section
+:ref:`Reservation Map Helper Routines <resv_map_helpers>` for more
 information on these routines.
 
 
 Instantiate Huge Pages
-----------------------
+======================
+
 After huge page allocation, the page is typically added to the page tables
 of the allocating task.  Before this, pages in a shared mapping are added
 to the page cache and pages in private mappings are added to an anonymous
@@ -237,7 +269,8 @@ to the global reservation count (resv_huge_pages).
 
 
 Freeing Huge Pages
-------------------
+==================
+
 Huge page freeing is performed by the routine free_huge_page().  This routine
 is the destructor for hugetlbfs compound pages.  As a result, it is only
 passed a pointer to the page struct.  When a huge page is freed, reservation
@@ -247,7 +280,8 @@ on an error path where a global reserve count must be restored.
 
 The page->private field points to any subpool associated with the page.
 If the PagePrivate flag is set, it indicates the global reserve count should
-be adjusted (see the section "Consuming Reservations/Allocating a Huge Page"
+be adjusted (see the section
+:ref:`Consuming Reservations/Allocating a Huge Page <consume_resv>`
 for information on how these are set).
 
 The routine first calls hugepage_subpool_put_pages() for the page.  If this
@@ -259,9 +293,11 @@ Therefore, the global resv_huge_pages counter is incremented in this case.
 If the PagePrivate flag was set in the page, the global resv_huge_pages counter
 will always be incremented.
 
+.. _sub_pool_resv:
 
 Subpool Reservations
---------------------
+====================
+
 There is a struct hstate associated with each huge page size.  The hstate
 tracks all huge pages of the specified size.  A subpool represents a subset
 of pages within a hstate that is associated with a mounted hugetlbfs
@@ -295,7 +331,8 @@ the global pools.
 
 
 COW and Reservations
---------------------
+====================
+
 Since shared mappings all point to and use the same underlying pages, the
 biggest reservation concern for COW is private mappings.  In this case,
 two tasks can be pointing at the same previously allocated page.  One task
@@ -326,30 +363,36 @@ faults on a non-present page.  But, the original owner of the
 mapping/reservation will behave as expected.
 
 
+.. _resv_map_modifications:
+
 Reservation Map Modifications
------------------------------
+=============================
+
 The following low level routines are used to make modifications to a
 reservation map.  Typically, these routines are not called directly.  Rather,
 a reservation map helper routine is called which calls one of these low level
 routines.  These low level routines are fairly well documented in the source
-code (mm/hugetlb.c).  These routines are:
-long region_chg(struct resv_map *resv, long f, long t);
-long region_add(struct resv_map *resv, long f, long t);
-void region_abort(struct resv_map *resv, long f, long t);
-long region_count(struct resv_map *resv, long f, long t);
+code (mm/hugetlb.c).  These routines are::
+
+	long region_chg(struct resv_map *resv, long f, long t);
+	long region_add(struct resv_map *resv, long f, long t);
+	void region_abort(struct resv_map *resv, long f, long t);
+	long region_count(struct resv_map *resv, long f, long t);
 
 Operations on the reservation map typically involve two operations:
+
 1) region_chg() is called to examine the reserve map and determine how
    many pages in the specified range [f, t) are NOT currently represented.
 
    The calling code performs global checks and allocations to determine if
    there are enough huge pages for the operation to succeed.
 
-2a) If the operation can succeed, region_add() is called to actually modify
-    the reservation map for the same range [f, t) previously passed to
-    region_chg().
-2b) If the operation can not succeed, region_abort is called for the same range
-    [f, t) to abort the operation.
+2)
+  a) If the operation can succeed, region_add() is called to actually modify
+     the reservation map for the same range [f, t) previously passed to
+     region_chg().
+  b) If the operation can not succeed, region_abort is called for the same
+     range [f, t) to abort the operation.
 
 Note that this is a two step process where region_add() and region_abort()
 are guaranteed to succeed after a prior call to region_chg() for the same
@@ -371,6 +414,7 @@ and make the appropriate adjustments.
 
 The routine region_del() is called to remove regions from a reservation map.
 It is typically called in the following situations:
+
 - When a file in the hugetlbfs filesystem is being removed, the inode will
   be released and the reservation map freed.  Before freeing the reservation
   map, all the individual file_region structures must be freed.  In this case
@@ -384,6 +428,7 @@ It is typically called in the following situations:
   removed, region_del() is called to remove the corresponding entry from the
   reservation map.  In this case, region_del is passed the range
   [page_idx, page_idx + 1).
+
 In every case, region_del() will return the number of pages removed from the
 reservation map.  In VERY rare cases, region_del() can fail.  This can only
 happen in the hole punch case where it has to split an existing file_region
@@ -403,9 +448,11 @@ outstanding (outstanding = (end - start) - region_count(resv, start, end)).
 Since the mapping is going away, the subpool and global reservation counts
 are decremented by the number of outstanding reservations.
 
+.. _resv_map_helpers:
 
 Reservation Map Helper Routines
--------------------------------
+===============================
+
 Several helper routines exist to query and modify the reservation maps.
 These routines are only interested with reservations for a specific huge
 page, so they just pass in an address instead of a range.  In addition,
@@ -414,32 +461,40 @@ or shared) and the location of the reservation map (inode or VMA) can be
 determined.  These routines simply call the underlying routines described
 in the section "Reservation Map Modifications".  However, they do take into
 account the 'opposite' meaning of reservation map entries for private and
-shared mappings and hide this detail from the caller.
+shared mappings and hide this detail from the caller::
+
+	long vma_needs_reservation(struct hstate *h,
+				   struct vm_area_struct *vma,
+				   unsigned long addr)
 
-long vma_needs_reservation(struct hstate *h,
-				struct vm_area_struct *vma, unsigned long addr)
 This routine calls region_chg() for the specified page.  If no reservation
-exists, 1 is returned.  If a reservation exists, 0 is returned.
+exists, 1 is returned.  If a reservation exists, 0 is returned::
+
+	long vma_commit_reservation(struct hstate *h,
+				    struct vm_area_struct *vma,
+				    unsigned long addr)
 
-long vma_commit_reservation(struct hstate *h,
-				struct vm_area_struct *vma, unsigned long addr)
 This calls region_add() for the specified page.  As in the case of region_chg
 and region_add, this routine is to be called after a previous call to
 vma_needs_reservation.  It will add a reservation entry for the page.  It
 returns 1 if the reservation was added and 0 if not.  The return value should
 be compared with the return value of the previous call to
 vma_needs_reservation.  An unexpected difference indicates the reservation
-map was modified between calls.
+map was modified between calls::
+
+	void vma_end_reservation(struct hstate *h,
+				 struct vm_area_struct *vma,
+				 unsigned long addr)
 
-void vma_end_reservation(struct hstate *h,
-				struct vm_area_struct *vma, unsigned long addr)
 This calls region_abort() for the specified page.  As in the case of region_chg
 and region_abort, this routine is to be called after a previous call to
 vma_needs_reservation.  It will abort/end the in progress reservation add
-operation.
+operation::
+
+	long vma_add_reservation(struct hstate *h,
+				 struct vm_area_struct *vma,
+				 unsigned long addr)
 
-long vma_add_reservation(struct hstate *h,
-				struct vm_area_struct *vma, unsigned long addr)
 This is a special wrapper routine to help facilitate reservation cleanup
 on error paths.  It is only called from the routine restore_reserve_on_error().
 This routine is used in conjunction with vma_needs_reservation in an attempt
@@ -453,8 +508,10 @@ be done on error paths.
 
 
 Reservation Cleanup in Error Paths
-----------------------------------
-As mentioned in the section "Reservation Map Helper Routines", reservation
+==================================
+
+As mentioned in the section
+:ref:`Reservation Map Helper Routines <resv_map_helpers>`, reservation
 map modifications are performed in two steps.  First vma_needs_reservation
 is called before a page is allocated.  If the allocation is successful,
 then vma_commit_reservation is called.  If not, vma_end_reservation is called.
@@ -494,13 +551,14 @@ so that a reservation will not be leaked when the huge page is freed.
 
 
 Reservations and Memory Policy
-------------------------------
+==============================
 Per-node huge page lists existed in struct hstate when git was first used
 to manage Linux code.  The concept of reservations was added some time later.
 When reservations were added, no attempt was made to take memory policy
 into account.  While cpusets are not exactly the same as memory policy, this
 comment in hugetlb_acct_memory sums up the interaction between reservations
-and cpusets/memory policy.
+and cpusets/memory policy::
+
 	/*
 	 * When cpuset is configured, it breaks the strict hugetlb page
 	 * reservation as the accounting is done on a global variable. Such
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 07/32] docs/vm: hugetlbpage.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/hugetlbpage.txt | 243 ++++++++++++++++++++++-----------------
 1 file changed, 139 insertions(+), 104 deletions(-)

diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
index faf077d..3bb0d99 100644
--- a/Documentation/vm/hugetlbpage.txt
+++ b/Documentation/vm/hugetlbpage.txt
@@ -1,3 +1,11 @@
+.. _hugetlbpage:
+
+=============
+HugeTLB Pages
+=============
+
+Overview
+========
 
 The intent of this file is to give a brief summary of hugetlbpage support in
 the Linux kernel.  This support is built on top of multiple page size support
@@ -18,53 +26,59 @@ First the Linux kernel needs to be built with the CONFIG_HUGETLBFS
 automatically when CONFIG_HUGETLBFS is selected) configuration
 options.
 
-The /proc/meminfo file provides information about the total number of
+The ``/proc/meminfo`` file provides information about the total number of
 persistent hugetlb pages in the kernel's huge page pool.  It also displays
 default huge page size and information about the number of free, reserved
 and surplus huge pages in the pool of huge pages of default size.
 The huge page size is needed for generating the proper alignment and
 size of the arguments to system calls that map huge page regions.
 
-The output of "cat /proc/meminfo" will include lines like:
+The output of ``cat /proc/meminfo`` will include lines like::
 
-.....
-HugePages_Total: uuu
-HugePages_Free:  vvv
-HugePages_Rsvd:  www
-HugePages_Surp:  xxx
-Hugepagesize:    yyy kB
-Hugetlb:         zzz kB
+	HugePages_Total: uuu
+	HugePages_Free:  vvv
+	HugePages_Rsvd:  www
+	HugePages_Surp:  xxx
+	Hugepagesize:    yyy kB
+	Hugetlb:         zzz kB
 
 where:
-HugePages_Total is the size of the pool of huge pages.
-HugePages_Free  is the number of huge pages in the pool that are not yet
-                allocated.
-HugePages_Rsvd  is short for "reserved," and is the number of huge pages for
-                which a commitment to allocate from the pool has been made,
-                but no allocation has yet been made.  Reserved huge pages
-                guarantee that an application will be able to allocate a
-                huge page from the pool of huge pages at fault time.
-HugePages_Surp  is short for "surplus," and is the number of huge pages in
-                the pool above the value in /proc/sys/vm/nr_hugepages. The
-                maximum number of surplus huge pages is controlled by
-                /proc/sys/vm/nr_overcommit_hugepages.
-Hugepagesize    is the default hugepage size (in Kb).
-Hugetlb         is the total amount of memory (in kB), consumed by huge
-                pages of all sizes.
-                If huge pages of different sizes are in use, this number
-                will exceed HugePages_Total * Hugepagesize. To get more
-                detailed information, please, refer to
-                /sys/kernel/mm/hugepages (described below).
-
-
-/proc/filesystems should also show a filesystem of type "hugetlbfs" configured
-in the kernel.
-
-/proc/sys/vm/nr_hugepages indicates the current number of "persistent" huge
+
+HugePages_Total
+	is the size of the pool of huge pages.
+HugePages_Free
+	is the number of huge pages in the pool that are not yet
+        allocated.
+HugePages_Rsvd
+	is short for "reserved," and is the number of huge pages for
+        which a commitment to allocate from the pool has been made,
+        but no allocation has yet been made.  Reserved huge pages
+        guarantee that an application will be able to allocate a
+        huge page from the pool of huge pages at fault time.
+HugePages_Surp
+	is short for "surplus," and is the number of huge pages in
+        the pool above the value in ``/proc/sys/vm/nr_hugepages``. The
+        maximum number of surplus huge pages is controlled by
+        ``/proc/sys/vm/nr_overcommit_hugepages``.
+Hugepagesize
+	is the default hugepage size (in Kb).
+Hugetlb
+        is the total amount of memory (in kB), consumed by huge
+        pages of all sizes.
+        If huge pages of different sizes are in use, this number
+        will exceed HugePages_Total \* Hugepagesize. To get more
+        detailed information, please, refer to
+        ``/sys/kernel/mm/hugepages`` (described below).
+
+
+``/proc/filesystems`` should also show a filesystem of type "hugetlbfs"
+configured in the kernel.
+
+``/proc/sys/vm/nr_hugepages`` indicates the current number of "persistent" huge
 pages in the kernel's huge page pool.  "Persistent" huge pages will be
 returned to the huge page pool when freed by a task.  A user with root
 privileges can dynamically allocate more or free some persistent huge pages
-by increasing or decreasing the value of 'nr_hugepages'.
+by increasing or decreasing the value of ``nr_hugepages``.
 
 Pages that are used as huge pages are reserved inside the kernel and cannot
 be used for other purposes.  Huge pages cannot be swapped out under
@@ -86,10 +100,10 @@ with a huge page size selection parameter "hugepagesz=<size>".  <size> must
 be specified in bytes with optional scale suffix [kKmMgG].  The default huge
 page size may be selected with the "default_hugepagesz=<size>" boot parameter.
 
-When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages
+When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
 indicates the current number of pre-allocated huge pages of the default size.
 Thus, one can use the following command to dynamically allocate/deallocate
-default sized persistent huge pages:
+default sized persistent huge pages::
 
 	echo 20 > /proc/sys/vm/nr_hugepages
 
@@ -98,7 +112,7 @@ huge page pool to 20, allocating or freeing huge pages, as required.
 
 On a NUMA platform, the kernel will attempt to distribute the huge page pool
 over all the set of allowed nodes specified by the NUMA memory policy of the
-task that modifies nr_hugepages.  The default for the allowed nodes--when the
+task that modifies ``nr_hugepages``. The default for the allowed nodes--when the
 task has default memory policy--is all on-line nodes with memory.  Allowed
 nodes with insufficient available, contiguous memory for a huge page will be
 silently skipped when allocating persistent huge pages.  See the discussion
@@ -117,51 +131,52 @@ init files.  This will enable the kernel to allocate huge pages early in
 the boot process when the possibility of getting physical contiguous pages
 is still very high.  Administrators can verify the number of huge pages
 actually allocated by checking the sysctl or meminfo.  To check the per node
-distribution of huge pages in a NUMA system, use:
+distribution of huge pages in a NUMA system, use::
 
 	cat /sys/devices/system/node/node*/meminfo | fgrep Huge
 
-/proc/sys/vm/nr_overcommit_hugepages specifies how large the pool of
-huge pages can grow, if more huge pages than /proc/sys/vm/nr_hugepages are
+``/proc/sys/vm/nr_overcommit_hugepages`` specifies how large the pool of
+huge pages can grow, if more huge pages than ``/proc/sys/vm/nr_hugepages`` are
 requested by applications.  Writing any non-zero value into this file
 indicates that the hugetlb subsystem is allowed to try to obtain that
 number of "surplus" huge pages from the kernel's normal page pool, when the
 persistent huge page pool is exhausted. As these surplus huge pages become
 unused, they are freed back to the kernel's normal page pool.
 
-When increasing the huge page pool size via nr_hugepages, any existing surplus
-pages will first be promoted to persistent huge pages.  Then, additional
+When increasing the huge page pool size via ``nr_hugepages``, any existing
+surplus pages will first be promoted to persistent huge pages.  Then, additional
 huge pages will be allocated, if necessary and if possible, to fulfill
 the new persistent huge page pool size.
 
 The administrator may shrink the pool of persistent huge pages for
-the default huge page size by setting the nr_hugepages sysctl to a
+the default huge page size by setting the ``nr_hugepages`` sysctl to a
 smaller value.  The kernel will attempt to balance the freeing of huge pages
-across all nodes in the memory policy of the task modifying nr_hugepages.
+across all nodes in the memory policy of the task modifying ``nr_hugepages``.
 Any free huge pages on the selected nodes will be freed back to the kernel's
 normal page pool.
 
-Caveat: Shrinking the persistent huge page pool via nr_hugepages such that
+Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` such that
 it becomes less than the number of huge pages in use will convert the balance
 of the in-use huge pages to surplus huge pages.  This will occur even if
 the number of surplus pages it would exceed the overcommit value.  As long as
-this condition holds--that is, until nr_hugepages+nr_overcommit_hugepages is
+this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is
 increased sufficiently, or the surplus huge pages go out of use and are freed--
 no more surplus huge pages will be allowed to be allocated.
 
 With support for multiple huge page pools at run-time available, much of
-the huge page userspace interface in /proc/sys/vm has been duplicated in sysfs.
-The /proc interfaces discussed above have been retained for backwards
-compatibility. The root huge page control directory in sysfs is:
+the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in
+sysfs.
+The ``/proc`` interfaces discussed above have been retained for backwards
+compatibility. The root huge page control directory in sysfs is::
 
 	/sys/kernel/mm/hugepages
 
 For each huge page size supported by the running kernel, a subdirectory
-will exist, of the form:
+will exist, of the form::
 
 	hugepages-${size}kB
 
-Inside each of these directories, the same set of files will exist:
+Inside each of these directories, the same set of files will exist::
 
 	nr_hugepages
 	nr_hugepages_mempolicy
@@ -176,33 +191,33 @@ which function as described above for the default huge page-sized case.
 Interaction of Task Memory Policy with Huge Page Allocation/Freeing
 ===================================================================
 
-Whether huge pages are allocated and freed via the /proc interface or
-the /sysfs interface using the nr_hugepages_mempolicy attribute, the NUMA
-nodes from which huge pages are allocated or freed are controlled by the
-NUMA memory policy of the task that modifies the nr_hugepages_mempolicy
-sysctl or attribute.  When the nr_hugepages attribute is used, mempolicy
+Whether huge pages are allocated and freed via the ``/proc`` interface or
+the ``/sysfs`` interface using the ``nr_hugepages_mempolicy`` attribute, the
+NUMA nodes from which huge pages are allocated or freed are controlled by the
+NUMA memory policy of the task that modifies the ``nr_hugepages_mempolicy``
+sysctl or attribute.  When the ``nr_hugepages`` attribute is used, mempolicy
 is ignored.
 
 The recommended method to allocate or free huge pages to/from the kernel
-huge page pool, using the nr_hugepages example above, is:
+huge page pool, using the ``nr_hugepages`` example above, is::
 
     numactl --interleave <node-list> echo 20 \
 				>/proc/sys/vm/nr_hugepages_mempolicy
 
-or, more succinctly:
+or, more succinctly::
 
     numactl -m <node-list> echo 20 >/proc/sys/vm/nr_hugepages_mempolicy
 
-This will allocate or free abs(20 - nr_hugepages) to or from the nodes
+This will allocate or free ``abs(20 - nr_hugepages)`` to or from the nodes
 specified in <node-list>, depending on whether number of persistent huge pages
 is initially less than or greater than 20, respectively.  No huge pages will be
 allocated nor freed on any node not included in the specified <node-list>.
 
-When adjusting the persistent hugepage count via nr_hugepages_mempolicy, any
+When adjusting the persistent hugepage count via ``nr_hugepages_mempolicy``, any
 memory policy mode--bind, preferred, local or interleave--may be used.  The
 resulting effect on persistent huge page allocation is as follows:
 
-1) Regardless of mempolicy mode [see Documentation/vm/numa_memory_policy.txt],
+#. Regardless of mempolicy mode [see Documentation/vm/numa_memory_policy.txt],
    persistent huge pages will be distributed across the node or nodes
    specified in the mempolicy as if "interleave" had been specified.
    However, if a node in the policy does not contain sufficient contiguous
@@ -212,7 +227,7 @@ resulting effect on persistent huge page allocation is as follows:
    possibly, allocation of persistent huge pages on nodes not allowed by
    the task's memory policy.
 
-2) One or more nodes may be specified with the bind or interleave policy.
+#. One or more nodes may be specified with the bind or interleave policy.
    If more than one node is specified with the preferred policy, only the
    lowest numeric id will be used.  Local policy will select the node where
    the task is running at the time the nodes_allowed mask is constructed.
@@ -222,20 +237,20 @@ resulting effect on persistent huge page allocation is as follows:
    indeterminate.  Thus, local policy is not very useful for this purpose.
    Any of the other mempolicy modes may be used to specify a single node.
 
-3) The nodes allowed mask will be derived from any non-default task mempolicy,
+#. The nodes allowed mask will be derived from any non-default task mempolicy,
    whether this policy was set explicitly by the task itself or one of its
    ancestors, such as numactl.  This means that if the task is invoked from a
    shell with non-default policy, that policy will be used.  One can specify a
    node list of "all" with numactl --interleave or --membind [-m] to achieve
    interleaving over all nodes in the system or cpuset.
 
-4) Any task mempolicy specified--e.g., using numactl--will be constrained by
+#. Any task mempolicy specified--e.g., using numactl--will be constrained by
    the resource limits of any cpuset in which the task runs.  Thus, there will
    be no way for a task with non-default policy running in a cpuset with a
    subset of the system nodes to allocate huge pages outside the cpuset
    without first moving to a cpuset that contains all of the desired nodes.
 
-5) Boot-time huge page allocation attempts to distribute the requested number
+#. Boot-time huge page allocation attempts to distribute the requested number
    of huge pages over all on-lines nodes with memory.
 
 Per Node Hugepages Attributes
@@ -243,22 +258,22 @@ Per Node Hugepages Attributes
 
 A subset of the contents of the root huge page control directory in sysfs,
 described above, will be replicated under each the system device of each
-NUMA node with memory in:
+NUMA node with memory in::
 
 	/sys/devices/system/node/node[0-9]*/hugepages/
 
 Under this directory, the subdirectory for each supported huge page size
-contains the following attribute files:
+contains the following attribute files::
 
 	nr_hugepages
 	free_hugepages
 	surplus_hugepages
 
-The free_' and surplus_' attribute files are read-only.  They return the number
+The free\_' and surplus\_' attribute files are read-only.  They return the number
 of free and surplus [overcommitted] huge pages, respectively, on the parent
 node.
 
-The nr_hugepages attribute returns the total number of huge pages on the
+The ``nr_hugepages`` attribute returns the total number of huge pages on the
 specified node.  When this attribute is written, the number of persistent huge
 pages on the parent node will be adjusted to the specified value, if sufficient
 resources exist, regardless of the task's mempolicy or cpuset constraints.
@@ -273,37 +288,51 @@ Using Huge Pages
 
 If the user applications are going to request huge pages using mmap system
 call, then it is required that system administrator mount a file system of
-type hugetlbfs:
+type hugetlbfs::
 
   mount -t hugetlbfs \
 	-o uid=<value>,gid=<value>,mode=<value>,pagesize=<value>,size=<value>,\
 	min_size=<value>,nr_inodes=<value> none /mnt/huge
 
 This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
-/mnt/huge.  Any files created on /mnt/huge uses huge pages.  The uid and gid
-options sets the owner and group of the root of the file system.  By default
-the uid and gid of the current process are taken.  The mode option sets the
-mode of root of file system to value & 01777.  This value is given in octal.
-By default the value 0755 is picked. If the platform supports multiple huge
-page sizes, the pagesize option can be used to specify the huge page size and
-associated pool.  pagesize is specified in bytes.  If pagesize is not specified
-the platform's default huge page size and associated pool will be used. The
-size option sets the maximum value of memory (huge pages) allowed for that
-filesystem (/mnt/huge).  The size option can be specified in bytes, or as a
-percentage of the specified huge page pool (nr_hugepages).  The size is
-rounded down to HPAGE_SIZE boundary.  The min_size option sets the minimum
-value of memory (huge pages) allowed for the filesystem.  min_size can be
-specified in the same way as size, either bytes or a percentage of the
-huge page pool.  At mount time, the number of huge pages specified by
-min_size are reserved for use by the filesystem.  If there are not enough
-free huge pages available, the mount will fail.  As huge pages are allocated
-to the filesystem and freed, the reserve count is adjusted so that the sum
-of allocated and reserved huge pages is always at least min_size.  The option
-nr_inodes sets the maximum number of inodes that /mnt/huge can use.  If the
-size, min_size or nr_inodes option is not provided on command line then
-no limits are set.  For pagesize, size, min_size and nr_inodes options, you
-can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For example, size=2K
-has the same meaning as size=2048.
+``/mnt/huge``.  Any files created on ``/mnt/huge`` uses huge pages.
+
+The ``uid`` and ``gid`` options sets the owner and group of the root of the
+file system.  By default the ``uid`` and ``gid`` of the current process
+are taken.
+
+The ``mode`` option sets the mode of root of file system to value & 01777.
+This value is given in octal. By default the value 0755 is picked.
+
+If the platform supports multiple huge page sizes, the ``pagesize`` option can
+be used to specify the huge page size and associated pool. ``pagesize``
+is specified in bytes. If ``pagesize`` is not specified the platform's
+default huge page size and associated pool will be used.
+
+The ``size`` option sets the maximum value of memory (huge pages) allowed
+for that filesystem (``/mnt/huge``). The ``size`` option can be specified
+in bytes, or as a percentage of the specified huge page pool (``nr_hugepages``).
+The size is rounded down to HPAGE_SIZE boundary.
+
+The ``min_size`` option sets the minimum value of memory (huge pages) allowed
+for the filesystem. ``min_size`` can be specified in the same way as ``size``,
+either bytes or a percentage of the huge page pool.
+At mount time, the number of huge pages specified by ``min_size`` are reserved
+for use by the filesystem.
+If there are not enough free huge pages available, the mount will fail.
+As huge pages are allocated to the filesystem and freed, the reserve count
+is adjusted so that the sum of allocated and reserved huge pages is always
+at least ``min_size``.
+
+The option ``nr_inodes`` sets the maximum number of inodes that ``/mnt/huge``
+can use.
+
+If the ``size``, ``min_size`` or ``nr_inodes`` option is not provided on
+command line then no limits are set.
+
+For ``pagesize``, ``size``, ``min_size`` and ``nr_inodes`` options, you can
+use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo.
+For example, size=2K has the same meaning as size=2048.
 
 While read system calls are supported on files that reside on hugetlb
 file systems, write system calls are not.
@@ -313,12 +342,12 @@ used to change the file attributes on hugetlbfs.
 
 Also, it is important to note that no such mount command is required if
 applications are going to use only shmat/shmget system calls or mmap with
-MAP_HUGETLB.  For an example of how to use mmap with MAP_HUGETLB see map_hugetlb
-below.
+MAP_HUGETLB.  For an example of how to use mmap with MAP_HUGETLB see
+:ref:`map_hugetlb <map_hugetlb>` below.
 
 Users who wish to use hugetlb memory via shared memory segment should be a
 member of a supplementary group and system admin needs to configure that gid
-into /proc/sys/vm/hugetlb_shm_group.  It is possible for same or different
+into ``/proc/sys/vm/hugetlb_shm_group``.  It is possible for same or different
 applications to use any combination of mmaps and shm* calls, though the mount of
 filesystem will be required for using mmap calls without MAP_HUGETLB.
 
@@ -332,15 +361,21 @@ a hugetlb page and the length is smaller than the hugepage size.
 Examples
 ========
 
-1) map_hugetlb: see tools/testing/selftests/vm/map_hugetlb.c
+.. _map_hugetlb:
+
+``map_hugetlb``
+	see tools/testing/selftests/vm/map_hugetlb.c
+
+``hugepage-shm``
+	see tools/testing/selftests/vm/hugepage-shm.c
 
-2) hugepage-shm:  see tools/testing/selftests/vm/hugepage-shm.c
+``hugepage-mmap``
+	see tools/testing/selftests/vm/hugepage-mmap.c
 
-3) hugepage-mmap:  see tools/testing/selftests/vm/hugepage-mmap.c
+The `libhugetlbfs`_  library provides a wide range of userspace tools
+to help with huge page usability, environment setup, and control.
 
-4) The libhugetlbfs (https://github.com/libhugetlbfs/libhugetlbfs) library
-   provides a wide range of userspace tools to help with huge page usability,
-   environment setup, and control.
+.. _libhugetlbfs: https://github.com/libhugetlbfs/libhugetlbfs
 
 Kernel development regression testing
 =====================================
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 03/32] docs/vm: cleancache.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/cleancache.txt | 105 ++++++++++++++++++++++++----------------
 1 file changed, 62 insertions(+), 43 deletions(-)

diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt
index e4b49df..68cba91 100644
--- a/Documentation/vm/cleancache.txt
+++ b/Documentation/vm/cleancache.txt
@@ -1,4 +1,11 @@
-MOTIVATION
+.. _cleancache:
+
+==========
+Cleancache
+==========
+
+Motivation
+==========
 
 Cleancache is a new optional feature provided by the VFS layer that
 potentially dramatically increases page cache effectiveness for
@@ -21,9 +28,10 @@ Transcendent memory "drivers" for cleancache are currently implemented
 in Xen (using hypervisor memory) and zcache (using in-kernel compressed
 memory) and other implementations are in development.
 
-FAQs are included below.
+:ref:`FAQs <faq>` are included below.
 
-IMPLEMENTATION OVERVIEW
+Implementation Overview
+=======================
 
 A cleancache "backend" that provides transcendent memory registers itself
 to the kernel's cleancache "frontend" by calling cleancache_register_ops,
@@ -80,22 +88,33 @@ different Linux threads are simultaneously putting and invalidating a page
 with the same handle, the results are indeterminate.  Callers must
 lock the page to ensure serial behavior.
 
-CLEANCACHE PERFORMANCE METRICS
+Cleancache Performance Metrics
+==============================
 
 If properly configured, monitoring of cleancache is done via debugfs in
-the /sys/kernel/debug/cleancache directory.  The effectiveness of cleancache
+the `/sys/kernel/debug/cleancache` directory.  The effectiveness of cleancache
 can be measured (across all filesystems) with:
 
-succ_gets	- number of gets that were successful
-failed_gets	- number of gets that failed
-puts		- number of puts attempted (all "succeed")
-invalidates	- number of invalidates attempted
+``succ_gets``
+	number of gets that were successful
+
+``failed_gets``
+	number of gets that failed
+
+``puts``
+	number of puts attempted (all "succeed")
+
+``invalidates``
+	number of invalidates attempted
 
 A backend implementation may provide additional metrics.
 
+.. _faq:
+
 FAQ
+===
 
-1) Where's the value? (Andrew Morton)
+* Where's the value? (Andrew Morton)
 
 Cleancache provides a significant performance benefit to many workloads
 in many environments with negligible overhead by improving the
@@ -137,8 +156,8 @@ device that stores pages of data in a compressed state.  And
 the proposed "RAMster" driver shares RAM across multiple physical
 systems.
 
-2) Why does cleancache have its sticky fingers so deep inside the
-   filesystems and VFS? (Andrew Morton and Christoph Hellwig)
+* Why does cleancache have its sticky fingers so deep inside the
+  filesystems and VFS? (Andrew Morton and Christoph Hellwig)
 
 The core hooks for cleancache in VFS are in most cases a single line
 and the minimum set are placed precisely where needed to maintain
@@ -168,9 +187,9 @@ filesystems in the future.
 The total impact of the hooks to existing fs and mm files is only
 about 40 lines added (not counting comments and blank lines).
 
-3) Why not make cleancache asynchronous and batched so it can
-   more easily interface with real devices with DMA instead
-   of copying each individual page? (Minchan Kim)
+* Why not make cleancache asynchronous and batched so it can more
+  easily interface with real devices with DMA instead of copying each
+  individual page? (Minchan Kim)
 
 The one-page-at-a-time copy semantics simplifies the implementation
 on both the frontend and backend and also allows the backend to
@@ -182,8 +201,8 @@ are avoided.  While the interface seems odd for a "real device"
 or for real kernel-addressable RAM, it makes perfect sense for
 transcendent memory.
 
-4) Why is non-shared cleancache "exclusive"?  And where is the
-   page "invalidated" after a "get"? (Minchan Kim)
+* Why is non-shared cleancache "exclusive"?  And where is the
+  page "invalidated" after a "get"? (Minchan Kim)
 
 The main reason is to free up space in transcendent memory and
 to avoid unnecessary cleancache_invalidate calls.  If you want inclusive,
@@ -193,7 +212,7 @@ be easily extended to add a "get_no_invalidate" call.
 
 The invalidate is done by the cleancache backend implementation.
 
-5) What's the performance impact?
+* What's the performance impact?
 
 Performance analysis has been presented at OLS'09 and LCA'10.
 Briefly, performance gains can be significant on most workloads,
@@ -206,7 +225,7 @@ single-core systems with slow memory-copy speeds, cleancache
 has little value, but in newer multicore machines, especially
 consolidated/virtualized machines, it has great value.
 
-6) How do I add cleancache support for filesystem X? (Boaz Harrash)
+* How do I add cleancache support for filesystem X? (Boaz Harrash)
 
 Filesystems that are well-behaved and conform to certain
 restrictions can utilize cleancache simply by making a call to
@@ -217,26 +236,26 @@ not enable the optional cleancache.
 
 Some points for a filesystem to consider:
 
-- The FS should be block-device-based (e.g. a ram-based FS such
-  as tmpfs should not enable cleancache)
-- To ensure coherency/correctness, the FS must ensure that all
-  file removal or truncation operations either go through VFS or
-  add hooks to do the equivalent cleancache "invalidate" operations
-- To ensure coherency/correctness, either inode numbers must
-  be unique across the lifetime of the on-disk file OR the
-  FS must provide an "encode_fh" function.
-- The FS must call the VFS superblock alloc and deactivate routines
-  or add hooks to do the equivalent cleancache calls done there.
-- To maximize performance, all pages fetched from the FS should
-  go through the do_mpag_readpage routine or the FS should add
-  hooks to do the equivalent (cf. btrfs)
-- Currently, the FS blocksize must be the same as PAGESIZE.  This
-  is not an architectural restriction, but no backends currently
-  support anything different.
-- A clustered FS should invoke the "shared_init_fs" cleancache
-  hook to get best performance for some backends.
-
-7) Why not use the KVA of the inode as the key? (Christoph Hellwig)
+  - The FS should be block-device-based (e.g. a ram-based FS such
+    as tmpfs should not enable cleancache)
+  - To ensure coherency/correctness, the FS must ensure that all
+    file removal or truncation operations either go through VFS or
+    add hooks to do the equivalent cleancache "invalidate" operations
+  - To ensure coherency/correctness, either inode numbers must
+    be unique across the lifetime of the on-disk file OR the
+    FS must provide an "encode_fh" function.
+  - The FS must call the VFS superblock alloc and deactivate routines
+    or add hooks to do the equivalent cleancache calls done there.
+  - To maximize performance, all pages fetched from the FS should
+    go through the do_mpag_readpage routine or the FS should add
+    hooks to do the equivalent (cf. btrfs)
+  - Currently, the FS blocksize must be the same as PAGESIZE.  This
+    is not an architectural restriction, but no backends currently
+    support anything different.
+  - A clustered FS should invoke the "shared_init_fs" cleancache
+    hook to get best performance for some backends.
+
+* Why not use the KVA of the inode as the key? (Christoph Hellwig)
 
 If cleancache would use the inode virtual address instead of
 inode/filehandle, the pool id could be eliminated.  But, this
@@ -251,7 +270,7 @@ of cleancache would be lost because the cache of pages in cleanache
 is potentially much larger than the kernel pagecache and is most
 useful if the pages survive inode cache removal.
 
-8) Why is a global variable required?
+* Why is a global variable required?
 
 The cleancache_enabled flag is checked in all of the frequently-used
 cleancache hooks.  The alternative is a function call to check a static
@@ -262,14 +281,14 @@ global variable allows cleancache to be enabled by default at compile
 time, but have insignificant performance impact when cleancache remains
 disabled at runtime.
 
-9) Does cleanache work with KVM?
+* Does cleanache work with KVM?
 
 The memory model of KVM is sufficiently different that a cleancache
 backend may have less value for KVM.  This remains to be tested,
 especially in an overcommitted system.
 
-10) Does cleancache work in userspace?  It sounds useful for
-   memory hungry caches like web browsers.  (Jamie Lokier)
+* Does cleancache work in userspace?  It sounds useful for
+  memory hungry caches like web browsers.  (Jamie Lokier)
 
 No plans yet, though we agree it sounds useful, at least for
 apps that bypass the page cache (e.g. O_DIRECT).
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 04/32] docs/vm: frontswap.txt: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/frontswap.txt | 59 ++++++++++++++++++++++++++----------------
 1 file changed, 37 insertions(+), 22 deletions(-)

diff --git a/Documentation/vm/frontswap.txt b/Documentation/vm/frontswap.txt
index c71a019..1979f43 100644
--- a/Documentation/vm/frontswap.txt
+++ b/Documentation/vm/frontswap.txt
@@ -1,13 +1,20 @@
+.. _frontswap:
+
+=========
+Frontswap
+=========
+
 Frontswap provides a "transcendent memory" interface for swap pages.
 In some environments, dramatic performance savings may be obtained because
 swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk.
 
-(Note, frontswap -- and cleancache (merged at 3.0) -- are the "frontends"
+(Note, frontswap -- and :ref:`cleancache` (merged at 3.0) -- are the "frontends"
 and the only necessary changes to the core kernel for transcendent memory;
 all other supporting code -- the "backends" -- is implemented as drivers.
-See the LWN.net article "Transcendent memory in a nutshell" for a detailed
-overview of frontswap and related kernel parts:
-https://lwn.net/Articles/454795/ )
+See the LWN.net article `Transcendent memory in a nutshell`_
+for a detailed overview of frontswap and related kernel parts)
+
+.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/
 
 Frontswap is so named because it can be thought of as the opposite of
 a "backing" store for a swap device.  The storage is assumed to be
@@ -50,19 +57,27 @@ or the store fails AND the page is invalidated.  This ensures stale data may
 never be obtained from frontswap.
 
 If properly configured, monitoring of frontswap is done via debugfs in
-the /sys/kernel/debug/frontswap directory.  The effectiveness of
+the `/sys/kernel/debug/frontswap` directory.  The effectiveness of
 frontswap can be measured (across all swap devices) with:
 
-failed_stores	- how many store attempts have failed
-loads		- how many loads were attempted (all should succeed)
-succ_stores	- how many store attempts have succeeded
-invalidates	- how many invalidates were attempted
+``failed_stores``
+	how many store attempts have failed
+
+``loads``
+	how many loads were attempted (all should succeed)
+
+``succ_stores``
+	how many store attempts have succeeded
+
+``invalidates``
+	how many invalidates were attempted
 
 A backend implementation may provide additional metrics.
 
 FAQ
+===
 
-1) Where's the value?
+* Where's the value?
 
 When a workload starts swapping, performance falls through the floor.
 Frontswap significantly increases performance in many such workloads by
@@ -117,8 +132,8 @@ A KVM implementation is underway and has been RFC'ed to lkml.  And,
 using frontswap, investigation is also underway on the use of NVM as
 a memory extension technology.
 
-2) Sure there may be performance advantages in some situations, but
-   what's the space/time overhead of frontswap?
+* Sure there may be performance advantages in some situations, but
+  what's the space/time overhead of frontswap?
 
 If CONFIG_FRONTSWAP is disabled, every frontswap hook compiles into
 nothingness and the only overhead is a few extra bytes per swapon'ed
@@ -148,8 +163,8 @@ pressure that can potentially outweigh the other advantages.  A
 backend, such as zcache, must implement policies to carefully (but
 dynamically) manage memory limits to ensure this doesn't happen.
 
-3) OK, how about a quick overview of what this frontswap patch does
-   in terms that a kernel hacker can grok?
+* OK, how about a quick overview of what this frontswap patch does
+  in terms that a kernel hacker can grok?
 
 Let's assume that a frontswap "backend" has registered during
 kernel initialization; this registration indicates that this
@@ -188,9 +203,9 @@ and (potentially) a swap device write are replaced by a "frontswap backend
 store" and (possibly) a "frontswap backend loads", which are presumably much
 faster.
 
-4) Can't frontswap be configured as a "special" swap device that is
-   just higher priority than any real swap device (e.g. like zswap,
-   or maybe swap-over-nbd/NFS)?
+* Can't frontswap be configured as a "special" swap device that is
+  just higher priority than any real swap device (e.g. like zswap,
+  or maybe swap-over-nbd/NFS)?
 
 No.  First, the existing swap subsystem doesn't allow for any kind of
 swap hierarchy.  Perhaps it could be rewritten to accommodate a hierarchy,
@@ -240,9 +255,9 @@ installation, frontswap is useless.  Swapless portable devices
 can still use frontswap but a backend for such devices must configure
 some kind of "ghost" swap device and ensure that it is never used.
 
-5) Why this weird definition about "duplicate stores"?  If a page
-   has been previously successfully stored, can't it always be
-   successfully overwritten?
+* Why this weird definition about "duplicate stores"?  If a page
+  has been previously successfully stored, can't it always be
+  successfully overwritten?
 
 Nearly always it can, but no, sometimes it cannot.  Consider an example
 where data is compressed and the original 4K page has been compressed
@@ -254,7 +269,7 @@ the old data and ensure that it is no longer accessible.  Since the
 swap subsystem then writes the new data to the read swap device,
 this is the correct course of action to ensure coherency.
 
-6) What is frontswap_shrink for?
+* What is frontswap_shrink for?
 
 When the (non-frontswap) swap subsystem swaps out a page to a real
 swap device, that page is only taking up low-value pre-allocated disk
@@ -267,7 +282,7 @@ to "repatriate" pages sent to a remote machine back to the local machine;
 this is driven using the frontswap_shrink mechanism when memory pressure
 subsides.
 
-7) Why does the frontswap patch create the new include file swapfile.h?
+* Why does the frontswap patch create the new include file swapfile.h?
 
 The frontswap code depends on some swap-subsystem-internal data
 structures that have, over the years, moved back and forth between
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 02/32] docs/vm: balance: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/balance | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/vm/balance b/Documentation/vm/balance
index 9645954..6a1fadf 100644
--- a/Documentation/vm/balance
+++ b/Documentation/vm/balance
@@ -1,3 +1,9 @@
+.. _balance:
+
+================
+Memory Balancing
+================
+
 Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
 
 Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
@@ -62,11 +68,11 @@ for non-sleepable allocations. Second, the HIGHMEM zone is also balanced,
 so as to give a fighting chance for replace_with_highmem() to get a
 HIGHMEM page, as well as to ensure that HIGHMEM allocations do not
 fall back into regular zone. This also makes sure that HIGHMEM pages
-are not leaked (for example, in situations where a HIGHMEM page is in 
+are not leaked (for example, in situations where a HIGHMEM page is in
 the swapcache but is not being used by anyone)
 
 kswapd also needs to know about the zones it should balance. kswapd is
-primarily needed in a situation where balancing can not be done, 
+primarily needed in a situation where balancing can not be done,
 probably because all allocation requests are coming from intr context
 and all process contexts are sleeping. For 2.3, kswapd does not really
 need to balance the highmem zone, since intr context does not request
@@ -89,7 +95,8 @@ pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set.
 
 
 (Good) Ideas that I have heard:
+
 1. Dynamic experience should influence balancing: number of failed requests
-for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net)
+   for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net)
 2. Implement a replace_with_highmem()-like replace_with_regular() to preserve
-dma pages. (lkd@tantalophile.demon.co.uk)
+   dma pages. (lkd@tantalophile.demon.co.uk)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 01/32] docs/vm: active_mm.txt convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport
In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com>

Just add a label for cross-referencing and indent the text to make it
``literal``

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/active_mm.txt | 174 +++++++++++++++++++++--------------------
 1 file changed, 91 insertions(+), 83 deletions(-)

diff --git a/Documentation/vm/active_mm.txt b/Documentation/vm/active_mm.txt
index dbf4581..c84471b 100644
--- a/Documentation/vm/active_mm.txt
+++ b/Documentation/vm/active_mm.txt
@@ -1,83 +1,91 @@
-List:       linux-kernel
-Subject:    Re: active_mm
-From:       Linus Torvalds <torvalds () transmeta ! com>
-Date:       1999-07-30 21:36:24
-
-Cc'd to linux-kernel, because I don't write explanations all that often,
-and when I do I feel better about more people reading them.
-
-On Fri, 30 Jul 1999, David Mosberger wrote:
->
-> Is there a brief description someplace on how "mm" vs. "active_mm" in
-> the task_struct are supposed to be used?  (My apologies if this was
-> discussed on the mailing lists---I just returned from vacation and
-> wasn't able to follow linux-kernel for a while).
-
-Basically, the new setup is:
-
- - we have "real address spaces" and "anonymous address spaces". The
-   difference is that an anonymous address space doesn't care about the
-   user-level page tables at all, so when we do a context switch into an
-   anonymous address space we just leave the previous address space
-   active.
-
-   The obvious use for a "anonymous address space" is any thread that
-   doesn't need any user mappings - all kernel threads basically fall into
-   this category, but even "real" threads can temporarily say that for
-   some amount of time they are not going to be interested in user space,
-   and that the scheduler might as well try to avoid wasting time on
-   switching the VM state around. Currently only the old-style bdflush
-   sync does that.
-
- - "tsk->mm" points to the "real address space". For an anonymous process,
-   tsk->mm will be NULL, for the logical reason that an anonymous process
-   really doesn't _have_ a real address space at all.
-
- - however, we obviously need to keep track of which address space we
-   "stole" for such an anonymous user. For that, we have "tsk->active_mm",
-   which shows what the currently active address space is.
-
-   The rule is that for a process with a real address space (ie tsk->mm is
-   non-NULL) the active_mm obviously always has to be the same as the real
-   one.
-
-   For a anonymous process, tsk->mm == NULL, and tsk->active_mm is the
-   "borrowed" mm while the anonymous process is running. When the
-   anonymous process gets scheduled away, the borrowed address space is
-   returned and cleared.
-
-To support all that, the "struct mm_struct" now has two counters: a
-"mm_users" counter that is how many "real address space users" there are,
-and a "mm_count" counter that is the number of "lazy" users (ie anonymous
-users) plus one if there are any real users.
-
-Usually there is at least one real user, but it could be that the real
-user exited on another CPU while a lazy user was still active, so you do
-actually get cases where you have a address space that is _only_ used by
-lazy users. That is often a short-lived state, because once that thread
-gets scheduled away in favour of a real thread, the "zombie" mm gets
-released because "mm_users" becomes zero.
-
-Also, a new rule is that _nobody_ ever has "init_mm" as a real MM any
-more. "init_mm" should be considered just a "lazy context when no other
-context is available", and in fact it is mainly used just at bootup when
-no real VM has yet been created. So code that used to check
-
-	if (current->mm == &init_mm)
-
-should generally just do
-
-	if (!current->mm)
-
-instead (which makes more sense anyway - the test is basically one of "do
-we have a user context", and is generally done by the page fault handler
-and things like that).
-
-Anyway, I put a pre-patch-2.3.13-1 on ftp.kernel.org just a moment ago,
-because it slightly changes the interfaces to accommodate the alpha (who
-would have thought it, but the alpha actually ends up having one of the
-ugliest context switch codes - unlike the other architectures where the MM
-and register state is separate, the alpha PALcode joins the two, and you
-need to switch both together).
-
-(From http://marc.info/?l=linux-kernel&m=93337278602211&w=2)
+.. _active_mm:
+
+=========
+Active MM
+=========
+
+::
+
+ List:       linux-kernel
+ Subject:    Re: active_mm
+ From:       Linus Torvalds <torvalds () transmeta ! com>
+ Date:       1999-07-30 21:36:24
+
+ Cc'd to linux-kernel, because I don't write explanations all that often,
+ and when I do I feel better about more people reading them.
+
+ On Fri, 30 Jul 1999, David Mosberger wrote:
+ >
+ > Is there a brief description someplace on how "mm" vs. "active_mm" in
+ > the task_struct are supposed to be used?  (My apologies if this was
+ > discussed on the mailing lists---I just returned from vacation and
+ > wasn't able to follow linux-kernel for a while).
+
+ Basically, the new setup is:
+
+  - we have "real address spaces" and "anonymous address spaces". The
+    difference is that an anonymous address space doesn't care about the
+    user-level page tables at all, so when we do a context switch into an
+    anonymous address space we just leave the previous address space
+    active.
+
+    The obvious use for a "anonymous address space" is any thread that
+    doesn't need any user mappings - all kernel threads basically fall into
+    this category, but even "real" threads can temporarily say that for
+    some amount of time they are not going to be interested in user space,
+    and that the scheduler might as well try to avoid wasting time on
+    switching the VM state around. Currently only the old-style bdflush
+    sync does that.
+
+  - "tsk->mm" points to the "real address space". For an anonymous process,
+    tsk->mm will be NULL, for the logical reason that an anonymous process
+    really doesn't _have_ a real address space at all.
+
+  - however, we obviously need to keep track of which address space we
+    "stole" for such an anonymous user. For that, we have "tsk->active_mm",
+    which shows what the currently active address space is.
+
+    The rule is that for a process with a real address space (ie tsk->mm is
+    non-NULL) the active_mm obviously always has to be the same as the real
+    one.
+
+    For a anonymous process, tsk->mm == NULL, and tsk->active_mm is the
+    "borrowed" mm while the anonymous process is running. When the
+    anonymous process gets scheduled away, the borrowed address space is
+    returned and cleared.
+
+ To support all that, the "struct mm_struct" now has two counters: a
+ "mm_users" counter that is how many "real address space users" there are,
+ and a "mm_count" counter that is the number of "lazy" users (ie anonymous
+ users) plus one if there are any real users.
+
+ Usually there is at least one real user, but it could be that the real
+ user exited on another CPU while a lazy user was still active, so you do
+ actually get cases where you have a address space that is _only_ used by
+ lazy users. That is often a short-lived state, because once that thread
+ gets scheduled away in favour of a real thread, the "zombie" mm gets
+ released because "mm_users" becomes zero.
+
+ Also, a new rule is that _nobody_ ever has "init_mm" as a real MM any
+ more. "init_mm" should be considered just a "lazy context when no other
+ context is available", and in fact it is mainly used just at bootup when
+ no real VM has yet been created. So code that used to check
+
+ 	if (current->mm == &init_mm)
+
+ should generally just do
+
+ 	if (!current->mm)
+
+ instead (which makes more sense anyway - the test is basically one of "do
+ we have a user context", and is generally done by the page fault handler
+ and things like that).
+
+ Anyway, I put a pre-patch-2.3.13-1 on ftp.kernel.org just a moment ago,
+ because it slightly changes the interfaces to accommodate the alpha (who
+ would have thought it, but the alpha actually ends up having one of the
+ ugliest context switch codes - unlike the other architectures where the MM
+ and register state is separate, the alpha PALcode joins the two, and you
+ need to switch both together).
+
+ (From http://marc.info/?l=linux-kernel&m=93337278602211&w=2)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 00/32] docs/vm: convert to ReST format
From: Mike Rapoport @ 2018-03-21 19:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm, Mike Rapoport

Hi,

These patches convert files in Documentation/vm to ReST format, add an
initial index and link it to the top level documentation.

There are no contents changes in the documentation, except few spelling
fixes. The relatively large diffstat stems from the indentation and
paragraph wrapping changes.

I've tried to keep the formatting as consistent as possible, but I could
miss some places that needed markup and add some markup where it was not
necessary.


Mike Rapoport (32):
  docs/vm: active_mm.txt convert to ReST format
  docs/vm: balance: convert to ReST format
  docs/vm: cleancache.txt: convert to ReST format
  docs/vm: frontswap.txt: convert to ReST format
  docs/vm: highmem.txt: convert to ReST format
  docs/vm: hmm.txt: convert to ReST format
  docs/vm: hugetlbpage.txt: convert to ReST format
  docs/vm: hugetlbfs_reserv.txt: convert to ReST format
  docs/vm: hwpoison.txt: convert to ReST format
  docs/vm: idle_page_tracking.txt: convert to ReST format
  docs/vm: ksm.txt: convert to ReST format
  docs/vm: mmu_notifier.txt: convert to ReST format
  docs/vm: numa_memory_policy.txt: convert to ReST format
  docs/vm: overcommit-accounting: convert to ReST format
  docs/vm: page_frags convert to ReST format
  docs/vm: numa: convert to ReST format
  docs/vm: pagemap.txt: convert to ReST format
  docs/vm: page_migration: convert to ReST format
  docs/vm: page_owner: convert to ReST format
  docs/vm: remap_file_pages.txt: conert to ReST format
  docs/vm: slub.txt: convert to ReST format
  docs/vm: soft-dirty.txt: convert to ReST format
  docs/vm: split_page_table_lock: convert to ReST format
  docs/vm: swap_numa.txt: convert to ReST format
  docs/vm: transhuge.txt: convert to ReST format
  docs/vm: unevictable-lru.txt: convert to ReST format
  docs/vm: userfaultfd.txt: convert to ReST format
  docs/vm: z3fold.txt: convert to ReST format
  docs/vm: zsmalloc.txt: convert to ReST format
  docs/vm: zswap.txt: convert to ReST format
  docs/vm: rename documentation files to .rst
  docs/vm: add index.rst and link MM documentation to top level index

 Documentation/ABI/stable/sysfs-devices-node        |   2 +-
 .../ABI/testing/sysfs-kernel-mm-hugepages          |   2 +-
 Documentation/ABI/testing/sysfs-kernel-mm-ksm      |   2 +-
 Documentation/ABI/testing/sysfs-kernel-slab        |   4 +-
 Documentation/admin-guide/kernel-parameters.txt    |  12 +-
 Documentation/dev-tools/kasan.rst                  |   2 +-
 Documentation/filesystems/proc.txt                 |   4 +-
 Documentation/filesystems/tmpfs.txt                |   2 +-
 Documentation/index.rst                            |   3 +-
 Documentation/sysctl/vm.txt                        |   6 +-
 Documentation/vm/00-INDEX                          |  58 +--
 Documentation/vm/active_mm.rst                     |  91 ++++
 Documentation/vm/active_mm.txt                     |  83 ----
 Documentation/vm/{balance => balance.rst}          |  15 +-
 .../vm/{cleancache.txt => cleancache.rst}          | 105 +++--
 Documentation/vm/conf.py                           |  10 +
 Documentation/vm/{frontswap.txt => frontswap.rst}  |  59 ++-
 Documentation/vm/{highmem.txt => highmem.rst}      |  87 ++--
 Documentation/vm/{hmm.txt => hmm.rst}              |  66 ++-
 .../{hugetlbfs_reserv.txt => hugetlbfs_reserv.rst} | 212 +++++----
 .../vm/{hugetlbpage.txt => hugetlbpage.rst}        | 243 ++++++-----
 Documentation/vm/{hwpoison.txt => hwpoison.rst}    | 141 +++---
 ...le_page_tracking.txt => idle_page_tracking.rst} |  55 ++-
 Documentation/vm/index.rst                         |  56 +++
 Documentation/vm/ksm.rst                           | 183 ++++++++
 Documentation/vm/ksm.txt                           | 178 --------
 Documentation/vm/mmu_notifier.rst                  |  99 +++++
 Documentation/vm/mmu_notifier.txt                  |  93 ----
 Documentation/vm/{numa => numa.rst}                |   6 +-
 Documentation/vm/numa_memory_policy.rst            | 485 +++++++++++++++++++++
 Documentation/vm/numa_memory_policy.txt            | 452 -------------------
 Documentation/vm/overcommit-accounting             |  80 ----
 Documentation/vm/overcommit-accounting.rst         |  87 ++++
 Documentation/vm/{page_frags => page_frags.rst}    |   5 +-
 .../vm/{page_migration => page_migration.rst}      | 149 ++++---
 .../vm/{page_owner.txt => page_owner.rst}          |  34 +-
 Documentation/vm/{pagemap.txt => pagemap.rst}      | 170 ++++----
 .../{remap_file_pages.txt => remap_file_pages.rst} |   6 +
 Documentation/vm/slub.rst                          | 361 +++++++++++++++
 Documentation/vm/slub.txt                          | 342 ---------------
 .../vm/{soft-dirty.txt => soft-dirty.rst}          |  20 +-
 ...t_page_table_lock => split_page_table_lock.rst} |  12 +-
 Documentation/vm/{swap_numa.txt => swap_numa.rst}  |  55 ++-
 Documentation/vm/{transhuge.txt => transhuge.rst}  | 286 +++++++-----
 .../{unevictable-lru.txt => unevictable-lru.rst}   | 117 +++--
 .../vm/{userfaultfd.txt => userfaultfd.rst}        |  66 +--
 Documentation/vm/{z3fold.txt => z3fold.rst}        |   6 +-
 Documentation/vm/{zsmalloc.txt => zsmalloc.rst}    |  60 ++-
 Documentation/vm/{zswap.txt => zswap.rst}          |  71 +--
 MAINTAINERS                                        |   2 +-
 arch/alpha/Kconfig                                 |   2 +-
 arch/ia64/Kconfig                                  |   2 +-
 arch/mips/Kconfig                                  |   2 +-
 arch/powerpc/Kconfig                               |   2 +-
 fs/Kconfig                                         |   2 +-
 fs/dax.c                                           |   2 +-
 fs/proc/task_mmu.c                                 |   4 +-
 include/linux/hmm.h                                |   2 +-
 include/linux/memremap.h                           |   4 +-
 include/linux/mmu_notifier.h                       |   2 +-
 include/linux/sched/mm.h                           |   4 +-
 include/linux/swap.h                               |   2 +-
 mm/Kconfig                                         |   6 +-
 mm/cleancache.c                                    |   2 +-
 mm/frontswap.c                                     |   2 +-
 mm/hmm.c                                           |   2 +-
 mm/huge_memory.c                                   |   4 +-
 mm/hugetlb.c                                       |   4 +-
 mm/ksm.c                                           |   4 +-
 mm/mmap.c                                          |   2 +-
 mm/rmap.c                                          |   6 +-
 mm/util.c                                          |   2 +-
 72 files changed, 2604 insertions(+), 2205 deletions(-)
 create mode 100644 Documentation/vm/active_mm.rst
 delete mode 100644 Documentation/vm/active_mm.txt
 rename Documentation/vm/{balance => balance.rst} (96%)
 rename Documentation/vm/{cleancache.txt => cleancache.rst} (83%)
 create mode 100644 Documentation/vm/conf.py
 rename Documentation/vm/{frontswap.txt => frontswap.rst} (91%)
 rename Documentation/vm/{highmem.txt => highmem.rst} (64%)
 rename Documentation/vm/{hmm.txt => hmm.rst} (92%)
 rename Documentation/vm/{hugetlbfs_reserv.txt => hugetlbfs_reserv.rst} (87%)
 rename Documentation/vm/{hugetlbpage.txt => hugetlbpage.rst} (64%)
 rename Documentation/vm/{hwpoison.txt => hwpoison.rst} (60%)
 rename Documentation/vm/{idle_page_tracking.txt => idle_page_tracking.rst} (72%)
 create mode 100644 Documentation/vm/index.rst
 create mode 100644 Documentation/vm/ksm.rst
 delete mode 100644 Documentation/vm/ksm.txt
 create mode 100644 Documentation/vm/mmu_notifier.rst
 delete mode 100644 Documentation/vm/mmu_notifier.txt
 rename Documentation/vm/{numa => numa.rst} (99%)
 create mode 100644 Documentation/vm/numa_memory_policy.rst
 delete mode 100644 Documentation/vm/numa_memory_policy.txt
 delete mode 100644 Documentation/vm/overcommit-accounting
 create mode 100644 Documentation/vm/overcommit-accounting.rst
 rename Documentation/vm/{page_frags => page_frags.rst} (97%)
 rename Documentation/vm/{page_migration => page_migration.rst} (63%)
 rename Documentation/vm/{page_owner.txt => page_owner.rst} (86%)
 rename Documentation/vm/{pagemap.txt => pagemap.rst} (60%)
 rename Documentation/vm/{remap_file_pages.txt => remap_file_pages.rst} (92%)
 create mode 100644 Documentation/vm/slub.rst
 delete mode 100644 Documentation/vm/slub.txt
 rename Documentation/vm/{soft-dirty.txt => soft-dirty.rst} (67%)
 rename Documentation/vm/{split_page_table_lock => split_page_table_lock.rst} (95%)
 rename Documentation/vm/{swap_numa.txt => swap_numa.rst} (74%)
 rename Documentation/vm/{transhuge.txt => transhuge.rst} (74%)
 rename Documentation/vm/{unevictable-lru.txt => unevictable-lru.rst} (92%)
 rename Documentation/vm/{userfaultfd.txt => userfaultfd.rst} (89%)
 rename Documentation/vm/{z3fold.txt => z3fold.rst} (97%)
 rename Documentation/vm/{zsmalloc.txt => zsmalloc.rst} (71%)
 rename Documentation/vm/{zswap.txt => zswap.rst} (74%)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] Input: alps - Update documentation for trackstick v3 format
From: Pali Rohár @ 2018-03-21 20:03 UTC (permalink / raw)
  To: Dmitry Torokhov, Jonathan Corbet, Masaki Ota
  Cc: linux-input, linux-doc, linux-kernel

Bits for M, R and L buttons are already processed in alps. Other newly
documented bits not yet.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
---
This is based on information which Masaki Ota provided to us.
---
 Documentation/input/devices/alps.rst | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/input/devices/alps.rst b/Documentation/input/devices/alps.rst
index 6779148e428c..b556d6bde5e1 100644
--- a/Documentation/input/devices/alps.rst
+++ b/Documentation/input/devices/alps.rst
@@ -192,10 +192,13 @@ The final v3 packet type is the trackstick packet::
  byte 0:    1    1   x7   y7    1    1    1    1
  byte 1:    0   x6   x5   x4   x3   x2   x1   x0
  byte 2:    0   y6   y5   y4   y3   y2   y1   y0
- byte 3:    0    1    0    0    1    0    0    0
- byte 4:    0   z4   z3   z2   z1   z0    ?    ?
+ byte 3:    0    1   TP   SW    1    M    R    L
+ byte 4:    0   z6   z5   z4   z3   z2   z1   z0
  byte 5:    0    0    1    1    1    1    1    1
 
+TP means Tap SW status when tap processing is enabled or Press status when press
+processing is enabled. SW means scroll up when 4 buttons are available.
+
 ALPS Absolute Mode - Protocol Version 4
 ---------------------------------------
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox