LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 51/63] Documentation: x86: convert pti.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst            |  1 +
 Documentation/x86/{pti.txt => pti.rst} | 19 ++++++++++++++-----
 2 files changed, 15 insertions(+), 5 deletions(-)
 rename Documentation/x86/{pti.txt => pti.rst} (95%)

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index a0426ab156bd..1c675cef14d7 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -21,3 +21,4 @@ Linux x86 Support
    protection-keys
    intel_mpx
    amd-memory-encryption
+   pti
diff --git a/Documentation/x86/pti.txt b/Documentation/x86/pti.rst
similarity index 95%
rename from Documentation/x86/pti.txt
rename to Documentation/x86/pti.rst
index 5cd58439ad2d..44b98f99ca8a 100644
--- a/Documentation/x86/pti.txt
+++ b/Documentation/x86/pti.rst
@@ -1,9 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+Page Table Isolation (PTI)
+==========================
+
 Overview
 ========
 
-Page Table Isolation (pti, previously known as KAISER[1]) is a
+Page Table Isolation (pti, previously known as KAISER [1]_) is a
 countermeasure against attacks on the shared user/kernel address
-space such as the "Meltdown" approach[2].
+space such as the "Meltdown" approach [2]_.
 
 To mitigate this class of attacks, we create an independent set of
 page tables for use only when running userspace applications.  When
@@ -60,6 +66,7 @@ Protection against side-channel attacks is important.  But,
 this protection comes at a cost:
 
 1. Increased Memory Use
+
   a. Each process now needs an order-1 PGD instead of order-0.
      (Consumes an additional 4k per process).
   b. The 'cpu_entry_area' structure must be 2MB in size and 2MB
@@ -68,6 +75,7 @@ this protection comes at a cost:
      is decompressed, but no space in the kernel image itself.
 
 2. Runtime Cost
+
   a. CR3 manipulation to switch between the page table copies
      must be done at interrupt, syscall, and exception entry
      and exit (it can be skipped when the kernel is interrupted,
@@ -142,8 +150,9 @@ ideally doing all of these in parallel:
    interrupted, including nested NMIs.  Using "-c" boosts the rate of
    NMIs, and using two -c with separate counters encourages nested NMIs
    and less deterministic behavior.
+   ::
 
-	while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done
+      while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done
 
 4. Launch a KVM virtual machine.
 5. Run 32-bit binaries on systems supporting the SYSCALL instruction.
@@ -182,5 +191,5 @@ that are worth noting here.
    tended to be TLB invalidation issues.  Usually invalidating
    the wrong PCID, or otherwise missing an invalidation.
 
-1. https://gruss.cc/files/kaiser.pdf
-2. https://meltdownattack.com/meltdown.pdf
+.. [1] https://gruss.cc/files/kaiser.pdf
+.. [2] https://meltdownattack.com/meltdown.pdf
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 50/63] Documentation: x86: convert amd-memory-encryption.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 ...ory-encryption.txt => amd-memory-encryption.rst} | 13 ++++++++++---
 Documentation/x86/index.rst                         |  1 +
 2 files changed, 11 insertions(+), 3 deletions(-)
 rename Documentation/x86/{amd-memory-encryption.txt => amd-memory-encryption.rst} (94%)

diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.rst
similarity index 94%
rename from Documentation/x86/amd-memory-encryption.txt
rename to Documentation/x86/amd-memory-encryption.rst
index afc41f544dab..c48d452d0718 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+AMD Memory Encryption
+=====================
+
 Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
 features found on AMD processors.
 
@@ -34,7 +40,7 @@ is operating in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware
 forces the memory encryption bit to 1.
 
 Support for SME and SEV can be determined through the CPUID instruction. The
-CPUID function 0x8000001f reports information related to SME:
+CPUID function 0x8000001f reports information related to SME::
 
 	0x8000001f[eax]:
 		Bit[0] indicates support for SME
@@ -48,14 +54,14 @@ CPUID function 0x8000001f reports information related to SME:
 			   addresses)
 
 If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to
-determine if SME is enabled and/or to enable memory encryption:
+determine if SME is enabled and/or to enable memory encryption::
 
 	0xc0010010:
 		Bit[23]   0 = memory encryption features are disabled
 			  1 = memory encryption features are enabled
 
 If SEV is supported, MSR 0xc0010131 (MSR_AMD64_SEV) can be used to determine if
-SEV is active:
+SEV is active::
 
 	0xc0010131:
 		Bit[0]	  0 = memory encryption is not active
@@ -68,6 +74,7 @@ requirements for the system.  If this bit is not set upon Linux startup then
 Linux itself will not set it and memory encryption will not be possible.
 
 The state of SME in the Linux kernel can be documented as follows:
+
 	- Supported:
 	  The CPU supports SME (determined through CPUID instruction).
 
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 20091d3e5d97..a0426ab156bd 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -20,3 +20,4 @@ Linux x86 Support
    pat
    protection-keys
    intel_mpx
+   amd-memory-encryption
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 49/63] Documentation: x86: convert intel_mpx.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst                   |   1 +
 .../x86/{intel_mpx.txt => intel_mpx.rst}      | 120 ++++++++++--------
 2 files changed, 65 insertions(+), 56 deletions(-)
 rename Documentation/x86/{intel_mpx.txt => intel_mpx.rst} (75%)

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 576628b121cc..20091d3e5d97 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -19,3 +19,4 @@ Linux x86 Support
    mtrr
    pat
    protection-keys
+   intel_mpx
diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.rst
similarity index 75%
rename from Documentation/x86/intel_mpx.txt
rename to Documentation/x86/intel_mpx.rst
index 85d0549ad846..387a640941a6 100644
--- a/Documentation/x86/intel_mpx.txt
+++ b/Documentation/x86/intel_mpx.rst
@@ -1,5 +1,11 @@
-1. Intel(R) MPX Overview
-========================
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Intel(R) Memory Protection Extensions (MPX)
+===========================================
+
+Intel(R) MPX Overview
+=====================
 
 Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
 introduced into Intel Architecture. Intel MPX provides hardware features
@@ -7,7 +13,7 @@ that can be used in conjunction with compiler changes to check memory
 references, for those references whose compile-time normal intentions are
 usurped at runtime due to buffer overflow or underflow.
 
-You can tell if your CPU supports MPX by looking in /proc/cpuinfo:
+You can tell if your CPU supports MPX by looking in /proc/cpuinfo::
 
 	cat /proc/cpuinfo  | grep ' mpx '
 
@@ -21,8 +27,8 @@ can be downloaded from
 http://software.intel.com/en-us/articles/intel-software-development-emulator
 
 
-2. How to get the advantage of MPX
-==================================
+How to get the advantage of MPX
+===============================
 
 For MPX to work, changes are required in the kernel, binutils and compiler.
 No source changes are required for applications, just a recompile.
@@ -84,14 +90,15 @@ Kernel MPX Code:
    is unmapped.
 
 
-3. How does MPX kernel code work
-================================
+How does MPX kernel code work
+=============================
 
 Handling #BR faults caused by MPX
 ---------------------------------
 
 When MPX is enabled, there are 2 new situations that can generate
 #BR faults.
+
   * new bounds tables (BT) need to be allocated to save bounds.
   * bounds violation caused by MPX instructions.
 
@@ -124,37 +131,37 @@ the kernel. It can theoretically be done completely from userspace. Here
 are a few ways this could be done. We don't think any of them are practical
 in the real-world, but here they are.
 
-Q: Can virtual space simply be reserved for the bounds tables so that we
-   never have to allocate them?
-A: MPX-enabled application will possibly create a lot of bounds tables in
-   process address space to save bounds information. These tables can take
-   up huge swaths of memory (as much as 80% of the memory on the system)
-   even if we clean them up aggressively. In the worst-case scenario, the
-   tables can be 4x the size of the data structure being tracked. IOW, a
-   1-page structure can require 4 bounds-table pages. An X-GB virtual
-   area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
-   If we were to preallocate them for the 128TB of user virtual address
-   space, we would need to reserve 512TB+2GB, which is larger than the
-   entire virtual address space today. This means they can not be reserved
-   ahead of time. Also, a single process's pre-populated bounds directory
-   consumes 2GB of virtual *AND* physical memory. IOW, it's completely
-   infeasible to prepopulate bounds directories.
-
-Q: Can we preallocate bounds table space at the same time memory is
-   allocated which might contain pointers that might eventually need
-   bounds tables?
-A: This would work if we could hook the site of each and every memory
-   allocation syscall. This can be done for small, constrained applications.
-   But, it isn't practical at a larger scale since a given app has no
-   way of controlling how all the parts of the app might allocate memory
-   (think libraries). The kernel is really the only place to intercept
-   these calls.
-
-Q: Could a bounds fault be handed to userspace and the tables allocated
-   there in a signal handler instead of in the kernel?
-A: mmap() is not on the list of safe async handler functions and even
-   if mmap() would work it still requires locking or nasty tricks to
-   keep track of the allocation state there.
+:Q: Can virtual space simply be reserved for the bounds tables so that we
+    never have to allocate them?
+:A: MPX-enabled application will possibly create a lot of bounds tables in
+    process address space to save bounds information. These tables can take
+    up huge swaths of memory (as much as 80% of the memory on the system)
+    even if we clean them up aggressively. In the worst-case scenario, the
+    tables can be 4x the size of the data structure being tracked. IOW, a
+    1-page structure can require 4 bounds-table pages. An X-GB virtual
+    area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
+    If we were to preallocate them for the 128TB of user virtual address
+    space, we would need to reserve 512TB+2GB, which is larger than the
+    entire virtual address space today. This means they can not be reserved
+    ahead of time. Also, a single process's pre-populated bounds directory
+    consumes 2GB of virtual *AND* physical memory. IOW, it's completely
+    infeasible to prepopulate bounds directories.
+
+:Q: Can we preallocate bounds table space at the same time memory is
+    allocated which might contain pointers that might eventually need
+    bounds tables?
+:A: This would work if we could hook the site of each and every memory
+    allocation syscall. This can be done for small, constrained applications.
+    But, it isn't practical at a larger scale since a given app has no
+    way of controlling how all the parts of the app might allocate memory
+    (think libraries). The kernel is really the only place to intercept
+    these calls.
+
+:Q: Could a bounds fault be handed to userspace and the tables allocated
+    there in a signal handler instead of in the kernel?
+:A: mmap() is not on the list of safe async handler functions and even
+    if mmap() would work it still requires locking or nasty tricks to
+    keep track of the allocation state there.
 
 Having ruled out all of the userspace-only approaches for managing
 bounds tables that we could think of, we create them on demand in
@@ -167,20 +174,20 @@ If a #BR is generated due to a bounds violation caused by MPX.
 We need to decode MPX instructions to get violation address and
 set this address into extended struct siginfo.
 
-The _sigfault field of struct siginfo is extended as follow:
-
-87		/* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
-88		struct {
-89			void __user *_addr; /* faulting insn/memory ref. */
-90 #ifdef __ARCH_SI_TRAPNO
-91			int _trapno;	/* TRAP # which caused the signal */
-92 #endif
-93			short _addr_lsb; /* LSB of the reported address */
-94			struct {
-95				void __user *_lower;
-96				void __user *_upper;
-97			} _addr_bnd;
-98		} _sigfault;
+The _sigfault field of struct siginfo is extended as follow::
+
+  87		/* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
+  88		struct {
+  89			void __user *_addr; /* faulting insn/memory ref. */
+  90 #ifdef __ARCH_SI_TRAPNO
+  91			int _trapno;	/* TRAP # which caused the signal */
+  92 #endif
+  93			short _addr_lsb; /* LSB of the reported address */
+  94			struct {
+  95				void __user *_lower;
+  96				void __user *_upper;
+  97			} _addr_bnd;
+  98		} _sigfault;
 
 The '_addr' field refers to violation address, and new '_addr_and'
 field refers to the upper/lower bounds when a #BR is caused.
@@ -209,9 +216,10 @@ Adding new prctl commands
 
 Two new prctl commands are added to enable and disable MPX bounds tables
 management in kernel.
+::
 
-155	#define PR_MPX_ENABLE_MANAGEMENT	43
-156	#define PR_MPX_DISABLE_MANAGEMENT	44
+  155	#define PR_MPX_ENABLE_MANAGEMENT	43
+  156	#define PR_MPX_DISABLE_MANAGEMENT	44
 
 Runtime library in userspace is responsible for allocation of bounds
 directory. So kernel have to use XSAVE instruction to get the base
@@ -223,8 +231,8 @@ into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
 command execution.
 
 
-4. Special rules
-================
+Special rules
+=============
 
 1) If userspace is requesting help from the kernel to do the management
 of bounds tables, it may not create or modify entries in the bounds directory.
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 48/63] Documentation: x86: convert protection-keys.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst                   |  1 +
 ...rotection-keys.txt => protection-keys.rst} | 33 ++++++++++++-------
 2 files changed, 22 insertions(+), 12 deletions(-)
 rename Documentation/x86/{protection-keys.txt => protection-keys.rst} (83%)

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index e06b5c0ea883..576628b121cc 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -18,3 +18,4 @@ Linux x86 Support
    tlb
    mtrr
    pat
+   protection-keys
diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.rst
similarity index 83%
rename from Documentation/x86/protection-keys.txt
rename to Documentation/x86/protection-keys.rst
index ecb0d2dadfb7..49d9833af871 100644
--- a/Documentation/x86/protection-keys.txt
+++ b/Documentation/x86/protection-keys.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Memory Protection Keys
+======================
+
 Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
 which is found on Intel's Skylake "Scalable Processor" Server CPUs.
 It will be avalable in future non-server parts.
@@ -23,9 +29,10 @@ even though there is theoretically space in the PAE PTEs.  These
 permissions are enforced on data access only and have no effect on
 instruction fetches.
 
-=========================== Syscalls ===========================
+Syscalls
+========
 
-There are 3 system calls which directly interact with pkeys:
+There are 3 system calls which directly interact with pkeys::
 
 	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
 	int pkey_free(int pkey);
@@ -37,6 +44,7 @@ pkey_alloc().  An application calls the WRPKRU instruction
 directly in order to change access permissions to memory covered
 with a key.  In this example WRPKRU is wrapped by a C function
 called pkey_set().
+::
 
 	int real_prot = PROT_READ|PROT_WRITE;
 	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
@@ -45,43 +53,44 @@ called pkey_set().
 	... application runs here
 
 Now, if the application needs to update the data at 'ptr', it can
-gain access, do the update, then remove its write access:
+gain access, do the update, then remove its write access::
 
 	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
 	*ptr = foo; // assign something
 	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
 
 Now when it frees the memory, it will also free the pkey since it
-is no longer in use:
+is no longer in use::
 
 	munmap(ptr, PAGE_SIZE);
 	pkey_free(pkey);
 
-(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
- An example implementation can be found in
- tools/testing/selftests/x86/protection_keys.c)
+.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
+          An example implementation can be found in
+          tools/testing/selftests/x86/protection_keys.c.
 
-=========================== Behavior ===========================
+Behavior
+========
 
 The kernel attempts to make protection keys consistent with the
-behavior of a plain mprotect().  For instance if you do this:
+behavior of a plain mprotect().  For instance if you do this::
 
 	mprotect(ptr, size, PROT_NONE);
 	something(ptr);
 
-you can expect the same effects with protection keys when doing this:
+you can expect the same effects with protection keys when doing this::
 
 	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
 	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
 	something(ptr);
 
 That should be true whether something() is a direct access to 'ptr'
-like:
+like::
 
 	*ptr = foo;
 
 or when the kernel does the access on the application's behalf like
-with a read():
+with a read()::
 
 	read(fd, ptr, 1);
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 47/63] Documentation: x86: convert pat.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst |   1 +
 Documentation/x86/pat.rst   | 235 ++++++++++++++++++++++++++++++++++++
 Documentation/x86/pat.txt   | 230 -----------------------------------
 3 files changed, 236 insertions(+), 230 deletions(-)
 create mode 100644 Documentation/x86/pat.rst
 delete mode 100644 Documentation/x86/pat.txt

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index d805962a7238..e06b5c0ea883 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -17,3 +17,4 @@ Linux x86 Support
    zero-page
    tlb
    mtrr
+   pat
diff --git a/Documentation/x86/pat.rst b/Documentation/x86/pat.rst
new file mode 100644
index 000000000000..bf09cab2e0bf
--- /dev/null
+++ b/Documentation/x86/pat.rst
@@ -0,0 +1,235 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+PAT (Page Attribute Table)
+==========================
+
+x86 Page Attribute Table (PAT) allows for setting the memory attribute at the
+page level granularity. PAT is complementary to the MTRR settings which allows
+for setting of memory types over physical address ranges. However, PAT is
+more flexible than MTRR due to its capability to set attributes at page level
+and also due to the fact that there are no hardware limitations on number of
+such attribute settings allowed. Added flexibility comes with guidelines for
+not having memory type aliasing for the same physical memory with multiple
+virtual addresses.
+
+PAT allows for different types of memory attributes. The most commonly used
+ones that will be supported at this time are Write-back, Uncached,
+Write-combined, Write-through and Uncached Minus.
+
+
+PAT APIs
+========
+
+There are many different APIs in the kernel that allows setting of memory
+attributes at the page level. In order to avoid aliasing, these interfaces
+should be used thoughtfully. Below is a table of interfaces available,
+their intended usage and their memory attribute relationships. Internally,
+these APIs use a reserve_memtype()/free_memtype() interface on the physical
+address range to avoid any aliasing.
+::
+
+  -------------------------------------------------------------------
+  API                    |    RAM   |  ACPI,...  |  Reserved/Holes  |
+  -----------------------|----------|------------|------------------|
+                         |          |            |                  |
+  ioremap                |    --    |    UC-     |       UC-        |
+                         |          |            |                  |
+  ioremap_cache          |    --    |    WB      |       WB         |
+                         |          |            |                  |
+  ioremap_uc             |    --    |    UC      |       UC         |
+                         |          |            |                  |
+  ioremap_nocache        |    --    |    UC-     |       UC-        |
+                         |          |            |                  |
+  ioremap_wc             |    --    |    --      |       WC         |
+                         |          |            |                  |
+  ioremap_wt             |    --    |    --      |       WT         |
+                         |          |            |                  |
+  set_memory_uc          |    UC-   |    --      |       --         |
+  set_memory_wb          |          |            |                  |
+                         |          |            |                  |
+  set_memory_wc          |    WC    |    --      |       --         |
+  set_memory_wb          |          |            |                  |
+                         |          |            |                  |
+  set_memory_wt          |    WT    |    --      |       --         |
+  set_memory_wb          |          |            |                  |
+                         |          |            |                  |
+  pci sysfs resource     |    --    |    --      |       UC-        |
+                         |          |            |                  |
+  pci sysfs resource_wc  |    --    |    --      |       WC         |
+  is IORESOURCE_PREFETCH |          |            |                  |
+                         |          |            |                  |
+  pci proc               |    --    |    --      |       UC-        |
+  !PCIIOC_WRITE_COMBINE  |          |            |                  |
+                         |          |            |                  |
+  pci proc               |    --    |    --      |       WC         |
+  PCIIOC_WRITE_COMBINE   |          |            |                  |
+                         |          |            |                  |
+  /dev/mem               |    --    |  WB/WC/UC- |    WB/WC/UC-     |
+  read-write             |          |            |                  |
+                         |          |            |                  |
+  /dev/mem               |    --    |    UC-     |       UC-        |
+  mmap SYNC flag         |          |            |                  |
+                         |          |            |                  |
+  /dev/mem               |    --    |  WB/WC/UC- |    WB/WC/UC-     |
+  mmap !SYNC flag        |          |(from exist-|  (from exist-    |
+  and                    |          |  ing alias)|    ing alias)    |
+  any alias to this area |          |            |                  |
+                         |          |            |                  |
+  /dev/mem               |    --    |    WB      |       WB         |
+  mmap !SYNC flag        |          |            |                  |
+  no alias to this area  |          |            |                  |
+  and                    |          |            |                  |
+  MTRR says WB           |          |            |                  |
+                         |          |            |                  |
+  /dev/mem               |    --    |    --      |       UC-        |
+  mmap !SYNC flag        |          |            |                  |
+  no alias to this area  |          |            |                  |
+  and                    |          |            |                  |
+  MTRR says !WB          |          |            |                  |
+                         |          |            |                  |
+  -------------------------------------------------------------------
+
+Advanced APIs for drivers
+=========================
+
+A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range,
+vmf_insert_pfn.
+
+Drivers wanting to export some pages to userspace do it by using mmap
+interface and a combination of:
+
+  1) pgprot_noncached()
+  2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn()
+
+With PAT support, a new API pgprot_writecombine is being added. So, drivers can
+continue to use the above sequence, with either pgprot_noncached() or
+pgprot_writecombine() in step 1, followed by step 2.
+
+In addition, step 2 internally tracks the region as UC or WC in memtype
+list in order to ensure no conflicting mapping.
+
+Note that this set of APIs only works with IO (non RAM) regions. If driver
+wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
+as step 0 above and also track the usage of those pages and use set_memory_wb()
+before the page is freed to free pool.
+
+MTRR effects on PAT / non-PAT systems
+=====================================
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made, should already have been ioremapped with WC attributes or PAT entries,
+this can be done by using ioremap_wc() / set_memory_wc().  Devices which
+combine areas of IO memory desired to remain uncacheable with areas where
+write-combining is desirable should consider use of ioremap_uc() followed by
+set_memory_wc() to white-list effective write-combined areas.  Such use is
+nevertheless discouraged as the effective memory type is considered
+implementation defined, yet this strategy can be used as last resort on devices
+with size-constrained regions where otherwise MTRR write-combining would
+otherwise not be effective.
+::
+
+  ----------------------------------------------------------------------
+  MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
+  ----------------------------------------------------------------------
+                                                    Non-PAT |  PAT
+       PAT
+       |PCD
+       ||PWT
+       |||
+  WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
+  WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
+  WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
+  WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
+  ----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
+
+.. note:: -- in the above table mean "Not suggested usage for the API". Some
+  of the --'s are strictly enforced by the kernel. Some others are not really
+  enforced today, but may be enforced in future.
+
+For ioremap and pci access through /sys or /proc - The actual type returned
+can be more restrictive, in case of any existing aliasing for that address.
+For example: If there is an existing uncached mapping, a new ioremap_wc can
+return uncached mapping in place of write-combine requested.
+
+set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
+will first make a region uc, wc or wt and switch it back to wb after use.
+
+Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
+interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
+
+Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
+types.
+
+Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.
+
+
+PAT debugging
+=============
+
+With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by::
+
+  # mount -t debugfs debugfs /sys/kernel/debug
+  # cat /sys/kernel/debug/x86/pat_memtype_list
+  PAT memtype list:
+  uncached-minus @ 0x7fadf000-0x7fae0000
+  uncached-minus @ 0x7fb19000-0x7fb1a000
+  uncached-minus @ 0x7fb1a000-0x7fb1b000
+  uncached-minus @ 0x7fb1b000-0x7fb1c000
+  uncached-minus @ 0x7fb1c000-0x7fb1d000
+  uncached-minus @ 0x7fb1d000-0x7fb1e000
+  uncached-minus @ 0x7fb1e000-0x7fb25000
+  uncached-minus @ 0x7fb25000-0x7fb26000
+  uncached-minus @ 0x7fb26000-0x7fb27000
+  uncached-minus @ 0x7fb27000-0x7fb28000
+  uncached-minus @ 0x7fb28000-0x7fb2e000
+  uncached-minus @ 0x7fb2e000-0x7fb2f000
+  uncached-minus @ 0x7fb2f000-0x7fb30000
+  uncached-minus @ 0x7fb31000-0x7fb32000
+  uncached-minus @ 0x80000000-0x90000000
+
+This list shows physical address ranges and various PAT settings used to
+access those physical address ranges.
+
+Another, more verbose way of getting PAT related debug messages is with
+"debugpat" boot parameter. With this parameter, various debug messages are
+printed to dmesg log.
+
+PAT Initialization
+==================
+
+The following table describes how PAT is initialized under various
+configurations. The PAT MSR must be updated by Linux in order to support WC
+and WT attributes. Otherwise, the PAT MSR has the value programmed in it
+by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests.
+::
+
+  MTRR PAT   Call Sequence               PAT State  PAT MSR
+  =========================================================
+  E    E     MTRR -> PAT init            Enabled    OS
+  E    D     MTRR -> PAT init            Disabled    -
+  D    E     MTRR -> PAT disable         Disabled   BIOS
+  D    D     MTRR -> PAT disable         Disabled    -
+  -    np/E  PAT  -> PAT disable         Disabled   BIOS
+  -    np/D  PAT  -> PAT disable         Disabled    -
+  E    !P/E  MTRR -> PAT init            Disabled   BIOS
+  D    !P/E  MTRR -> PAT disable         Disabled   BIOS
+  !M   !P/E  MTRR stub -> PAT disable    Disabled   BIOS
+
+  Legend
+  ------------------------------------------------
+  E         Feature enabled in CPU
+  D	   Feature disabled/unsupported in CPU
+  np	   "nopat" boot option specified
+  !P	   CONFIG_X86_PAT option unset
+  !M	   CONFIG_MTRR option unset
+  Enabled   PAT state set to enabled
+  Disabled  PAT state set to disabled
+  OS        PAT initializes PAT MSR with OS setting
+  BIOS      PAT keeps PAT MSR with BIOS setting
+
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
deleted file mode 100644
index 481d8d8536ac..000000000000
--- a/Documentation/x86/pat.txt
+++ /dev/null
@@ -1,230 +0,0 @@
-
-PAT (Page Attribute Table)
-
-x86 Page Attribute Table (PAT) allows for setting the memory attribute at the
-page level granularity. PAT is complementary to the MTRR settings which allows
-for setting of memory types over physical address ranges. However, PAT is
-more flexible than MTRR due to its capability to set attributes at page level
-and also due to the fact that there are no hardware limitations on number of
-such attribute settings allowed. Added flexibility comes with guidelines for
-not having memory type aliasing for the same physical memory with multiple
-virtual addresses.
-
-PAT allows for different types of memory attributes. The most commonly used
-ones that will be supported at this time are Write-back, Uncached,
-Write-combined, Write-through and Uncached Minus.
-
-
-PAT APIs
---------
-
-There are many different APIs in the kernel that allows setting of memory
-attributes at the page level. In order to avoid aliasing, these interfaces
-should be used thoughtfully. Below is a table of interfaces available,
-their intended usage and their memory attribute relationships. Internally,
-these APIs use a reserve_memtype()/free_memtype() interface on the physical
-address range to avoid any aliasing.
-
-
--------------------------------------------------------------------
-API                    |    RAM   |  ACPI,...  |  Reserved/Holes  |
------------------------|----------|------------|------------------|
-                       |          |            |                  |
-ioremap                |    --    |    UC-     |       UC-        |
-                       |          |            |                  |
-ioremap_cache          |    --    |    WB      |       WB         |
-                       |          |            |                  |
-ioremap_uc             |    --    |    UC      |       UC         |
-                       |          |            |                  |
-ioremap_nocache        |    --    |    UC-     |       UC-        |
-                       |          |            |                  |
-ioremap_wc             |    --    |    --      |       WC         |
-                       |          |            |                  |
-ioremap_wt             |    --    |    --      |       WT         |
-                       |          |            |                  |
-set_memory_uc          |    UC-   |    --      |       --         |
- set_memory_wb         |          |            |                  |
-                       |          |            |                  |
-set_memory_wc          |    WC    |    --      |       --         |
- set_memory_wb         |          |            |                  |
-                       |          |            |                  |
-set_memory_wt          |    WT    |    --      |       --         |
- set_memory_wb         |          |            |                  |
-                       |          |            |                  |
-pci sysfs resource     |    --    |    --      |       UC-        |
-                       |          |            |                  |
-pci sysfs resource_wc  |    --    |    --      |       WC         |
- is IORESOURCE_PREFETCH|          |            |                  |
-                       |          |            |                  |
-pci proc               |    --    |    --      |       UC-        |
- !PCIIOC_WRITE_COMBINE |          |            |                  |
-                       |          |            |                  |
-pci proc               |    --    |    --      |       WC         |
- PCIIOC_WRITE_COMBINE  |          |            |                  |
-                       |          |            |                  |
-/dev/mem               |    --    |  WB/WC/UC- |    WB/WC/UC-     |
- read-write            |          |            |                  |
-                       |          |            |                  |
-/dev/mem               |    --    |    UC-     |       UC-        |
- mmap SYNC flag        |          |            |                  |
-                       |          |            |                  |
-/dev/mem               |    --    |  WB/WC/UC- |    WB/WC/UC-     |
- mmap !SYNC flag       |          |(from exist-|  (from exist-    |
- and                   |          |  ing alias)|    ing alias)    |
- any alias to this area|          |            |                  |
-                       |          |            |                  |
-/dev/mem               |    --    |    WB      |       WB         |
- mmap !SYNC flag       |          |            |                  |
- no alias to this area |          |            |                  |
- and                   |          |            |                  |
- MTRR says WB          |          |            |                  |
-                       |          |            |                  |
-/dev/mem               |    --    |    --      |       UC-        |
- mmap !SYNC flag       |          |            |                  |
- no alias to this area |          |            |                  |
- and                   |          |            |                  |
- MTRR says !WB         |          |            |                  |
-                       |          |            |                  |
--------------------------------------------------------------------
-
-Advanced APIs for drivers
--------------------------
-A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range,
-vmf_insert_pfn
-
-Drivers wanting to export some pages to userspace do it by using mmap
-interface and a combination of
-1) pgprot_noncached()
-2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn()
-
-With PAT support, a new API pgprot_writecombine is being added. So, drivers can
-continue to use the above sequence, with either pgprot_noncached() or
-pgprot_writecombine() in step 1, followed by step 2.
-
-In addition, step 2 internally tracks the region as UC or WC in memtype
-list in order to ensure no conflicting mapping.
-
-Note that this set of APIs only works with IO (non RAM) regions. If driver
-wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
-as step 0 above and also track the usage of those pages and use set_memory_wb()
-before the page is freed to free pool.
-
-MTRR effects on PAT / non-PAT systems
--------------------------------------
-
-The following table provides the effects of using write-combining MTRRs when
-using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
-mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
-be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
-is made, should already have been ioremapped with WC attributes or PAT entries,
-this can be done by using ioremap_wc() / set_memory_wc().  Devices which
-combine areas of IO memory desired to remain uncacheable with areas where
-write-combining is desirable should consider use of ioremap_uc() followed by
-set_memory_wc() to white-list effective write-combined areas.  Such use is
-nevertheless discouraged as the effective memory type is considered
-implementation defined, yet this strategy can be used as last resort on devices
-with size-constrained regions where otherwise MTRR write-combining would
-otherwise not be effective.
-
-----------------------------------------------------------------------
-MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
-----------------------------------------------------------------------
-                                                  Non-PAT |  PAT
-     PAT
-     |PCD
-     ||PWT
-     |||
-WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
-WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
-WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
-WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
-----------------------------------------------------------------------
-
-(*) denotes implementation defined and is discouraged
-
-Notes:
-
--- in the above table mean "Not suggested usage for the API". Some of the --'s
-are strictly enforced by the kernel. Some others are not really enforced
-today, but may be enforced in future.
-
-For ioremap and pci access through /sys or /proc - The actual type returned
-can be more restrictive, in case of any existing aliasing for that address.
-For example: If there is an existing uncached mapping, a new ioremap_wc can
-return uncached mapping in place of write-combine requested.
-
-set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
-will first make a region uc, wc or wt and switch it back to wb after use.
-
-Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
-interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
-
-Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
-types.
-
-Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.
-
-
-PAT debugging
--------------
-
-With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by
-
-# mount -t debugfs debugfs /sys/kernel/debug
-# cat /sys/kernel/debug/x86/pat_memtype_list
-PAT memtype list:
-uncached-minus @ 0x7fadf000-0x7fae0000
-uncached-minus @ 0x7fb19000-0x7fb1a000
-uncached-minus @ 0x7fb1a000-0x7fb1b000
-uncached-minus @ 0x7fb1b000-0x7fb1c000
-uncached-minus @ 0x7fb1c000-0x7fb1d000
-uncached-minus @ 0x7fb1d000-0x7fb1e000
-uncached-minus @ 0x7fb1e000-0x7fb25000
-uncached-minus @ 0x7fb25000-0x7fb26000
-uncached-minus @ 0x7fb26000-0x7fb27000
-uncached-minus @ 0x7fb27000-0x7fb28000
-uncached-minus @ 0x7fb28000-0x7fb2e000
-uncached-minus @ 0x7fb2e000-0x7fb2f000
-uncached-minus @ 0x7fb2f000-0x7fb30000
-uncached-minus @ 0x7fb31000-0x7fb32000
-uncached-minus @ 0x80000000-0x90000000
-
-This list shows physical address ranges and various PAT settings used to
-access those physical address ranges.
-
-Another, more verbose way of getting PAT related debug messages is with
-"debugpat" boot parameter. With this parameter, various debug messages are
-printed to dmesg log.
-
-PAT Initialization
-------------------
-
-The following table describes how PAT is initialized under various
-configurations. The PAT MSR must be updated by Linux in order to support WC
-and WT attributes. Otherwise, the PAT MSR has the value programmed in it
-by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests.
-
- MTRR PAT   Call Sequence               PAT State  PAT MSR
- =========================================================
- E    E     MTRR -> PAT init            Enabled    OS
- E    D     MTRR -> PAT init            Disabled    -
- D    E     MTRR -> PAT disable         Disabled   BIOS
- D    D     MTRR -> PAT disable         Disabled    -
- -    np/E  PAT  -> PAT disable         Disabled   BIOS
- -    np/D  PAT  -> PAT disable         Disabled    -
- E    !P/E  MTRR -> PAT init            Disabled   BIOS
- D    !P/E  MTRR -> PAT disable         Disabled   BIOS
- !M   !P/E  MTRR stub -> PAT disable    Disabled   BIOS
-
- Legend
- ------------------------------------------------
- E         Feature enabled in CPU
- D	   Feature disabled/unsupported in CPU
- np	   "nopat" boot option specified
- !P	   CONFIG_X86_PAT option unset
- !M	   CONFIG_MTRR option unset
- Enabled   PAT state set to enabled
- Disabled  PAT state set to disabled
- OS        PAT initializes PAT MSR with OS setting
- BIOS      PAT keeps PAT MSR with BIOS setting
-
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 46/63] Documentation: x86: convert mtrr.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst |   1 +
 Documentation/x86/mtrr.rst  | 350 ++++++++++++++++++++++++++++++++++++
 Documentation/x86/mtrr.txt  | 329 ---------------------------------
 3 files changed, 351 insertions(+), 329 deletions(-)
 create mode 100644 Documentation/x86/mtrr.rst
 delete mode 100644 Documentation/x86/mtrr.txt

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index fd54b859db9b..d805962a7238 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -16,3 +16,4 @@ Linux x86 Support
    earlyprintk
    zero-page
    tlb
+   mtrr
diff --git a/Documentation/x86/mtrr.rst b/Documentation/x86/mtrr.rst
new file mode 100644
index 000000000000..72da61022861
--- /dev/null
+++ b/Documentation/x86/mtrr.rst
@@ -0,0 +1,350 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================
+MTRR (Memory Type Range Register) control
+=========================================
+
+:Ahthors: - Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+          - Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+
+Phasing out MTRR use
+====================
+
+MTRR use is replaced on modern x86 hardware with PAT. Direct MTRR use by
+drivers on Linux is now completely phased out, device drivers should use
+arch_phys_wc_add() in combination with ioremap_wc() to make MTRR effective on
+non-PAT systems while a no-op but equally effective on PAT enabled systems.
+
+Even if Linux does not use MTRRs directly, some x86 platform firmware may still
+set up MTRRs early before booting the OS. They do this as some platform
+firmware may still have implemented access to MTRRs which would be controlled
+and handled by the platform firmware directly. An example of platform use of
+MTRRs is through the use of SMI handlers, one case could be for fan control,
+the platform code would need uncachable access to some of its fan control
+registers. Such platform access does not need any Operating System MTRR code in
+place other than mtrr_type_lookup() to ensure any OS specific mapping requests
+are aligned with platform MTRR setup. If MTRRs are only set up by the platform
+firmware code though and the OS does not make any specific MTRR mapping
+requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID.
+
+For details refer to :doc:`x86/pat`.
+
+On Intel P6 family processors (Pentium Pro, Pentium II and later)
+the Memory Type Range Registers (MTRRs) may be used to control
+processor access to memory ranges. This is most useful when you have
+a video (VGA) card on a PCI or AGP bus. Enabling write-combining
+allows bus write transfers to be combined into a larger transfer
+before bursting over the PCI/AGP bus. This can increase performance
+of image write operations 2.5 times or more.
+
+The Cyrix 6x86, 6x86MX and M II processors have Address Range
+Registers (ARRs) which provide a similar functionality to MTRRs. For
+these, the ARRs are used to emulate the MTRRs.
+
+The AMD K6-2 (stepping 8 and above) and K6-3 processors have two
+MTRRs. These are supported.  The AMD Athlon family provide 8 Intel
+style MTRRs.
+
+The Centaur C6 (WinChip) has 8 MCRs, allowing write-combining. These
+are supported.
+
+The VIA Cyrix III and VIA C3 CPUs offer 8 Intel style MTRRs.
+
+The CONFIG_MTRR option creates a /proc/mtrr file which may be used
+to manipulate your MTRRs. Typically the X server should use
+this. This should have a reasonably generic interface so that
+similar control registers on other processors can be easily
+supported.
+
+There are two interfaces to /proc/mtrr: one is an ASCII interface
+which allows you to read and write. The other is an ioctl()
+interface. The ASCII interface is meant for administration. The
+ioctl() interface is meant for C programs (i.e. the X server). The
+interfaces are described below, with sample commands and C code.
+
+Reading MTRRs from the shell::
+
+  % cat /proc/mtrr
+  reg00: base=0x00000000 (   0MB), size= 128MB: write-back, count=1
+  reg01: base=0x08000000 ( 128MB), size=  64MB: write-back, count=1
+
+Creating MTRRs from the C-shell::
+
+  # echo "base=0xf8000000 size=0x400000 type=write-combining" >! /proc/mtrr
+
+or if you use bash::
+
+  # echo "base=0xf8000000 size=0x400000 type=write-combining" >| /proc/mtrr
+
+And the result thereof::
+
+  % cat /proc/mtrr
+  reg00: base=0x00000000 (   0MB), size= 128MB: write-back, count=1
+  reg01: base=0x08000000 ( 128MB), size=  64MB: write-back, count=1
+  reg02: base=0xf8000000 (3968MB), size=   4MB: write-combining, count=1
+
+This is for video RAM at base address 0xf8000000 and size 4 megabytes. To
+find out your base address, you need to look at the output of your X
+server, which tells you where the linear framebuffer address is. A
+typical line that you may get is:
+
+(--) S3: PCI: 968 rev 0, Linear FB @ 0xf8000000
+
+Note that you should only use the value from the X server, as it may
+move the framebuffer base address, so the only value you can trust is
+that reported by the X server.
+
+To find out the size of your framebuffer (what, you don't actually
+know?), the following line will tell you:
+
+(--) S3: videoram:  4096k
+
+That's 4 megabytes, which is 0x400000 bytes (in hexadecimal).
+A patch is being written for XFree86 which will make this automatic:
+in other words the X server will manipulate /proc/mtrr using the
+ioctl() interface, so users won't have to do anything. If you use a
+commercial X server, lobby your vendor to add support for MTRRs.
+
+
+Creating overlapping MTRRs
+==========================
+::
+
+  %echo "base=0xfb000000 size=0x1000000 type=write-combining" >/proc/mtrr
+  %echo "base=0xfb000000 size=0x1000 type=uncachable" >/proc/mtrr
+
+And the results::
+
+  % cat /proc/mtrr
+  reg00: base=0x00000000 (   0MB), size=  64MB: write-back, count=1
+  reg01: base=0xfb000000 (4016MB), size=  16MB: write-combining, count=1
+  reg02: base=0xfb000000 (4016MB), size=   4kB: uncachable, count=1
+
+Some cards (especially Voodoo Graphics boards) need this 4 kB area
+excluded from the beginning of the region because it is used for
+registers.
+
+NOTE: You can only create type=uncachable region, if the first
+region that you created is type=write-combining.
+
+
+Removing MTRRs from the C-shel
+==============================
+::
+
+  % echo "disable=2" >! /proc/mtrr
+
+or using bash::
+
+  % echo "disable=2" >| /proc/mtrr
+
+
+Reading MTRRs from a C program using ioctl()'s
+==============================================
+::
+
+  /*  mtrr-show.c
+
+      Source file for mtrr-show (example program to show MTRRs using ioctl()'s)
+
+      Copyright (C) 1997-1998  Richard Gooch
+
+      This program is free software; you can redistribute it and/or modify
+      it under the terms of the GNU General Public License as published by
+      the Free Software Foundation; either version 2 of the License, or
+      (at your option) any later version.
+
+      This program is distributed in the hope that it will be useful,
+      but WITHOUT ANY WARRANTY; without even the implied warranty of
+      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+      GNU General Public License for more details.
+
+      You should have received a copy of the GNU General Public License
+      along with this program; if not, write to the Free Software
+      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+      Richard Gooch may be reached by email at  rgooch@atnf.csiro.au
+      The postal address is:
+        Richard Gooch, c/o ATNF, P. O. Box 76, Epping, N.S.W., 2121, Australia.
+  */
+
+  /*
+      This program will use an ioctl() on /proc/mtrr to show the current MTRR
+      settings. This is an alternative to reading /proc/mtrr.
+
+
+      Written by      Richard Gooch   17-DEC-1997
+
+      Last updated by Richard Gooch   2-MAY-1998
+
+
+  */
+  #include <stdio.h>
+  #include <stdlib.h>
+  #include <string.h>
+  #include <sys/types.h>
+  #include <sys/stat.h>
+  #include <fcntl.h>
+  #include <sys/ioctl.h>
+  #include <errno.h>
+  #include <asm/mtrr.h>
+
+  #define TRUE 1
+  #define FALSE 0
+  #define ERRSTRING strerror (errno)
+
+  static char *mtrr_strings[MTRR_NUM_TYPES] =
+  {
+      "uncachable",               /* 0 */
+      "write-combining",          /* 1 */
+      "?",                        /* 2 */
+      "?",                        /* 3 */
+      "write-through",            /* 4 */
+      "write-protect",            /* 5 */
+      "write-back",               /* 6 */
+  };
+
+  int main ()
+  {
+      int fd;
+      struct mtrr_gentry gentry;
+
+      if ( ( fd = open ("/proc/mtrr", O_RDONLY, 0) ) == -1 )
+      {
+    if (errno == ENOENT)
+    {
+        fputs ("/proc/mtrr not found: not supported or you don't have a PPro?\n",
+        stderr);
+        exit (1);
+    }
+    fprintf (stderr, "Error opening /proc/mtrr\t%s\n", ERRSTRING);
+    exit (2);
+      }
+      for (gentry.regnum = 0; ioctl (fd, MTRRIOC_GET_ENTRY, &gentry) == 0;
+    ++gentry.regnum)
+      {
+    if (gentry.size < 1)
+    {
+        fprintf (stderr, "Register: %u disabled\n", gentry.regnum);
+        continue;
+    }
+    fprintf (stderr, "Register: %u base: 0x%lx size: 0x%lx type: %s\n",
+      gentry.regnum, gentry.base, gentry.size,
+      mtrr_strings[gentry.type]);
+      }
+      if (errno == EINVAL) exit (0);
+      fprintf (stderr, "Error doing ioctl(2) on /dev/mtrr\t%s\n", ERRSTRING);
+      exit (3);
+  }   /*  End Function main  */
+
+
+Creating MTRRs from a C programme using ioctl()'s
+=================================================
+::
+
+  /*  mtrr-add.c
+
+      Source file for mtrr-add (example programme to add an MTRRs using ioctl())
+
+      Copyright (C) 1997-1998  Richard Gooch
+
+      This program is free software; you can redistribute it and/or modify
+      it under the terms of the GNU General Public License as published by
+      the Free Software Foundation; either version 2 of the License, or
+      (at your option) any later version.
+
+      This program is distributed in the hope that it will be useful,
+      but WITHOUT ANY WARRANTY; without even the implied warranty of
+      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+      GNU General Public License for more details.
+
+      You should have received a copy of the GNU General Public License
+      along with this program; if not, write to the Free Software
+      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+      Richard Gooch may be reached by email at  rgooch@atnf.csiro.au
+      The postal address is:
+        Richard Gooch, c/o ATNF, P. O. Box 76, Epping, N.S.W., 2121, Australia.
+  */
+
+  /*
+      This programme will use an ioctl() on /proc/mtrr to add an entry. The first
+      available mtrr is used. This is an alternative to writing /proc/mtrr.
+
+
+      Written by      Richard Gooch   17-DEC-1997
+
+      Last updated by Richard Gooch   2-MAY-1998
+
+
+  */
+  #include <stdio.h>
+  #include <string.h>
+  #include <stdlib.h>
+  #include <unistd.h>
+  #include <sys/types.h>
+  #include <sys/stat.h>
+  #include <fcntl.h>
+  #include <sys/ioctl.h>
+  #include <errno.h>
+  #include <asm/mtrr.h>
+
+  #define TRUE 1
+  #define FALSE 0
+  #define ERRSTRING strerror (errno)
+
+  static char *mtrr_strings[MTRR_NUM_TYPES] =
+  {
+      "uncachable",               /* 0 */
+      "write-combining",          /* 1 */
+      "?",                        /* 2 */
+      "?",                        /* 3 */
+      "write-through",            /* 4 */
+      "write-protect",            /* 5 */
+      "write-back",               /* 6 */
+  };
+
+  int main (int argc, char **argv)
+  {
+      int fd;
+      struct mtrr_sentry sentry;
+
+      if (argc != 4)
+      {
+    fprintf (stderr, "Usage:\tmtrr-add base size type\n");
+    exit (1);
+      }
+      sentry.base = strtoul (argv[1], NULL, 0);
+      sentry.size = strtoul (argv[2], NULL, 0);
+      for (sentry.type = 0; sentry.type < MTRR_NUM_TYPES; ++sentry.type)
+      {
+    if (strcmp (argv[3], mtrr_strings[sentry.type]) == 0) break;
+      }
+      if (sentry.type >= MTRR_NUM_TYPES)
+      {
+    fprintf (stderr, "Illegal type: \"%s\"\n", argv[3]);
+    exit (2);
+      }
+      if ( ( fd = open ("/proc/mtrr", O_WRONLY, 0) ) == -1 )
+      {
+    if (errno == ENOENT)
+    {
+        fputs ("/proc/mtrr not found: not supported or you don't have a PPro?\n",
+        stderr);
+        exit (3);
+    }
+    fprintf (stderr, "Error opening /proc/mtrr\t%s\n", ERRSTRING);
+    exit (4);
+      }
+      if (ioctl (fd, MTRRIOC_ADD_ENTRY, &sentry) == -1)
+      {
+    fprintf (stderr, "Error doing ioctl(2) on /dev/mtrr\t%s\n", ERRSTRING);
+    exit (5);
+      }
+      fprintf (stderr, "Sleeping for 5 seconds so you can see the new entry\n");
+      sleep (5);
+      close (fd);
+      fputs ("I've just closed /proc/mtrr so now the new entry should be gone\n",
+      stderr);
+  }   /*  End Function main  */
diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
deleted file mode 100644
index dc3e703913ac..000000000000
--- a/Documentation/x86/mtrr.txt
+++ /dev/null
@@ -1,329 +0,0 @@
-MTRR (Memory Type Range Register) control
-
-Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
-Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
-
-===============================================================================
-Phasing out MTRR use
-
-MTRR use is replaced on modern x86 hardware with PAT. Direct MTRR use by
-drivers on Linux is now completely phased out, device drivers should use
-arch_phys_wc_add() in combination with ioremap_wc() to make MTRR effective on
-non-PAT systems while a no-op but equally effective on PAT enabled systems.
-
-Even if Linux does not use MTRRs directly, some x86 platform firmware may still
-set up MTRRs early before booting the OS. They do this as some platform
-firmware may still have implemented access to MTRRs which would be controlled
-and handled by the platform firmware directly. An example of platform use of
-MTRRs is through the use of SMI handlers, one case could be for fan control,
-the platform code would need uncachable access to some of its fan control
-registers. Such platform access does not need any Operating System MTRR code in
-place other than mtrr_type_lookup() to ensure any OS specific mapping requests
-are aligned with platform MTRR setup. If MTRRs are only set up by the platform
-firmware code though and the OS does not make any specific MTRR mapping
-requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID.
-
-For details refer to Documentation/x86/pat.txt.
-
-===============================================================================
-
-  On Intel P6 family processors (Pentium Pro, Pentium II and later)
-  the Memory Type Range Registers (MTRRs) may be used to control
-  processor access to memory ranges. This is most useful when you have
-  a video (VGA) card on a PCI or AGP bus. Enabling write-combining
-  allows bus write transfers to be combined into a larger transfer
-  before bursting over the PCI/AGP bus. This can increase performance
-  of image write operations 2.5 times or more.
-
-  The Cyrix 6x86, 6x86MX and M II processors have Address Range
-  Registers (ARRs) which provide a similar functionality to MTRRs. For
-  these, the ARRs are used to emulate the MTRRs.
-
-  The AMD K6-2 (stepping 8 and above) and K6-3 processors have two
-  MTRRs. These are supported.  The AMD Athlon family provide 8 Intel
-  style MTRRs.
-
-  The Centaur C6 (WinChip) has 8 MCRs, allowing write-combining. These
-  are supported.
-
-  The VIA Cyrix III and VIA C3 CPUs offer 8 Intel style MTRRs.
-
-  The CONFIG_MTRR option creates a /proc/mtrr file which may be used
-  to manipulate your MTRRs. Typically the X server should use
-  this. This should have a reasonably generic interface so that
-  similar control registers on other processors can be easily
-  supported.
-
-
-There are two interfaces to /proc/mtrr: one is an ASCII interface
-which allows you to read and write. The other is an ioctl()
-interface. The ASCII interface is meant for administration. The
-ioctl() interface is meant for C programs (i.e. the X server). The
-interfaces are described below, with sample commands and C code.
-
-===============================================================================
-Reading MTRRs from the shell:
-
-% cat /proc/mtrr
-reg00: base=0x00000000 (   0MB), size= 128MB: write-back, count=1
-reg01: base=0x08000000 ( 128MB), size=  64MB: write-back, count=1
-===============================================================================
-Creating MTRRs from the C-shell:
-# echo "base=0xf8000000 size=0x400000 type=write-combining" >! /proc/mtrr
-or if you use bash:
-# echo "base=0xf8000000 size=0x400000 type=write-combining" >| /proc/mtrr
-
-And the result thereof:
-% cat /proc/mtrr
-reg00: base=0x00000000 (   0MB), size= 128MB: write-back, count=1
-reg01: base=0x08000000 ( 128MB), size=  64MB: write-back, count=1
-reg02: base=0xf8000000 (3968MB), size=   4MB: write-combining, count=1
-
-This is for video RAM at base address 0xf8000000 and size 4 megabytes. To
-find out your base address, you need to look at the output of your X
-server, which tells you where the linear framebuffer address is. A
-typical line that you may get is:
-
-(--) S3: PCI: 968 rev 0, Linear FB @ 0xf8000000
-
-Note that you should only use the value from the X server, as it may
-move the framebuffer base address, so the only value you can trust is
-that reported by the X server.
-
-To find out the size of your framebuffer (what, you don't actually
-know?), the following line will tell you:
-
-(--) S3: videoram:  4096k
-
-That's 4 megabytes, which is 0x400000 bytes (in hexadecimal).
-A patch is being written for XFree86 which will make this automatic:
-in other words the X server will manipulate /proc/mtrr using the
-ioctl() interface, so users won't have to do anything. If you use a
-commercial X server, lobby your vendor to add support for MTRRs.
-===============================================================================
-Creating overlapping MTRRs:
-
-%echo "base=0xfb000000 size=0x1000000 type=write-combining" >/proc/mtrr
-%echo "base=0xfb000000 size=0x1000 type=uncachable" >/proc/mtrr
-
-And the results: cat /proc/mtrr
-reg00: base=0x00000000 (   0MB), size=  64MB: write-back, count=1
-reg01: base=0xfb000000 (4016MB), size=  16MB: write-combining, count=1
-reg02: base=0xfb000000 (4016MB), size=   4kB: uncachable, count=1
-
-Some cards (especially Voodoo Graphics boards) need this 4 kB area
-excluded from the beginning of the region because it is used for
-registers.
-
-NOTE: You can only create type=uncachable region, if the first
-region that you created is type=write-combining.
-===============================================================================
-Removing MTRRs from the C-shell:
-% echo "disable=2" >! /proc/mtrr
-or using bash:
-% echo "disable=2" >| /proc/mtrr
-===============================================================================
-Reading MTRRs from a C program using ioctl()'s:
-
-/*  mtrr-show.c
-
-    Source file for mtrr-show (example program to show MTRRs using ioctl()'s)
-
-    Copyright (C) 1997-1998  Richard Gooch
-
-    This program is free software; you can redistribute it and/or modify
-    it under the terms of the GNU General Public License as published by
-    the Free Software Foundation; either version 2 of the License, or
-    (at your option) any later version.
-
-    This program is distributed in the hope that it will be useful,
-    but WITHOUT ANY WARRANTY; without even the implied warranty of
-    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-    GNU General Public License for more details.
-
-    You should have received a copy of the GNU General Public License
-    along with this program; if not, write to the Free Software
-    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-
-    Richard Gooch may be reached by email at  rgooch@atnf.csiro.au
-    The postal address is:
-      Richard Gooch, c/o ATNF, P. O. Box 76, Epping, N.S.W., 2121, Australia.
-*/
-
-/*
-    This program will use an ioctl() on /proc/mtrr to show the current MTRR
-    settings. This is an alternative to reading /proc/mtrr.
-
-
-    Written by      Richard Gooch   17-DEC-1997
-
-    Last updated by Richard Gooch   2-MAY-1998
-
-
-*/
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <sys/ioctl.h>
-#include <errno.h>
-#include <asm/mtrr.h>
-
-#define TRUE 1
-#define FALSE 0
-#define ERRSTRING strerror (errno)
-
-static char *mtrr_strings[MTRR_NUM_TYPES] =
-{
-    "uncachable",               /* 0 */
-    "write-combining",          /* 1 */
-    "?",                        /* 2 */
-    "?",                        /* 3 */
-    "write-through",            /* 4 */
-    "write-protect",            /* 5 */
-    "write-back",               /* 6 */
-};
-
-int main ()
-{
-    int fd;
-    struct mtrr_gentry gentry;
-
-    if ( ( fd = open ("/proc/mtrr", O_RDONLY, 0) ) == -1 )
-    {
-	if (errno == ENOENT)
-	{
-	    fputs ("/proc/mtrr not found: not supported or you don't have a PPro?\n",
-		   stderr);
-	    exit (1);
-	}
-	fprintf (stderr, "Error opening /proc/mtrr\t%s\n", ERRSTRING);
-	exit (2);
-    }
-    for (gentry.regnum = 0; ioctl (fd, MTRRIOC_GET_ENTRY, &gentry) == 0;
-	 ++gentry.regnum)
-    {
-	if (gentry.size < 1)
-	{
-	    fprintf (stderr, "Register: %u disabled\n", gentry.regnum);
-	    continue;
-	}
-	fprintf (stderr, "Register: %u base: 0x%lx size: 0x%lx type: %s\n",
-		 gentry.regnum, gentry.base, gentry.size,
-		 mtrr_strings[gentry.type]);
-    }
-    if (errno == EINVAL) exit (0);
-    fprintf (stderr, "Error doing ioctl(2) on /dev/mtrr\t%s\n", ERRSTRING);
-    exit (3);
-}   /*  End Function main  */
-===============================================================================
-Creating MTRRs from a C programme using ioctl()'s:
-
-/*  mtrr-add.c
-
-    Source file for mtrr-add (example programme to add an MTRRs using ioctl())
-
-    Copyright (C) 1997-1998  Richard Gooch
-
-    This program is free software; you can redistribute it and/or modify
-    it under the terms of the GNU General Public License as published by
-    the Free Software Foundation; either version 2 of the License, or
-    (at your option) any later version.
-
-    This program is distributed in the hope that it will be useful,
-    but WITHOUT ANY WARRANTY; without even the implied warranty of
-    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-    GNU General Public License for more details.
-
-    You should have received a copy of the GNU General Public License
-    along with this program; if not, write to the Free Software
-    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-
-    Richard Gooch may be reached by email at  rgooch@atnf.csiro.au
-    The postal address is:
-      Richard Gooch, c/o ATNF, P. O. Box 76, Epping, N.S.W., 2121, Australia.
-*/
-
-/*
-    This programme will use an ioctl() on /proc/mtrr to add an entry. The first
-    available mtrr is used. This is an alternative to writing /proc/mtrr.
-
-
-    Written by      Richard Gooch   17-DEC-1997
-
-    Last updated by Richard Gooch   2-MAY-1998
-
-
-*/
-#include <stdio.h>
-#include <string.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <sys/ioctl.h>
-#include <errno.h>
-#include <asm/mtrr.h>
-
-#define TRUE 1
-#define FALSE 0
-#define ERRSTRING strerror (errno)
-
-static char *mtrr_strings[MTRR_NUM_TYPES] =
-{
-    "uncachable",               /* 0 */
-    "write-combining",          /* 1 */
-    "?",                        /* 2 */
-    "?",                        /* 3 */
-    "write-through",            /* 4 */
-    "write-protect",            /* 5 */
-    "write-back",               /* 6 */
-};
-
-int main (int argc, char **argv)
-{
-    int fd;
-    struct mtrr_sentry sentry;
-
-    if (argc != 4)
-    {
-	fprintf (stderr, "Usage:\tmtrr-add base size type\n");
-	exit (1);
-    }
-    sentry.base = strtoul (argv[1], NULL, 0);
-    sentry.size = strtoul (argv[2], NULL, 0);
-    for (sentry.type = 0; sentry.type < MTRR_NUM_TYPES; ++sentry.type)
-    {
-	if (strcmp (argv[3], mtrr_strings[sentry.type]) == 0) break;
-    }
-    if (sentry.type >= MTRR_NUM_TYPES)
-    {
-	fprintf (stderr, "Illegal type: \"%s\"\n", argv[3]);
-	exit (2);
-    }
-    if ( ( fd = open ("/proc/mtrr", O_WRONLY, 0) ) == -1 )
-    {
-	if (errno == ENOENT)
-	{
-	    fputs ("/proc/mtrr not found: not supported or you don't have a PPro?\n",
-		   stderr);
-	    exit (3);
-	}
-	fprintf (stderr, "Error opening /proc/mtrr\t%s\n", ERRSTRING);
-	exit (4);
-    }
-    if (ioctl (fd, MTRRIOC_ADD_ENTRY, &sentry) == -1)
-    {
-	fprintf (stderr, "Error doing ioctl(2) on /dev/mtrr\t%s\n", ERRSTRING);
-	exit (5);
-    }
-    fprintf (stderr, "Sleeping for 5 seconds so you can see the new entry\n");
-    sleep (5);
-    close (fd);
-    fputs ("I've just closed /proc/mtrr so now the new entry should be gone\n",
-	   stderr);
-}   /*  End Function main  */
-===============================================================================
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 45/63] Documentation: x86: convert tlb.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst            |  1 +
 Documentation/x86/{tlb.txt => tlb.rst} | 30 ++++++++++++++++----------
 2 files changed, 20 insertions(+), 11 deletions(-)
 rename Documentation/x86/{tlb.txt => tlb.rst} (81%)

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 9a0b5f38ef6b..fd54b859db9b 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -15,3 +15,4 @@ Linux x86 Support
    entry_64
    earlyprintk
    zero-page
+   tlb
diff --git a/Documentation/x86/tlb.txt b/Documentation/x86/tlb.rst
similarity index 81%
rename from Documentation/x86/tlb.txt
rename to Documentation/x86/tlb.rst
index 6a0607b99ed8..82ec58ae63a8 100644
--- a/Documentation/x86/tlb.txt
+++ b/Documentation/x86/tlb.rst
@@ -1,5 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======
+The TLB
+=======
+
 When the kernel unmaps or modified the attributes of a range of
 memory, it has two choices:
+
  1. Flush the entire TLB with a two-instruction sequence.  This is
     a quick operation, but it causes collateral damage: TLB entries
     from areas other than the one we are trying to flush will be
@@ -10,6 +17,7 @@ memory, it has two choices:
     damage to other TLB entries.
 
 Which method to do depends on a few things:
+
  1. The size of the flush being performed.  A flush of the entire
     address space is obviously better performed by flushing the
     entire TLB than doing 2^48/PAGE_SIZE individual flushes.
@@ -33,7 +41,7 @@ well.  There is essentially no "right" point to choose.
 You may be doing too many individual invalidations if you see the
 invlpg instruction (or instructions _near_ it) show up high in
 profiles.  If you believe that individual invalidations being
-called too often, you can lower the tunable:
+called too often, you can lower the tunable::
 
 	/sys/kernel/debug/x86/tlb_single_page_flush_ceiling
 
@@ -43,7 +51,7 @@ Setting it to 1 is a very conservative setting and it should
 never need to be 0 under normal circumstances.
 
 Despite the fact that a single individual flush on x86 is
-guaranteed to flush a full 2MB [1], hugetlbfs always uses the full
+guaranteed to flush a full 2MB [1]_, hugetlbfs always uses the full
 flushes.  THP is treated exactly the same as normal memory.
 
 You might see invlpg inside of flush_tlb_mm_range() show up in
@@ -54,15 +62,15 @@ Essentially, you are balancing the cycles you spend doing invlpg
 with the cycles that you spend refilling the TLB later.
 
 You can measure how expensive TLB refills are by using
-performance counters and 'perf stat', like this:
+performance counters and 'perf stat', like this::
 
-perf stat -e
-	cpu/event=0x8,umask=0x84,name=dtlb_load_misses_walk_duration/,
-	cpu/event=0x8,umask=0x82,name=dtlb_load_misses_walk_completed/,
-	cpu/event=0x49,umask=0x4,name=dtlb_store_misses_walk_duration/,
-	cpu/event=0x49,umask=0x2,name=dtlb_store_misses_walk_completed/,
-	cpu/event=0x85,umask=0x4,name=itlb_misses_walk_duration/,
-	cpu/event=0x85,umask=0x2,name=itlb_misses_walk_completed/
+  perf stat -e
+    cpu/event=0x8,umask=0x84,name=dtlb_load_misses_walk_duration/,
+    cpu/event=0x8,umask=0x82,name=dtlb_load_misses_walk_completed/,
+    cpu/event=0x49,umask=0x4,name=dtlb_store_misses_walk_duration/,
+    cpu/event=0x49,umask=0x2,name=dtlb_store_misses_walk_completed/,
+    cpu/event=0x85,umask=0x4,name=itlb_misses_walk_duration/,
+    cpu/event=0x85,umask=0x2,name=itlb_misses_walk_completed/
 
 That works on an IvyBridge-era CPU (i5-3320M).  Different CPUs
 may have differently-named counters, but they should at least
@@ -70,6 +78,6 @@ be there in some form.  You can use pmu-tools 'ocperf list'
 (https://github.com/andikleen/pmu-tools) to find the right
 counters for a given CPU.
 
-1. A footnote in Intel's SDM "4.10.4.2 Recommended Invalidation"
+.. [1] A footnote in Intel's SDM "4.10.4.2 Recommended Invalidation"
    says: "One execution of INVLPG is sufficient even for a page
    with size greater than 4 KBytes."
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 44/63] Documentation: x86: convert zero-page.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst     |  1 +
 Documentation/x86/zero-page.rst | 47 +++++++++++++++++++++++++++++++++
 Documentation/x86/zero-page.txt | 40 ----------------------------
 3 files changed, 48 insertions(+), 40 deletions(-)
 create mode 100644 Documentation/x86/zero-page.rst
 delete mode 100644 Documentation/x86/zero-page.txt

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 7b8388ebd43d..9a0b5f38ef6b 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -14,3 +14,4 @@ Linux x86 Support
    kernel-stacks
    entry_64
    earlyprintk
+   zero-page
diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst
new file mode 100644
index 000000000000..deedbc84454d
--- /dev/null
+++ b/Documentation/x86/zero-page.rst
@@ -0,0 +1,47 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========
+Zero Page
+=========
+
+The additional fields in struct boot_params as a part of 32-bit boot
+protocol of kernel. These should be filled by bootloader or 16-bit
+real-mode setup code of the kernel. References/settings to it mainly
+are in::
+
+  arch/x86/include/uapi/asm/bootparam.h
+
+::
+
+	Offset	Proto	Name		Meaning
+	/Size
+
+	000/040	ALL	screen_info	Text mode or frame buffer information
+					(struct screen_info)
+	040/014	ALL	apm_bios_info	APM BIOS information (struct apm_bios_info)
+	058/008	ALL	tboot_addr      Physical address of tboot shared page
+	060/010	ALL	ist_info	Intel SpeedStep (IST) BIOS support information
+					(struct ist_info)
+	080/010	ALL	hd0_info	hd0 disk parameter, OBSOLETE!!
+	090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
+	0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table),
+					OBSOLETE!!
+	0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
+	0C0/004	ALL	ext_ramdisk_image ramdisk_image high 32bits
+	0C4/004	ALL	ext_ramdisk_size  ramdisk_size high 32bits
+	0C8/004	ALL	ext_cmd_line_ptr  cmd_line_ptr high 32bits
+	140/080	ALL	edid_info	Video mode setup (struct edid_info)
+	1C0/020	ALL	efi_info	EFI 32 information (struct efi_info)
+	1E0/004	ALL	alt_mem_k	Alternative mem check, in KB
+	1E4/004	ALL	scratch		Scratch field for the kernel setup code
+	1E8/001	ALL	e820_entries	Number of entries in e820_table (below)
+	1E9/001	ALL	eddbuf_entries	Number of entries in eddbuf (below)
+	1EA/001	ALL	edd_mbr_sig_buf_entries	Number of entries in edd_mbr_sig_buffer
+					(below)
+	1EB/001	ALL     kbd_status      Numlock is enabled
+	1EC/001	ALL     secure_boot	Secure boot is enabled in the firmware
+	1EF/001	ALL	sentinel	Used to detect broken bootloaders
+	290/040	ALL	edd_mbr_sig_buffer EDD MBR signatures
+	2D0/A00	ALL	e820_table	E820 memory map table
+					(array of struct e820_entry)
+	D00/1EC	ALL	eddbuf		EDD data (array of struct edd_info)
diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
deleted file mode 100644
index 68aed077f7b6..000000000000
--- a/Documentation/x86/zero-page.txt
+++ /dev/null
@@ -1,40 +0,0 @@
-The additional fields in struct boot_params as a part of 32-bit boot
-protocol of kernel. These should be filled by bootloader or 16-bit
-real-mode setup code of the kernel. References/settings to it mainly
-are in:
-
-  arch/x86/include/uapi/asm/bootparam.h
-
-
-Offset	Proto	Name		Meaning
-/Size
-
-000/040	ALL	screen_info	Text mode or frame buffer information
-				(struct screen_info)
-040/014	ALL	apm_bios_info	APM BIOS information (struct apm_bios_info)
-058/008	ALL	tboot_addr      Physical address of tboot shared page
-060/010	ALL	ist_info	Intel SpeedStep (IST) BIOS support information
-				(struct ist_info)
-080/010	ALL	hd0_info	hd0 disk parameter, OBSOLETE!!
-090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
-0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table),
-				OBSOLETE!!
-0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
-0C0/004	ALL	ext_ramdisk_image ramdisk_image high 32bits
-0C4/004	ALL	ext_ramdisk_size  ramdisk_size high 32bits
-0C8/004	ALL	ext_cmd_line_ptr  cmd_line_ptr high 32bits
-140/080	ALL	edid_info	Video mode setup (struct edid_info)
-1C0/020	ALL	efi_info	EFI 32 information (struct efi_info)
-1E0/004	ALL	alt_mem_k	Alternative mem check, in KB
-1E4/004	ALL	scratch		Scratch field for the kernel setup code
-1E8/001	ALL	e820_entries	Number of entries in e820_table (below)
-1E9/001	ALL	eddbuf_entries	Number of entries in eddbuf (below)
-1EA/001	ALL	edd_mbr_sig_buf_entries	Number of entries in edd_mbr_sig_buffer
-				(below)
-1EB/001	ALL     kbd_status      Numlock is enabled
-1EC/001	ALL     secure_boot	Secure boot is enabled in the firmware
-1EF/001	ALL	sentinel	Used to detect broken bootloaders
-290/040	ALL	edd_mbr_sig_buffer EDD MBR signatures
-2D0/A00	ALL	e820_table	E820 memory map table
-				(array of struct e820_entry)
-D00/1EC	ALL	eddbuf		EDD data (array of struct edd_info)
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 43/63] Documentation: x86: convert earlyprintk.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/earlyprintk.rst | 148 ++++++++++++++++++++++++++++++
 Documentation/x86/earlyprintk.txt | 141 ----------------------------
 Documentation/x86/index.rst       |   1 +
 3 files changed, 149 insertions(+), 141 deletions(-)
 create mode 100644 Documentation/x86/earlyprintk.rst
 delete mode 100644 Documentation/x86/earlyprintk.txt

diff --git a/Documentation/x86/earlyprintk.rst b/Documentation/x86/earlyprintk.rst
new file mode 100644
index 000000000000..519402451f9c
--- /dev/null
+++ b/Documentation/x86/earlyprintk.rst
@@ -0,0 +1,148 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Early Printk
+============
+
+Mini-HOWTO for using the earlyprintk=dbgp boot option with a
+USB2 Debug port key and a debug cable, on x86 systems.
+
+You need two computers, the 'USB debug key' special gadget and
+and two USB cables, connected like this::
+
+  [host/target] <-------> [USB debug key] <-------> [client/console]
+
+There are a number of specific hardware requirements
+====================================================
+
+ a) Host/target system needs to have USB debug port capability.
+
+  You can check this capability by looking at a 'Debug port' bit in
+  the lspci -vvv output::
+
+    # lspci -vvv
+    ...
+    00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03) (prog-if 20 [EHCI])
+            Subsystem: Lenovo ThinkPad T61
+            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
+            Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
+            Latency: 0
+            Interrupt: pin D routed to IRQ 19
+            Region 0: Memory at fe227000 (32-bit, non-prefetchable) [size=1K]
+            Capabilities: [50] Power Management version 2
+                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
+                    Status: D0 PME-Enable- DSel=0 DScale=0 PME+
+            Capabilities: [58] Debug port: BAR=1 offset=00a0
+                                ^^^^^^^^^^^ <==================== [ HERE ]
+      Kernel driver in use: ehci_hcd
+            Kernel modules: ehci-hcd
+    ...
+
+  If your system does not list a debug port capability then you probably
+  won't be able to use the USB debug key.
+
+ b) You also need a NetChip USB debug cable/key:
+
+        http://www.plxtech.com/products/NET2000/NET20DC/default.asp
+
+    This is a small blue plastic connector with two USB connections;
+    it draws power from its USB connections.
+
+ c) You need a second client/console system with a high speed USB 2.0 port.
+
+ d) The NetChip device must be plugged directly into the physical
+    debug port on the "host/target" system.  You cannot use a USB hub in
+    between the physical debug port and the "host/target" system.
+
+    The EHCI debug controller is bound to a specific physical USB
+    port and the NetChip device will only work as an early printk
+    device in this port.  The EHCI host controllers are electrically
+    wired such that the EHCI debug controller is hooked up to the
+    first physical port and there is no way to change this via software.
+    You can find the physical port through experimentation by trying
+    each physical port on the system and rebooting.  Or you can try
+    and use lsusb or look at the kernel info messages emitted by the
+    usb stack when you plug a usb device into various ports on the
+    "host/target" system.
+
+    Some hardware vendors do not expose the usb debug port with a
+    physical connector and if you find such a device send a complaint
+    to the hardware vendor, because there is no reason not to wire
+    this port into one of the physically accessible ports.
+
+ e) It is also important to note, that many versions of the NetChip
+    device require the "client/console" system to be plugged into the
+    right hand side of the device (with the product logo facing up and
+    readable left to right).  The reason being is that the 5 volt
+    power supply is taken from only one side of the device and it
+    must be the side that does not get rebooted.
+
+Software requirements
+=====================
+
+ a) On the host/target system:
+
+    You need to enable the following kernel config option::
+
+      CONFIG_EARLY_PRINTK_DBGP=y
+
+    And you need to add the boot command line: "earlyprintk=dbgp".
+
+    .. note:: If you are using Grub, append it to the 'kernel' line in
+     /etc/grub.conf.  If you are using Grub2 on a BIOS firmware system,
+     append it to the 'linux' line in /boot/grub2/grub.cfg. If you are
+     using Grub2 on an EFI firmware system, append it to the 'linux'
+     or 'linuxefi' line in /boot/grub2/grub.cfg or
+     /boot/efi/EFI/<distro>/grub.cfg.)
+
+    On systems with more than one EHCI debug controller you must
+    specify the correct EHCI debug controller number.  The ordering
+    comes from the PCI bus enumeration of the EHCI controllers.  The
+    default with no number argument is "0" or the first EHCI debug
+    controller.  To use the second EHCI debug controller, you would
+    use the command line: "earlyprintk=dbgp1"
+
+    NOTE: normally earlyprintk console gets turned off once the
+    regular console is alive - use "earlyprintk=dbgp,keep" to keep
+    this channel open beyond early bootup. This can be useful for
+    debugging crashes under Xorg, etc.
+
+ b) On the client/console system:
+
+    You should enable the following kernel config option::
+
+      CONFIG_USB_SERIAL_DEBUG=y
+
+    On the next bootup with the modified kernel you should
+    get a /dev/ttyUSBx device(s).
+
+    Now this channel of kernel messages is ready to be used: start
+    your favorite terminal emulator (minicom, etc.) and set
+    it up to use /dev/ttyUSB0 - or use a raw 'cat /dev/ttyUSBx' to
+    see the raw output.
+
+ c) On Nvidia Southbridge based systems: the kernel will try to probe
+    and find out which port has a debug device connected.
+
+Testing that it works fine
+==========================
+
+  You can test the output by using earlyprintk=dbgp,keep and provoking
+  kernel messages on the host/target system. You can provoke a harmless
+  kernel message by for example doing::
+
+    echo h > /proc/sysrq-trigger
+
+  On the host/target system you should see this help line in "dmesg" output::
+
+    SysRq : HELP : loglevel(0-9) reBoot Crashdump terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount show-blocked-tasks(W) dump-ftrace-buffer(Z)
+
+  On the client/console system do::
+
+    cat /dev/ttyUSB0
+
+  And you should see the help line above displayed shortly after you've
+  provoked it on the host system.
+
+If it does not work then please ask about it on the linux-kernel@vger.kernel.org
+mailing list or contact the x86 maintainers.
diff --git a/Documentation/x86/earlyprintk.txt b/Documentation/x86/earlyprintk.txt
deleted file mode 100644
index 46933e06c972..000000000000
--- a/Documentation/x86/earlyprintk.txt
+++ /dev/null
@@ -1,141 +0,0 @@
-
-Mini-HOWTO for using the earlyprintk=dbgp boot option with a
-USB2 Debug port key and a debug cable, on x86 systems.
-
-You need two computers, the 'USB debug key' special gadget and
-and two USB cables, connected like this:
-
-  [host/target] <-------> [USB debug key] <-------> [client/console]
-
-1. There are a number of specific hardware requirements:
-
- a.) Host/target system needs to have USB debug port capability.
-
- You can check this capability by looking at a 'Debug port' bit in
- the lspci -vvv output:
-
- # lspci -vvv
- ...
- 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03) (prog-if 20 [EHCI])
-         Subsystem: Lenovo ThinkPad T61
-         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
-         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
-         Latency: 0
-         Interrupt: pin D routed to IRQ 19
-         Region 0: Memory at fe227000 (32-bit, non-prefetchable) [size=1K]
-         Capabilities: [50] Power Management version 2
-                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
-                 Status: D0 PME-Enable- DSel=0 DScale=0 PME+
-         Capabilities: [58] Debug port: BAR=1 offset=00a0
-                            ^^^^^^^^^^^ <==================== [ HERE ]
-	 Kernel driver in use: ehci_hcd
-         Kernel modules: ehci-hcd
- ...
-
-( If your system does not list a debug port capability then you probably
-  won't be able to use the USB debug key. )
-
- b.) You also need a NetChip USB debug cable/key:
-
-        http://www.plxtech.com/products/NET2000/NET20DC/default.asp
-
-     This is a small blue plastic connector with two USB connections;
-     it draws power from its USB connections.
-
- c.) You need a second client/console system with a high speed USB 2.0
-     port.
-
- d.) The NetChip device must be plugged directly into the physical
-     debug port on the "host/target" system.  You cannot use a USB hub in
-     between the physical debug port and the "host/target" system.
-
-     The EHCI debug controller is bound to a specific physical USB
-     port and the NetChip device will only work as an early printk
-     device in this port.  The EHCI host controllers are electrically
-     wired such that the EHCI debug controller is hooked up to the
-     first physical port and there is no way to change this via software.
-     You can find the physical port through experimentation by trying
-     each physical port on the system and rebooting.  Or you can try
-     and use lsusb or look at the kernel info messages emitted by the
-     usb stack when you plug a usb device into various ports on the
-     "host/target" system.
-
-     Some hardware vendors do not expose the usb debug port with a
-     physical connector and if you find such a device send a complaint
-     to the hardware vendor, because there is no reason not to wire
-     this port into one of the physically accessible ports.
-
- e.) It is also important to note, that many versions of the NetChip
-     device require the "client/console" system to be plugged into the
-     right hand side of the device (with the product logo facing up and
-     readable left to right).  The reason being is that the 5 volt
-     power supply is taken from only one side of the device and it
-     must be the side that does not get rebooted.
-
-2. Software requirements:
-
- a.) On the host/target system:
-
-    You need to enable the following kernel config option:
-
-      CONFIG_EARLY_PRINTK_DBGP=y
-
-    And you need to add the boot command line: "earlyprintk=dbgp".
-
-    (If you are using Grub, append it to the 'kernel' line in
-     /etc/grub.conf.  If you are using Grub2 on a BIOS firmware system,
-     append it to the 'linux' line in /boot/grub2/grub.cfg. If you are
-     using Grub2 on an EFI firmware system, append it to the 'linux'
-     or 'linuxefi' line in /boot/grub2/grub.cfg or
-     /boot/efi/EFI/<distro>/grub.cfg.)
-
-    On systems with more than one EHCI debug controller you must
-    specify the correct EHCI debug controller number.  The ordering
-    comes from the PCI bus enumeration of the EHCI controllers.  The
-    default with no number argument is "0" or the first EHCI debug
-    controller.  To use the second EHCI debug controller, you would
-    use the command line: "earlyprintk=dbgp1"
-
-    NOTE: normally earlyprintk console gets turned off once the
-    regular console is alive - use "earlyprintk=dbgp,keep" to keep
-    this channel open beyond early bootup. This can be useful for
-    debugging crashes under Xorg, etc.
-
- b.) On the client/console system:
-
-    You should enable the following kernel config option:
-
-      CONFIG_USB_SERIAL_DEBUG=y
-
-    On the next bootup with the modified kernel you should
-    get a /dev/ttyUSBx device(s).
-
-    Now this channel of kernel messages is ready to be used: start
-    your favorite terminal emulator (minicom, etc.) and set
-    it up to use /dev/ttyUSB0 - or use a raw 'cat /dev/ttyUSBx' to
-    see the raw output.
-
- c.) On Nvidia Southbridge based systems: the kernel will try to probe
-     and find out which port has a debug device connected.
-
-3. Testing that it works fine:
-
-   You can test the output by using earlyprintk=dbgp,keep and provoking
-   kernel messages on the host/target system. You can provoke a harmless
-   kernel message by for example doing:
-
-     echo h > /proc/sysrq-trigger
-
-   On the host/target system you should see this help line in "dmesg" output:
-
-     SysRq : HELP : loglevel(0-9) reBoot Crashdump terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount show-blocked-tasks(W) dump-ftrace-buffer(Z)
-
-   On the client/console system do:
-
-       cat /dev/ttyUSB0
-
-   And you should see the help line above displayed shortly after you've
-   provoked it on the host system.
-
-If it does not work then please ask about it on the linux-kernel@vger.kernel.org
-mailing list or contact the x86 maintainers.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 8a666c5abc85..7b8388ebd43d 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -13,3 +13,4 @@ Linux x86 Support
    exception-tables
    kernel-stacks
    entry_64
+   earlyprintk
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 42/63] Documentation: x86: convert entry_64.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/{entry_64.txt => entry_64.rst} | 12 +++++++++---
 Documentation/x86/index.rst                      |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)
 rename Documentation/x86/{entry_64.txt => entry_64.rst} (95%)

diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.rst
similarity index 95%
rename from Documentation/x86/entry_64.txt
rename to Documentation/x86/entry_64.rst
index c1df8eba9dfd..a48b3f6ebbe8 100644
--- a/Documentation/x86/entry_64.txt
+++ b/Documentation/x86/entry_64.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+Kernel Entries
+==============
+
 This file documents some of the kernel entries in
 arch/x86/entry/entry_64.S.  A lot of this explanation is adapted from
 an email from Ingo Molnar:
@@ -59,7 +65,7 @@ Now, there's a secondary complication: there's a cheap way to test
 which mode the CPU is in and an expensive way.
 
 The cheap way is to pick this info off the entry frame on the kernel
-stack, from the CS of the ptregs area of the kernel stack:
+stack, from the CS of the ptregs area of the kernel stack::
 
 	xorl %ebx,%ebx
 	testl $3,CS+8(%rsp)
@@ -67,7 +73,7 @@ stack, from the CS of the ptregs area of the kernel stack:
 	SWAPGS
 
 The expensive (paranoid) way is to read back the MSR_GS_BASE value
-(which is what SWAPGS modifies):
+(which is what SWAPGS modifies)::
 
 	movl $1,%ebx
 	movl $MSR_GS_BASE,%ecx
@@ -76,7 +82,7 @@ The expensive (paranoid) way is to read back the MSR_GS_BASE value
 	js 1f   /* negative -> in kernel */
 	SWAPGS
 	xorl %ebx,%ebx
-1:	ret
+  1:	ret
 
 If we are at an interrupt or user-trap/gate-alike boundary then we can
 use the faster check: the stack will be a reliable indicator of
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 489f4f4179c4..8a666c5abc85 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -12,3 +12,4 @@ Linux x86 Support
    topology
    exception-tables
    kernel-stacks
+   entry_64
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 41/63] Documentation: x86: convert kernel-stacks to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst                   |  1 +
 .../x86/{kernel-stacks => kernel-stacks.rst}  | 20 ++++++++++++-------
 2 files changed, 14 insertions(+), 7 deletions(-)
 rename Documentation/x86/{kernel-stacks => kernel-stacks.rst} (92%)

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index c0bfd0bd6000..489f4f4179c4 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -11,3 +11,4 @@ Linux x86 Support
    boot
    topology
    exception-tables
+   kernel-stacks
diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks.rst
similarity index 92%
rename from Documentation/x86/kernel-stacks
rename to Documentation/x86/kernel-stacks.rst
index 9a0aa4d3a866..3e6bf5940db0 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks.rst
@@ -1,5 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Kernel Stacks
+=============
+
 Kernel stacks on x86-64 bit
----------------------------
+===========================
 
 Most of the text from Keith Owens, hacked by AK
 
@@ -57,7 +63,7 @@ IST events with the same code to be nested.  However in most cases, the
 stack size allocated to an IST assumes no nesting for the same code.
 If that assumption is ever broken then the stacks will become corrupt.
 
-The currently assigned IST stacks are :-
+The currently assigned IST stacks are :
 
 * DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
 
@@ -98,7 +104,7 @@ For more details see the Intel IA32 or AMD AMD64 architecture manuals.
 
 
 Printing backtraces on x86
---------------------------
+==========================
 
 The question about the '?' preceding function names in an x86 stacktrace
 keeps popping up, here's an indepth explanation. It helps if the reader
@@ -108,7 +114,7 @@ arch/x86/kernel/dumpstack.c.
 Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
 
 We always scan the full kernel stack for return addresses stored on
-the kernel stack(s) [*], from stack top to stack bottom, and print out
+the kernel stack(s) [1]_, from stack top to stack bottom, and print out
 anything that 'looks like' a kernel text address.
 
 If it fits into the frame pointer chain, we print it without a question
@@ -136,6 +142,6 @@ that look like kernel text addresses, so if debug information is wrong,
 we still print out the real call chain as well - just with more question
 marks than ideal.
 
-[*] For things like IRQ and IST stacks, we also scan those stacks, in
-    the right order, and try to cross from one stack into another
-    reconstructing the call chain. This works most of the time.
+.. [1] For things like IRQ and IST stacks, we also scan those stacks, in
+       the right order, and try to cross from one stack into another
+       reconstructing the call chain. This works most of the time.
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 40/63] Documentation: x86: convert exception-tables.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 ...eption-tables.txt => exception-tables.rst} | 231 ++++++++++--------
 Documentation/x86/index.rst                   |   1 +
 2 files changed, 126 insertions(+), 106 deletions(-)
 rename Documentation/x86/{exception-tables.txt => exception-tables.rst} (67%)

diff --git a/Documentation/x86/exception-tables.txt b/Documentation/x86/exception-tables.rst
similarity index 67%
rename from Documentation/x86/exception-tables.txt
rename to Documentation/x86/exception-tables.rst
index e396bcd8d830..2ffb096c8b58 100644
--- a/Documentation/x86/exception-tables.txt
+++ b/Documentation/x86/exception-tables.rst
@@ -1,5 +1,10 @@
-     Kernel level exception handling in Linux
-  Commentary by Joerg Pommnitz <joerg@raleigh.ibm.com>
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Kernel level exception handling
+===============================
+
+Commentary by Joerg Pommnitz <joerg@raleigh.ibm.com>
 
 When a process runs in kernel mode, it often has to access user
 mode memory whose address has been passed by an untrusted program.
@@ -25,9 +30,9 @@ How does this work?
 
 Whenever the kernel tries to access an address that is currently not
 accessible, the CPU generates a page fault exception and calls the
-page fault handler
+page fault handler::
 
-void do_page_fault(struct pt_regs *regs, unsigned long error_code)
+  void do_page_fault(struct pt_regs *regs, unsigned long error_code)
 
 in arch/x86/mm/fault.c. The parameters on the stack are set up by
 the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter
@@ -57,73 +62,74 @@ as an example. The definition is somewhat hard to follow, so let's peek at
 the code generated by the preprocessor and the compiler. I selected
 the get_user call in drivers/char/sysrq.c for a detailed examination.
 
-The original code in sysrq.c line 587:
+The original code in sysrq.c line 587::
+
         get_user(c, buf);
 
-The preprocessor output (edited to become somewhat readable):
-
-(
-  {
-    long __gu_err = - 14 , __gu_val = 0;
-    const __typeof__(*( (  buf ) )) *__gu_addr = ((buf));
-    if (((((0 + current_set[0])->tss.segment) == 0x18 )  ||
-       (((sizeof(*(buf))) <= 0xC0000000UL) &&
-       ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
-      do {
-        __gu_err  = 0;
-        switch ((sizeof(*(buf)))) {
-          case 1:
-            __asm__ __volatile__(
-              "1:      mov" "b" " %2,%" "b" "1\n"
-              "2:\n"
-              ".section .fixup,\"ax\"\n"
-              "3:      movl %3,%0\n"
-              "        xor" "b" " %" "b" "1,%" "b" "1\n"
-              "        jmp 2b\n"
-              ".section __ex_table,\"a\"\n"
-              "        .align 4\n"
-              "        .long 1b,3b\n"
-              ".text"        : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )) ;
-              break;
-          case 2:
-            __asm__ __volatile__(
-              "1:      mov" "w" " %2,%" "w" "1\n"
-              "2:\n"
-              ".section .fixup,\"ax\"\n"
-              "3:      movl %3,%0\n"
-              "        xor" "w" " %" "w" "1,%" "w" "1\n"
-              "        jmp 2b\n"
-              ".section __ex_table,\"a\"\n"
-              "        .align 4\n"
-              "        .long 1b,3b\n"
-              ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  ));
-              break;
-          case 4:
-            __asm__ __volatile__(
-              "1:      mov" "l" " %2,%" "" "1\n"
-              "2:\n"
-              ".section .fixup,\"ax\"\n"
-              "3:      movl %3,%0\n"
-              "        xor" "l" " %" "" "1,%" "" "1\n"
-              "        jmp 2b\n"
-              ".section __ex_table,\"a\"\n"
-              "        .align 4\n"        "        .long 1b,3b\n"
-              ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(__gu_err));
-              break;
-          default:
-            (__gu_val) = __get_user_bad();
-        }
-      } while (0) ;
-    ((c)) = (__typeof__(*((buf))))__gu_val;
-    __gu_err;
-  }
-);
+The preprocessor output (edited to become somewhat readable)::
+
+  (
+    {
+      long __gu_err = - 14 , __gu_val = 0;
+      const __typeof__(*( (  buf ) )) *__gu_addr = ((buf));
+      if (((((0 + current_set[0])->tss.segment) == 0x18 )  ||
+        (((sizeof(*(buf))) <= 0xC0000000UL) &&
+        ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
+        do {
+          __gu_err  = 0;
+          switch ((sizeof(*(buf)))) {
+            case 1:
+              __asm__ __volatile__(
+                "1:      mov" "b" " %2,%" "b" "1\n"
+                "2:\n"
+                ".section .fixup,\"ax\"\n"
+                "3:      movl %3,%0\n"
+                "        xor" "b" " %" "b" "1,%" "b" "1\n"
+                "        jmp 2b\n"
+                ".section __ex_table,\"a\"\n"
+                "        .align 4\n"
+                "        .long 1b,3b\n"
+                ".text"        : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
+                              (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )) ;
+                break;
+            case 2:
+              __asm__ __volatile__(
+                "1:      mov" "w" " %2,%" "w" "1\n"
+                "2:\n"
+                ".section .fixup,\"ax\"\n"
+                "3:      movl %3,%0\n"
+                "        xor" "w" " %" "w" "1,%" "w" "1\n"
+                "        jmp 2b\n"
+                ".section __ex_table,\"a\"\n"
+                "        .align 4\n"
+                "        .long 1b,3b\n"
+                ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
+                              (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  ));
+                break;
+            case 4:
+              __asm__ __volatile__(
+                "1:      mov" "l" " %2,%" "" "1\n"
+                "2:\n"
+                ".section .fixup,\"ax\"\n"
+                "3:      movl %3,%0\n"
+                "        xor" "l" " %" "" "1,%" "" "1\n"
+                "        jmp 2b\n"
+                ".section __ex_table,\"a\"\n"
+                "        .align 4\n"        "        .long 1b,3b\n"
+                ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
+                              (   __gu_addr   )) ), "i"(- 14 ), "0"(__gu_err));
+                break;
+            default:
+              (__gu_val) = __get_user_bad();
+          }
+        } while (0) ;
+      ((c)) = (__typeof__(*((buf))))__gu_val;
+      __gu_err;
+    }
+  );
 
 WOW! Black GCC/assembly magic. This is impossible to follow, so let's
-see what code gcc generates:
+see what code gcc generates::
 
  >         xorl %edx,%edx
  >         movl current_set,%eax
@@ -154,7 +160,7 @@ understand. Can we? The actual user access is quite obvious. Thanks
 to the unified address space we can just access the address in user
 memory. But what does the .section stuff do?????
 
-To understand this we have to look at the final kernel:
+To understand this we have to look at the final kernel::
 
  > objdump --section-headers vmlinux
  >
@@ -181,7 +187,7 @@ To understand this we have to look at the final kernel:
 
 There are obviously 2 non standard ELF sections in the generated object
 file. But first we want to find out what happened to our code in the
-final kernel executable:
+final kernel executable::
 
  > objdump --disassemble --section=.text vmlinux
  >
@@ -199,7 +205,7 @@ final kernel executable:
 The whole user memory access is reduced to 10 x86 machine instructions.
 The instructions bracketed in the .section directives are no longer
 in the normal execution path. They are located in a different section
-of the executable file:
+of the executable file::
 
  > objdump --disassemble --section=.fixup vmlinux
  >
@@ -207,14 +213,15 @@ of the executable file:
  > c0199ffa <.fixup+10ba> xorb   %dl,%dl
  > c0199ffc <.fixup+10bc> jmp    c017e7a7 <do_con_write+e3>
 
-And finally:
+And finally::
+
  > objdump --full-contents --section=__ex_table vmlinux
  >
  >  c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0  ................
  >  c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0  ................
  >  c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0  ................
 
-or in human readable byte order:
+or in human readable byte order::
 
  >  c01aa7c4 c017c093 c0199fe0 c017c097 c017c099  ................
  >  c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5  ................
@@ -222,18 +229,22 @@ or in human readable byte order:
                                this is the interesting part!
  >  c01aa7e4 c0180a08 c019a001 c0180a0a c019a004  ................
 
-What happened? The assembly directives
+What happened? The assembly directives::
 
-.section .fixup,"ax"
-.section __ex_table,"a"
+  .section .fixup,"ax"
+  .section __ex_table,"a"
 
 told the assembler to move the following code to the specified
-sections in the ELF object file. So the instructions
-3:      movl $-14,%eax
-        xorb %dl,%dl
-        jmp 2b
-ended up in the .fixup section of the object file and the addresses
+sections in the ELF object file. So the instructions::
+
+  3:      movl $-14,%eax
+          xorb %dl,%dl
+          jmp 2b
+
+ended up in the .fixup section of the object file and the addresses::
+
         .long 1b,3b
+
 ended up in the __ex_table section of the object file. 1b and 3b
 are local labels. The local label 1b (1b stands for next label 1
 backward) is the address of the instruction that might fault, i.e.
@@ -246,35 +257,39 @@ the fault, in our case the actual value is c0199ff5:
 the original assembly code: > 3:      movl $-14,%eax
 and linked in vmlinux     : > c0199ff5 <.fixup+10b5> movl   $0xfffffff2,%eax
 
-The assembly code
+The assembly code::
+
  > .section __ex_table,"a"
  >         .align 4
  >         .long 1b,3b
 
-becomes the value pair
+becomes the value pair::
+
  >  c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5  ................
                                ^this is ^this is
                                1b       3b
+
 c017e7a5,c0199ff5 in the exception table of the kernel.
 
 So, what actually happens if a fault from kernel mode with no suitable
 vma occurs?
 
-1.) access to invalid address:
- > c017e7a5 <do_con_write+e1> movb   (%ebx),%dl
-2.) MMU generates exception
-3.) CPU calls do_page_fault
-4.) do page fault calls search_exception_table (regs->eip == c017e7a5);
-5.) search_exception_table looks up the address c017e7a5 in the
-    exception table (i.e. the contents of the ELF section __ex_table)
-    and returns the address of the associated fault handle code c0199ff5.
-6.) do_page_fault modifies its own return address to point to the fault
-    handle code and returns.
-7.) execution continues in the fault handling code.
-8.) 8a) EAX becomes -EFAULT (== -14)
-    8b) DL  becomes zero (the value we "read" from user space)
-    8c) execution continues at local label 2 (address of the
-        instruction immediately after the faulting user access).
+#. access to invalid address::
+
+    > c017e7a5 <do_con_write+e1> movb   (%ebx),%dl
+#. MMU generates exception
+#. CPU calls do_page_fault
+#. do page fault calls search_exception_table (regs->eip == c017e7a5);
+#. search_exception_table looks up the address c017e7a5 in the
+   exception table (i.e. the contents of the ELF section __ex_table)
+   and returns the address of the associated fault handle code c0199ff5.
+#. do_page_fault modifies its own return address to point to the fault
+   handle code and returns.
+#. execution continues in the fault handling code.
+#. a) EAX becomes -EFAULT (== -14)
+   b) DL  becomes zero (the value we "read" from user space)
+   c) execution continues at local label 2 (address of the
+      instruction immediately after the faulting user access).
 
 The steps 8a to 8c in a certain way emulate the faulting instruction.
 
@@ -295,14 +310,15 @@ Things changed when 64-bit support was added to x86 Linux. Rather than
 double the size of the exception table by expanding the two entries
 from 32-bits to 64 bits, a clever trick was used to store addresses
 as relative offsets from the table itself. The assembly code changed
-from:
-	.long 1b,3b
-to:
-        .long (from) - .
-        .long (to) - .
+from::
+
+    .long 1b,3b
+  to:
+          .long (from) - .
+          .long (to) - .
 
 and the C-code that uses these values converts back to absolute addresses
-like this:
+like this::
 
 	ex_insn_addr(const struct exception_table_entry *x)
 	{
@@ -313,15 +329,18 @@ In v4.6 the exception table entry was expanded with a new field "handler".
 This is also 32-bits wide and contains a third relative function
 pointer which points to one of:
 
-1) int ex_handler_default(const struct exception_table_entry *fixup)
+1) `int ex_handler_default(const struct exception_table_entry *fixup)`
    This is legacy case that just jumps to the fixup code
-2) int ex_handler_fault(const struct exception_table_entry *fixup)
+
+2) `int ex_handler_fault(const struct exception_table_entry *fixup)`
    This case provides the fault number of the trap that occurred at
    entry->insn. It is used to distinguish page faults from machine
    check.
-3) int ex_handler_ext(const struct exception_table_entry *fixup)
+
+3) `int ex_handler_ext(const struct exception_table_entry *fixup)`
    This case is used for uaccess_err ... we need to set a flag
    in the task structure. Before the handler functions existed this
    case was handled by adding a large offset to the fixup to tag
    it as special.
+
 More functions can easily be added.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 2033791e53bc..c0bfd0bd6000 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -10,3 +10,4 @@ Linux x86 Support
 
    boot
    topology
+   exception-tables
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 39/63] Documentation: x86: convert topology.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/index.rst    |   1 +
 Documentation/x86/topology.rst | 228 +++++++++++++++++++++++++++++++++
 Documentation/x86/topology.txt | 217 -------------------------------
 3 files changed, 229 insertions(+), 217 deletions(-)
 create mode 100644 Documentation/x86/topology.rst
 delete mode 100644 Documentation/x86/topology.txt

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 8f08caf4fbbb..2033791e53bc 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -9,3 +9,4 @@ Linux x86 Support
    :numbered:
 
    boot
+   topology
diff --git a/Documentation/x86/topology.rst b/Documentation/x86/topology.rst
new file mode 100644
index 000000000000..1df5f56f4882
--- /dev/null
+++ b/Documentation/x86/topology.rst
@@ -0,0 +1,228 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+x86 Topology
+============
+
+This documents and clarifies the main aspects of x86 topology modelling and
+representation in the kernel. Update/change when doing changes to the
+respective code.
+
+The architecture-agnostic topology definitions are in
+Documentation/cputopology.txt. This file holds x86-specific
+differences/specialities which must not necessarily apply to the generic
+definitions. Thus, the way to read up on Linux topology on x86 is to start
+with the generic one and look at this one in parallel for the x86 specifics.
+
+Needless to say, code should use the generic functions - this file is *only*
+here to *document* the inner workings of x86 topology.
+
+Started by Thomas Gleixner <tglx@linutronix.de> and Borislav Petkov <bp@alien8.de>.
+
+The main aim of the topology facilities is to present adequate interfaces to
+code which needs to know/query/use the structure of the running system wrt
+threads, cores, packages, etc.
+
+The kernel does not care about the concept of physical sockets because a
+socket has no relevance to software. It's an electromechanical component. In
+the past a socket always contained a single package (see below), but with the
+advent of Multi Chip Modules (MCM) a socket can hold more than one package. So
+there might be still references to sockets in the code, but they are of
+historical nature and should be cleaned up.
+
+The topology of a system is described in the units of:
+
+    - packages
+    - cores
+    - threads
+
+Package
+=======
+
+Packages contain a number of cores plus shared resources, e.g. DRAM
+controller, shared caches etc.
+
+AMD nomenclature for package is 'Node'.
+
+Package-related topology information in the kernel:
+
+  - cpuinfo_x86.x86_max_cores:
+
+    The number of cores in a package. This information is retrieved via CPUID.
+
+  - cpuinfo_x86.phys_proc_id:
+
+    The physical ID of the package. This information is retrieved via CPUID
+    and deduced from the APIC IDs of the cores in the package.
+
+  - cpuinfo_x86.logical_id:
+
+    The logical ID of the package. As we do not trust BIOSes to enumerate the
+    packages in a consistent way, we introduced the concept of logical package
+    ID so we can sanely calculate the number of maximum possible packages in
+    the system and have the packages enumerated linearly.
+
+  - topology_max_packages():
+
+    The maximum possible number of packages in the system. Helpful for per
+    package facilities to preallocate per package information.
+
+  - cpu_llc_id:
+
+    A per-CPU variable containing:
+
+    - On Intel, the first APIC ID of the list of CPUs sharing the Last Level
+      Cache.
+
+    - On AMD, the Node ID or Core Complex ID containing the Last Level
+      Cache. In general, it is a number identifying an LLC uniquely on the
+      system.
+
+Cores
+=====
+
+A core consists of 1 or more threads. It does not matter whether the threads
+are SMT- or CMT-type threads.
+
+AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses
+"core".
+
+Core-related topology information in the kernel:
+
+  - smp_num_siblings:
+
+    The number of threads in a core. The number of threads in a package can be
+    calculated by::
+
+      threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings
+
+
+Threads
+=======
+
+A thread is a single scheduling unit. It's the equivalent to a logical Linux
+CPU.
+
+AMDs nomenclature for CMT threads is "Compute Unit Core". The kernel always
+uses "thread".
+
+Thread-related topology information in the kernel:
+
+  - topology_core_cpumask():
+
+    The cpumask contains all online threads in the package to which a thread
+    belongs.
+
+    The number of online threads is also printed in /proc/cpuinfo "siblings."
+
+  - topology_sibling_cpumask():
+
+    The cpumask contains all online threads in the core to which a thread
+    belongs.
+
+  - topology_logical_package_id():
+
+    The logical package ID to which a thread belongs.
+
+  - topology_physical_package_id():
+
+    The physical package ID to which a thread belongs.
+
+  - topology_core_id();
+
+    The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo
+    "core_id."
+
+
+
+System topology examples
+========================
+
+.. note:: The alternative Linux CPU enumeration depends on how the BIOS
+  enumerates the threads. Many BIOSes enumerate all threads 0 first and
+  then all threads 1. That has the "advantage" that the logical Linux CPU
+  numbers of threads 0 stay the same whether threads are enabled or not.
+  That's merely an implementation detail and has no practical impact.
+
+1) Single Package, Single Core
+::
+
+   [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
+
+2) Single Package, Dual Core
+
+  a) One thread per core
+  ::
+
+    [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
+          -> [core 1] -> [thread 0] -> Linux CPU 1
+
+  b) Two threads per core
+  ::
+
+    [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
+          -> [thread 1] -> Linux CPU 1
+          -> [core 1] -> [thread 0] -> Linux CPU 2
+          -> [thread 1] -> Linux CPU 3
+
+  Alternative enumeration::
+
+    [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
+          -> [thread 1] -> Linux CPU 2
+          -> [core 1] -> [thread 0] -> Linux CPU 1
+          -> [thread 1] -> Linux CPU 3
+
+  AMD nomenclature for CMT systems::
+
+    [node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0
+              -> [Compute Unit Core 1] -> Linux CPU 1
+      -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2
+              -> [Compute Unit Core 1] -> Linux CPU 3
+
+4) Dual Package, Dual Core
+
+  a) One thread per core
+  ::
+
+    [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
+          -> [core 1] -> [thread 0] -> Linux CPU 1
+
+    [package 1] -> [core 0] -> [thread 0] -> Linux CPU 2
+          -> [core 1] -> [thread 0] -> Linux CPU 3
+
+  b) Two threads per core
+  ::
+
+    [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
+          -> [thread 1] -> Linux CPU 1
+          -> [core 1] -> [thread 0] -> Linux CPU 2
+          -> [thread 1] -> Linux CPU 3
+
+    [package 1] -> [core 0] -> [thread 0] -> Linux CPU 4
+          -> [thread 1] -> Linux CPU 5
+          -> [core 1] -> [thread 0] -> Linux CPU 6
+          -> [thread 1] -> Linux CPU 7
+
+  Alternative enumeration::
+
+    [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
+          -> [thread 1] -> Linux CPU 4
+          -> [core 1] -> [thread 0] -> Linux CPU 1
+          -> [thread 1] -> Linux CPU 5
+
+    [package 1] -> [core 0] -> [thread 0] -> Linux CPU 2
+          -> [thread 1] -> Linux CPU 6
+          -> [core 1] -> [thread 0] -> Linux CPU 3
+          -> [thread 1] -> Linux CPU 7
+
+  AMD nomenclature for CMT systems::
+
+    [node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0
+              -> [Compute Unit Core 1] -> Linux CPU 1
+      -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2
+              -> [Compute Unit Core 1] -> Linux CPU 3
+
+    [node 1] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 4
+              -> [Compute Unit Core 1] -> Linux CPU 5
+      -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 6
+              -> [Compute Unit Core 1] -> Linux CPU 7
diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
deleted file mode 100644
index 2953e3ec9a02..000000000000
--- a/Documentation/x86/topology.txt
+++ /dev/null
@@ -1,217 +0,0 @@
-x86 Topology
-============
-
-This documents and clarifies the main aspects of x86 topology modelling and
-representation in the kernel. Update/change when doing changes to the
-respective code.
-
-The architecture-agnostic topology definitions are in
-Documentation/cputopology.txt. This file holds x86-specific
-differences/specialities which must not necessarily apply to the generic
-definitions. Thus, the way to read up on Linux topology on x86 is to start
-with the generic one and look at this one in parallel for the x86 specifics.
-
-Needless to say, code should use the generic functions - this file is *only*
-here to *document* the inner workings of x86 topology.
-
-Started by Thomas Gleixner <tglx@linutronix.de> and Borislav Petkov <bp@alien8.de>.
-
-The main aim of the topology facilities is to present adequate interfaces to
-code which needs to know/query/use the structure of the running system wrt
-threads, cores, packages, etc.
-
-The kernel does not care about the concept of physical sockets because a
-socket has no relevance to software. It's an electromechanical component. In
-the past a socket always contained a single package (see below), but with the
-advent of Multi Chip Modules (MCM) a socket can hold more than one package. So
-there might be still references to sockets in the code, but they are of
-historical nature and should be cleaned up.
-
-The topology of a system is described in the units of:
-
-    - packages
-    - cores
-    - threads
-
-* Package:
-
-  Packages contain a number of cores plus shared resources, e.g. DRAM
-  controller, shared caches etc.
-
-  AMD nomenclature for package is 'Node'.
-
-  Package-related topology information in the kernel:
-
-  - cpuinfo_x86.x86_max_cores:
-
-    The number of cores in a package. This information is retrieved via CPUID.
-
-  - cpuinfo_x86.phys_proc_id:
-
-    The physical ID of the package. This information is retrieved via CPUID
-    and deduced from the APIC IDs of the cores in the package.
-
-  - cpuinfo_x86.logical_id:
-
-    The logical ID of the package. As we do not trust BIOSes to enumerate the
-    packages in a consistent way, we introduced the concept of logical package
-    ID so we can sanely calculate the number of maximum possible packages in
-    the system and have the packages enumerated linearly.
-
-  - topology_max_packages():
-
-    The maximum possible number of packages in the system. Helpful for per
-    package facilities to preallocate per package information.
-
-  - cpu_llc_id:
-
-    A per-CPU variable containing:
-    - On Intel, the first APIC ID of the list of CPUs sharing the Last Level
-    Cache
-
-    - On AMD, the Node ID or Core Complex ID containing the Last Level
-    Cache. In general, it is a number identifying an LLC uniquely on the
-    system.
-
-* Cores:
-
-  A core consists of 1 or more threads. It does not matter whether the threads
-  are SMT- or CMT-type threads.
-
-  AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses
-  "core".
-
-  Core-related topology information in the kernel:
-
-  - smp_num_siblings:
-
-    The number of threads in a core. The number of threads in a package can be
-    calculated by:
-
-	threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings
-
-
-* Threads:
-
-  A thread is a single scheduling unit. It's the equivalent to a logical Linux
-  CPU.
-
-  AMDs nomenclature for CMT threads is "Compute Unit Core". The kernel always
-  uses "thread".
-
-  Thread-related topology information in the kernel:
-
-  - topology_core_cpumask():
-
-    The cpumask contains all online threads in the package to which a thread
-    belongs.
-
-    The number of online threads is also printed in /proc/cpuinfo "siblings."
-
-  - topology_sibling_cpumask():
-
-    The cpumask contains all online threads in the core to which a thread
-    belongs.
-
-   - topology_logical_package_id():
-
-    The logical package ID to which a thread belongs.
-
-   - topology_physical_package_id():
-
-    The physical package ID to which a thread belongs.
-
-   - topology_core_id();
-
-    The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo
-    "core_id."
-
-
-
-System topology examples
-
-Note:
-
-The alternative Linux CPU enumeration depends on how the BIOS enumerates the
-threads. Many BIOSes enumerate all threads 0 first and then all threads 1.
-That has the "advantage" that the logical Linux CPU numbers of threads 0 stay
-the same whether threads are enabled or not. That's merely an implementation
-detail and has no practical impact.
-
-1) Single Package, Single Core
-
-   [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-
-2) Single Package, Dual Core
-
-   a) One thread per core
-
-	[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-		    -> [core 1] -> [thread 0] -> Linux CPU 1
-
-   b) Two threads per core
-
-	[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-				-> [thread 1] -> Linux CPU 1
-		    -> [core 1] -> [thread 0] -> Linux CPU 2
-				-> [thread 1] -> Linux CPU 3
-
-      Alternative enumeration:
-
-	[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-				-> [thread 1] -> Linux CPU 2
-		    -> [core 1] -> [thread 0] -> Linux CPU 1
-				-> [thread 1] -> Linux CPU 3
-
-      AMD nomenclature for CMT systems:
-
-	[node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0
-				     -> [Compute Unit Core 1] -> Linux CPU 1
-		 -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2
-				     -> [Compute Unit Core 1] -> Linux CPU 3
-
-4) Dual Package, Dual Core
-
-   a) One thread per core
-
-	[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-		    -> [core 1] -> [thread 0] -> Linux CPU 1
-
-	[package 1] -> [core 0] -> [thread 0] -> Linux CPU 2
-		    -> [core 1] -> [thread 0] -> Linux CPU 3
-
-   b) Two threads per core
-
-	[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-				-> [thread 1] -> Linux CPU 1
-		    -> [core 1] -> [thread 0] -> Linux CPU 2
-				-> [thread 1] -> Linux CPU 3
-
-	[package 1] -> [core 0] -> [thread 0] -> Linux CPU 4
-				-> [thread 1] -> Linux CPU 5
-		    -> [core 1] -> [thread 0] -> Linux CPU 6
-				-> [thread 1] -> Linux CPU 7
-
-      Alternative enumeration:
-
-	[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-				-> [thread 1] -> Linux CPU 4
-		    -> [core 1] -> [thread 0] -> Linux CPU 1
-				-> [thread 1] -> Linux CPU 5
-
-	[package 1] -> [core 0] -> [thread 0] -> Linux CPU 2
-				-> [thread 1] -> Linux CPU 6
-		    -> [core 1] -> [thread 0] -> Linux CPU 3
-				-> [thread 1] -> Linux CPU 7
-
-      AMD nomenclature for CMT systems:
-
-	[node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0
-				     -> [Compute Unit Core 1] -> Linux CPU 1
-		 -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2
-				     -> [Compute Unit Core 1] -> Linux CPU 3
-
-	[node 1] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 4
-				     -> [Compute Unit Core 1] -> Linux CPU 5
-		 -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 6
-				     -> [Compute Unit Core 1] -> Linux CPU 7
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 38/63] Documentation: x86: convert boot.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/x86/boot.rst  | 1205 +++++++++++++++++++++++++++++++++++
 Documentation/x86/boot.txt  | 1130 --------------------------------
 Documentation/x86/index.rst |    2 +
 3 files changed, 1207 insertions(+), 1130 deletions(-)
 create mode 100644 Documentation/x86/boot.rst
 delete mode 100644 Documentation/x86/boot.txt

diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
new file mode 100644
index 000000000000..9f55e832bc47
--- /dev/null
+++ b/Documentation/x86/boot.rst
@@ -0,0 +1,1205 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+The Linux/x86 Boot Protocol
+===========================
+
+On the x86 platform, the Linux kernel uses a rather complicated boot
+convention.  This has evolved partially due to historical aspects, as
+well as the desire in the early days to have the kernel itself be a
+bootable image, the complicated PC memory model and due to changed
+expectations in the PC industry caused by the effective demise of
+real-mode DOS as a mainstream operating system.
+
+Currently, the following versions of the Linux/x86 boot protocol exist.
+
+Old kernels:
+  zImage/Image support only.  Some very early kernels
+  may not even support a command line.
+
+Protocol 2.00:
+  (Kernel 1.3.73) Added bzImage and initrd support, as
+  well as a formalized way to communicate between the
+  boot loader and the kernel.  setup.S made relocatable,
+  although the traditional setup area still assumed writable.
+
+Protocol 2.01:
+  (Kernel 1.3.76) Added a heap overrun warning.
+
+Protocol 2.02:
+  (Kernel 2.4.0-test3-pre3) New command line protocol.
+  Lower the conventional memory ceiling.	No overwrite
+  of the traditional setup area, thus making booting
+  safe for systems which use the EBDA from SMM or 32-bit
+  BIOS entry points.  zImage deprecated but still supported.
+
+Protocol 2.03:
+  (Kernel 2.4.18-pre1) Explicitly makes the highest possible
+  initrd address available to the bootloader.
+
+Protocol 2.04:
+  (Kernel 2.6.14) Extend the syssize field to four bytes.
+
+Protocol 2.05:
+  (Kernel 2.6.20) Make protected mode kernel relocatable.
+  Introduce relocatable_kernel and kernel_alignment fields.
+
+Protocol 2.06:
+  (Kernel 2.6.22) Added a field that contains the size of
+  the boot command line.
+
+Protocol 2.07:
+  (Kernel 2.6.24) Added paravirtualised boot protocol.
+  Introduced hardware_subarch and hardware_subarch_data
+  and KEEP_SEGMENTS flag in load_flags.
+
+Protocol 2.08:
+  (Kernel 2.6.26) Added crc32 checksum and ELF format
+  payload. Introduced payload_offset and payload_length
+  fields to aid in locating the payload.
+
+Protocol 2.09:
+  (Kernel 2.6.26) Added a field of 64-bit physical
+  pointer to single linked list of struct	setup_data.
+
+Protocol 2.10:
+  (Kernel 2.6.31) Added a protocol for relaxed alignment
+  beyond the kernel_alignment added, new init_size and
+  pref_address fields.  Added extended boot loader IDs.
+
+Protocol 2.11:
+  (Kernel 3.6) Added a field for offset of EFI handover
+  protocol entry point.
+
+Protocol 2.12:
+  (Kernel 3.8) Added the xloadflags field and extension fields
+  to struct boot_params for loading bzImage and ramdisk
+  above 4G in 64bit.
+
+MEMORY LAYOUT
+=============
+
+The traditional memory map for the kernel loader, used for Image or
+zImage kernels, typically looks like::
+
+    |			 |
+  0A0000	+------------------------+
+    |  Reserved for BIOS	 |	Do not use.  Reserved for BIOS EBDA.
+  09A000	+------------------------+
+    |  Command line		 |
+    |  Stack/heap		 |	For use by the kernel real-mode code.
+  098000	+------------------------+	
+    |  Kernel setup		 |	The kernel real-mode code.
+  090200	+------------------------+
+    |  Kernel boot sector	 |	The kernel legacy boot sector.
+  090000	+------------------------+
+    |  Protected-mode kernel |	The bulk of the kernel image.
+  010000	+------------------------+
+    |  Boot loader		 |	<- Boot sector entry point 0000:7C00
+  001000	+------------------------+
+    |  Reserved for MBR/BIOS |
+  000800	+------------------------+
+    |  Typically used by MBR |
+  000600	+------------------------+ 
+    |  BIOS use only	 |
+  000000	+------------------------+
+
+
+When using bzImage, the protected-mode kernel was relocated to
+0x100000 ("high memory"), and the kernel real-mode block (boot sector,
+setup, and stack/heap) was made relocatable to any address between
+0x10000 and end of low memory. Unfortunately, in protocols 2.00 and
+2.01 the 0x90000+ memory range is still used internally by the kernel;
+the 2.02 protocol resolves that problem.
+
+It is desirable to keep the "memory ceiling" -- the highest point in
+low memory touched by the boot loader -- as low as possible, since
+some newer BIOSes have begun to allocate some rather large amounts of
+memory, called the Extended BIOS Data Area, near the top of low
+memory.	 The boot loader should use the "INT 12h" BIOS call to verify
+how much low memory is available.
+
+Unfortunately, if INT 12h reports that the amount of memory is too
+low, there is usually nothing the boot loader can do but to report an
+error to the user.  The boot loader should therefore be designed to
+take up as little space in low memory as it reasonably can.  For
+zImage or old bzImage kernels, which need data written into the
+0x90000 segment, the boot loader should make sure not to use memory
+above the 0x9A000 point; too many BIOSes will break above that point.
+
+For a modern bzImage kernel with boot protocol version >= 2.02, a
+memory layout like the following is suggested::
+
+    ~                        ~
+          |  Protected-mode kernel |
+  100000  +------------------------+
+    |  I/O memory hole	 |
+  0A0000	+------------------------+
+    |  Reserved for BIOS	 |	Leave as much as possible unused
+    ~                        ~
+    |  Command line		 |	(Can also be below the X+10000 mark)
+  X+10000	+------------------------+
+    |  Stack/heap		 |	For use by the kernel real-mode code.
+  X+08000	+------------------------+	
+    |  Kernel setup		 |	The kernel real-mode code.
+    |  Kernel boot sector	 |	The kernel legacy boot sector.
+  X       +------------------------+
+    |  Boot loader		 |	<- Boot sector entry point 0000:7C00
+  001000	+------------------------+
+    |  Reserved for MBR/BIOS |
+  000800	+------------------------+
+    |  Typically used by MBR |
+  000600	+------------------------+ 
+    |  BIOS use only	 |
+  000000	+------------------------+
+
+... where the address X is as low as the design of the boot loader
+permits.
+
+
+THE REAL-MODE KERNEL HEADER
+===========================
+
+In the following text, and anywhere in the kernel boot sequence, "a
+sector" refers to 512 bytes.  It is independent of the actual sector
+size of the underlying medium.
+
+The first step in loading a Linux kernel should be to load the
+real-mode code (boot sector and setup code) and then examine the
+following header at offset 0x01f1.  The real-mode code can total up to
+32K, although the boot loader may choose to load only the first two
+sectors (1K) and then examine the bootup sector size.
+
+The header looks like::
+
+  Offset	Proto	Name		Meaning
+  /Size
+
+  01F1/1	ALL(1	setup_sects	The size of the setup in sectors
+  01F2/2	ALL	root_flags	If set, the root is mounted readonly
+  01F4/4	2.04+(2	syssize		The size of the 32-bit code in 16-byte paras
+  01F8/2	ALL	ram_size	DO NOT USE - for bootsect.S use only
+  01FA/2	ALL	vid_mode	Video mode control
+  01FC/2	ALL	root_dev	Default root device number
+  01FE/2	ALL	boot_flag	0xAA55 magic number
+  0200/2	2.00+	jump		Jump instruction
+  0202/4	2.00+	header		Magic signature "HdrS"
+  0206/2	2.00+	version		Boot protocol version supported
+  0208/4	2.00+	realmode_swtch	Boot loader hook (see below)
+  020C/2	2.00+	start_sys_seg	The load-low segment (0x1000) (obsolete)
+  020E/2	2.00+	kernel_version	Pointer to kernel version string
+  0210/1	2.00+	type_of_loader	Boot loader identifier
+  0211/1	2.00+	loadflags	Boot protocol option flags
+  0212/2	2.00+	setup_move_size	Move to high memory size (used with hooks)
+  0214/4	2.00+	code32_start	Boot loader hook (see below)
+  0218/4	2.00+	ramdisk_image	initrd load address (set by boot loader)
+  021C/4	2.00+	ramdisk_size	initrd size (set by boot loader)
+  0220/4	2.00+	bootsect_kludge	DO NOT USE - for bootsect.S use only
+  0224/2	2.01+	heap_end_ptr	Free memory after setup end
+  0226/1	2.02+(3 ext_loader_ver	Extended boot loader version
+  0227/1	2.02+(3	ext_loader_type	Extended boot loader ID
+  0228/4	2.02+	cmd_line_ptr	32-bit pointer to the kernel command line
+  022C/4	2.03+	initrd_addr_max	Highest legal initrd address
+  0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
+  0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
+  0235/1	2.10+	min_alignment	Minimum alignment, as a power of two
+  0236/2	2.12+	xloadflags	Boot protocol option flags
+  0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
+  023C/4	2.07+	hardware_subarch Hardware subarchitecture
+  0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
+  0248/4	2.08+	payload_offset	Offset of kernel payload
+  024C/4	2.08+	payload_length	Length of kernel payload
+  0250/8	2.09+	setup_data	64-bit physical pointer to linked list
+          of struct setup_data
+  0258/8	2.10+	pref_address	Preferred loading address
+  0260/4	2.10+	init_size	Linear memory required during initialization
+  0264/4	2.11+	handover_offset	Offset of handover entry point
+
+(1) For backwards compatibility, if the setup_sects field contains 0, the
+    real value is 4.
+
+(2) For boot protocol prior to 2.04, the upper two bytes of the syssize
+    field are unusable, which means the size of a bzImage kernel
+    cannot be determined.
+
+(3) Ignored, but safe to set, for boot protocols 2.02-2.09.
+
+If the "HdrS" (0x53726448) magic number is not found at offset 0x202,
+the boot protocol version is "old".  Loading an old kernel, the
+following parameters should be assumed::
+
+	Image type = zImage
+	initrd not supported
+	Real-mode kernel must be located at 0x90000.
+
+Otherwise, the "version" field contains the protocol version,
+e.g. protocol version 2.01 will contain 0x0201 in this field.  When
+setting fields in the header, you must make sure only to set fields
+supported by the protocol version in use.
+
+
+DETAILS OF HEADER FIELDS
+========================
+
+For each field, some are information from the kernel to the bootloader
+("read"), some are expected to be filled out by the bootloader
+("write"), and some are expected to be read and modified by the
+bootloader ("modify").
+
+All general purpose boot loaders should write the fields marked
+(obligatory).  Boot loaders who want to load the kernel at a
+nonstandard address should fill in the fields marked (reloc); other
+boot loaders can ignore those fields.
+
+The byte order of all fields is littleendian (this is x86, after all.)
+::
+
+  Field name:	setup_sects
+  Type:		read
+  Offset/size:	0x1f1/1
+  Protocol:	ALL
+
+The size of the setup code in 512-byte sectors.  If this field is
+0, the real value is 4.  The real-mode code consists of the boot
+sector (always one 512-byte sector) plus the setup code.
+::
+
+  Field name:	 root_flags
+  Type:		 modify (optional)
+  Offset/size:	 0x1f2/2
+  Protocol:	 ALL
+
+If this field is nonzero, the root defaults to readonly.  The use of
+this field is deprecated; use the "ro" or "rw" options on the
+command line instead.
+::
+
+  Field name:	syssize
+  Type:		read
+  Offset/size:	0x1f4/4 (protocol 2.04+) 0x1f4/2 (protocol ALL)
+  Protocol:	2.04+
+
+The size of the protected-mode code in units of 16-byte paragraphs.
+For protocol versions older than 2.04 this field is only two bytes
+wide, and therefore cannot be trusted for the size of a kernel if
+the LOAD_HIGH flag is set.
+::
+
+  Field name:	ram_size
+  Type:		kernel internal
+  Offset/size:	0x1f8/2
+  Protocol:	ALL
+
+This field is obsolete.
+::
+
+  Field name:	vid_mode
+  Type:		modify (obligatory)
+  Offset/size:	0x1fa/2
+
+Please see the section on SPECIAL COMMAND LINE OPTIONS.
+::
+
+  Field name:	root_dev
+  Type:		modify (optional)
+  Offset/size:	0x1fc/2
+  Protocol:	ALL
+
+The default root device device number.  The use of this field is
+deprecated, use the "root=" option on the command line instead.
+::
+
+  Field name:	boot_flag
+  Type:		read
+  Offset/size:	0x1fe/2
+  Protocol:	ALL
+
+Contains 0xAA55.  This is the closest thing old Linux kernels have
+to a magic number.
+::
+
+  Field name:	jump
+  Type:		read
+  Offset/size:	0x200/2
+  Protocol:	2.00+
+
+Contains an x86 jump instruction, 0xEB followed by a signed offset
+relative to byte 0x202.  This can be used to determine the size of
+the header.
+::
+
+  Field name:	header
+  Type:		read
+  Offset/size:	0x202/4
+  Protocol:	2.00+
+
+Contains the magic number "HdrS" (0x53726448).
+::
+
+  Field name:	version
+  Type:		read
+  Offset/size:	0x206/2
+  Protocol:	2.00+
+
+Contains the boot protocol version, in (major << 8)+minor format,
+e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version
+10.17.
+::
+
+  Field name:	realmode_swtch
+  Type:		modify (optional)
+  Offset/size:	0x208/4
+  Protocol:	2.00+
+
+Boot loader hook (see ADVANCED BOOT LOADER HOOKS below.)
+::
+
+  Field name:	start_sys_seg
+  Type:		read
+  Offset/size:	0x20c/2
+  Protocol:	2.00+
+
+The load low segment (0x1000).  Obsolete.
+::
+
+  Field name:	kernel_version
+  Type:		read
+  Offset/size:	0x20e/2
+  Protocol:	2.00+
+
+If set to a nonzero value, contains a pointer to a NUL-terminated
+human-readable kernel version number string, less 0x200.  This can
+be used to display the kernel version to the user.  This value
+should be less than (0x200*setup_sects).
+
+For example, if this value is set to 0x1c00, the kernel version
+number string can be found at offset 0x1e00 in the kernel file.
+This is a valid value if and only if the "setup_sects" field
+contains the value 15 or higher, as::
+
+	0x1c00  < 15*0x200 (= 0x1e00) but
+	0x1c00 >= 14*0x200 (= 0x1c00)
+
+	0x1c00 >> 9 = 14, so the minimum value for setup_secs is 15.
+
+::
+
+  Field name:	type_of_loader
+  Type:		write (obligatory)
+  Offset/size:	0x210/1
+  Protocol:	2.00+
+
+If your boot loader has an assigned id (see table below), enter
+0xTV here, where T is an identifier for the boot loader and V is
+a version number.  Otherwise, enter 0xFF here.
+
+For boot loader IDs above T = 0xD, write T = 0xE to this field and
+write the extended ID minus 0x10 to the ext_loader_type field.
+Similarly, the ext_loader_ver field can be used to provide more than
+four bits for the bootloader version.
+
+For example, for T = 0x15, V = 0x234, write::
+
+  type_of_loader  <- 0xE4
+  ext_loader_type <- 0x05
+  ext_loader_ver  <- 0x23
+
+Assigned boot loader ids (hexadecimal)::
+
+	0  LILO			(0x00 reserved for pre-2.00 bootloader)
+	1  Loadlin
+	2  bootsect-loader	(0x20, all other values reserved)
+	3  Syslinux
+	4  Etherboot/gPXE/iPXE
+	5  ELILO
+	7  GRUB
+	8  U-Boot
+	9  Xen
+	A  Gujin
+	B  Qemu
+	C  Arcturus Networks uCbootloader
+	D  kexec-tools
+	E  Extended		(see ext_loader_type)
+	F  Special		(0xFF = undefined)
+       10  Reserved
+       11  Minimal Linux Bootloader <http://sebastian-plotz.blogspot.de>
+       12  OVMF UEFI virtualization stack
+
+Please contact <hpa@zytor.com> if you need a bootloader ID value assigned.
+::
+
+  Field name:	loadflags
+  Type:		modify (obligatory)
+  Offset/size:	0x211/1
+  Protocol:	2.00+
+
+This field is a bitmask.
+::
+
+  Bit 0 (read):	LOADED_HIGH
+	- If 0, the protected-mode code is loaded at 0x10000.
+	- If 1, the protected-mode code is loaded at 0x100000.
+
+  Bit 1 (kernel internal): KASLR_FLAG
+	- Used internally by the compressed kernel to communicate
+	  KASLR status to kernel proper.
+	  If 1, KASLR enabled.
+	  If 0, KASLR disabled.
+
+  Bit 5 (write): QUIET_FLAG
+	- If 0, print early messages.
+	- If 1, suppress early messages.
+		This requests to the kernel (decompressor and early
+		kernel) to not write early messages that require
+		accessing the display hardware directly.
+
+  Bit 6 (write): KEEP_SEGMENTS
+	Protocol: 2.07+
+	- If 0, reload the segment registers in the 32bit entry point.
+	- If 1, do not reload the segment registers in the 32bit entry point.
+		Assume that %cs %ds %ss %es are all set to flat segments with
+		a base of 0 (or the equivalent for their environment).
+
+  Bit 7 (write): CAN_USE_HEAP
+	Set this bit to 1 to indicate that the value entered in the
+	heap_end_ptr is valid.  If this field is clear, some setup code
+	functionality will be disabled.
+
+::
+
+  Field name:	setup_move_size
+  Type:		modify (obligatory)
+  Offset/size:	0x212/2
+  Protocol:	2.00-2.01
+
+When using protocol 2.00 or 2.01, if the real mode kernel is not
+loaded at 0x90000, it gets moved there later in the loading
+sequence.  Fill in this field if you want additional data (such as
+the kernel command line) moved in addition to the real-mode kernel
+itself.
+
+The unit is bytes starting with the beginning of the boot sector.
+  
+This field is can be ignored when the protocol is 2.02 or higher, or
+if the real-mode code is loaded at 0x90000.
+::
+
+  Field name:	code32_start
+  Type:		modify (optional, reloc)
+  Offset/size:	0x214/4
+  Protocol:	2.00+
+
+The address to jump to in protected mode.  This defaults to the load
+address of the kernel, and can be used by the boot loader to
+determine the proper load address.
+
+This field can be modified for two purposes:
+
+  1. as a boot loader hook (see ADVANCED BOOT LOADER HOOKS below.)
+
+  2. if a bootloader which does not install a hook loads a
+     relocatable kernel at a nonstandard address it will have to modify
+     this field to point to the load address.
+
+::
+
+  Field name:	ramdisk_image
+  Type:		write (obligatory)
+  Offset/size:	0x218/4
+  Protocol:	2.00+
+
+The 32-bit linear address of the initial ramdisk or ramfs.  Leave at
+zero if there is no initial ramdisk/ramfs.
+::
+
+  Field name:	ramdisk_size
+  Type:		write (obligatory)
+  Offset/size:	0x21c/4
+  Protocol:	2.00+
+
+Size of the initial ramdisk or ramfs.  Leave at zero if there is no
+initial ramdisk/ramfs.
+::
+
+  Field name:	bootsect_kludge
+  Type:		kernel internal
+  Offset/size:	0x220/4
+  Protocol:	2.00+
+
+This field is obsolete.
+::
+
+  Field name:	heap_end_ptr
+  Type:		write (obligatory)
+  Offset/size:	0x224/2
+  Protocol:	2.01+
+
+Set this field to the offset (from the beginning of the real-mode
+code) of the end of the setup stack/heap, minus 0x0200.
+::
+
+  Field name:	ext_loader_ver
+  Type:		write (optional)
+  Offset/size:	0x226/1
+  Protocol:	2.02+
+
+This field is used as an extension of the version number in the
+type_of_loader field.  The total version number is considered to be
+(type_of_loader & 0x0f) + (ext_loader_ver << 4).
+
+The use of this field is boot loader specific.  If not written, it
+is zero.
+
+Kernels prior to 2.6.31 did not recognize this field, but it is safe
+to write for protocol version 2.02 or higher.
+::
+
+  Field name:	ext_loader_type
+  Type:		write (obligatory if (type_of_loader & 0xf0) == 0xe0)
+  Offset/size:	0x227/1
+  Protocol:	2.02+
+
+This field is used as an extension of the type number in
+type_of_loader field.  If the type in type_of_loader is 0xE, then
+the actual type is (ext_loader_type + 0x10).
+
+This field is ignored if the type in type_of_loader is not 0xE.
+
+Kernels prior to 2.6.31 did not recognize this field, but it is safe
+to write for protocol version 2.02 or higher.
+::
+
+  Field name:	cmd_line_ptr
+  Type:		write (obligatory)
+  Offset/size:	0x228/4
+  Protocol:	2.02+
+
+Set this field to the linear address of the kernel command line.
+The kernel command line can be located anywhere between the end of
+the setup heap and 0xA0000; it does not have to be located in the
+same 64K segment as the real-mode code itself.
+
+Fill in this field even if your boot loader does not support a
+command line, in which case you can point this to an empty string
+(or better yet, to the string "auto".)  If this field is left at
+zero, the kernel will assume that your boot loader does not support
+the 2.02+ protocol.
+::
+
+  Field name:	initrd_addr_max
+  Type:		read
+  Offset/size:	0x22c/4
+  Protocol:	2.03+
+
+The maximum address that may be occupied by the initial
+ramdisk/ramfs contents.  For boot protocols 2.02 or earlier, this
+field is not present, and the maximum address is 0x37FFFFFF.  (This
+address is defined as the address of the highest safe byte, so if
+your ramdisk is exactly 131072 bytes long and this field is
+0x37FFFFFF, you can start your ramdisk at 0x37FE0000.)
+::
+
+  Field name:	kernel_alignment
+  Type:		read/modify (reloc)
+  Offset/size:	0x230/4
+  Protocol:	2.05+ (read), 2.10+ (modify)
+
+Alignment unit required by the kernel (if relocatable_kernel is
+true.)  A relocatable kernel that is loaded at an alignment
+incompatible with the value in this field will be realigned during
+kernel initialization.
+
+Starting with protocol version 2.10, this reflects the kernel
+alignment preferred for optimal performance; it is possible for the
+loader to modify this field to permit a lesser alignment.  See the
+min_alignment and pref_address field below.
+::
+
+  Field name:	relocatable_kernel
+  Type:		read (reloc)
+  Offset/size:	0x234/1
+  Protocol:	2.05+
+
+If this field is nonzero, the protected-mode part of the kernel can
+be loaded at any address that satisfies the kernel_alignment field.
+After loading, the boot loader must set the code32_start field to
+point to the loaded code, or to a boot loader hook.
+::
+
+  Field name:	min_alignment
+  Type:		read (reloc)
+  Offset/size:	0x235/1
+  Protocol:	2.10+
+
+This field, if nonzero, indicates as a power of two the minimum
+alignment required, as opposed to preferred, by the kernel to boot.
+If a boot loader makes use of this field, it should update the
+kernel_alignment field with the alignment unit desired; typically::
+
+	kernel_alignment = 1 << min_alignment
+
+There may be a considerable performance cost with an excessively
+misaligned kernel.  Therefore, a loader should typically try each
+power-of-two alignment from kernel_alignment down to this alignment.
+::
+
+  Field name:     xloadflags
+  Type:           read
+  Offset/size:    0x236/2
+  Protocol:       2.12+
+
+This field is a bitmask.
+::
+
+  Bit 0 (read):	XLF_KERNEL_64
+	- If 1, this kernel has the legacy 64-bit entry point at 0x200.
+
+  Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
+        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
+
+  Bit 2 (read):	XLF_EFI_HANDOVER_32
+	- If 1, the kernel supports the 32-bit EFI handoff entry point
+          given at handover_offset.
+
+  Bit 3 (read): XLF_EFI_HANDOVER_64
+	- If 1, the kernel supports the 64-bit EFI handoff entry point
+          given at handover_offset + 0x200.
+
+  Bit 4 (read): XLF_EFI_KEXEC
+	- If 1, the kernel supports kexec EFI boot with EFI runtime support.
+
+::
+
+  Field name:	cmdline_size
+  Type:		read
+  Offset/size:	0x238/4
+  Protocol:	2.06+
+
+The maximum size of the command line without the terminating
+zero. This means that the command line can contain at most
+cmdline_size characters. With protocol version 2.05 and earlier, the
+maximum size was 255.
+::
+
+  Field name:	hardware_subarch
+  Type:		write (optional, defaults to x86/PC)
+  Offset/size:	0x23c/4
+  Protocol:	2.07+
+
+In a paravirtualized environment the hardware low level architectural
+pieces such as interrupt handling, page table handling, and
+accessing process control registers needs to be done differently.
+
+This field allows the bootloader to inform the kernel we are in one
+one of those environments.
+::
+
+  0x00000000	The default x86/PC environment
+  0x00000001	lguest
+  0x00000002	Xen
+  0x00000003	Moorestown MID
+  0x00000004	CE4100 TV Platform
+
+::
+
+  Field name:	hardware_subarch_data
+  Type:		write (subarch-dependent)
+  Offset/size:	0x240/8
+  Protocol:	2.07+
+
+A pointer to data that is specific to hardware subarch
+This field is currently unused for the default x86/PC environment,
+do not modify.
+::
+
+  Field name:	payload_offset
+  Type:		read
+  Offset/size:	0x248/4
+  Protocol:	2.08+
+
+If non-zero then this field contains the offset from the beginning
+of the protected-mode code to the payload.
+
+The payload may be compressed. The format of both the compressed and
+uncompressed data should be determined using the standard magic
+numbers.  The currently supported compression formats are gzip
+(magic numbers 1F 8B or 1F 9E), bzip2 (magic number 42 5A), LZMA
+(magic number 5D 00), XZ (magic number FD 37), and LZ4 (magic number
+02 21).  The uncompressed payload is currently always ELF (magic
+number 7F 45 4C 46).
+::
+
+  Field name:	payload_length
+  Type:		read
+  Offset/size:	0x24c/4
+  Protocol:	2.08+
+
+The length of the payload.
+::
+
+  Field name:	setup_data
+  Type:		write (special)
+  Offset/size:	0x250/8
+  Protocol:	2.09+
+
+The 64-bit physical pointer to NULL terminated single linked list of
+struct setup_data. This is used to define a more extensible boot
+parameters passing mechanism. The definition of struct setup_data is
+as follow::
+
+  struct setup_data {
+	  u64 next;
+	  u32 type;
+	  u32 len;
+	  u8  data[0];
+  };
+
+Where, the next is a 64-bit physical pointer to the next node of
+linked list, the next field of the last node is 0; the type is used
+to identify the contents of data; the len is the length of data
+field; the data holds the real payload.
+
+This list may be modified at a number of points during the bootup
+process.  Therefore, when modifying this list one should always make
+sure to consider the case where the linked list already contains
+entries.
+::
+
+  Field name:	pref_address
+  Type:		read (reloc)
+  Offset/size:	0x258/8
+  Protocol:	2.10+
+
+This field, if nonzero, represents a preferred load address for the
+kernel.  A relocating bootloader should attempt to load at this
+address if possible.
+
+A non-relocatable kernel will unconditionally move itself and to run
+at this address.
+::
+
+  Field name:	init_size
+  Type:		read
+  Offset/size:	0x260/4
+
+This field indicates the amount of linear contiguous memory starting
+at the kernel runtime start address that the kernel needs before it
+is capable of examining its memory map.  This is not the same thing
+as the total amount of memory the kernel needs to boot, but it can
+be used by a relocating boot loader to help select a safe load
+address for the kernel.
+
+The kernel runtime start address is determined by the following algorithm::
+
+  if (relocatable_kernel)
+    runtime_start = align_up(load_address, kernel_alignment)
+  else
+    runtime_start = pref_address
+
+::
+
+  Field name:	handover_offset
+  Type:		read
+  Offset/size:	0x264/4
+
+This field is the offset from the beginning of the kernel image to
+the EFI handover protocol entry point. Boot loaders using the EFI
+handover protocol to boot the kernel should jump to this offset.
+
+See EFI HANDOVER PROTOCOL below for more details.
+
+
+THE IMAGE CHECKSUM
+==================
+
+From boot protocol version 2.08 onwards the CRC-32 is calculated over
+the entire file using the characteristic polynomial 0x04C11DB7 and an
+initial remainder of 0xffffffff.  The checksum is appended to the
+file; therefore the CRC of the file up to the limit specified in the
+syssize field of the header is always 0.
+
+
+THE KERNEL COMMAND LINE
+=======================
+
+The kernel command line has become an important way for the boot
+loader to communicate with the kernel.  Some of its options are also
+relevant to the boot loader itself, see "special command line options"
+below.
+
+The kernel command line is a null-terminated string. The maximum
+length can be retrieved from the field cmdline_size.  Before protocol
+version 2.06, the maximum was 255 characters.  A string that is too
+long will be automatically truncated by the kernel.
+
+If the boot protocol version is 2.02 or later, the address of the
+kernel command line is given by the header field cmd_line_ptr (see
+above.)  This address can be anywhere between the end of the setup
+heap and 0xA0000.
+
+If the protocol version is *not* 2.02 or higher, the kernel
+command line is entered using the following protocol:
+
+  - At offset 0x0020 (word), "cmd_line_magic", enter the magic
+    number 0xA33F.
+
+  - At offset 0x0022 (word), "cmd_line_offset", enter the offset
+    of the kernel command line (relative to the start of the
+    real-mode kernel).
+
+  - The kernel command line *must* be within the memory region
+    covered by setup_move_size, so you may need to adjust this
+    field.
+
+
+MEMORY LAYOUT OF THE REAL-MODE CODE
+===================================
+
+The real-mode code requires a stack/heap to be set up, as well as
+memory allocated for the kernel command line.  This needs to be done
+in the real-mode accessible memory in bottom megabyte.
+
+It should be noted that modern machines often have a sizable Extended
+BIOS Data Area (EBDA).  As a result, it is advisable to use as little
+of the low megabyte as possible.
+
+Unfortunately, under the following circumstances the 0x90000 memory
+segment has to be used:
+
+	- When loading a zImage kernel ((loadflags & 0x01) == 0).
+	- When loading a 2.01 or earlier boot protocol kernel.
+
+	     For the 2.00 and 2.01 boot protocols, the real-mode code
+	     can be loaded at another address, but it is internally
+	     relocated to 0x90000.  For the "old" protocol, the
+	     real-mode code must be loaded at 0x90000.
+
+When loading at 0x90000, avoid using memory above 0x9a000.
+
+For boot protocol 2.02 or higher, the command line does not have to be
+located in the same 64K segment as the real-mode setup code; it is
+thus permitted to give the stack/heap the full 64K segment and locate
+the command line above it.
+
+The kernel command line should not be located below the real-mode
+code, nor should it be located in high memory.
+
+
+SAMPLE BOOT CONFIGURATION
+=========================
+
+As a sample configuration, assume the following layout of the real
+mode segment.
+
+When loading below 0x90000, use the entire segment::
+
+	0x0000-0x7fff	Real mode kernel
+	0x8000-0xdfff	Stack and heap
+	0xe000-0xffff	Kernel command line
+
+When loading at 0x90000 OR the protocol version is 2.01 or earlier::
+
+	0x0000-0x7fff	Real mode kernel
+	0x8000-0x97ff	Stack and heap
+	0x9800-0x9fff	Kernel command line
+
+Such a boot loader should enter the following fields in the header::
+
+	unsigned long base_ptr;	/* base address for real-mode segment */
+
+	if ( setup_sects == 0 ) {
+		setup_sects = 4;
+	}
+
+	if ( protocol >= 0x0200 ) {
+		type_of_loader = <type code>;
+		if ( loading_initrd ) {
+			ramdisk_image = <initrd_address>;
+			ramdisk_size = <initrd_size>;
+		}
+
+		if ( protocol >= 0x0202 && loadflags & 0x01 )
+			heap_end = 0xe000;
+		else
+			heap_end = 0x9800;
+
+		if ( protocol >= 0x0201 ) {
+			heap_end_ptr = heap_end - 0x200;
+			loadflags |= 0x80; /* CAN_USE_HEAP */
+		}
+
+		if ( protocol >= 0x0202 ) {
+			cmd_line_ptr = base_ptr + heap_end;
+			strcpy(cmd_line_ptr, cmdline);
+		} else {
+			cmd_line_magic	= 0xA33F;
+			cmd_line_offset = heap_end;
+			setup_move_size = heap_end + strlen(cmdline)+1;
+			strcpy(base_ptr+cmd_line_offset, cmdline);
+		}
+	} else {
+		/* Very old kernel */
+
+		heap_end = 0x9800;
+
+		cmd_line_magic	= 0xA33F;
+		cmd_line_offset = heap_end;
+
+		/* A very old kernel MUST have its real-mode code
+		   loaded at 0x90000 */
+
+		if ( base_ptr != 0x90000 ) {
+			/* Copy the real-mode kernel */
+			memcpy(0x90000, base_ptr, (setup_sects+1)*512);
+			base_ptr = 0x90000;		 /* Relocated */
+		}
+
+		strcpy(0x90000+cmd_line_offset, cmdline);
+
+		/* It is recommended to clear memory up to the 32K mark */
+		memset(0x90000 + (setup_sects+1)*512, 0,
+		       (64-(setup_sects+1))*512);
+	}
+
+
+LOADING THE REST OF THE KERNEL
+==============================
+
+The 32-bit (non-real-mode) kernel starts at offset (setup_sects+1)*512
+in the kernel file (again, if setup_sects == 0 the real value is 4.)
+It should be loaded at address 0x10000 for Image/zImage kernels and
+0x100000 for bzImage kernels.
+
+The kernel is a bzImage kernel if the protocol >= 2.00 and the 0x01
+bit (LOAD_HIGH) in the loadflags field is set::
+
+	is_bzImage = (protocol >= 0x0200) && (loadflags & 0x01);
+	load_address = is_bzImage ? 0x100000 : 0x10000;
+
+Note that Image/zImage kernels can be up to 512K in size, and thus use
+the entire 0x10000-0x90000 range of memory.  This means it is pretty
+much a requirement for these kernels to load the real-mode part at
+0x90000.  bzImage kernels allow much more flexibility.
+
+
+SPECIAL COMMAND LINE OPTIONS
+============================
+
+If the command line provided by the boot loader is entered by the
+user, the user may expect the following command line options to work.
+They should normally not be deleted from the kernel command line even
+though not all of them are actually meaningful to the kernel.  Boot
+loader authors who need additional command line options for the boot
+loader itself should get them registered in
+Documentation/admin-guide/kernel-parameters.rst to make sure they will not
+conflict with actual kernel options now or in the future.
+
+  vga=<mode>
+	<mode> here is either an integer (in C notation, either
+	decimal, octal, or hexadecimal) or one of the strings
+	"normal" (meaning 0xFFFF), "ext" (meaning 0xFFFE) or "ask"
+	(meaning 0xFFFD).  This value should be entered into the
+	vid_mode field, as it is used by the kernel before the command
+	line is parsed.
+
+  mem=<size>
+	<size> is an integer in C notation optionally followed by
+	(case insensitive) K, M, G, T, P or E (meaning << 10, << 20,
+	<< 30, << 40, << 50 or << 60).  This specifies the end of
+	memory to the kernel. This affects the possible placement of
+	an initrd, since an initrd should be placed near end of
+	memory.  Note that this is an option to *both* the kernel and
+	the bootloader!
+
+  initrd=<file>
+	An initrd should be loaded.  The meaning of <file> is
+	obviously bootloader-dependent, and some boot loaders
+	(e.g. LILO) do not have such a command.
+
+In addition, some boot loaders add the following options to the
+user-specified command line:
+
+  BOOT_IMAGE=<file>
+	The boot image which was loaded.  Again, the meaning of <file>
+	is obviously bootloader-dependent.
+
+  auto
+	The kernel was booted without explicit user intervention.
+
+If these options are added by the boot loader, it is highly
+recommended that they are located *first*, before the user-specified
+or configuration-specified command line.  Otherwise, "init=/bin/sh"
+gets confused by the "auto" option.
+
+
+RUNNING THE KERNEL
+==================
+
+The kernel is started by jumping to the kernel entry point, which is
+located at *segment* offset 0x20 from the start of the real mode
+kernel.  This means that if you loaded your real-mode kernel code at
+0x90000, the kernel entry point is 9020:0000.
+
+At entry, ds = es = ss should point to the start of the real-mode
+kernel code (0x9000 if the code is loaded at 0x90000), sp should be
+set up properly, normally pointing to the top of the heap, and
+interrupts should be disabled.  Furthermore, to guard against bugs in
+the kernel, it is recommended that the boot loader sets fs = gs = ds =
+es = ss.
+
+In our example from above, we would do::
+
+	/* Note: in the case of the "old" kernel protocol, base_ptr must
+	   be == 0x90000 at this point; see the previous sample code */
+
+	seg = base_ptr >> 4;
+
+	cli();	/* Enter with interrupts disabled! */
+
+	/* Set up the real-mode kernel stack */
+	_SS = seg;
+	_SP = heap_end;
+
+	_DS = _ES = _FS = _GS = seg;
+	jmp_far(seg+0x20, 0);	/* Run the kernel */
+
+If your boot sector accesses a floppy drive, it is recommended to
+switch off the floppy motor before running the kernel, since the
+kernel boot leaves interrupts off and thus the motor will not be
+switched off, especially if the loaded kernel has the floppy driver as
+a demand-loaded module!
+
+
+ADVANCED BOOT LOADER HOOKS
+==========================
+
+If the boot loader runs in a particularly hostile environment (such as
+LOADLIN, which runs under DOS) it may be impossible to follow the
+standard memory location requirements.  Such a boot loader may use the
+following hooks that, if set, are invoked by the kernel at the
+appropriate time.  The use of these hooks should probably be
+considered an absolutely last resort!
+
+IMPORTANT: All the hooks are required to preserve %esp, %ebp, %esi and
+%edi across invocation.
+
+  realmode_swtch:
+	A 16-bit real mode far subroutine invoked immediately before
+	entering protected mode.  The default routine disables NMI, so
+	your routine should probably do so, too.
+
+  code32_start:
+	A 32-bit flat-mode routine *jumped* to immediately after the
+	transition to protected mode, but before the kernel is
+	uncompressed.  No segments, except CS, are guaranteed to be
+	set up (current kernels do, but older ones do not); you should
+	set them up to BOOT_DS (0x18) yourself.
+
+	After completing your hook, you should jump to the address
+	that was in this field before your boot loader overwrote it
+	(relocated, if appropriate.)
+
+
+32-bit BOOT PROTOCOL
+====================
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol needs
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should be to setup the boot parameters (struct boot_params,
+traditionally known as "zero page"). The memory for struct boot_params
+should be allocated and initialized to all zero. Then the setup header
+from offset 0x01f1 of kernel image on should be loaded into struct
+boot_params and examined. The end of setup header can be calculated as
+follow::
+
+	0x0202 + byte value at offset 0x0201
+
+In addition to read/modify/write the setup header of the struct
+boot_params as that of 16-bit boot protocol, the boot loader should
+also fill the additional fields of the struct boot_params as that
+described in zero-page.txt.
+
+After setting up the struct boot_params, the boot loader can load the
+32/64-bit kernel in the same way as that of 16-bit boot protocol.
+
+In 32-bit boot protocol, the kernel is started by jumping to the
+32-bit kernel entry point, which is the start address of loaded
+32/64-bit kernel.
+
+At entry, the CPU must be in 32-bit protected mode with paging
+disabled; a GDT must be loaded with the descriptors for selectors
+__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
+segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
+must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
+must be __BOOT_DS; interrupt must be disabled; %esi must hold the base
+address of the struct boot_params; %ebp, %edi and %ebx must be zero.
+
+64-bit BOOT PROTOCOL
+====================
+
+For machine with 64bit cpus and 64bit kernel, we could use 64bit bootloader
+and we need a 64-bit boot protocol.
+
+In 64-bit boot protocol, the first step in loading a Linux kernel
+should be to setup the boot parameters (struct boot_params,
+traditionally known as "zero page"). The memory for struct boot_params
+could be allocated anywhere (even above 4G) and initialized to all zero.
+Then, the setup header at offset 0x01f1 of kernel image on should be
+loaded into struct boot_params and examined. The end of setup header
+can be calculated as follows::
+
+	0x0202 + byte value at offset 0x0201
+
+In addition to read/modify/write the setup header of the struct
+boot_params as that of 16-bit boot protocol, the boot loader should
+also fill the additional fields of the struct boot_params as described
+in zero-page.txt.
+
+After setting up the struct boot_params, the boot loader can load
+64-bit kernel in the same way as that of 16-bit boot protocol, but
+kernel could be loaded above 4G.
+
+In 64-bit boot protocol, the kernel is started by jumping to the
+64-bit kernel entry point, which is the start address of loaded
+64-bit kernel plus 0x200.
+
+At entry, the CPU must be in 64-bit mode with paging enabled.
+The range with setup_header.init_size from start address of loaded
+kernel and zero page and command line buffer get ident mapping;
+a GDT must be loaded with the descriptors for selectors
+__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
+segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
+must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
+must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base
+address of the struct boot_params.
+
+EFI HANDOVER PROTOCOL
+=====================
+
+This protocol allows boot loaders to defer initialisation to the EFI
+boot stub. The boot loader is required to load the kernel/initrd(s)
+from the boot media and jump to the EFI handover protocol entry point
+which is hdr->handover_offset bytes from the beginning of
+startup_{32,64}.
+
+The function prototype for the handover entry point looks like this::
+
+    efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp)
+
+'handle' is the EFI image handle passed to the boot loader by the EFI
+firmware, 'table' is the EFI system table - these are the first two
+arguments of the "handoff state" as described in section 2.3 of the
+UEFI specification. 'bp' is the boot loader-allocated boot params.
+
+The boot loader *must* fill out the following fields in bp::
+
+  - hdr.code32_start
+  - hdr.cmd_line_ptr
+  - hdr.ramdisk_image (if applicable)
+  - hdr.ramdisk_size  (if applicable)
+
+All other fields should be zero.
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
deleted file mode 100644
index f4c2a97bfdbd..000000000000
--- a/Documentation/x86/boot.txt
+++ /dev/null
@@ -1,1130 +0,0 @@
-		     THE LINUX/x86 BOOT PROTOCOL
-		     ---------------------------
-
-On the x86 platform, the Linux kernel uses a rather complicated boot
-convention.  This has evolved partially due to historical aspects, as
-well as the desire in the early days to have the kernel itself be a
-bootable image, the complicated PC memory model and due to changed
-expectations in the PC industry caused by the effective demise of
-real-mode DOS as a mainstream operating system.
-
-Currently, the following versions of the Linux/x86 boot protocol exist.
-
-Old kernels:	zImage/Image support only.  Some very early kernels
-		may not even support a command line.
-
-Protocol 2.00:	(Kernel 1.3.73) Added bzImage and initrd support, as
-		well as a formalized way to communicate between the
-		boot loader and the kernel.  setup.S made relocatable,
-		although the traditional setup area still assumed
-		writable.
-
-Protocol 2.01:	(Kernel 1.3.76) Added a heap overrun warning.
-
-Protocol 2.02:	(Kernel 2.4.0-test3-pre3) New command line protocol.
-		Lower the conventional memory ceiling.	No overwrite
-		of the traditional setup area, thus making booting
-		safe for systems which use the EBDA from SMM or 32-bit
-		BIOS entry points.  zImage deprecated but still
-		supported.
-
-Protocol 2.03:	(Kernel 2.4.18-pre1) Explicitly makes the highest possible
-		initrd address available to the bootloader.
-
-Protocol 2.04:	(Kernel 2.6.14) Extend the syssize field to four bytes.
-
-Protocol 2.05:	(Kernel 2.6.20) Make protected mode kernel relocatable.
-		Introduce relocatable_kernel and kernel_alignment fields.
-
-Protocol 2.06:	(Kernel 2.6.22) Added a field that contains the size of
-		the boot command line.
-
-Protocol 2.07:	(Kernel 2.6.24) Added paravirtualised boot protocol.
-		Introduced hardware_subarch and hardware_subarch_data
-		and KEEP_SEGMENTS flag in load_flags.
-
-Protocol 2.08:	(Kernel 2.6.26) Added crc32 checksum and ELF format
-		payload. Introduced payload_offset and payload_length
-		fields to aid in locating the payload.
-
-Protocol 2.09:	(Kernel 2.6.26) Added a field of 64-bit physical
-		pointer to single linked list of struct	setup_data.
-
-Protocol 2.10:	(Kernel 2.6.31) Added a protocol for relaxed alignment
-		beyond the kernel_alignment added, new init_size and
-		pref_address fields.  Added extended boot loader IDs.
-
-Protocol 2.11:	(Kernel 3.6) Added a field for offset of EFI handover
-		protocol entry point.
-
-Protocol 2.12:	(Kernel 3.8) Added the xloadflags field and extension fields
-		to struct boot_params for loading bzImage and ramdisk
-		above 4G in 64bit.
-
-**** MEMORY LAYOUT
-
-The traditional memory map for the kernel loader, used for Image or
-zImage kernels, typically looks like:
-
-	|			 |
-0A0000	+------------------------+
-	|  Reserved for BIOS	 |	Do not use.  Reserved for BIOS EBDA.
-09A000	+------------------------+
-	|  Command line		 |
-	|  Stack/heap		 |	For use by the kernel real-mode code.
-098000	+------------------------+	
-	|  Kernel setup		 |	The kernel real-mode code.
-090200	+------------------------+
-	|  Kernel boot sector	 |	The kernel legacy boot sector.
-090000	+------------------------+
-	|  Protected-mode kernel |	The bulk of the kernel image.
-010000	+------------------------+
-	|  Boot loader		 |	<- Boot sector entry point 0000:7C00
-001000	+------------------------+
-	|  Reserved for MBR/BIOS |
-000800	+------------------------+
-	|  Typically used by MBR |
-000600	+------------------------+ 
-	|  BIOS use only	 |
-000000	+------------------------+
-
-
-When using bzImage, the protected-mode kernel was relocated to
-0x100000 ("high memory"), and the kernel real-mode block (boot sector,
-setup, and stack/heap) was made relocatable to any address between
-0x10000 and end of low memory. Unfortunately, in protocols 2.00 and
-2.01 the 0x90000+ memory range is still used internally by the kernel;
-the 2.02 protocol resolves that problem.
-
-It is desirable to keep the "memory ceiling" -- the highest point in
-low memory touched by the boot loader -- as low as possible, since
-some newer BIOSes have begun to allocate some rather large amounts of
-memory, called the Extended BIOS Data Area, near the top of low
-memory.	 The boot loader should use the "INT 12h" BIOS call to verify
-how much low memory is available.
-
-Unfortunately, if INT 12h reports that the amount of memory is too
-low, there is usually nothing the boot loader can do but to report an
-error to the user.  The boot loader should therefore be designed to
-take up as little space in low memory as it reasonably can.  For
-zImage or old bzImage kernels, which need data written into the
-0x90000 segment, the boot loader should make sure not to use memory
-above the 0x9A000 point; too many BIOSes will break above that point.
-
-For a modern bzImage kernel with boot protocol version >= 2.02, a
-memory layout like the following is suggested:
-
-	~                        ~
-        |  Protected-mode kernel |
-100000  +------------------------+
-	|  I/O memory hole	 |
-0A0000	+------------------------+
-	|  Reserved for BIOS	 |	Leave as much as possible unused
-	~                        ~
-	|  Command line		 |	(Can also be below the X+10000 mark)
-X+10000	+------------------------+
-	|  Stack/heap		 |	For use by the kernel real-mode code.
-X+08000	+------------------------+	
-	|  Kernel setup		 |	The kernel real-mode code.
-	|  Kernel boot sector	 |	The kernel legacy boot sector.
-X       +------------------------+
-	|  Boot loader		 |	<- Boot sector entry point 0000:7C00
-001000	+------------------------+
-	|  Reserved for MBR/BIOS |
-000800	+------------------------+
-	|  Typically used by MBR |
-000600	+------------------------+ 
-	|  BIOS use only	 |
-000000	+------------------------+
-
-... where the address X is as low as the design of the boot loader
-permits.
-
-
-**** THE REAL-MODE KERNEL HEADER
-
-In the following text, and anywhere in the kernel boot sequence, "a
-sector" refers to 512 bytes.  It is independent of the actual sector
-size of the underlying medium.
-
-The first step in loading a Linux kernel should be to load the
-real-mode code (boot sector and setup code) and then examine the
-following header at offset 0x01f1.  The real-mode code can total up to
-32K, although the boot loader may choose to load only the first two
-sectors (1K) and then examine the bootup sector size.
-
-The header looks like:
-
-Offset	Proto	Name		Meaning
-/Size
-
-01F1/1	ALL(1	setup_sects	The size of the setup in sectors
-01F2/2	ALL	root_flags	If set, the root is mounted readonly
-01F4/4	2.04+(2	syssize		The size of the 32-bit code in 16-byte paras
-01F8/2	ALL	ram_size	DO NOT USE - for bootsect.S use only
-01FA/2	ALL	vid_mode	Video mode control
-01FC/2	ALL	root_dev	Default root device number
-01FE/2	ALL	boot_flag	0xAA55 magic number
-0200/2	2.00+	jump		Jump instruction
-0202/4	2.00+	header		Magic signature "HdrS"
-0206/2	2.00+	version		Boot protocol version supported
-0208/4	2.00+	realmode_swtch	Boot loader hook (see below)
-020C/2	2.00+	start_sys_seg	The load-low segment (0x1000) (obsolete)
-020E/2	2.00+	kernel_version	Pointer to kernel version string
-0210/1	2.00+	type_of_loader	Boot loader identifier
-0211/1	2.00+	loadflags	Boot protocol option flags
-0212/2	2.00+	setup_move_size	Move to high memory size (used with hooks)
-0214/4	2.00+	code32_start	Boot loader hook (see below)
-0218/4	2.00+	ramdisk_image	initrd load address (set by boot loader)
-021C/4	2.00+	ramdisk_size	initrd size (set by boot loader)
-0220/4	2.00+	bootsect_kludge	DO NOT USE - for bootsect.S use only
-0224/2	2.01+	heap_end_ptr	Free memory after setup end
-0226/1	2.02+(3 ext_loader_ver	Extended boot loader version
-0227/1	2.02+(3	ext_loader_type	Extended boot loader ID
-0228/4	2.02+	cmd_line_ptr	32-bit pointer to the kernel command line
-022C/4	2.03+	initrd_addr_max	Highest legal initrd address
-0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
-0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
-0235/1	2.10+	min_alignment	Minimum alignment, as a power of two
-0236/2	2.12+	xloadflags	Boot protocol option flags
-0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
-023C/4	2.07+	hardware_subarch Hardware subarchitecture
-0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
-0248/4	2.08+	payload_offset	Offset of kernel payload
-024C/4	2.08+	payload_length	Length of kernel payload
-0250/8	2.09+	setup_data	64-bit physical pointer to linked list
-				of struct setup_data
-0258/8	2.10+	pref_address	Preferred loading address
-0260/4	2.10+	init_size	Linear memory required during initialization
-0264/4	2.11+	handover_offset	Offset of handover entry point
-
-(1) For backwards compatibility, if the setup_sects field contains 0, the
-    real value is 4.
-
-(2) For boot protocol prior to 2.04, the upper two bytes of the syssize
-    field are unusable, which means the size of a bzImage kernel
-    cannot be determined.
-
-(3) Ignored, but safe to set, for boot protocols 2.02-2.09.
-
-If the "HdrS" (0x53726448) magic number is not found at offset 0x202,
-the boot protocol version is "old".  Loading an old kernel, the
-following parameters should be assumed:
-
-	Image type = zImage
-	initrd not supported
-	Real-mode kernel must be located at 0x90000.
-
-Otherwise, the "version" field contains the protocol version,
-e.g. protocol version 2.01 will contain 0x0201 in this field.  When
-setting fields in the header, you must make sure only to set fields
-supported by the protocol version in use.
-
-
-**** DETAILS OF HEADER FIELDS
-
-For each field, some are information from the kernel to the bootloader
-("read"), some are expected to be filled out by the bootloader
-("write"), and some are expected to be read and modified by the
-bootloader ("modify").
-
-All general purpose boot loaders should write the fields marked
-(obligatory).  Boot loaders who want to load the kernel at a
-nonstandard address should fill in the fields marked (reloc); other
-boot loaders can ignore those fields.
-
-The byte order of all fields is littleendian (this is x86, after all.)
-
-Field name:	setup_sects
-Type:		read
-Offset/size:	0x1f1/1
-Protocol:	ALL
-
-  The size of the setup code in 512-byte sectors.  If this field is
-  0, the real value is 4.  The real-mode code consists of the boot
-  sector (always one 512-byte sector) plus the setup code.
-
-Field name:	 root_flags
-Type:		 modify (optional)
-Offset/size:	 0x1f2/2
-Protocol:	 ALL
-
-  If this field is nonzero, the root defaults to readonly.  The use of
-  this field is deprecated; use the "ro" or "rw" options on the
-  command line instead.
-
-Field name:	syssize
-Type:		read
-Offset/size:	0x1f4/4 (protocol 2.04+) 0x1f4/2 (protocol ALL)
-Protocol:	2.04+
-
-  The size of the protected-mode code in units of 16-byte paragraphs.
-  For protocol versions older than 2.04 this field is only two bytes
-  wide, and therefore cannot be trusted for the size of a kernel if
-  the LOAD_HIGH flag is set.
-
-Field name:	ram_size
-Type:		kernel internal
-Offset/size:	0x1f8/2
-Protocol:	ALL
-
-  This field is obsolete.
-
-Field name:	vid_mode
-Type:		modify (obligatory)
-Offset/size:	0x1fa/2
-
-  Please see the section on SPECIAL COMMAND LINE OPTIONS.
-
-Field name:	root_dev
-Type:		modify (optional)
-Offset/size:	0x1fc/2
-Protocol:	ALL
-
-  The default root device device number.  The use of this field is
-  deprecated, use the "root=" option on the command line instead.
-
-Field name:	boot_flag
-Type:		read
-Offset/size:	0x1fe/2
-Protocol:	ALL
-
-  Contains 0xAA55.  This is the closest thing old Linux kernels have
-  to a magic number.
-
-Field name:	jump
-Type:		read
-Offset/size:	0x200/2
-Protocol:	2.00+
-
-  Contains an x86 jump instruction, 0xEB followed by a signed offset
-  relative to byte 0x202.  This can be used to determine the size of
-  the header.
-
-Field name:	header
-Type:		read
-Offset/size:	0x202/4
-Protocol:	2.00+
-
-  Contains the magic number "HdrS" (0x53726448).
-
-Field name:	version
-Type:		read
-Offset/size:	0x206/2
-Protocol:	2.00+
-
-  Contains the boot protocol version, in (major << 8)+minor format,
-  e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version
-  10.17.
-
-Field name:	realmode_swtch
-Type:		modify (optional)
-Offset/size:	0x208/4
-Protocol:	2.00+
-
-  Boot loader hook (see ADVANCED BOOT LOADER HOOKS below.)
-
-Field name:	start_sys_seg
-Type:		read
-Offset/size:	0x20c/2
-Protocol:	2.00+
-
-  The load low segment (0x1000).  Obsolete.
-
-Field name:	kernel_version
-Type:		read
-Offset/size:	0x20e/2
-Protocol:	2.00+
-
-  If set to a nonzero value, contains a pointer to a NUL-terminated
-  human-readable kernel version number string, less 0x200.  This can
-  be used to display the kernel version to the user.  This value
-  should be less than (0x200*setup_sects).
-
-  For example, if this value is set to 0x1c00, the kernel version
-  number string can be found at offset 0x1e00 in the kernel file.
-  This is a valid value if and only if the "setup_sects" field
-  contains the value 15 or higher, as:
-
-	0x1c00  < 15*0x200 (= 0x1e00) but
-	0x1c00 >= 14*0x200 (= 0x1c00)
-
-	0x1c00 >> 9 = 14, so the minimum value for setup_secs is 15.
-
-Field name:	type_of_loader
-Type:		write (obligatory)
-Offset/size:	0x210/1
-Protocol:	2.00+
-
-  If your boot loader has an assigned id (see table below), enter
-  0xTV here, where T is an identifier for the boot loader and V is
-  a version number.  Otherwise, enter 0xFF here.
-
-  For boot loader IDs above T = 0xD, write T = 0xE to this field and
-  write the extended ID minus 0x10 to the ext_loader_type field.
-  Similarly, the ext_loader_ver field can be used to provide more than
-  four bits for the bootloader version.
-
-  For example, for T = 0x15, V = 0x234, write:
-
-  type_of_loader  <- 0xE4
-  ext_loader_type <- 0x05
-  ext_loader_ver  <- 0x23
-
-  Assigned boot loader ids (hexadecimal):
-
-	0  LILO			(0x00 reserved for pre-2.00 bootloader)
-	1  Loadlin
-	2  bootsect-loader	(0x20, all other values reserved)
-	3  Syslinux
-	4  Etherboot/gPXE/iPXE
-	5  ELILO
-	7  GRUB
-	8  U-Boot
-	9  Xen
-	A  Gujin
-	B  Qemu
-	C  Arcturus Networks uCbootloader
-	D  kexec-tools
-	E  Extended		(see ext_loader_type)
-	F  Special		(0xFF = undefined)
-       10  Reserved
-       11  Minimal Linux Bootloader <http://sebastian-plotz.blogspot.de>
-       12  OVMF UEFI virtualization stack
-
-  Please contact <hpa@zytor.com> if you need a bootloader ID
-  value assigned.
-
-Field name:	loadflags
-Type:		modify (obligatory)
-Offset/size:	0x211/1
-Protocol:	2.00+
-
-  This field is a bitmask.
-
-  Bit 0 (read):	LOADED_HIGH
-	- If 0, the protected-mode code is loaded at 0x10000.
-	- If 1, the protected-mode code is loaded at 0x100000.
-
-  Bit 1 (kernel internal): KASLR_FLAG
-	- Used internally by the compressed kernel to communicate
-	  KASLR status to kernel proper.
-	  If 1, KASLR enabled.
-	  If 0, KASLR disabled.
-
-  Bit 5 (write): QUIET_FLAG
-	- If 0, print early messages.
-	- If 1, suppress early messages.
-		This requests to the kernel (decompressor and early
-		kernel) to not write early messages that require
-		accessing the display hardware directly.
-
-  Bit 6 (write): KEEP_SEGMENTS
-	Protocol: 2.07+
-	- If 0, reload the segment registers in the 32bit entry point.
-	- If 1, do not reload the segment registers in the 32bit entry point.
-		Assume that %cs %ds %ss %es are all set to flat segments with
-		a base of 0 (or the equivalent for their environment).
-
-  Bit 7 (write): CAN_USE_HEAP
-	Set this bit to 1 to indicate that the value entered in the
-	heap_end_ptr is valid.  If this field is clear, some setup code
-	functionality will be disabled.
-
-Field name:	setup_move_size
-Type:		modify (obligatory)
-Offset/size:	0x212/2
-Protocol:	2.00-2.01
-
-  When using protocol 2.00 or 2.01, if the real mode kernel is not
-  loaded at 0x90000, it gets moved there later in the loading
-  sequence.  Fill in this field if you want additional data (such as
-  the kernel command line) moved in addition to the real-mode kernel
-  itself.
-
-  The unit is bytes starting with the beginning of the boot sector.
-  
-  This field is can be ignored when the protocol is 2.02 or higher, or
-  if the real-mode code is loaded at 0x90000.
-
-Field name:	code32_start
-Type:		modify (optional, reloc)
-Offset/size:	0x214/4
-Protocol:	2.00+
-
-  The address to jump to in protected mode.  This defaults to the load
-  address of the kernel, and can be used by the boot loader to
-  determine the proper load address.
-
-  This field can be modified for two purposes:
-
-  1. as a boot loader hook (see ADVANCED BOOT LOADER HOOKS below.)
-
-  2. if a bootloader which does not install a hook loads a
-     relocatable kernel at a nonstandard address it will have to modify
-     this field to point to the load address.
-
-Field name:	ramdisk_image
-Type:		write (obligatory)
-Offset/size:	0x218/4
-Protocol:	2.00+
-
-  The 32-bit linear address of the initial ramdisk or ramfs.  Leave at
-  zero if there is no initial ramdisk/ramfs.
-
-Field name:	ramdisk_size
-Type:		write (obligatory)
-Offset/size:	0x21c/4
-Protocol:	2.00+
-
-  Size of the initial ramdisk or ramfs.  Leave at zero if there is no
-  initial ramdisk/ramfs.
-
-Field name:	bootsect_kludge
-Type:		kernel internal
-Offset/size:	0x220/4
-Protocol:	2.00+
-
-  This field is obsolete.
-
-Field name:	heap_end_ptr
-Type:		write (obligatory)
-Offset/size:	0x224/2
-Protocol:	2.01+
-
-  Set this field to the offset (from the beginning of the real-mode
-  code) of the end of the setup stack/heap, minus 0x0200.
-
-Field name:	ext_loader_ver
-Type:		write (optional)
-Offset/size:	0x226/1
-Protocol:	2.02+
-
-  This field is used as an extension of the version number in the
-  type_of_loader field.  The total version number is considered to be
-  (type_of_loader & 0x0f) + (ext_loader_ver << 4).
-
-  The use of this field is boot loader specific.  If not written, it
-  is zero.
-
-  Kernels prior to 2.6.31 did not recognize this field, but it is safe
-  to write for protocol version 2.02 or higher.
-
-Field name:	ext_loader_type
-Type:		write (obligatory if (type_of_loader & 0xf0) == 0xe0)
-Offset/size:	0x227/1
-Protocol:	2.02+
-
-  This field is used as an extension of the type number in
-  type_of_loader field.  If the type in type_of_loader is 0xE, then
-  the actual type is (ext_loader_type + 0x10).
-
-  This field is ignored if the type in type_of_loader is not 0xE.
-
-  Kernels prior to 2.6.31 did not recognize this field, but it is safe
-  to write for protocol version 2.02 or higher.
-
-Field name:	cmd_line_ptr
-Type:		write (obligatory)
-Offset/size:	0x228/4
-Protocol:	2.02+
-
-  Set this field to the linear address of the kernel command line.
-  The kernel command line can be located anywhere between the end of
-  the setup heap and 0xA0000; it does not have to be located in the
-  same 64K segment as the real-mode code itself.
-
-  Fill in this field even if your boot loader does not support a
-  command line, in which case you can point this to an empty string
-  (or better yet, to the string "auto".)  If this field is left at
-  zero, the kernel will assume that your boot loader does not support
-  the 2.02+ protocol.
-
-Field name:	initrd_addr_max
-Type:		read
-Offset/size:	0x22c/4
-Protocol:	2.03+
-
-  The maximum address that may be occupied by the initial
-  ramdisk/ramfs contents.  For boot protocols 2.02 or earlier, this
-  field is not present, and the maximum address is 0x37FFFFFF.  (This
-  address is defined as the address of the highest safe byte, so if
-  your ramdisk is exactly 131072 bytes long and this field is
-  0x37FFFFFF, you can start your ramdisk at 0x37FE0000.)
-
-Field name:	kernel_alignment
-Type:		read/modify (reloc)
-Offset/size:	0x230/4
-Protocol:	2.05+ (read), 2.10+ (modify)
-
-  Alignment unit required by the kernel (if relocatable_kernel is
-  true.)  A relocatable kernel that is loaded at an alignment
-  incompatible with the value in this field will be realigned during
-  kernel initialization.
-
-  Starting with protocol version 2.10, this reflects the kernel
-  alignment preferred for optimal performance; it is possible for the
-  loader to modify this field to permit a lesser alignment.  See the
-  min_alignment and pref_address field below.
-
-Field name:	relocatable_kernel
-Type:		read (reloc)
-Offset/size:	0x234/1
-Protocol:	2.05+
-
-  If this field is nonzero, the protected-mode part of the kernel can
-  be loaded at any address that satisfies the kernel_alignment field.
-  After loading, the boot loader must set the code32_start field to
-  point to the loaded code, or to a boot loader hook.
-
-Field name:	min_alignment
-Type:		read (reloc)
-Offset/size:	0x235/1
-Protocol:	2.10+
-
-  This field, if nonzero, indicates as a power of two the minimum
-  alignment required, as opposed to preferred, by the kernel to boot.
-  If a boot loader makes use of this field, it should update the
-  kernel_alignment field with the alignment unit desired; typically:
-
-	kernel_alignment = 1 << min_alignment
-
-  There may be a considerable performance cost with an excessively
-  misaligned kernel.  Therefore, a loader should typically try each
-  power-of-two alignment from kernel_alignment down to this alignment.
-
-Field name:     xloadflags
-Type:           read
-Offset/size:    0x236/2
-Protocol:       2.12+
-
-  This field is a bitmask.
-
-  Bit 0 (read):	XLF_KERNEL_64
-	- If 1, this kernel has the legacy 64-bit entry point at 0x200.
-
-  Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
-        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
-
-  Bit 2 (read):	XLF_EFI_HANDOVER_32
-	- If 1, the kernel supports the 32-bit EFI handoff entry point
-          given at handover_offset.
-
-  Bit 3 (read): XLF_EFI_HANDOVER_64
-	- If 1, the kernel supports the 64-bit EFI handoff entry point
-          given at handover_offset + 0x200.
-
-  Bit 4 (read): XLF_EFI_KEXEC
-	- If 1, the kernel supports kexec EFI boot with EFI runtime support.
-
-Field name:	cmdline_size
-Type:		read
-Offset/size:	0x238/4
-Protocol:	2.06+
-
-  The maximum size of the command line without the terminating
-  zero. This means that the command line can contain at most
-  cmdline_size characters. With protocol version 2.05 and earlier, the
-  maximum size was 255.
-
-Field name:	hardware_subarch
-Type:		write (optional, defaults to x86/PC)
-Offset/size:	0x23c/4
-Protocol:	2.07+
-
-  In a paravirtualized environment the hardware low level architectural
-  pieces such as interrupt handling, page table handling, and
-  accessing process control registers needs to be done differently.
-
-  This field allows the bootloader to inform the kernel we are in one
-  one of those environments.
-
-  0x00000000	The default x86/PC environment
-  0x00000001	lguest
-  0x00000002	Xen
-  0x00000003	Moorestown MID
-  0x00000004	CE4100 TV Platform
-
-Field name:	hardware_subarch_data
-Type:		write (subarch-dependent)
-Offset/size:	0x240/8
-Protocol:	2.07+
-
-  A pointer to data that is specific to hardware subarch
-  This field is currently unused for the default x86/PC environment,
-  do not modify.
-
-Field name:	payload_offset
-Type:		read
-Offset/size:	0x248/4
-Protocol:	2.08+
-
-  If non-zero then this field contains the offset from the beginning
-  of the protected-mode code to the payload.
-
-  The payload may be compressed. The format of both the compressed and
-  uncompressed data should be determined using the standard magic
-  numbers.  The currently supported compression formats are gzip
-  (magic numbers 1F 8B or 1F 9E), bzip2 (magic number 42 5A), LZMA
-  (magic number 5D 00), XZ (magic number FD 37), and LZ4 (magic number
-  02 21).  The uncompressed payload is currently always ELF (magic
-  number 7F 45 4C 46).
-
-Field name:	payload_length
-Type:		read
-Offset/size:	0x24c/4
-Protocol:	2.08+
-
-  The length of the payload.
-
-Field name:	setup_data
-Type:		write (special)
-Offset/size:	0x250/8
-Protocol:	2.09+
-
-  The 64-bit physical pointer to NULL terminated single linked list of
-  struct setup_data. This is used to define a more extensible boot
-  parameters passing mechanism. The definition of struct setup_data is
-  as follow:
-
-  struct setup_data {
-	  u64 next;
-	  u32 type;
-	  u32 len;
-	  u8  data[0];
-  };
-
-  Where, the next is a 64-bit physical pointer to the next node of
-  linked list, the next field of the last node is 0; the type is used
-  to identify the contents of data; the len is the length of data
-  field; the data holds the real payload.
-
-  This list may be modified at a number of points during the bootup
-  process.  Therefore, when modifying this list one should always make
-  sure to consider the case where the linked list already contains
-  entries.
-
-Field name:	pref_address
-Type:		read (reloc)
-Offset/size:	0x258/8
-Protocol:	2.10+
-
-  This field, if nonzero, represents a preferred load address for the
-  kernel.  A relocating bootloader should attempt to load at this
-  address if possible.
-
-  A non-relocatable kernel will unconditionally move itself and to run
-  at this address.
-
-Field name:	init_size
-Type:		read
-Offset/size:	0x260/4
-
-  This field indicates the amount of linear contiguous memory starting
-  at the kernel runtime start address that the kernel needs before it
-  is capable of examining its memory map.  This is not the same thing
-  as the total amount of memory the kernel needs to boot, but it can
-  be used by a relocating boot loader to help select a safe load
-  address for the kernel.
-
-  The kernel runtime start address is determined by the following algorithm:
-
-  if (relocatable_kernel)
-	runtime_start = align_up(load_address, kernel_alignment)
-  else
-	runtime_start = pref_address
-
-Field name:	handover_offset
-Type:		read
-Offset/size:	0x264/4
-
-  This field is the offset from the beginning of the kernel image to
-  the EFI handover protocol entry point. Boot loaders using the EFI
-  handover protocol to boot the kernel should jump to this offset.
-
-  See EFI HANDOVER PROTOCOL below for more details.
-
-
-**** THE IMAGE CHECKSUM
-
-From boot protocol version 2.08 onwards the CRC-32 is calculated over
-the entire file using the characteristic polynomial 0x04C11DB7 and an
-initial remainder of 0xffffffff.  The checksum is appended to the
-file; therefore the CRC of the file up to the limit specified in the
-syssize field of the header is always 0.
-
-
-**** THE KERNEL COMMAND LINE
-
-The kernel command line has become an important way for the boot
-loader to communicate with the kernel.  Some of its options are also
-relevant to the boot loader itself, see "special command line options"
-below.
-
-The kernel command line is a null-terminated string. The maximum
-length can be retrieved from the field cmdline_size.  Before protocol
-version 2.06, the maximum was 255 characters.  A string that is too
-long will be automatically truncated by the kernel.
-
-If the boot protocol version is 2.02 or later, the address of the
-kernel command line is given by the header field cmd_line_ptr (see
-above.)  This address can be anywhere between the end of the setup
-heap and 0xA0000.
-
-If the protocol version is *not* 2.02 or higher, the kernel
-command line is entered using the following protocol:
-
-	At offset 0x0020 (word), "cmd_line_magic", enter the magic
-	number 0xA33F.
-
-	At offset 0x0022 (word), "cmd_line_offset", enter the offset
-	of the kernel command line (relative to the start of the
-	real-mode kernel).
-	
-	The kernel command line *must* be within the memory region
-	covered by setup_move_size, so you may need to adjust this
-	field.
-
-
-**** MEMORY LAYOUT OF THE REAL-MODE CODE
-
-The real-mode code requires a stack/heap to be set up, as well as
-memory allocated for the kernel command line.  This needs to be done
-in the real-mode accessible memory in bottom megabyte.
-
-It should be noted that modern machines often have a sizable Extended
-BIOS Data Area (EBDA).  As a result, it is advisable to use as little
-of the low megabyte as possible.
-
-Unfortunately, under the following circumstances the 0x90000 memory
-segment has to be used:
-
-	- When loading a zImage kernel ((loadflags & 0x01) == 0).
-	- When loading a 2.01 or earlier boot protocol kernel.
-
-	  -> For the 2.00 and 2.01 boot protocols, the real-mode code
-	     can be loaded at another address, but it is internally
-	     relocated to 0x90000.  For the "old" protocol, the
-	     real-mode code must be loaded at 0x90000.
-
-When loading at 0x90000, avoid using memory above 0x9a000.
-
-For boot protocol 2.02 or higher, the command line does not have to be
-located in the same 64K segment as the real-mode setup code; it is
-thus permitted to give the stack/heap the full 64K segment and locate
-the command line above it.
-
-The kernel command line should not be located below the real-mode
-code, nor should it be located in high memory.
-
-
-**** SAMPLE BOOT CONFIGURATION
-
-As a sample configuration, assume the following layout of the real
-mode segment:
-
-    When loading below 0x90000, use the entire segment:
-
-	0x0000-0x7fff	Real mode kernel
-	0x8000-0xdfff	Stack and heap
-	0xe000-0xffff	Kernel command line
-
-    When loading at 0x90000 OR the protocol version is 2.01 or earlier:
-
-	0x0000-0x7fff	Real mode kernel
-	0x8000-0x97ff	Stack and heap
-	0x9800-0x9fff	Kernel command line
-
-Such a boot loader should enter the following fields in the header:
-
-	unsigned long base_ptr;	/* base address for real-mode segment */
-
-	if ( setup_sects == 0 ) {
-		setup_sects = 4;
-	}
-
-	if ( protocol >= 0x0200 ) {
-		type_of_loader = <type code>;
-		if ( loading_initrd ) {
-			ramdisk_image = <initrd_address>;
-			ramdisk_size = <initrd_size>;
-		}
-
-		if ( protocol >= 0x0202 && loadflags & 0x01 )
-			heap_end = 0xe000;
-		else
-			heap_end = 0x9800;
-
-		if ( protocol >= 0x0201 ) {
-			heap_end_ptr = heap_end - 0x200;
-			loadflags |= 0x80; /* CAN_USE_HEAP */
-		}
-
-		if ( protocol >= 0x0202 ) {
-			cmd_line_ptr = base_ptr + heap_end;
-			strcpy(cmd_line_ptr, cmdline);
-		} else {
-			cmd_line_magic	= 0xA33F;
-			cmd_line_offset = heap_end;
-			setup_move_size = heap_end + strlen(cmdline)+1;
-			strcpy(base_ptr+cmd_line_offset, cmdline);
-		}
-	} else {
-		/* Very old kernel */
-
-		heap_end = 0x9800;
-
-		cmd_line_magic	= 0xA33F;
-		cmd_line_offset = heap_end;
-
-		/* A very old kernel MUST have its real-mode code
-		   loaded at 0x90000 */
-
-		if ( base_ptr != 0x90000 ) {
-			/* Copy the real-mode kernel */
-			memcpy(0x90000, base_ptr, (setup_sects+1)*512);
-			base_ptr = 0x90000;		 /* Relocated */
-		}
-
-		strcpy(0x90000+cmd_line_offset, cmdline);
-
-		/* It is recommended to clear memory up to the 32K mark */
-		memset(0x90000 + (setup_sects+1)*512, 0,
-		       (64-(setup_sects+1))*512);
-	}
-
-
-**** LOADING THE REST OF THE KERNEL
-
-The 32-bit (non-real-mode) kernel starts at offset (setup_sects+1)*512
-in the kernel file (again, if setup_sects == 0 the real value is 4.)
-It should be loaded at address 0x10000 for Image/zImage kernels and
-0x100000 for bzImage kernels.
-
-The kernel is a bzImage kernel if the protocol >= 2.00 and the 0x01
-bit (LOAD_HIGH) in the loadflags field is set:
-
-	is_bzImage = (protocol >= 0x0200) && (loadflags & 0x01);
-	load_address = is_bzImage ? 0x100000 : 0x10000;
-
-Note that Image/zImage kernels can be up to 512K in size, and thus use
-the entire 0x10000-0x90000 range of memory.  This means it is pretty
-much a requirement for these kernels to load the real-mode part at
-0x90000.  bzImage kernels allow much more flexibility.
-
-
-**** SPECIAL COMMAND LINE OPTIONS
-
-If the command line provided by the boot loader is entered by the
-user, the user may expect the following command line options to work.
-They should normally not be deleted from the kernel command line even
-though not all of them are actually meaningful to the kernel.  Boot
-loader authors who need additional command line options for the boot
-loader itself should get them registered in
-Documentation/admin-guide/kernel-parameters.rst to make sure they will not
-conflict with actual kernel options now or in the future.
-
-  vga=<mode>
-	<mode> here is either an integer (in C notation, either
-	decimal, octal, or hexadecimal) or one of the strings
-	"normal" (meaning 0xFFFF), "ext" (meaning 0xFFFE) or "ask"
-	(meaning 0xFFFD).  This value should be entered into the
-	vid_mode field, as it is used by the kernel before the command
-	line is parsed.
-
-  mem=<size>
-	<size> is an integer in C notation optionally followed by
-	(case insensitive) K, M, G, T, P or E (meaning << 10, << 20,
-	<< 30, << 40, << 50 or << 60).  This specifies the end of
-	memory to the kernel. This affects the possible placement of
-	an initrd, since an initrd should be placed near end of
-	memory.  Note that this is an option to *both* the kernel and
-	the bootloader!
-
-  initrd=<file>
-	An initrd should be loaded.  The meaning of <file> is
-	obviously bootloader-dependent, and some boot loaders
-	(e.g. LILO) do not have such a command.
-
-In addition, some boot loaders add the following options to the
-user-specified command line:
-
-  BOOT_IMAGE=<file>
-	The boot image which was loaded.  Again, the meaning of <file>
-	is obviously bootloader-dependent.
-
-  auto
-	The kernel was booted without explicit user intervention.
-
-If these options are added by the boot loader, it is highly
-recommended that they are located *first*, before the user-specified
-or configuration-specified command line.  Otherwise, "init=/bin/sh"
-gets confused by the "auto" option.
-
-
-**** RUNNING THE KERNEL
-
-The kernel is started by jumping to the kernel entry point, which is
-located at *segment* offset 0x20 from the start of the real mode
-kernel.  This means that if you loaded your real-mode kernel code at
-0x90000, the kernel entry point is 9020:0000.
-
-At entry, ds = es = ss should point to the start of the real-mode
-kernel code (0x9000 if the code is loaded at 0x90000), sp should be
-set up properly, normally pointing to the top of the heap, and
-interrupts should be disabled.  Furthermore, to guard against bugs in
-the kernel, it is recommended that the boot loader sets fs = gs = ds =
-es = ss.
-
-In our example from above, we would do:
-
-	/* Note: in the case of the "old" kernel protocol, base_ptr must
-	   be == 0x90000 at this point; see the previous sample code */
-
-	seg = base_ptr >> 4;
-
-	cli();	/* Enter with interrupts disabled! */
-
-	/* Set up the real-mode kernel stack */
-	_SS = seg;
-	_SP = heap_end;
-
-	_DS = _ES = _FS = _GS = seg;
-	jmp_far(seg+0x20, 0);	/* Run the kernel */
-
-If your boot sector accesses a floppy drive, it is recommended to
-switch off the floppy motor before running the kernel, since the
-kernel boot leaves interrupts off and thus the motor will not be
-switched off, especially if the loaded kernel has the floppy driver as
-a demand-loaded module!
-
-
-**** ADVANCED BOOT LOADER HOOKS
-
-If the boot loader runs in a particularly hostile environment (such as
-LOADLIN, which runs under DOS) it may be impossible to follow the
-standard memory location requirements.  Such a boot loader may use the
-following hooks that, if set, are invoked by the kernel at the
-appropriate time.  The use of these hooks should probably be
-considered an absolutely last resort!
-
-IMPORTANT: All the hooks are required to preserve %esp, %ebp, %esi and
-%edi across invocation.
-
-  realmode_swtch:
-	A 16-bit real mode far subroutine invoked immediately before
-	entering protected mode.  The default routine disables NMI, so
-	your routine should probably do so, too.
-
-  code32_start:
-	A 32-bit flat-mode routine *jumped* to immediately after the
-	transition to protected mode, but before the kernel is
-	uncompressed.  No segments, except CS, are guaranteed to be
-	set up (current kernels do, but older ones do not); you should
-	set them up to BOOT_DS (0x18) yourself.
-
-	After completing your hook, you should jump to the address
-	that was in this field before your boot loader overwrote it
-	(relocated, if appropriate.)
-
-
-**** 32-bit BOOT PROTOCOL
-
-For machine with some new BIOS other than legacy BIOS, such as EFI,
-LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
-based on legacy BIOS can not be used, so a 32-bit boot protocol needs
-to be defined.
-
-In 32-bit boot protocol, the first step in loading a Linux kernel
-should be to setup the boot parameters (struct boot_params,
-traditionally known as "zero page"). The memory for struct boot_params
-should be allocated and initialized to all zero. Then the setup header
-from offset 0x01f1 of kernel image on should be loaded into struct
-boot_params and examined. The end of setup header can be calculated as
-follow:
-
-	0x0202 + byte value at offset 0x0201
-
-In addition to read/modify/write the setup header of the struct
-boot_params as that of 16-bit boot protocol, the boot loader should
-also fill the additional fields of the struct boot_params as that
-described in zero-page.txt.
-
-After setting up the struct boot_params, the boot loader can load the
-32/64-bit kernel in the same way as that of 16-bit boot protocol.
-
-In 32-bit boot protocol, the kernel is started by jumping to the
-32-bit kernel entry point, which is the start address of loaded
-32/64-bit kernel.
-
-At entry, the CPU must be in 32-bit protected mode with paging
-disabled; a GDT must be loaded with the descriptors for selectors
-__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
-segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
-must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
-must be __BOOT_DS; interrupt must be disabled; %esi must hold the base
-address of the struct boot_params; %ebp, %edi and %ebx must be zero.
-
-**** 64-bit BOOT PROTOCOL
-
-For machine with 64bit cpus and 64bit kernel, we could use 64bit bootloader
-and we need a 64-bit boot protocol.
-
-In 64-bit boot protocol, the first step in loading a Linux kernel
-should be to setup the boot parameters (struct boot_params,
-traditionally known as "zero page"). The memory for struct boot_params
-could be allocated anywhere (even above 4G) and initialized to all zero.
-Then, the setup header at offset 0x01f1 of kernel image on should be
-loaded into struct boot_params and examined. The end of setup header
-can be calculated as follows:
-
-	0x0202 + byte value at offset 0x0201
-
-In addition to read/modify/write the setup header of the struct
-boot_params as that of 16-bit boot protocol, the boot loader should
-also fill the additional fields of the struct boot_params as described
-in zero-page.txt.
-
-After setting up the struct boot_params, the boot loader can load
-64-bit kernel in the same way as that of 16-bit boot protocol, but
-kernel could be loaded above 4G.
-
-In 64-bit boot protocol, the kernel is started by jumping to the
-64-bit kernel entry point, which is the start address of loaded
-64-bit kernel plus 0x200.
-
-At entry, the CPU must be in 64-bit mode with paging enabled.
-The range with setup_header.init_size from start address of loaded
-kernel and zero page and command line buffer get ident mapping;
-a GDT must be loaded with the descriptors for selectors
-__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
-segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
-must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
-must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base
-address of the struct boot_params.
-
-**** EFI HANDOVER PROTOCOL
-
-This protocol allows boot loaders to defer initialisation to the EFI
-boot stub. The boot loader is required to load the kernel/initrd(s)
-from the boot media and jump to the EFI handover protocol entry point
-which is hdr->handover_offset bytes from the beginning of
-startup_{32,64}.
-
-The function prototype for the handover entry point looks like this,
-
-    efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp)
-
-'handle' is the EFI image handle passed to the boot loader by the EFI
-firmware, 'table' is the EFI system table - these are the first two
-arguments of the "handoff state" as described in section 2.3 of the
-UEFI specification. 'bp' is the boot loader-allocated boot params.
-
-The boot loader *must* fill out the following fields in bp,
-
-    o hdr.code32_start
-    o hdr.cmd_line_ptr
-    o hdr.ramdisk_image (if applicable)
-    o hdr.ramdisk_size  (if applicable)
-
-All other fields should be zero.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 7612d3142b2a..8f08caf4fbbb 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -7,3 +7,5 @@ Linux x86 Support
 .. toctree::
    :maxdepth: 2
    :numbered:
+
+   boot
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 37/63] Documentation: add Linux x86 docs to Sphinx TOC tree
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

Add a index.rst for x86 support. More docs will be added later.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/index.rst     | 1 +
 Documentation/x86/index.rst | 9 +++++++++
 2 files changed, 10 insertions(+)
 create mode 100644 Documentation/x86/index.rst

diff --git a/Documentation/index.rst b/Documentation/index.rst
index d80138284e0f..f185c8040fa9 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -112,6 +112,7 @@ implementation.
 .. toctree::
    :maxdepth: 2
 
+   x86/index
    sh/index
 
 Filesystem Documentation
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
new file mode 100644
index 000000000000..7612d3142b2a
--- /dev/null
+++ b/Documentation/x86/index.rst
@@ -0,0 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Linux x86 Support
+=================
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 36/63] Documentation: PCI: convert endpoint/pci-test-howto.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/endpoint/index.rst          |  1 +
 ...{pci-test-howto.txt => pci-test-howto.rst} | 81 +++++++++++++------
 2 files changed, 56 insertions(+), 26 deletions(-)
 rename Documentation/PCI/endpoint/{pci-test-howto.txt => pci-test-howto.rst} (78%)

diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
index b680a3fc4fec..d114ea74b444 100644
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@@ -10,3 +10,4 @@ PCI Endpoint Framework
    pci-endpoint
    pci-endpoint-cfs
    pci-test-function
+   pci-test-howto
diff --git a/Documentation/PCI/endpoint/pci-test-howto.txt b/Documentation/PCI/endpoint/pci-test-howto.rst
similarity index 78%
rename from Documentation/PCI/endpoint/pci-test-howto.txt
rename to Documentation/PCI/endpoint/pci-test-howto.rst
index 040479f437a5..909f770a07d6 100644
--- a/Documentation/PCI/endpoint/pci-test-howto.txt
+++ b/Documentation/PCI/endpoint/pci-test-howto.rst
@@ -1,38 +1,51 @@
-			    PCI TEST USERGUIDE
-		    Kishon Vijay Abraham I <kishon@ti.com>
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+PCI Test User Guide
+===================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
 
 This document is a guide to help users use pci-epf-test function driver
 and pci_endpoint_test host driver for testing PCI. The list of steps to
 be followed in the host side and EP side is given below.
 
-1. Endpoint Device
+Endpoint Device
+===============
 
-1.1 Endpoint Controller Devices
+Endpoint Controller Devices
+---------------------------
 
-To find the list of endpoint controller devices in the system:
+To find the list of endpoint controller devices in the system::
 
 	# ls /sys/class/pci_epc/
 	  51000000.pcie_ep
 
-If PCI_ENDPOINT_CONFIGFS is enabled
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
 	# ls /sys/kernel/config/pci_ep/controllers
 	  51000000.pcie_ep
 
-1.2 Endpoint Function Drivers
 
-To find the list of endpoint function drivers in the system:
+Endpoint Function Drivers
+-------------------------
+
+To find the list of endpoint function drivers in the system::
 
 	# ls /sys/bus/pci-epf/drivers
 	  pci_epf_test
 
-If PCI_ENDPOINT_CONFIGFS is enabled
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
 	# ls /sys/kernel/config/pci_ep/functions
 	  pci_epf_test
 
-1.3 Creating pci-epf-test Device
+
+Creating pci-epf-test Device
+----------------------------
 
 PCI endpoint function device can be created using the configfs. To create
-pci-epf-test device, the following commands can be used
+pci-epf-test device, the following commands can be used::
 
 	# mount -t configfs none /sys/kernel/config
 	# cd /sys/kernel/config/pci_ep/
@@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will
 be probed by pci_epf_test driver.
 
 The PCI endpoint framework populates the directory with the following
-configurable fields.
+configurable fields::
 
 	# ls functions/pci_epf_test/func1
 	  baseclass_code	interrupt_pin	progif_code	subsys_id
@@ -51,67 +64,83 @@ configurable fields.
 
 The PCI endpoint function driver populates these entries with default values
 when the device is bound to the driver. The pci-epf-test driver populates
-vendorid with 0xffff and interrupt_pin with 0x0001
+vendorid with 0xffff and interrupt_pin with 0x0001::
 
 	# cat functions/pci_epf_test/func1/vendorid
 	  0xffff
 	# cat functions/pci_epf_test/func1/interrupt_pin
 	  0x0001
 
-1.4 Configuring pci-epf-test Device
+
+Configuring pci-epf-test Device
+-------------------------------
 
 The user can configure the pci-epf-test device using configfs entry. In order
 to change the vendorid and the number of MSI interrupts used by the function
-device, the following commands can be used.
+device, the following commands can be used::
 
 	# echo 0x104c > functions/pci_epf_test/func1/vendorid
 	# echo 0xb500 > functions/pci_epf_test/func1/deviceid
 	# echo 16 > functions/pci_epf_test/func1/msi_interrupts
 	# echo 8 > functions/pci_epf_test/func1/msix_interrupts
 
-1.5 Binding pci-epf-test Device to EP Controller
+
+Binding pci-epf-test Device to EP Controller
+--------------------------------------------
 
 In order for the endpoint function device to be useful, it has to be bound to
 a PCI endpoint controller driver. Use the configfs to bind the function
-device to one of the controller driver present in the system.
+device to one of the controller driver present in the system::
 
 	# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
 
 Once the above step is completed, the PCI endpoint is ready to establish a link
 with the host.
 
-1.6 Start the Link
+
+Start the Link
+--------------
 
 In order for the endpoint device to establish a link with the host, the _start_
-field should be populated with '1'.
+field should be populated with '1'::
 
 	# echo 1 > controllers/51000000.pcie_ep/start
 
-2. RootComplex Device
 
-2.1 lspci Output
+RootComplex Device
+==================
+
+lspci Output
+------------
 
-Note that the devices listed here correspond to the value populated in 1.4 above
+Note that the devices listed here correspond to the value populated in 1.4
+above::
 
 	00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
 	01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
 
-2.2 Using Endpoint Test function Device
+
+Using Endpoint Test function Device
+-----------------------------------
 
 pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
-tests. To compile this tool the following commands should be used:
+tests. To compile this tool the following commands should be used::
 
 	# cd <kernel-dir>
 	# make -C tools/pci
 
-or if you desire to compile and install in your system:
+or if you desire to compile and install in your system::
 
 	# cd <kernel-dir>
 	# make -C tools/pci install
 
 The tool and script will be located in <rootfs>/usr/bin/
 
-2.2.1 pcitest.sh Output
+
+pcitest.sh Output
+~~~~~~~~~~~~~~~~~
+::
+
 	# pcitest.sh
 	BAR tests
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 35/63] Documentation: PCI: convert endpoint/pci-test-function.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/endpoint/index.rst          |  1 +
 ...est-function.txt => pci-test-function.rst} | 32 +++++++++++--------
 2 files changed, 20 insertions(+), 13 deletions(-)
 rename Documentation/PCI/endpoint/{pci-test-function.txt => pci-test-function.rst} (84%)

diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
index 3951de9f923c..b680a3fc4fec 100644
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@@ -9,3 +9,4 @@ PCI Endpoint Framework
 
    pci-endpoint
    pci-endpoint-cfs
+   pci-test-function
diff --git a/Documentation/PCI/endpoint/pci-test-function.txt b/Documentation/PCI/endpoint/pci-test-function.rst
similarity index 84%
rename from Documentation/PCI/endpoint/pci-test-function.txt
rename to Documentation/PCI/endpoint/pci-test-function.rst
index 5916f1f592bb..ba02cddcec37 100644
--- a/Documentation/PCI/endpoint/pci-test-function.txt
+++ b/Documentation/PCI/endpoint/pci-test-function.rst
@@ -1,5 +1,10 @@
-				PCI TEST
-		    Kishon Vijay Abraham I <kishon@ti.com>
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+PCI Test Function
+=================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
 
 Traditionally PCI RC has always been validated by using standard
 PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
@@ -23,30 +28,31 @@ The PCI endpoint test device has the following registers:
 	8) PCI_ENDPOINT_TEST_IRQ_TYPE
 	9) PCI_ENDPOINT_TEST_IRQ_NUMBER
 
-*) PCI_ENDPOINT_TEST_MAGIC
+* PCI_ENDPOINT_TEST_MAGIC
 
 This register will be used to test BAR0. A known pattern will be written
 and read back from MAGIC register to verify BAR0.
 
-*) PCI_ENDPOINT_TEST_COMMAND:
+* PCI_ENDPOINT_TEST_COMMAND:
 
 This register will be used by the host driver to indicate the function
 that the endpoint device must perform.
 
-Bitfield Description:
+Bitfield Description::
+
   Bit 0		: raise legacy IRQ
   Bit 1		: raise MSI IRQ
   Bit 2		: raise MSI-X IRQ
   Bit 3		: read command (read data from RC buffer)
   Bit 4		: write command (write data to RC buffer)
-  Bit 5		: copy command (copy data from one RC buffer to another
-		  RC buffer)
+  Bit 5		: copy command (copy data from one RC buffer to another RC buffer)
 
-*) PCI_ENDPOINT_TEST_STATUS
+* PCI_ENDPOINT_TEST_STATUS
 
 This register reflects the status of the PCI endpoint device.
 
-Bitfield Description:
+Bitfield Description::
+
   Bit 0		: read success
   Bit 1		: read fail
   Bit 2		: write success
@@ -57,17 +63,17 @@ Bitfield Description:
   Bit 7		: source address is invalid
   Bit 8		: destination address is invalid
 
-*) PCI_ENDPOINT_TEST_SRC_ADDR
+* PCI_ENDPOINT_TEST_SRC_ADDR
 
 This register contains the source address (RC buffer address) for the
 COPY/READ command.
 
-*) PCI_ENDPOINT_TEST_DST_ADDR
+* PCI_ENDPOINT_TEST_DST_ADDR
 
 This register contains the destination address (RC buffer address) for
 the COPY/WRITE command.
 
-*) PCI_ENDPOINT_TEST_IRQ_TYPE
+* PCI_ENDPOINT_TEST_IRQ_TYPE
 
 This register contains the interrupt type (Legacy/MSI) triggered
 for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
@@ -77,7 +83,7 @@ Possible types:
  - MSI		: 1
  - MSI-X	: 2
 
-*) PCI_ENDPOINT_TEST_IRQ_NUMBER
+* PCI_ENDPOINT_TEST_IRQ_NUMBER
 
 This register contains the triggered ID interrupt.
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 34/63] Documentation: PCI: convert endpoint/pci-endpoint-cfs.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/endpoint/index.rst          |  1 +
 ...-endpoint-cfs.txt => pci-endpoint-cfs.rst} | 99 +++++++++++--------
 2 files changed, 57 insertions(+), 43 deletions(-)
 rename Documentation/PCI/endpoint/{pci-endpoint-cfs.txt => pci-endpoint-cfs.rst} (64%)

diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
index 0db4f2fcd7f0..3951de9f923c 100644
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@@ -8,3 +8,4 @@ PCI Endpoint Framework
    :maxdepth: 2
 
    pci-endpoint
+   pci-endpoint-cfs
diff --git a/Documentation/PCI/endpoint/pci-endpoint-cfs.txt b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
similarity index 64%
rename from Documentation/PCI/endpoint/pci-endpoint-cfs.txt
rename to Documentation/PCI/endpoint/pci-endpoint-cfs.rst
index d740f29960a4..b6d39cdec56e 100644
--- a/Documentation/PCI/endpoint/pci-endpoint-cfs.txt
+++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
@@ -1,41 +1,51 @@
-                   CONFIGURING PCI ENDPOINT USING CONFIGFS
-                    Kishon Vijay Abraham I <kishon@ti.com>
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+Configuring PCI Endpoint Using CONFIGFS
+=======================================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
 
 The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
 PCI endpoint function and to bind the endpoint function
 with the endpoint controller. (For introducing other mechanisms to
 configure the PCI Endpoint Function refer to [1]).
 
-*) Mounting configfs
+Mounting configfs
+=================
 
 The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
-directory. configfs can be mounted using the following command.
+directory. configfs can be mounted using the following command::
 
 	mount -t configfs none /sys/kernel/config
 
-*) Directory Structure
+Directory Structure
+===================
 
 The pci_ep configfs has two directories at its root: controllers and
 functions. Every EPC device present in the system will have an entry in
 the *controllers* directory and and every EPF driver present in the system
 will have an entry in the *functions* directory.
+::
 
-/sys/kernel/config/pci_ep/
-	.. controllers/
-	.. functions/
+	/sys/kernel/config/pci_ep/
+		.. controllers/
+		.. functions/
 
-*) Creating EPF Device
+Creating EPF Device
+===================
 
 Every registered EPF driver will be listed in controllers directory. The
 entries corresponding to EPF driver will be created by the EPF core.
+::
 
-/sys/kernel/config/pci_ep/functions/
-	.. <EPF Driver1>/
-		... <EPF Device 11>/
-		... <EPF Device 21>/
-	.. <EPF Driver2>/
-		... <EPF Device 12>/
-		... <EPF Device 22>/
+	/sys/kernel/config/pci_ep/functions/
+		.. <EPF Driver1>/
+			... <EPF Device 11>/
+			... <EPF Device 21>/
+		.. <EPF Driver2>/
+			... <EPF Device 12>/
+			... <EPF Device 22>/
 
 In order to create a <EPF device> of the type probed by <EPF Driver>, the
 user has to create a directory inside <EPF DriverN>.
@@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be
 used to configure the standard configuration header of the endpoint function.
 (These entries are created by the framework when any new <EPF Device> is
 created)
-
-	.. <EPF Driver1>/
-		... <EPF Device 11>/
-			... vendorid
-			... deviceid
-			... revid
-			... progif_code
-			... subclass_code
-			... baseclass_code
-			... cache_line_size
-			... subsys_vendor_id
-			... subsys_id
-			... interrupt_pin
-
-*) EPC Device
+::
+
+		.. <EPF Driver1>/
+			... <EPF Device 11>/
+				... vendorid
+				... deviceid
+				... revid
+				... progif_code
+				... subclass_code
+				... baseclass_code
+				... cache_line_size
+				... subsys_vendor_id
+				... subsys_id
+				... interrupt_pin
+
+EPC Device
+==========
 
 Every registered EPC device will be listed in controllers directory. The
 entries corresponding to EPC device will be created by the EPC core.
-
-/sys/kernel/config/pci_ep/controllers/
-	.. <EPC Device1>/
-		... <Symlink EPF Device11>/
-		... <Symlink EPF Device12>/
-		... start
-	.. <EPC Device2>/
-		... <Symlink EPF Device21>/
-		... <Symlink EPF Device22>/
-		... start
+::
+
+	/sys/kernel/config/pci_ep/controllers/
+		.. <EPC Device1>/
+			... <Symlink EPF Device11>/
+			... <Symlink EPF Device12>/
+			... start
+		.. <EPC Device2>/
+			... <Symlink EPF Device21>/
+			... <Symlink EPF Device22>/
+			... start
 
 The <EPC Device> directory will have a list of symbolic links to
 <EPF Device>. These symbolic links should be created by the user to
@@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once
 "1" is written to this field, the endpoint device will be ready to
 establish the link with the host. This is usually done after
 all the EPF devices are created and linked with the EPC device.
-
+::
 
 			 | controllers/
 				| <Directory: EPC name>/
@@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device.
 						| interrupt_pin
 						| function
 
-[1] -> Documentation/PCI/endpoint/pci-endpoint.txt
+[1] :doc:`pci-endpoint`
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 33/63] Documentation: PCI: convert endpoint/pci-endpoint.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/endpoint/index.rst          | 10 ++
 .../{pci-endpoint.txt => pci-endpoint.rst}    | 95 +++++++++++--------
 Documentation/PCI/index.rst                   |  1 +
 3 files changed, 68 insertions(+), 38 deletions(-)
 create mode 100644 Documentation/PCI/endpoint/index.rst
 rename Documentation/PCI/endpoint/{pci-endpoint.txt => pci-endpoint.rst} (82%)

diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
new file mode 100644
index 000000000000..0db4f2fcd7f0
--- /dev/null
+++ b/Documentation/PCI/endpoint/index.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+PCI Endpoint Framework
+======================
+
+.. toctree::
+   :maxdepth: 2
+
+   pci-endpoint
diff --git a/Documentation/PCI/endpoint/pci-endpoint.txt b/Documentation/PCI/endpoint/pci-endpoint.rst
similarity index 82%
rename from Documentation/PCI/endpoint/pci-endpoint.txt
rename to Documentation/PCI/endpoint/pci-endpoint.rst
index e86a96b66a6a..6674ce5425bf 100644
--- a/Documentation/PCI/endpoint/pci-endpoint.txt
+++ b/Documentation/PCI/endpoint/pci-endpoint.rst
@@ -1,11 +1,17 @@
-			    PCI ENDPOINT FRAMEWORK
-		    Kishon Vijay Abraham I <kishon@ti.com>
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+PCI Endpoint Framework
+======================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
 
 This document is a guide to use the PCI Endpoint Framework in order to create
 endpoint controller driver, endpoint function driver, and using configfs
 interface to bind the function driver to the controller driver.
 
-1. Introduction
+Introduction
+============
 
 Linux has a comprehensive PCI subsystem to support PCI controllers that
 operates in Root Complex mode. The subsystem has capability to scan PCI bus,
@@ -19,24 +25,27 @@ add endpoint mode support in Linux. This will help to run Linux in an
 EP system which can have a wide variety of use cases from testing or
 validation, co-processor accelerator, etc.
 
-2. PCI Endpoint Core
+PCI Endpoint Core
+=================
 
 The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
 library, the Endpoint Function library, and the configfs layer to bind the
 endpoint function with the endpoint controller.
 
-2.1 PCI Endpoint Controller(EPC) Library
+PCI Endpoint Controller(EPC) Library
+------------------------------------
 
 The EPC library provides APIs to be used by the controller that can operate
 in endpoint mode. It also provides APIs to be used by function driver/library
 in order to implement a particular endpoint function.
 
-2.1.1 APIs for the PCI controller Driver
+APIs for the PCI controller Driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI controller driver.
 
-*) devm_pci_epc_create()/pci_epc_create()
+* devm_pci_epc_create()/pci_epc_create()
 
    The PCI controller driver should implement the following ops:
 	 * write_header: ops to populate configuration space header
@@ -51,110 +60,116 @@ by the PCI controller driver.
    The PCI controller driver can then create a new EPC device by invoking
    devm_pci_epc_create()/pci_epc_create().
 
-*) devm_pci_epc_destroy()/pci_epc_destroy()
+* devm_pci_epc_destroy()/pci_epc_destroy()
 
    The PCI controller driver can destroy the EPC device created by either
    devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
    pci_epc_destroy().
 
-*) pci_epc_linkup()
+* pci_epc_linkup()
 
    In order to notify all the function devices that the EPC device to which
    they are linked has established a link with the host, the PCI controller
    driver should invoke pci_epc_linkup().
 
-*) pci_epc_mem_init()
+* pci_epc_mem_init()
 
    Initialize the pci_epc_mem structure used for allocating EPC addr space.
 
-*) pci_epc_mem_exit()
+* pci_epc_mem_exit()
 
    Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
 
-2.1.2 APIs for the PCI Endpoint Function Driver
+
+APIs for the PCI Endpoint Function Driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint function driver.
 
-*) pci_epc_write_header()
+* pci_epc_write_header()
 
    The PCI endpoint function driver should use pci_epc_write_header() to
    write the standard configuration header to the endpoint controller.
 
-*) pci_epc_set_bar()
+* pci_epc_set_bar()
 
    The PCI endpoint function driver should use pci_epc_set_bar() to configure
    the Base Address Register in order for the host to assign PCI addr space.
    Register space of the function driver is usually configured
    using this API.
 
-*) pci_epc_clear_bar()
+* pci_epc_clear_bar()
 
    The PCI endpoint function driver should use pci_epc_clear_bar() to reset
    the BAR.
 
-*) pci_epc_raise_irq()
+* pci_epc_raise_irq()
 
    The PCI endpoint function driver should use pci_epc_raise_irq() to raise
    Legacy Interrupt, MSI or MSI-X Interrupt.
 
-*) pci_epc_mem_alloc_addr()
+* pci_epc_mem_alloc_addr()
 
    The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
    allocate memory address from EPC addr space which is required to access
    RC's buffer
 
-*) pci_epc_mem_free_addr()
+* pci_epc_mem_free_addr()
 
    The PCI endpoint function driver should use pci_epc_mem_free_addr() to
    free the memory space allocated using pci_epc_mem_alloc_addr().
 
-2.1.3 Other APIs
+Other APIs
+~~~~~~~~~~
 
 There are other APIs provided by the EPC library. These are used for binding
 the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
 using these APIs.
 
-*) pci_epc_get()
+* pci_epc_get()
 
    Get a reference to the PCI endpoint controller based on the device name of
    the controller.
 
-*) pci_epc_put()
+* pci_epc_put()
 
    Release the reference to the PCI endpoint controller obtained using
    pci_epc_get()
 
-*) pci_epc_add_epf()
+* pci_epc_add_epf()
 
    Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
    can have up to 8 functions according to the specification.
 
-*) pci_epc_remove_epf()
+* pci_epc_remove_epf()
 
    Remove the PCI endpoint function from PCI endpoint controller.
 
-*) pci_epc_start()
+* pci_epc_start()
 
    The PCI endpoint function driver should invoke pci_epc_start() once it
    has configured the endpoint function and wants to start the PCI link.
 
-*) pci_epc_stop()
+* pci_epc_stop()
 
    The PCI endpoint function driver should invoke pci_epc_stop() to stop
    the PCI LINK.
 
-2.2 PCI Endpoint Function(EPF) Library
+
+PCI Endpoint Function(EPF) Library
+----------------------------------
 
 The EPF library provides APIs to be used by the function driver and the EPC
 library to provide endpoint mode functionality.
 
-2.2.1 APIs for the PCI Endpoint Function Driver
+APIs for the PCI Endpoint Function Driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint function driver.
 
-*) pci_epf_register_driver()
+* pci_epf_register_driver()
 
    The PCI Endpoint Function driver should implement the following ops:
 	 * bind: ops to perform when a EPC device has been bound to EPF device
@@ -166,50 +181,54 @@ by the PCI endpoint function driver.
   The PCI Function driver can then register the PCI EPF driver by using
   pci_epf_register_driver().
 
-*) pci_epf_unregister_driver()
+* pci_epf_unregister_driver()
 
   The PCI Function driver can unregister the PCI EPF driver by using
   pci_epf_unregister_driver().
 
-*) pci_epf_alloc_space()
+* pci_epf_alloc_space()
 
   The PCI Function driver can allocate space for a particular BAR using
   pci_epf_alloc_space().
 
-*) pci_epf_free_space()
+* pci_epf_free_space()
 
   The PCI Function driver can free the allocated space
   (using pci_epf_alloc_space) by invoking pci_epf_free_space().
 
-2.2.2 APIs for the PCI Endpoint Controller Library
+APIs for the PCI Endpoint Controller Library
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint controller library.
 
-*) pci_epf_linkup()
+* pci_epf_linkup()
 
    The PCI endpoint controller library invokes pci_epf_linkup() when the
    EPC device has established the connection to the host.
 
-2.2.2 Other APIs
+Other APIs
+~~~~~~~~~~
+
 There are other APIs provided by the EPF library. These are used to notify
 the function driver when the EPF device is bound to the EPC device.
 pci-ep-cfs.c can be used as reference for using these APIs.
 
-*) pci_epf_create()
+* pci_epf_create()
 
    Create a new PCI EPF device by passing the name of the PCI EPF device.
    This name will be used to bind the the EPF device to a EPF driver.
 
-*) pci_epf_destroy()
+* pci_epf_destroy()
 
    Destroy the created PCI EPF device.
 
-*) pci_epf_bind()
+* pci_epf_bind()
 
    pci_epf_bind() should be invoked when the EPF device has been bound to
    a EPC device.
 
-*) pci_epf_unbind()
+* pci_epf_unbind()
 
    pci_epf_unbind() should be invoked when the binding between EPC device
    and EPF device is lost.
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 86c76c22810b..c8ea2e626c20 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -15,3 +15,4 @@ Linux PCI Bus Subsystem
    acpi-info
    pci-error-recovery
    pcieaer-howto
+   endpoint/index
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 32/63] Documentation: PCI: convert pcieaer-howto.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/index.rst                   |   1 +
 .../{pcieaer-howto.txt => pcieaer-howto.rst}  | 110 ++++++++++++------
 2 files changed, 74 insertions(+), 37 deletions(-)
 rename Documentation/PCI/{pcieaer-howto.txt => pcieaer-howto.rst} (81%)

diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 5ee4dba07116..86c76c22810b 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -14,3 +14,4 @@ Linux PCI Bus Subsystem
    MSI-HOWTO
    acpi-info
    pci-error-recovery
+   pcieaer-howto
diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.rst
similarity index 81%
rename from Documentation/PCI/pcieaer-howto.txt
rename to Documentation/PCI/pcieaer-howto.rst
index 48ce7903e3c6..67f77ff76865 100644
--- a/Documentation/PCI/pcieaer-howto.txt
+++ b/Documentation/PCI/pcieaer-howto.rst
@@ -1,21 +1,29 @@
-   The PCI Express Advanced Error Reporting Driver Guide HOWTO
-		T. Long Nguyen	<tom.l.nguyen@intel.com>
-		Yanmin Zhang	<yanmin.zhang@intel.com>
-				07/29/2006
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
 
+===========================================================
+The PCI Express Advanced Error Reporting Driver Guide HOWTO
+===========================================================
 
-1. Overview
+:Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
+          - Yanmin Zhang <yanmin.zhang@intel.com>
 
-1.1 About this guide
+:Copyright: |copy| 2006 Intel Corporation
+
+Overview
+===========
+
+About this guide
+----------------
 
 This guide describes the basics of the PCI Express Advanced Error
 Reporting (AER) driver and provides information on how to use it, as
 well as how to enable the drivers of endpoint devices to conform with
 PCI Express AER driver.
 
-1.2 Copyright (C) Intel Corporation 2006.
 
-1.3 What is the PCI Express AER Driver?
+What is the PCI Express AER Driver?
+-----------------------------------
 
 PCI Express error signaling can occur on the PCI Express link itself
 or on behalf of transactions initiated on the link. PCI Express
@@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI
 Express Advanced Error Reporting capability. The PCI Express AER
 driver provides three basic functions:
 
--	Gathers the comprehensive error information if errors occurred.
--	Reports error to the users.
--	Performs error recovery actions.
+  - Gathers the comprehensive error information if errors occurred.
+  - Reports error to the users.
+  - Performs error recovery actions.
 
 AER driver only attaches root ports which support PCI-Express AER
 capability.
 
 
-2. User Guide
+User Guide
+==========
 
-2.1 Include the PCI Express AER Root Driver into the Linux Kernel
+Include the PCI Express AER Root Driver into the Linux Kernel
+-------------------------------------------------------------
 
 The PCI Express AER Root driver is a Root Port service driver attached
 to the PCI Express Port Bus driver. If a user wants to use it, the driver
@@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It
 depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
 CONFIG_PCIEAER = y.
 
-2.2 Load PCI Express AER Root Driver
+Load PCI Express AER Root Driver
+--------------------------------
 
 Some systems have AER support in firmware. Enabling Linux AER support at
 the same time the firmware handles AER may result in unpredictable
@@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware
 grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
 Specification for details regarding _OSC usage.
 
-2.3 AER error output
+AER error output
+----------------
 
 When a PCIe AER error is captured, an error message will be output to
 console. If it's a correctable error, it is output as a warning.
 Otherwise, it is printed as an error. So users could choose different
 log level to filter out correctable error messages.
 
-Below shows an example:
-0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
-0000:50:00.0:   device [8086:0329] error status/mask=00100000/00000000
-0000:50:00.0:    [20] Unsupported Request    (First)
-0000:50:00.0:   TLP Header: 04000001 00200a03 05010000 00050100
+Below shows an example::
+
+  0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
+  0000:50:00.0:   device [8086:0329] error status/mask=00100000/00000000
+  0000:50:00.0:    [20] Unsupported Request    (First)
+  0000:50:00.0:   TLP Header: 04000001 00200a03 05010000 00050100
 
 In the example, 'Requester ID' means the ID of the device who sends
 the error message to root port. Pls. refer to pci express specs for
 other fields.
 
-2.4 AER Statistics / Counters
+AER Statistics / Counters
+-------------------------
 
 When PCIe AER errors are captured, the counters / statistics are also exposed
 in the form of sysfs attributes which are documented at
 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
 
-3. Developer Guide
+Developer Guide
+===============
 
 To enable AER aware support requires a software driver to configure
 the AER capability structure within its device and to provide callbacks.
@@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific
 errors because device specific errors will still get sent directly to
 the device driver.
 
-3.1 Configure the AER capability structure
+Configure the AER capability structure
+--------------------------------------
 
 AER aware drivers of PCI Express component need change the device
 control registers to enable AER. They also could change AER registers,
@@ -128,9 +144,11 @@ including mask and severity registers. Helper function
 pci_enable_pcie_error_reporting could be used to enable AER. See
 section 3.3.
 
-3.2. Provide callbacks
+Provide callbacks
+-----------------
 
-3.2.1 callback reset_link to reset pci express link
+callback reset_link to reset pci express link
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 This callback is used to reset the pci express physical link when a
 fatal error happens. The root port aer service driver provides a
@@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions.
 
 In struct pcie_port_service_driver, a new pointer, reset_link, is
 added.
+::
 
-pci_ers_result_t (*reset_link) (struct pci_dev *dev);
+	pci_ers_result_t (*reset_link) (struct pci_dev *dev);
 
 Section 3.2.2.2 provides more detailed info on when to call
 reset_link.
 
-3.2.2 PCI error-recovery callbacks
+PCI error-recovery callbacks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The PCI Express AER Root driver uses error callbacks to coordinate
 with downstream device drivers associated with a hierarchy in question
@@ -161,7 +181,8 @@ definitions of the callbacks.
 
 Below sections specify when to call the error callback functions.
 
-3.2.2.1 Correctable errors
+Correctable errors
+~~~~~~~~~~~~~~~~~~
 
 Correctable errors pose no impacts on the functionality of
 the interface. The PCI Express protocol can recover without any
@@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not
 require any recovery actions. The AER driver clears the device's
 correctable error status register accordingly and logs these errors.
 
-3.2.2.2 Non-correctable (non-fatal and fatal) errors
+Non-correctable (non-fatal and fatal) errors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 If an error message indicates a non-fatal error, performing link reset
 at upstream is not required. The AER driver calls error_detected(dev,
 pci_channel_io_normal) to all drivers associated within a hierarchy in
-question. for example,
-EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
+question. for example::
+
+  EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
+
 If Upstream port A captures an AER error, the hierarchy consists of
 Downstream port B and EndPoint.
 
@@ -199,23 +223,33 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
 reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
 to mmio_enabled.
 
-3.3 helper functions
+helper functions
+----------------
+::
+
+  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 
-3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 pci_enable_pcie_error_reporting enables the device to send error
 messages to root port when an error is detected. Note that devices
 don't enable the error reporting by default, so device drivers need
 call this function to enable it.
 
-3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
+::
+
+  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
+
 pci_disable_pcie_error_reporting disables the device to send error
 messages to root port when an error is detected.
 
-3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
+::
+
+  int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
+
 pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
 error status register.
 
-3.4 Frequent Asked Questions
+Frequent Asked Questions
+------------------------
 
 Q: What happens if a PCI Express device driver does not provide an
 error recovery handler (pci_driver->err_handler is equal to NULL)?
@@ -245,7 +279,8 @@ A: It could call the helper functions to enable AER in devices and
 cleanup uncorrectable status register. Pls. refer to section 3.3.
 
 
-4. Software error injection
+Software error injection
+========================
 
 Debugging PCIe AER error recovery code is quite difficult because it
 is hard to trigger real hardware errors. Software based error
@@ -261,6 +296,7 @@ After reboot with new kernel or insert the module, a device file named
 
 Then, you need a user space tool named aer-inject, which can be gotten
 from:
+
     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
 
 More information about aer-inject can be found in the document comes
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 31/63] Documentation: PCI: convert pci-error-recovery.txt to reST
From: Changbin Du @ 2019-04-23 16:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/index.rst                   |   1 +
 ...or-recovery.txt => pci-error-recovery.rst} | 178 +++++++++---------
 MAINTAINERS                                   |   2 +-
 3 files changed, 94 insertions(+), 87 deletions(-)
 rename Documentation/PCI/{pci-error-recovery.txt => pci-error-recovery.rst} (80%)

diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index c877a369481d..5ee4dba07116 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -13,3 +13,4 @@ Linux PCI Bus Subsystem
    pci-iov-howto
    MSI-HOWTO
    acpi-info
+   pci-error-recovery
diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.rst
similarity index 80%
rename from Documentation/PCI/pci-error-recovery.txt
rename to Documentation/PCI/pci-error-recovery.rst
index 0b6bb3ef449e..533ec4035bf5 100644
--- a/Documentation/PCI/pci-error-recovery.txt
+++ b/Documentation/PCI/pci-error-recovery.rst
@@ -1,12 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
 
-                       PCI Error Recovery
-                       ------------------
-                        February 2, 2006
+==================
+PCI Error Recovery
+==================
 
-                 Current document maintainer:
-             Linas Vepstas <linasvepstas@gmail.com>
-          updated by Richard Lary <rlary@us.ibm.com>
-       and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
+
+:Authors: - Linas Vepstas <linasvepstas@gmail.com>
+          - Richard Lary <rlary@us.ibm.com>
+          - Mike Mason <mmlnx@us.ibm.com>
 
 
 Many PCI bus controllers are able to detect a variety of hardware
@@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets.
 
 
 Detailed Design
----------------
+===============
+
 Design and implementation details below, based on a chain of
 public email discussions with Ben Herrenschmidt, circa 5 April 2005.
 
@@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware",
 and the actual recovery steps taken are platform dependent.  The
 arch/powerpc implementation will simulate a PCI hotplug remove/add.
 
-This structure has the form:
-struct pci_error_handlers
-{
-	int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
-	int (*mmio_enabled)(struct pci_dev *dev);
-	int (*slot_reset)(struct pci_dev *dev);
-	void (*resume)(struct pci_dev *dev);
-};
-
-The possible channel states are:
-enum pci_channel_state {
-	pci_channel_io_normal,  /* I/O channel is in normal state */
-	pci_channel_io_frozen,  /* I/O to channel is blocked */
-	pci_channel_io_perm_failure, /* PCI card is dead */
-};
-
-Possible return values are:
-enum pci_ers_result {
-	PCI_ERS_RESULT_NONE,        /* no result/none/not supported in device driver */
-	PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
-	PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
-	PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
-	PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
-};
+This structure has the form::
+
+	struct pci_error_handlers
+	{
+		int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
+		int (*mmio_enabled)(struct pci_dev *dev);
+		int (*slot_reset)(struct pci_dev *dev);
+		void (*resume)(struct pci_dev *dev);
+	};
+
+The possible channel states are::
+
+	enum pci_channel_state {
+		pci_channel_io_normal,  /* I/O channel is in normal state */
+		pci_channel_io_frozen,  /* I/O to channel is blocked */
+		pci_channel_io_perm_failure, /* PCI card is dead */
+	};
+
+Possible return values are::
+
+	enum pci_ers_result {
+		PCI_ERS_RESULT_NONE,        /* no result/none/not supported in device driver */
+		PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
+		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
+		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
+		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
+	};
 
 A driver does not have to implement all of these callbacks; however,
 if it implements any, it must implement error_detected(). If a callback
@@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a
 
 All drivers participating in this system must implement this call.
 The driver must return one of the following result codes:
-		- PCI_ERS_RESULT_CAN_RECOVER:
-		  Driver returns this if it thinks it might be able to recover
-		  the HW by just banging IOs or if it wants to be given
-		  a chance to extract some diagnostic information (see
-		  mmio_enable, below).
-		- PCI_ERS_RESULT_NEED_RESET:
-		  Driver returns this if it can't recover without a
-		  slot reset.
-		- PCI_ERS_RESULT_DISCONNECT:
-		  Driver returns this if it doesn't want to recover at all.
+
+  - PCI_ERS_RESULT_CAN_RECOVER:
+    Driver returns this if it thinks it might be able to recover
+    the HW by just banging IOs or if it wants to be given
+    a chance to extract some diagnostic information (see
+    mmio_enable, below).
+  - PCI_ERS_RESULT_NEED_RESET:
+    Driver returns this if it can't recover without a
+    slot reset.
+  - PCI_ERS_RESULT_DISCONNECT:
+    Driver returns this if it doesn't want to recover at all.
 
 The next step taken will depend on the result codes returned by the
 drivers.
@@ -177,7 +183,7 @@ is STEP 6 (Permanent Failure).
 >>> get the device working again.
 
 STEP 2: MMIO Enabled
--------------------
+--------------------
 The platform re-enables MMIO to the device (but typically not the
 DMA), and then calls the mmio_enabled() callback on all affected
 device drivers.
@@ -203,23 +209,23 @@ instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
 >>> into one of the next states, that is, link reset or slot reset.
 
 The driver should return one of the following result codes:
-		- PCI_ERS_RESULT_RECOVERED
-		  Driver returns this if it thinks the device is fully
-		  functional and thinks it is ready to start
-		  normal driver operations again. There is no
-		  guarantee that the driver will actually be
-		  allowed to proceed, as another driver on the
-		  same segment might have failed and thus triggered a
-		  slot reset on platforms that support it.
-
-		- PCI_ERS_RESULT_NEED_RESET
-		  Driver returns this if it thinks the device is not
-		  recoverable in its current state and it needs a slot
-		  reset to proceed.
-
-		- PCI_ERS_RESULT_DISCONNECT
-		  Same as above. Total failure, no recovery even after
-		  reset driver dead. (To be defined more precisely)
+  - PCI_ERS_RESULT_RECOVERED
+    Driver returns this if it thinks the device is fully
+    functional and thinks it is ready to start
+    normal driver operations again. There is no
+    guarantee that the driver will actually be
+    allowed to proceed, as another driver on the
+    same segment might have failed and thus triggered a
+    slot reset on platforms that support it.
+
+  - PCI_ERS_RESULT_NEED_RESET
+    Driver returns this if it thinks the device is not
+    recoverable in its current state and it needs a slot
+    reset to proceed.
+
+  - PCI_ERS_RESULT_DISCONNECT
+    Same as above. Total failure, no recovery even after
+    reset driver dead. (To be defined more precisely)
 
 The next step taken depends on the results returned by the drivers.
 If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
@@ -293,24 +299,24 @@ device will be considered "dead" in this case.
 Drivers for multi-function cards will need to coordinate among
 themselves as to which driver instance will perform any "one-shot"
 or global device initialization. For example, the Symbios sym53cxx2
-driver performs device init only from PCI function 0:
+driver performs device init only from PCI function 0::
 
-+       if (PCI_FUNC(pdev->devfn) == 0)
-+               sym_reset_scsi_bus(np, 0);
+	+       if (PCI_FUNC(pdev->devfn) == 0)
+	+               sym_reset_scsi_bus(np, 0);
 
-	Result codes:
-		- PCI_ERS_RESULT_DISCONNECT
-		Same as above.
+Result codes:
+	- PCI_ERS_RESULT_DISCONNECT
+	  Same as above.
 
 Drivers for PCI Express cards that require a fundamental reset must
 set the needs_freset bit in the pci_dev structure in their probe function.
 For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
-PCI card types:
+PCI card types::
 
-+	/* Set EEH reset type to fundamental if required by hba  */
-+	if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
-+		pdev->needs_freset = 1;
-+
+	+	/* Set EEH reset type to fundamental if required by hba  */
+	+	if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
+	+		pdev->needs_freset = 1;
+	+
 
 Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
 Failure).
@@ -370,23 +376,23 @@ The current policy is to turn this into a platform policy.
 That is, the recovery API only requires that:
 
  - There is no guarantee that interrupt delivery can proceed from any
-device on the segment starting from the error detection and until the
-slot_reset callback is called, at which point interrupts are expected
-to be fully operational.
+   device on the segment starting from the error detection and until the
+   slot_reset callback is called, at which point interrupts are expected
+   to be fully operational.
 
  - There is no guarantee that interrupt delivery is stopped, that is,
-a driver that gets an interrupt after detecting an error, or that detects
-an error within the interrupt handler such that it prevents proper
-ack'ing of the interrupt (and thus removal of the source) should just
-return IRQ_NOTHANDLED. It's up to the platform to deal with that
-condition, typically by masking the IRQ source during the duration of
-the error handling. It is expected that the platform "knows" which
-interrupts are routed to error-management capable slots and can deal
-with temporarily disabling that IRQ number during error processing (this
-isn't terribly complex). That means some IRQ latency for other devices
-sharing the interrupt, but there is simply no other way. High end
-platforms aren't supposed to share interrupts between many devices
-anyway :)
+   a driver that gets an interrupt after detecting an error, or that detects
+   an error within the interrupt handler such that it prevents proper
+   ack'ing of the interrupt (and thus removal of the source) should just
+   return IRQ_NOTHANDLED. It's up to the platform to deal with that
+   condition, typically by masking the IRQ source during the duration of
+   the error handling. It is expected that the platform "knows" which
+   interrupts are routed to error-management capable slots and can deal
+   with temporarily disabling that IRQ number during error processing (this
+   isn't terribly complex). That means some IRQ latency for other devices
+   sharing the interrupt, but there is simply no other way. High end
+   platforms aren't supposed to share interrupts between many devices
+   anyway :)
 
 >>> Implementation details for the powerpc platform are discussed in
 >>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
diff --git a/MAINTAINERS b/MAINTAINERS
index 87f930bf32ad..403178958b05 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11965,7 +11965,7 @@ M:	Sam Bobroff <sbobroff@linux.ibm.com>
 M:	Oliver O'Halloran <oohall@gmail.com>
 L:	linuxppc-dev@lists.ozlabs.org
 S:	Supported
-F:	Documentation/PCI/pci-error-recovery.txt
+F:	Documentation/PCI/pci-error-recovery.rst
 F:	drivers/pci/pcie/aer.c
 F:	drivers/pci/pcie/dpc.c
 F:	drivers/pci/pcie/err.c
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 30/63] Documentation: PCI: convert acpi-info.txt to reST
From: Changbin Du @ 2019-04-23 16:28 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/{acpi-info.txt => acpi-info.rst} | 11 ++++++++---
 Documentation/PCI/index.rst                        |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)
 rename Documentation/PCI/{acpi-info.txt => acpi-info.rst} (97%)

diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.rst
similarity index 97%
rename from Documentation/PCI/acpi-info.txt
rename to Documentation/PCI/acpi-info.rst
index 3ffa3b03970e..f7dabb7ca255 100644
--- a/Documentation/PCI/acpi-info.txt
+++ b/Documentation/PCI/acpi-info.rst
@@ -1,4 +1,8 @@
-		ACPI considerations for PCI host bridges
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+ACPI considerations for PCI host bridges
+========================================
 
 The general rule is that the ACPI namespace should describe everything the
 OS might use unless there's another way for the OS to find it [1, 2].
@@ -135,8 +139,9 @@ address always corresponds to bus 0, even if the bus range below the bridge
 
     Extended Address Space Descriptor (.4)
     General Flags: Bit [0] Consumer/Producer:
-	1–This device consumes this resource
-	0–This device produces and consumes this resource
+
+        * 1 – This device consumes this resource
+        * 0 – This device produces and consumes this resource
 
 [5] ACPI 6.2, sec 19.6.43:
     ResourceUsage specifies whether the Memory range is consumed by
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 1b25bcc1edca..c877a369481d 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -12,3 +12,4 @@ Linux PCI Bus Subsystem
    PCIEBUS-HOWTO
    pci-iov-howto
    MSI-HOWTO
+   acpi-info
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 29/63] Documentation: PCI: convert MSI-HOWTO.txt to reST
From: Changbin Du @ 2019-04-23 16:28 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>

---
v2:
  o drop numbering.
  o simplify author list
---
 .../PCI/{MSI-HOWTO.txt => MSI-HOWTO.rst}      | 83 +++++++++++--------
 Documentation/PCI/index.rst                   |  1 +
 2 files changed, 50 insertions(+), 34 deletions(-)
 rename Documentation/PCI/{MSI-HOWTO.txt => MSI-HOWTO.rst} (88%)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.rst
similarity index 88%
rename from Documentation/PCI/MSI-HOWTO.txt
rename to Documentation/PCI/MSI-HOWTO.rst
index 618e13d5e276..18cc3700489b 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.rst
@@ -1,13 +1,14 @@
-		The MSI Driver Guide HOWTO
-	Tom L Nguyen tom.l.nguyen@intel.com
-			10/03/2003
-	Revised Feb 12, 2004 by Martine Silbermann
-		email: Martine.Silbermann@hp.com
-	Revised Jun 25, 2004 by Tom L Nguyen
-	Revised Jul  9, 2008 by Matthew Wilcox <willy@linux.intel.com>
-		Copyright 2003, 2008 Intel Corporation
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
 
-1. About this guide
+==========================
+The MSI Driver Guide HOWTO
+==========================
+
+:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
+
+About this guide
+================
 
 This guide describes the basics of Message Signaled Interrupts (MSIs),
 the advantages of using MSI over traditional interrupt mechanisms, how
@@ -15,7 +16,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to
 try if a device doesn't support MSIs.
 
 
-2. What are MSIs?
+What are MSIs?
+==============
 
 A Message Signaled Interrupt is a write from the device to a special
 address which causes an interrupt to be received by the CPU.
@@ -29,7 +31,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at
 a time.
 
 
-3. Why use MSIs?
+Why use MSIs?
+=============
 
 There are three reasons why using MSIs can give an advantage over
 traditional pin-based interrupts.
@@ -61,14 +64,16 @@ Other possible designs include giving one interrupt to each packet queue
 in a network card or each port in a storage controller.
 
 
-4. How to use MSIs
+How to use MSIs
+===============
 
 PCI devices are initialised to use pin-based interrupts.  The device
 driver has to set up the device to use MSI or MSI-X.  Not all machines
 support MSIs correctly, and for those machines, the APIs described below
 will simply fail and the device will continue to use pin-based interrupts.
 
-4.1 Include kernel support for MSIs
+Include kernel support for MSIs
+-------------------------------
 
 To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
 option enabled.  This option is only available on some architectures,
@@ -76,14 +81,15 @@ and it may depend on some other options also being set.  For example,
 on x86, you must also enable X86_UP_APIC or SMP in order to see the
 CONFIG_PCI_MSI option.
 
-4.2 Using MSI
+Using MSI
+---------
 
 Most of the hard work is done for the driver in the PCI layer.  The driver
 simply has to request that the PCI layer set up the MSI capability for this
 device.
 
 To automatically use MSI or MSI-X interrupt vectors, use the following
-function:
+function::
 
   int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		unsigned int max_vecs, unsigned int flags);
@@ -101,12 +107,12 @@ any possible kind of interrupt.  If the PCI_IRQ_AFFINITY flag is set,
 pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
 
 To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
-vectors, use the following function:
+vectors, use the following function::
 
   int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
 
 Any allocated resources should be freed before removing the device using
-the following function:
+the following function::
 
   void pci_free_irq_vectors(struct pci_dev *dev);
 
@@ -126,7 +132,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
 as possible, likely up to the limit supported by the device.  If nvec is
 larger than the number supported by the device it will automatically be
 capped to the supported limit, so there is no need to query the number of
-vectors supported beforehand:
+vectors supported beforehand::
 
 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
 	if (nvec < 0)
@@ -135,7 +141,7 @@ vectors supported beforehand:
 If a driver is unable or unwilling to deal with a variable number of MSI
 interrupts it can request a particular number of interrupts by passing that
 number to pci_alloc_irq_vectors() function as both 'min_vecs' and
-'max_vecs' parameters:
+'max_vecs' parameters::
 
 	ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
 	if (ret < 0)
@@ -143,23 +149,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and
 
 The most notorious example of the request type described above is enabling
 the single MSI mode for a device.  It could be done by passing two 1s as
-'min_vecs' and 'max_vecs':
+'min_vecs' and 'max_vecs'::
 
 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
 	if (ret < 0)
 		goto out_err;
 
 Some devices might not support using legacy line interrupts, in which case
-the driver can specify that only MSI or MSI-X is acceptable:
+the driver can specify that only MSI or MSI-X is acceptable::
 
 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
 	if (nvec < 0)
 		goto out_err;
 
-4.3 Legacy APIs
+Legacy APIs
+-----------
 
 The following old APIs to enable and disable MSI or MSI-X interrupts should
-not be used in new code:
+not be used in new code::
 
   pci_enable_msi()		/* deprecated */
   pci_disable_msi()		/* deprecated */
@@ -174,9 +181,11 @@ number of vectors.  If you have a legitimate special use case for the count
 of vectors we might have to revisit that decision and add a
 pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
 
-4.4 Considerations when using MSIs
+Considerations when using MSIs
+------------------------------
 
-4.4.1 Spinlocks
+Spinlocks
+~~~~~~~~~
 
 Most device drivers have a per-device spinlock which is taken in the
 interrupt handler.  With pin-based interrupts or a single MSI, it is not
@@ -188,7 +197,8 @@ acquire the spinlock.  Such deadlocks can be avoided by using
 spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
 and acquire the lock (see Documentation/kernel-hacking/locking.rst).
 
-4.5 How to tell whether MSI/MSI-X is enabled on a device
+How to tell whether MSI/MSI-X is enabled on a device
+----------------------------------------------------
 
 Using 'lspci -v' (as root) may show some devices with "MSI", "Message
 Signalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities
@@ -196,7 +206,8 @@ has an 'Enable' flag which is followed with either "+" (enabled)
 or "-" (disabled).
 
 
-5. MSI quirks
+MSI quirks
+==========
 
 Several PCI chipsets or devices are known not to support MSIs.
 The PCI stack provides three ways to disable MSIs:
@@ -205,7 +216,8 @@ The PCI stack provides three ways to disable MSIs:
 2. on all devices behind a specific bridge
 3. on a single device
 
-5.1. Disabling MSIs globally
+Disabling MSIs globally
+-----------------------
 
 Some host chipsets simply don't support MSIs properly.  If we're
 lucky, the manufacturer knows this and has indicated it in the ACPI
@@ -219,7 +231,8 @@ on the kernel command line to disable MSIs on all devices.  It would be
 in your best interests to report the problem to linux-pci@vger.kernel.org
 including a full 'lspci -v' so we can add the quirks to the kernel.
 
-5.2. Disabling MSIs below a bridge
+Disabling MSIs below a bridge
+-----------------------------
 
 Some PCI bridges are not able to route MSIs between busses properly.
 In this case, MSIs must be disabled on all devices behind the bridge.
@@ -230,7 +243,7 @@ as the nVidia nForce and Serverworks HT2000).  As with host chipsets,
 Linux mostly knows about them and automatically enables MSIs if it can.
 If you have a bridge unknown to Linux, you can enable
 MSIs in configuration space using whatever method you know works, then
-enable MSIs on that bridge by doing:
+enable MSIs on that bridge by doing::
 
        echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
 
@@ -244,7 +257,8 @@ below this bridge.
 Again, please notify linux-pci@vger.kernel.org of any bridges that need
 special handling.
 
-5.3. Disabling MSIs on a single device
+Disabling MSIs on a single device
+---------------------------------
 
 Some devices are known to have faulty MSI implementations.  Usually this
 is handled in the individual device driver, but occasionally it's necessary
@@ -252,7 +266,8 @@ to handle this with a quirk.  Some drivers have an option to disable use
 of MSI.  While this is a convenient workaround for the driver author,
 it is not good practice, and should not be emulated.
 
-5.4. Finding why MSIs are disabled on a device
+Finding why MSIs are disabled on a device
+-----------------------------------------
 
 From the above three sections, you can see that there are many reasons
 why MSIs may not be enabled for a given device.  Your first step should
@@ -260,8 +275,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled
 for your machine.  You should also check your .config to be sure you
 have enabled CONFIG_PCI_MSI.
 
-Then, 'lspci -t' gives the list of bridges above a device.  Reading
-/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1)
+Then, 'lspci -t' gives the list of bridges above a device. Reading
+`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
 or disabled (0).  If 0 is found in any of the msi_bus files belonging
 to bridges between the PCI root and the device, MSIs are disabled.
 
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index e1c19962a7f8..1b25bcc1edca 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -11,3 +11,4 @@ Linux PCI Bus Subsystem
    pci
    PCIEBUS-HOWTO
    pci-iov-howto
+   MSI-HOWTO
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 28/63] Documentation: PCI: convert pci-iov-howto.txt to reST
From: Changbin Du @ 2019-04-23 16:28 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/index.rst                   |   1 +
 .../{pci-iov-howto.txt => pci-iov-howto.rst}  | 161 ++++++++++--------
 2 files changed, 94 insertions(+), 68 deletions(-)
 rename Documentation/PCI/{pci-iov-howto.txt => pci-iov-howto.rst} (63%)

diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 452723318405..e1c19962a7f8 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -10,3 +10,4 @@ Linux PCI Bus Subsystem
 
    pci
    PCIEBUS-HOWTO
+   pci-iov-howto
diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.rst
similarity index 63%
rename from Documentation/PCI/pci-iov-howto.txt
rename to Documentation/PCI/pci-iov-howto.rst
index d2a84151e99c..b9fd003206f1 100644
--- a/Documentation/PCI/pci-iov-howto.txt
+++ b/Documentation/PCI/pci-iov-howto.rst
@@ -1,14 +1,19 @@
-		PCI Express I/O Virtualization Howto
-		Copyright (C) 2009 Intel Corporation
-		    Yu Zhao <yu.zhao@intel.com>
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
 
-		Update: November 2012
-			-- sysfs-based SRIOV enable-/disable-ment
-		Donald Dutile <ddutile@redhat.com>
+====================================
+PCI Express I/O Virtualization Howto
+====================================
 
-1. Overview
+:Copyright: |copy| 2009 Intel Corporation
+:Authors: - Yu Zhao <yu.zhao@intel.com>
+          - Donald Dutile <ddutile@redhat.com>
 
-1.1 What is SR-IOV
+Overview
+========
+
+What is SR-IOV
+--------------
 
 Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
 capability which makes one physical device appear as multiple virtual
@@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver
 operates on the register set so it can be functional and appear as a
 real existing PCI device.
 
-2. User Guide
+User Guide
+==========
 
-2.1 How can I enable SR-IOV capability
+How can I enable SR-IOV capability
+----------------------------------
 
 Multiple methods are available for SR-IOV enablement.
 In the first method, the device driver (PF driver) will control the
@@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure
 numvfs <= totalvfs.
 The second method is the recommended method for new/future VF devices.
 
-2.2 How can I use the Virtual Functions
+How can I use the Virtual Functions
+-----------------------------------
 
 The VF is treated as hot-plugged PCI devices in the kernel, so they
 should be able to work in the same way as real PCI devices. The VF
 requires device driver that is same as a normal PCI device's.
 
-3. Developer Guide
+Developer Guide
+===============
 
-3.1 SR-IOV API
+SR-IOV API
+----------
 
 To enable SR-IOV capability:
-(a) For the first method, in the driver:
+
+(a) For the first method, in the driver::
+
 	int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
-	'nr_virtfn' is number of VFs to be enabled.
-(b) For the second method, from sysfs:
+
+'nr_virtfn' is number of VFs to be enabled.
+
+(b) For the second method, from sysfs::
+
 	echo 'nr_virtfn' > \
         /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 
 To disable SR-IOV capability:
-(a) For the first method, in the driver:
+
+(a) For the first method, in the driver::
+
 	void pci_disable_sriov(struct pci_dev *dev);
-(b) For the second method, from sysfs:
+
+(b) For the second method, from sysfs::
+
 	echo  0 > \
         /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 
 To enable auto probing VFs by a compatible driver on the host, run
 command below before enabling SR-IOV capabilities. This is the
 default behavior.
+::
+
 	echo 1 > \
         /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
 
 To disable auto probing VFs by a compatible driver on the host, run
 command below before enabling SR-IOV capabilities. Updating this
 entry will not affect VFs which are already probed.
+::
+
 	echo  0 > \
         /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
 
-3.2 Usage example
+Usage example
+-------------
 
 Following piece of code illustrates the usage of the SR-IOV API.
+::
 
-static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
-{
-	pci_enable_sriov(dev, NR_VIRTFN);
+	static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
+	{
+		pci_enable_sriov(dev, NR_VIRTFN);
 
-	...
-
-	return 0;
-}
+		...
 
-static void dev_remove(struct pci_dev *dev)
-{
-	pci_disable_sriov(dev);
+		return 0;
+	}
 
-	...
-}
+	static void dev_remove(struct pci_dev *dev)
+	{
+		pci_disable_sriov(dev);
 
-static int dev_suspend(struct pci_dev *dev, pm_message_t state)
-{
-	...
+		...
+	}
 
-	return 0;
-}
+	static int dev_suspend(struct pci_dev *dev, pm_message_t state)
+	{
+		...
 
-static int dev_resume(struct pci_dev *dev)
-{
-	...
+		return 0;
+	}
 
-	return 0;
-}
+	static int dev_resume(struct pci_dev *dev)
+	{
+		...
 
-static void dev_shutdown(struct pci_dev *dev)
-{
-	...
-}
+		return 0;
+	}
 
-static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
-{
-	if (numvfs > 0) {
-		...
-		pci_enable_sriov(dev, numvfs);
+	static void dev_shutdown(struct pci_dev *dev)
+	{
 		...
-		return numvfs;
 	}
-	if (numvfs == 0) {
-		....
-		pci_disable_sriov(dev);
-		...
-		return 0;
+
+	static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
+	{
+		if (numvfs > 0) {
+			...
+			pci_enable_sriov(dev, numvfs);
+			...
+			return numvfs;
+		}
+		if (numvfs == 0) {
+			....
+			pci_disable_sriov(dev);
+			...
+			return 0;
+		}
 	}
-}
-
-static struct pci_driver dev_driver = {
-	.name =		"SR-IOV Physical Function driver",
-	.id_table =	dev_id_table,
-	.probe =	dev_probe,
-	.remove =	dev_remove,
-	.suspend =	dev_suspend,
-	.resume =	dev_resume,
-	.shutdown =	dev_shutdown,
-	.sriov_configure = dev_sriov_configure,
-};
+
+	static struct pci_driver dev_driver = {
+		.name =		"SR-IOV Physical Function driver",
+		.id_table =	dev_id_table,
+		.probe =	dev_probe,
+		.remove =	dev_remove,
+		.suspend =	dev_suspend,
+		.resume =	dev_resume,
+		.shutdown =	dev_shutdown,
+		.sriov_configure = dev_sriov_configure,
+	};
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 27/63] Documentation: PCI: convert PCIEBUS-HOWTO.txt to reST
From: Changbin Du @ 2019-04-23 16:28 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-pci, linux-gpio,
	x86, rjw, linux-kernel, linux-acpi, mingo, Bjorn Helgaas, tglx,
	linuxppc-dev, Changbin Du
In-Reply-To: <20190423162932.21428-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 .../{PCIEBUS-HOWTO.txt => PCIEBUS-HOWTO.rst}  | 140 ++++++++++--------
 Documentation/PCI/index.rst                   |   1 +
 2 files changed, 82 insertions(+), 59 deletions(-)
 rename Documentation/PCI/{PCIEBUS-HOWTO.txt => PCIEBUS-HOWTO.rst} (70%)

diff --git a/Documentation/PCI/PCIEBUS-HOWTO.txt b/Documentation/PCI/PCIEBUS-HOWTO.rst
similarity index 70%
rename from Documentation/PCI/PCIEBUS-HOWTO.txt
rename to Documentation/PCI/PCIEBUS-HOWTO.rst
index 15f0bb3b5045..f882ff62c51f 100644
--- a/Documentation/PCI/PCIEBUS-HOWTO.txt
+++ b/Documentation/PCI/PCIEBUS-HOWTO.rst
@@ -1,16 +1,23 @@
-		The PCI Express Port Bus Driver Guide HOWTO
-	Tom L Nguyen tom.l.nguyen@intel.com
-			11/03/2004
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
 
-1. About this guide
+===========================================
+The PCI Express Port Bus Driver Guide HOWTO
+===========================================
+
+:Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
+:Copyright: |copy| 2004 Intel Corporation
+
+About this guide
+================
 
 This guide describes the basics of the PCI Express Port Bus driver
 and provides information on how to enable the service drivers to
 register/unregister with the PCI Express Port Bus Driver.
 
-2. Copyright 2004 Intel Corporation
 
-3. What is the PCI Express Port Bus Driver
+What is the PCI Express Port Bus Driver
+=======================================
 
 A PCI Express Port is a logical PCI-PCI Bridge structure. There
 are two types of PCI Express Port: the Root Port and the Switch
@@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may
 be handled by a single complex driver or be individually distributed
 and handled by corresponding service drivers.
 
-4. Why use the PCI Express Port Bus Driver?
+Why use the PCI Express Port Bus Driver?
+========================================
 
 In existing Linux kernels, the Linux Device Driver Model allows a
 physical device to be handled by only a single driver. The PCI
@@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests
 to the corresponding service drivers as required. Some key
 advantages of using the PCI Express Port Bus driver are listed below:
 
-	- Allow multiple service drivers to run simultaneously on
-	  a PCI-PCI Bridge Port device.
+  - Allow multiple service drivers to run simultaneously on
+    a PCI-PCI Bridge Port device.
 
-	- Allow service drivers implemented in an independent
-	  staged approach.
+  - Allow service drivers implemented in an independent
+    staged approach.
 
-	- Allow one service driver to run on multiple PCI-PCI Bridge
-	  Port devices.
+  - Allow one service driver to run on multiple PCI-PCI Bridge
+    Port devices.
 
-	- Manage and distribute resources of a PCI-PCI Bridge Port
-	  device to requested service drivers.
+  - Manage and distribute resources of a PCI-PCI Bridge Port
+    device to requested service drivers.
 
-5. Configuring the PCI Express Port Bus Driver vs. Service Drivers
+Configuring the PCI Express Port Bus Driver vs. Service Drivers
+===============================================================
 
-5.1 Including the PCI Express Port Bus Driver Support into the Kernel
+Including the PCI Express Port Bus Driver Support into the Kernel
+-----------------------------------------------------------------
 
 Including the PCI Express Port Bus driver depends on whether the PCI
 Express support is included in the kernel config. The kernel will
 automatically include the PCI Express Port Bus driver as a kernel
 driver when the PCI Express support is enabled in the kernel.
 
-5.2 Enabling Service Driver Support
+Enabling Service Driver Support
+-------------------------------
 
 PCI device drivers are implemented based on Linux Device Driver Model.
 All service drivers are PCI device drivers. As discussed above, it is
@@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs.
 Failure to do so will result an identity mismatch, which prevents
 the PCI Express Port Bus driver from loading a service driver.
 
-5.2.1 pcie_port_service_register
+pcie_port_service_register
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+::
 
-int pcie_port_service_register(struct pcie_port_service_driver *new)
+  int pcie_port_service_register(struct pcie_port_service_driver *new)
 
 This API replaces the Linux Driver Model's pci_register_driver API. A
 service driver should always calls pcie_port_service_register at
@@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls
 such as pci_enable_device(dev) and pci_set_master(dev) are no longer
 necessary since these calls are executed by the PCI Port Bus driver.
 
-5.2.2 pcie_port_service_unregister
+pcie_port_service_unregister
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+::
 
-void pcie_port_service_unregister(struct pcie_port_service_driver *new)
+  void pcie_port_service_unregister(struct pcie_port_service_driver *new)
 
 pcie_port_service_unregister replaces the Linux Driver Model's
 pci_unregister_driver. It's always called by service driver when a
 module exits.
 
-5.2.3 Sample Code
+Sample Code
+~~~~~~~~~~~
 
 Below is sample service driver code to initialize the port service
 driver data structure.
+::
 
-static struct pcie_port_service_id service_id[] = { {
-	.vendor = PCI_ANY_ID,
-	.device = PCI_ANY_ID,
-	.port_type = PCIE_RC_PORT,
-	.service_type = PCIE_PORT_SERVICE_AER,
-	}, { /* end: all zeroes */ }
-};
+  static struct pcie_port_service_id service_id[] = { {
+    .vendor = PCI_ANY_ID,
+    .device = PCI_ANY_ID,
+    .port_type = PCIE_RC_PORT,
+    .service_type = PCIE_PORT_SERVICE_AER,
+    }, { /* end: all zeroes */ }
+  };
 
-static struct pcie_port_service_driver root_aerdrv = {
-	.name		= (char *)device_name,
-	.id_table	= &service_id[0],
+  static struct pcie_port_service_driver root_aerdrv = {
+    .name		= (char *)device_name,
+    .id_table	= &service_id[0],
 
-	.probe		= aerdrv_load,
-	.remove		= aerdrv_unload,
+    .probe		= aerdrv_load,
+    .remove		= aerdrv_unload,
 
-	.suspend	= aerdrv_suspend,
-	.resume		= aerdrv_resume,
-};
+    .suspend	= aerdrv_suspend,
+    .resume		= aerdrv_resume,
+  };
 
 Below is a sample code for registering/unregistering a service
 driver.
+::
 
-static int __init aerdrv_service_init(void)
-{
-	int retval = 0;
+  static int __init aerdrv_service_init(void)
+  {
+    int retval = 0;
 
-	retval = pcie_port_service_register(&root_aerdrv);
-	if (!retval) {
-		/*
-		 * FIX ME
-		 */
-	}
-	return retval;
-}
+    retval = pcie_port_service_register(&root_aerdrv);
+    if (!retval) {
+      /*
+      * FIX ME
+      */
+    }
+    return retval;
+  }
 
-static void __exit aerdrv_service_exit(void)
-{
-	pcie_port_service_unregister(&root_aerdrv);
-}
+  static void __exit aerdrv_service_exit(void)
+  {
+    pcie_port_service_unregister(&root_aerdrv);
+  }
 
-module_init(aerdrv_service_init);
-module_exit(aerdrv_service_exit);
+  module_init(aerdrv_service_init);
+  module_exit(aerdrv_service_exit);
 
-6. Possible Resource Conflicts
+Possible Resource Conflicts
+===========================
 
 Since all service drivers of a PCI-PCI Bridge Port device are
 allowed to run simultaneously, below lists a few of possible resource
 conflicts with proposed solutions.
 
-6.1 MSI and MSI-X Vector Resource
+MSI and MSI-X Vector Resource
+-----------------------------
 
 Once MSI or MSI-X interrupts are enabled on a device, it stays in this
 mode until they are disabled again.  Since service drivers of the same
@@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to
 call request_irq/free_irq. In addition, the interrupt mode is stored
 in the field interrupt_mode of struct pcie_device.
 
-6.3 PCI Memory/IO Mapped Regions
+PCI Memory/IO Mapped Regions
+----------------------------
 
 Service drivers for PCI Express Power Management (PME), Advanced
 Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
@@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes
 that all service drivers will be well behaved and not overwrite
 other service driver's configuration settings.
 
-6.4 PCI Config Registers
+PCI Config Registers
+--------------------
 
 Each service driver runs its PCI config operations on its own
 capability structure except the PCI Express capability structure, in
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 7babf43709b0..452723318405 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -9,3 +9,4 @@ Linux PCI Bus Subsystem
    :numbered:
 
    pci
+   PCIEBUS-HOWTO
-- 
2.20.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox