LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 01/11] powerpc/mm: fix erroneous duplicate slb_addr_limit init
From: Christophe Leroy @ 2019-04-25 14:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	aneesh.kumar
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1556202028.git.christophe.leroy@c-s.fr>

Commit 67fda38f0d68 ("powerpc/mm: Move slb_addr_linit to
early_init_mmu") moved slb_addr_limit init out of setup_arch().

Commit 701101865f5d ("powerpc/mm: Reduce memory usage for mm_context_t
for radix") brought it back into setup_arch() by error.

This patch reverts that erroneous regress.

Fixes: 701101865f5d ("powerpc/mm: Reduce memory usage for mm_context_t for radix")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/setup-common.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1729bf409562..7af085b38cd1 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -950,12 +950,6 @@ void __init setup_arch(char **cmdline_p)
 	init_mm.end_data = (unsigned long) _edata;
 	init_mm.brk = klimit;
 
-#ifdef CONFIG_PPC_MM_SLICES
-#if defined(CONFIG_PPC_8xx)
-	init_mm.context.slb_addr_limit = DEFAULT_MAP_WINDOW;
-#endif
-#endif
-
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	mm_iommu_init(&init_mm);
 #endif
-- 
2.13.3


^ permalink raw reply related

* [PATCH v2 00/11] Reduce ifdef mess in slice.c
From: Christophe Leroy @ 2019-04-25 14:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	aneesh.kumar
  Cc: linuxppc-dev, linux-kernel

This series is a split out of the v1 series "Reduce ifdef mess in hugetlbpage.c and slice.c".

It is also rebased after the series from Aneesh to reduce context size for Radix.

See http://kisskb.ellerman.id.au/kisskb/branch/chleroy/head/f263887b4ca31f4bb0fe77823e301c28ba27c796/ for wide compilation.

Christophe Leroy (11):
  powerpc/mm: fix erroneous duplicate slb_addr_limit init
  powerpc/mm: no slice for nohash/64
  powerpc/mm: hand a context_t over to slice_mask_for_size() instead of
    mm_struct
  powerpc/mm: move slice_mask_for_size() into mmu.h
  powerpc/mm: get rid of mm_ctx_slice_mask_xxx()
  powerpc/mm: remove unnecessary #ifdef CONFIG_PPC64
  powerpc/mm: remove a couple of #ifdef CONFIG_PPC_64K_PAGES in
    mm/slice.c
  powerpc/8xx: get rid of #ifdef CONFIG_HUGETLB_PAGE for slices
  powerpc/mm: define get_slice_psize() all the time
  powerpc/mm: define subarch SLB_ADDR_LIMIT_DEFAULT
  powerpc/mm: drop slice DEBUG

 arch/powerpc/include/asm/book3s/64/mmu.h     |  29 +++---
 arch/powerpc/include/asm/book3s/64/slice.h   |   2 +
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h |  51 +++++------
 arch/powerpc/include/asm/nohash/32/slice.h   |   2 +
 arch/powerpc/include/asm/nohash/64/slice.h   |  12 ---
 arch/powerpc/include/asm/slice.h             |   9 +-
 arch/powerpc/kernel/setup-common.c           |   6 --
 arch/powerpc/mm/hash_utils_64.c              |   2 +-
 arch/powerpc/mm/hugetlbpage.c                |   4 +-
 arch/powerpc/mm/slice.c                      | 132 ++++-----------------------
 arch/powerpc/mm/tlb_nohash.c                 |   4 +-
 arch/powerpc/platforms/Kconfig.cputype       |   4 +
 12 files changed, 69 insertions(+), 188 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/nohash/64/slice.h

-- 
2.13.3


^ permalink raw reply

* [PATCH v2 03/11] powerpc/mm: hand a context_t over to slice_mask_for_size() instead of mm_struct
From: Christophe Leroy @ 2019-04-25 14:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	aneesh.kumar
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1556202028.git.christophe.leroy@c-s.fr>

slice_mask_for_size() only uses mm->context, so hand directly a
pointer to the context. This will help moving the function in
subarch mmu.h in the next patch by avoiding having to include
the definition of struct mm_struct

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/mm/slice.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 35b278082391..8eb7e8b09c75 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -151,32 +151,32 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
-static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
+static struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int psize)
 {
 #ifdef CONFIG_PPC_64K_PAGES
 	if (psize == MMU_PAGE_64K)
-		return mm_ctx_slice_mask_64k(&mm->context);
+		return mm_ctx_slice_mask_64k(&ctx);
 #endif
 	if (psize == MMU_PAGE_4K)
-		return mm_ctx_slice_mask_4k(&mm->context);
+		return mm_ctx_slice_mask_4k(&ctx);
 #ifdef CONFIG_HUGETLB_PAGE
 	if (psize == MMU_PAGE_16M)
-		return mm_ctx_slice_mask_16m(&mm->context);
+		return mm_ctx_slice_mask_16m(&ctx);
 	if (psize == MMU_PAGE_16G)
-		return mm_ctx_slice_mask_16g(&mm->context);
+		return mm_ctx_slice_mask_16g(&ctx);
 #endif
 	BUG();
 }
 #elif defined(CONFIG_PPC_8xx)
-static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
+static struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int psize)
 {
 	if (psize == mmu_virtual_psize)
-		return &mm->context.mask_base_psize;
+		return &ctx->mask_base_psize;
 #ifdef CONFIG_HUGETLB_PAGE
 	if (psize == MMU_PAGE_512K)
-		return &mm->context.mask_512k;
+		return &ctx->mask_512k;
 	if (psize == MMU_PAGE_8M)
-		return &mm->context.mask_8m;
+		return &ctx->mask_8m;
 #endif
 	BUG();
 }
@@ -246,7 +246,7 @@ static void slice_convert(struct mm_struct *mm,
 	slice_dbg("slice_convert(mm=%p, psize=%d)\n", mm, psize);
 	slice_print_mask(" mask", mask);
 
-	psize_mask = slice_mask_for_size(mm, psize);
+	psize_mask = slice_mask_for_size(&mm->context, psize);
 
 	/* We need to use a spinlock here to protect against
 	 * concurrent 64k -> 4k demotion ...
@@ -263,7 +263,7 @@ static void slice_convert(struct mm_struct *mm,
 
 		/* Update the slice_mask */
 		old_psize = (lpsizes[index] >> (mask_index * 4)) & 0xf;
-		old_mask = slice_mask_for_size(mm, old_psize);
+		old_mask = slice_mask_for_size(&mm->context, old_psize);
 		old_mask->low_slices &= ~(1u << i);
 		psize_mask->low_slices |= 1u << i;
 
@@ -282,7 +282,7 @@ static void slice_convert(struct mm_struct *mm,
 
 		/* Update the slice_mask */
 		old_psize = (hpsizes[index] >> (mask_index * 4)) & 0xf;
-		old_mask = slice_mask_for_size(mm, old_psize);
+		old_mask = slice_mask_for_size(&mm->context, old_psize);
 		__clear_bit(i, old_mask->high_slices);
 		__set_bit(i, psize_mask->high_slices);
 
@@ -538,7 +538,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	/* First make up a "good" mask of slices that have the right size
 	 * already
 	 */
-	maskp = slice_mask_for_size(mm, psize);
+	maskp = slice_mask_for_size(&mm->context, psize);
 
 	/*
 	 * Here "good" means slices that are already the right page size,
@@ -565,7 +565,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * a pointer to good mask for the next code to use.
 	 */
 	if (IS_ENABLED(CONFIG_PPC_64K_PAGES) && psize == MMU_PAGE_64K) {
-		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
+		compat_maskp = slice_mask_for_size(&mm->context, MMU_PAGE_4K);
 		if (fixed)
 			slice_or_mask(&good_mask, maskp, compat_maskp);
 		else
@@ -760,7 +760,7 @@ void slice_init_new_context_exec(struct mm_struct *mm)
 	/*
 	 * Slice mask cache starts zeroed, fill the default size cache.
 	 */
-	mask = slice_mask_for_size(mm, psize);
+	mask = slice_mask_for_size(&mm->context, psize);
 	mask->low_slices = ~0UL;
 	if (SLICE_NUM_HIGH)
 		bitmap_fill(mask->high_slices, SLICE_NUM_HIGH);
@@ -819,14 +819,14 @@ int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 
 	VM_BUG_ON(radix_enabled());
 
-	maskp = slice_mask_for_size(mm, psize);
+	maskp = slice_mask_for_size(&mm->context, psize);
 #ifdef CONFIG_PPC_64K_PAGES
 	/* We need to account for 4k slices too */
 	if (psize == MMU_PAGE_64K) {
 		const struct slice_mask *compat_maskp;
 		struct slice_mask available;
 
-		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
+		compat_maskp = slice_mask_for_size(&mm->context, MMU_PAGE_4K);
 		slice_or_mask(&available, maskp, compat_maskp);
 		return !slice_check_range_fits(mm, &available, addr, len);
 	}
-- 
2.13.3


^ permalink raw reply related

* [PATCH v2 04/11] powerpc/mm: move slice_mask_for_size() into mmu.h
From: Christophe Leroy @ 2019-04-25 14:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	aneesh.kumar
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1556202028.git.christophe.leroy@c-s.fr>

Move slice_mask_for_size() into subarch mmu.h

At the same time, replace BUG() by VM_BUG_ON() as those BUG() are not
there to catch runtime errors but to catch errors during development
cycle only.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/book3s/64/mmu.h     | 17 +++++++++++
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 42 +++++++++++++++++++---------
 arch/powerpc/mm/slice.c                      | 34 ----------------------
 3 files changed, 46 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 230a9dec7677..ad00355f874f 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -203,6 +203,23 @@ static inline struct slice_mask *mm_ctx_slice_mask_16g(mm_context_t *ctx)
 }
 #endif
 
+static inline struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int psize)
+{
+#ifdef CONFIG_PPC_64K_PAGES
+	if (psize == MMU_PAGE_64K)
+		return mm_ctx_slice_mask_64k(&ctx);
+#endif
+#ifdef CONFIG_HUGETLB_PAGE
+	if (psize == MMU_PAGE_16M)
+		return mm_ctx_slice_mask_16m(&ctx);
+	if (psize == MMU_PAGE_16G)
+		return mm_ctx_slice_mask_16g(&ctx);
+#endif
+	VM_BUG_ON(psize != MMU_PAGE_4K);
+
+	return mm_ctx_slice_mask_4k(&ctx);
+}
+
 #ifdef CONFIG_PPC_SUBPAGE_PROT
 static inline struct subpage_prot_table *mm_ctx_subpage_prot(mm_context_t *ctx)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index c503e2f05e61..a0f6844a1498 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -184,7 +184,23 @@
 #define LOW_SLICE_ARRAY_SZ	SLICE_ARRAY_SIZE
 #endif
 
+#if defined(CONFIG_PPC_4K_PAGES)
+#define mmu_virtual_psize	MMU_PAGE_4K
+#elif defined(CONFIG_PPC_16K_PAGES)
+#define mmu_virtual_psize	MMU_PAGE_16K
+#define PTE_FRAG_NR		4
+#define PTE_FRAG_SIZE_SHIFT	12
+#define PTE_FRAG_SIZE		(1UL << 12)
+#else
+#error "Unsupported PAGE_SIZE"
+#endif
+
+#define mmu_linear_psize	MMU_PAGE_8M
+
 #ifndef __ASSEMBLY__
+
+#include <linux/mmdebug.h>
+
 struct slice_mask {
 	u64 low_slices;
 	DECLARE_BITMAP(high_slices, 0);
@@ -255,6 +271,19 @@ static inline struct slice_mask *mm_ctx_slice_mask_8m(mm_context_t *ctx)
 	return &ctx->mask_8m;
 }
 #endif
+
+static inline struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int psize)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	if (psize == MMU_PAGE_512K)
+		return &ctx->mask_512k;
+	if (psize == MMU_PAGE_8M)
+		return &ctx->mask_8m;
+#endif
+	VM_BUG_ON(psize != mmu_virtual_psize);
+
+	return &ctx->mask_base_psize;
+}
 #endif /* CONFIG_PPC_MM_SLICE */
 
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff80000)
@@ -306,17 +335,4 @@ extern s32 patch__itlbmiss_perf, patch__dtlbmiss_perf;
 
 #endif /* !__ASSEMBLY__ */
 
-#if defined(CONFIG_PPC_4K_PAGES)
-#define mmu_virtual_psize	MMU_PAGE_4K
-#elif defined(CONFIG_PPC_16K_PAGES)
-#define mmu_virtual_psize	MMU_PAGE_16K
-#define PTE_FRAG_NR		4
-#define PTE_FRAG_SIZE_SHIFT	12
-#define PTE_FRAG_SIZE		(1UL << 12)
-#else
-#error "Unsupported PAGE_SIZE"
-#endif
-
-#define mmu_linear_psize	MMU_PAGE_8M
-
 #endif /* _ASM_POWERPC_MMU_8XX_H_ */
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 8eb7e8b09c75..31de91b65a64 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -150,40 +150,6 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
 			__set_bit(i, ret->high_slices);
 }
 
-#ifdef CONFIG_PPC_BOOK3S_64
-static struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int psize)
-{
-#ifdef CONFIG_PPC_64K_PAGES
-	if (psize == MMU_PAGE_64K)
-		return mm_ctx_slice_mask_64k(&ctx);
-#endif
-	if (psize == MMU_PAGE_4K)
-		return mm_ctx_slice_mask_4k(&ctx);
-#ifdef CONFIG_HUGETLB_PAGE
-	if (psize == MMU_PAGE_16M)
-		return mm_ctx_slice_mask_16m(&ctx);
-	if (psize == MMU_PAGE_16G)
-		return mm_ctx_slice_mask_16g(&ctx);
-#endif
-	BUG();
-}
-#elif defined(CONFIG_PPC_8xx)
-static struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int psize)
-{
-	if (psize == mmu_virtual_psize)
-		return &ctx->mask_base_psize;
-#ifdef CONFIG_HUGETLB_PAGE
-	if (psize == MMU_PAGE_512K)
-		return &ctx->mask_512k;
-	if (psize == MMU_PAGE_8M)
-		return &ctx->mask_8m;
-#endif
-	BUG();
-}
-#else
-#error "Must define the slice masks for page sizes supported by the platform"
-#endif
-
 static bool slice_check_range_fits(struct mm_struct *mm,
 			   const struct slice_mask *available,
 			   unsigned long start, unsigned long len)
-- 
2.13.3


^ permalink raw reply related

* Re: [PATCH v2 0/5] Allow CPU0 to be nohz full
From: Peter Zijlstra @ 2019-04-25 12:04 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Frederic Weisbecker, Rafael J . Wysocki, linux-kernel,
	Ingo Molnar, Thomas Gleixner, linuxppc-dev
In-Reply-To: <20190411033448.20842-1-npiggin@gmail.com>

On Thu, Apr 11, 2019 at 01:34:43PM +1000, Nicholas Piggin wrote:
> Since last time, I added a compile time option to opt-out of this
> if the platform does not support suspend on non-zero, and tried to
> improve legibility of changelogs and explain the justification
> better.
> 
> I have been testing this on powerpc/pseries and it seems to work
> fine (the firmware call to suspend can be called on any CPU and
> resumes where it left off), but not included here because the
> code has some bitrot unrelated to this series which I hacked to
> fix. I will discuss it and either send an acked patch to go with
> this series if it is small, or fix it in powerpc tree.
> 

Rafael, Frederic, any comments?

^ permalink raw reply

* Re: [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
From: Peter Zijlstra @ 2019-04-25 12:02 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Frederic Weisbecker, Rafael J . Wysocki, linux-kernel,
	Ingo Molnar, Thomas Gleixner, linuxppc-dev
In-Reply-To: <20190411033448.20842-4-npiggin@gmail.com>

On Thu, Apr 11, 2019 at 01:34:46PM +1000, Nicholas Piggin wrote:
> This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
> opt-in to allowing suspend to occur on one of the housekeeping CPUs
> rather than hardcoded CPU0.
> 
> This will allow CPU0 to be a nohz_full CPU with a later change.
> 
> It may be possible for platforms with hardware/firmware restrictions
> on suspend/wake effectively support this by handing off the final
> stage to CPU0 when kernel housekeeping is no longer required. Another
> option is to make housekeeping / nohz_full mask dynamic at runtime,
> but the complexity could not be justified at this time.

Should we not tie this into whatever already allows an achitecture to
hotplug CPU-0? For instance, x86 default disallows this but has
cpu0_hotpluggable to allow this.

Presumably POWER already allows hotplugging CPU-0 ?



^ permalink raw reply

* Re: [PATCH v5 00/23] Include linux ACPI docs into Sphinx TOC tree
From: Rafael J. Wysocki @ 2019-04-25  8:44 UTC (permalink / raw)
  To: Changbin Du
  Cc: Fenghua Yu, Mauro Carvalho Chehab, open list:DOCUMENTATION,
	linux-gpio, Jonathan Corbet, Rafael J. Wysocki,
	Linux Kernel Mailing List, ACPI Devel Maling List, Bjorn Helgaas,
	linuxppc-dev
In-Reply-To: <20190424175306.25880-1-changbin.du@gmail.com>

.On Wed, Apr 24, 2019 at 7:54 PM Changbin Du <changbin.du@gmail.com> wrote:
>
> Hi All,
> The kernel now uses Sphinx to generate intelligent and beautiful documentation
> from reStructuredText files. I converted all of the Linux ACPI/PCI/X86 docs to
> reST format in this serias.
>
> The hieararchy of ACPI docs are based on Corbet's suggestion:
> https://lkml.org/lkml/2019/4/3/1047
> I did some adjustment according to the content and finally they are placed as:
> Documentation/firmware-guide/acpi/

I'd like to queue up this series, but it is missing a patch to create
Documentation/firmware-guide/acpi/index.rst.

Care to provide one?

^ permalink raw reply

* Re: [PATCHv2] kernel/crash: make parse_crashkernel()'s return value more indicant
From: Pingfan Liu @ 2019-04-25  8:20 UTC (permalink / raw)
  To: Matthias Brugger
  Cc: Rich Felker, linux-ia64, Julien Thierry, Yangtao Li,
	Palmer Dabbelt, Heiko Carstens, Stefan Agner, linux-mips,
	Paul Mackerras, H. Peter Anvin, Thomas Gleixner, Logan Gunthorpe,
	linux-s390, Florian Fainelli, Yoshinori Sato, linux-sh, x86,
	Russell King, Ingo Molnar, Hari Bathini, Catalin Marinas,
	James Hogan, Dave Young, Fenghua Yu, Will Deacon, Johannes Weiner,
	Borislav Petkov, David Hildenbrand, linux-arm-kernel, Jens Axboe,
	Tony Luck, Baoquan He, Ard Biesheuvel, Robin Murphy, LKML,
	Ralf Baechle, Thomas Bogendoerfer, Paul Burton,
	Greg Kroah-Hartman, Martin Schwidefsky, Andrew Morton,
	linuxppc-dev, Greg Hackmann
In-Reply-To: <10dc5468-6cd9-85c7-ba66-1dfa5aa922b7@suse.com>

On Wed, Apr 24, 2019 at 4:31 PM Matthias Brugger <mbrugger@suse.com> wrote:
>
>
[...]
> > @@ -139,6 +141,8 @@ static int __init parse_crashkernel_simple(char *cmdline,
> >               pr_warn("crashkernel: unrecognized char: %c\n", *cur);
> >               return -EINVAL;
> >       }
> > +     if (*crash_size == 0)
> > +             return -EINVAL;
>
> This covers the case where I pass an argument like "crashkernel=0M" ?
> Can't we fix that by using kstrtoull() in memparse and check if the return value
> is < 0? In that case we could return without updating the retptr and we will be
> fine.
>
It seems that kstrtoull() treats 0M as invalid parameter, while
simple_strtoull() does not.

If changed like your suggestion, then all the callers of memparse()
will treats 0M as invalid parameter. This affects many components
besides kexec.  Not sure this can be done or not.

Regards,
Pingfan

> >
> >       return 0;
> >  }
> > @@ -181,6 +185,8 @@ static int __init parse_crashkernel_suffix(char *cmdline,
> >               pr_warn("crashkernel: unrecognized char: %c\n", *cur);
> >               return -EINVAL;
> >       }
> > +     if (*crash_size == 0)
> > +             return -EINVAL;
>
> Same here.
>
> >
> >       return 0;
> >  }
> > @@ -266,6 +272,8 @@ static int __init __parse_crashkernel(char *cmdline,
> >  /*
> >   * That function is the entry point for command line parsing and should be
> >   * called from the arch-specific code.
> > + * On success 0. On error for either syntax error or crash_size=0, -EINVAL is
> > + * returned.
> >   */
> >  int __init parse_crashkernel(char *cmdline,
> >                            unsigned long long system_ram,
> >

^ permalink raw reply

* Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
From: Jan Kara @ 2019-04-25  7:31 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-nvdimm, Aneesh Kumar K.V, Matthew Wilcox,
	Linux MM, Chandan Rajendra, stable, Andrew Morton, linuxppc-dev
In-Reply-To: <CAPcyv4gLGUa69svQnwjvruALZ0ChqUJZHQJ1Mt_Cjr1Jh_6vbQ@mail.gmail.com>

On Wed 24-04-19 11:13:48, Dan Williams wrote:
> On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:
> > > I think unaligned addresses have always been passed to
> > > vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
> > > the only change needed is the following, thoughts?
> > >
> > > diff --git a/fs/dax.c b/fs/dax.c
> > > index ca0671d55aa6..82aee9a87efa 100644
> > > --- a/fs/dax.c
> > > +++ b/fs/dax.c
> > > @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> > > vm_fault *vmf, pfn_t *pfnp,
> > >                 }
> > >
> > >                 trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
> > > -               result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
> > > +               result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
> > >                                             write);
> >
> > We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
> > that need to change too?
> 
> It wasn't clear to me that it was a problem. I think that one already
> happens to be pmd-aligned.

Why would it need to be? The address is taken from vmf->address and that's
set up in __handle_mm_fault() like .address = address & PAGE_MASK. So I
don't see anything forcing PMD alignment of the virtual address...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: powerpc hugepage leak caused by 576ed913 "block: use bio_add_page in bio_iov_iter_get_pages"
From: Christoph Hellwig @ 2019-04-25  6:19 UTC (permalink / raw)
  To: David Gibson
  Cc: Jens Axboe, linux-kernel, Nick Piggin, Michael Ellerman,
	Paul Mackerras, linuxppc-dev, Christoph Hellwig
In-Reply-To: <20190423054131.GB31496@umbus.fritz.box>

Just curious:  What exact trees do you see this with?  This area
changed a lot with the multipage bvec support, and subsequent fixes.

So I'd be really curious if it can be reproduced with Jens' latest
block for-5.2 tree (which should be in latest linux-next).

^ permalink raw reply

* Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
From: Dan Williams @ 2019-04-25  4:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Jan Kara, linux-nvdimm, Matthew Wilcox, Linux MM,
	Chandan Rajendra, stable, Andrew Morton, linuxppc-dev
In-Reply-To: <444ca26b-ec38-ae4b-512b-7e915c575098@linux.ibm.com>

On Wed, Apr 24, 2019 at 6:37 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 4/24/19 11:43 PM, Dan Williams wrote:
> > On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:
> >>> I think unaligned addresses have always been passed to
> >>> vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
> >>> the only change needed is the following, thoughts?
> >>>
> >>> diff --git a/fs/dax.c b/fs/dax.c
> >>> index ca0671d55aa6..82aee9a87efa 100644
> >>> --- a/fs/dax.c
> >>> +++ b/fs/dax.c
> >>> @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> >>> vm_fault *vmf, pfn_t *pfnp,
> >>>                  }
> >>>
> >>>                  trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
> >>> -               result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
> >>> +               result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
> >>>                                              write);
> >>
> >> We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
> >> that need to change too?
> >
> > It wasn't clear to me that it was a problem. I think that one already
> > happens to be pmd-aligned.
> >
>
> How about vmf_insert_pfn_pud()?

That is currently not used by fsdax, only devdax, but it does seem
that it passes the unaligned fault address rather than the pud aligned
address. I'll add that to the proposed fix.

^ permalink raw reply

* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
From: Daniel Jordan @ 2019-04-25  1:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexey Kardashevskiy, linux-kernel@vger.kernel.org, Daniel Jordan,
	linux-mm@kvack.org, Paul Mackerras, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, Christoph Lameter
In-Reply-To: <20190424111018.GA16077@mellanox.com>

On Wed, Apr 24, 2019 at 11:10:24AM +0000, Jason Gunthorpe wrote:
> On Tue, Apr 23, 2019 at 07:15:44PM -0700, Davidlohr Bueso wrote:
> > Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
> > validating the new value and the cmpxchg() and we'd bogusly fail even when there
> > is still just because the value changed (I'm assuming we don't hold any locks,
> > otherwise all this is pointless).

That's true, I hadn't considered that we could retry even when there's enough
locked_vm.  Seems like another one is that RLIMIT_MEMLOCK could change after
it's read.  I guess nothing's going to be perfect.  :/

> Well it needs a loop..
> 
> again:
>    current_locked = atomic_read(&mm->locked_vm);
>    new_locked = current_locked + npages;
>    if (new_locked < lock_limit)
>       if (cmpxchg(&mm->locked_vm, current_locked, new_locked) != current_locked)
>             goto again;
> 
> So it won't have bogus failures as there is no unwind after
> error. Basically this is a load locked/store conditional style of
> locking pattern.

This is basically what I have so far.

> > > That's a good idea, and especially worth doing considering that an arbitrary
> > > number of threads that charge a low amount of locked_vm can fail just because
> > > one thread charges lots of it.
> > 
> > Yeah but the window for this is quite small, I doubt it would be a real issue.
>
> > What if before doing the atomic_add_return(), we first did the racy new_locked
> > check for ENOMEM, then do the speculative add and cleanup, if necessary. This
> > would further reduce the scope of the window where false ENOMEM can occur.

So the upside of this is that there's no retry loop so tasks don't spin under
heavy contention?  Seems better to always guard against false ENOMEM, at least
from the locked_vm side if not from the rlimit changing.

> > > pinned_vm appears to be broken the same way, so I can fix it too unless someone
> > > beats me to it.
> > 
> > This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless.
> 
> I think we accepted this tiny race as a side effect of removing the
> lock, which was very beneficial. Really the time window between the
> atomic failing and unwind is very small, and there are enough other
> ways a hostile user could DOS locked_vm that I don't think it really
> matters in practice..
> 
> However, the cmpxchg seems better, so a helper to implement that would
> probably be the best thing to do.

I've collapsed all the locked_vm users into such a helper and am now working on
converting the pinned_vm users to the same helper.  Taking longer than I
thought.

^ permalink raw reply

* Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
From: Aneesh Kumar K.V @ 2019-04-25  1:37 UTC (permalink / raw)
  To: Dan Williams, Matthew Wilcox
  Cc: Jan Kara, linux-nvdimm, stable, Linux MM, Chandan Rajendra,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CAPcyv4gLGUa69svQnwjvruALZ0ChqUJZHQJ1Mt_Cjr1Jh_6vbQ@mail.gmail.com>

On 4/24/19 11:43 PM, Dan Williams wrote:
> On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:
>>> I think unaligned addresses have always been passed to
>>> vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
>>> the only change needed is the following, thoughts?
>>>
>>> diff --git a/fs/dax.c b/fs/dax.c
>>> index ca0671d55aa6..82aee9a87efa 100644
>>> --- a/fs/dax.c
>>> +++ b/fs/dax.c
>>> @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
>>> vm_fault *vmf, pfn_t *pfnp,
>>>                  }
>>>
>>>                  trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
>>> -               result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
>>> +               result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
>>>                                              write);
>>
>> We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
>> that need to change too?
> 
> It wasn't clear to me that it was a problem. I think that one already
> happens to be pmd-aligned.
> 

How about vmf_insert_pfn_pud()?

-aneesh


^ permalink raw reply

* Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
From: Michael S. Tsirkin @ 2019-04-25  1:18 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Mike Anderson, Michael Roth, Jean-Philippe Brucker, Jason Wang,
	Alexey Kardashevskiy, Ram Pai, linux-kernel, virtualization,
	iommu, linuxppc-dev, Christoph Hellwig, David Gibson
In-Reply-To: <875zr228zf.fsf@morokweng.localdomain>

On Wed, Apr 24, 2019 at 10:01:56PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin <mst@redhat.com> writes:
> 
> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin <mst@redhat.com> writes:
> >>
> >> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >> Michael S. Tsirkin <mst@redhat.com> writes:
> >> >>
> >> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
> >> >> >> only ever try to access memory addresses that are supplied to it by the
> >> >> >> guest, so all of the secure guest memory that the host cares about is
> >> >> >> accessible:
> >> >> >>
> >> >> >>     If this feature bit is set to 0, then the device has same access to
> >> >> >>     memory addresses supplied to it as the driver has. In particular,
> >> >> >>     the device will always use physical addresses matching addresses
> >> >> >>     used by the driver (typically meaning physical addresses used by the
> >> >> >>     CPU) and not translated further, and can access any address supplied
> >> >> >>     to it by the driver. When clear, this overrides any
> >> >> >>     platform-specific description of whether device access is limited or
> >> >> >>     translated in any way, e.g. whether an IOMMU may be present.
> >> >> >>
> >> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> >> guests or not.
> >> >> >>
> >> >> >> Or are you saying that a virtio device may want to access memory
> >> >> >> addresses that weren't supplied to it by the driver?
> >> >> >
> >> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> >> > specific encrypted memory regions that driver has access to but device
> >> >> > does not. that seems to violate the constraint.
> >> >>
> >> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> >> the device can ignore the IOMMU for all practical purposes I would
> >> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >> >>
> >> >> I guess I'm still struggling with the purpose of signalling to the
> >> >> driver that the host may not have access to memory addresses that it
> >> >> will never try to access.
> >> >
> >> > For example, one of the benefits is to signal to host that driver does
> >> > not expect ability to access all memory. If it does, host can
> >> > fail initialization gracefully.
> >>
> >> But why would the ability to access all memory be necessary or even
> >> useful? When would the host access memory that the driver didn't tell it
> >> to access?
> >
> > When I say all memory I mean even memory not allowed by the IOMMU.
> 
> Yes, but why? How is that memory relevant?

It's relevant when driver is not trusted to only supply correct
addresses. The feature was originally designed to support userspace
drivers within guests.

> >> >> >> >> > But the name "sev_active" makes me scared because at least AMD guys who
> >> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >> >> >>
> >> >> >> >> My understanding is, AMD guest-platform knows in advance that their
> >> >> >> >> guest will run in secure mode and hence sets the flag at the time of VM
> >> >> >> >> instantiation. Unfortunately we dont have that luxury on our platforms.
> >> >> >> >
> >> >> >> > Well you do have that luxury. It looks like that there are existing
> >> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
> >> >> >> > with how that path is slow. So you are trying to optimize for
> >> >> >> > them by clearing ACCESS_PLATFORM and then you have lost ability
> >> >> >> > to invoke DMA API.
> >> >> >> >
> >> >> >> > For example if there was another flag just like ACCESS_PLATFORM
> >> >> >> > just not yet used by anyone, you would be all fine using that right?
> >> >> >>
> >> >> >> Yes, a new flag sounds like a great idea. What about the definition
> >> >> >> below?
> >> >> >>
> >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
> >> >> >>     VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
> >> >> >>     exception that the IOMMU is explicitly defined to be off or bypassed
> >> >> >>     when accessing memory addresses supplied to the device by the
> >> >> >>     driver. This flag should be set by the guest if offered, but to
> >> >> >>     allow for backward-compatibility device implementations allow for it
> >> >> >>     to be left unset by the guest. It is an error to set both this flag
> >> >> >>     and VIRTIO_F_ACCESS_PLATFORM.
> >> >> >
> >> >> > It looks kind of narrow but it's an option.
> >> >>
> >> >> Great!
> >> >>
> >> >> > I wonder how we'll define what's an iommu though.
> >> >>
> >> >> Hm, it didn't occur to me it could be an issue. I'll try.
> >>
> >> I rephrased it in terms of address translation. What do you think of
> >> this version? The flag name is slightly different too:
> >>
> >>
> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >>     meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> >>     with the exception that address translation is guaranteed to be
> >>     unnecessary when accessing memory addresses supplied to the device
> >>     by the driver. Which is to say, the device will always use physical
> >>     addresses matching addresses used by the driver (typically meaning
> >>     physical addresses used by the CPU) and not translated further. This
> >>     flag should be set by the guest if offered, but to allow for
> >>     backward-compatibility device implementations allow for it to be
> >>     left unset by the guest. It is an error to set both this flag and
> >>     VIRTIO_F_ACCESS_PLATFORM.
> >
> > Thanks, I'll think about this approach. Will respond next week.
> 
> Thanks!
> 
> >> >> > Another idea is maybe something like virtio-iommu?
> >> >>
> >> >> You mean, have legacy guests use virtio-iommu to request an IOMMU
> >> >> bypass? If so, it's an interesting idea for new guests but it doesn't
> >> >> help with guests that are out today in the field, which don't have A
> >> >> virtio-iommu driver.
> >> >
> >> > I presume legacy guests don't use encrypted memory so why do we
> >> > worry about them at all?
> >>
> >> They don't use encrypted memory, but a host machine will run a mix of
> >> secure and legacy guests. And since the hypervisor doesn't know whether
> >> a guest will be secure or not at the time it is launched, legacy guests
> >> will have to be launched with the same configuration as secure guests.
> >
> > OK and so I think the issue is that hosts generally fail if they set
> > ACCESS_PLATFORM and guests do not negotiate it.
> > So you can not just set ACCESS_PLATFORM for everyone.
> > Is that the issue here?
> 
> Yes, that is one half of the issue. The other is that even if hosts
> didn't fail, existing legacy guests wouldn't "take the initiative" of
> not negotiating ACCESS_PLATFORM to get the improved performance. They'd
> have to be modified to do that.

So there's a non-encrypted guest, hypervisor wants to set
ACCESS_PLATFORM to allow encrypted guests but that will slow down legacy
guests since their vIOMMU emulation is very slow.

So enabling support for encryption slows down non-encrypted guests. Not
great but not the end of the world, considering even older guests that
don't support ACCESS_PLATFORM are completely broken and you do not seem
to be too worried by that.

For future non-encrypted guests, bypassing the emulated IOMMU for when
that emulated IOMMU is very slow might be solvable in some other way,
e.g. with virtio-iommu. Which reminds me, could you look at
virtio-iommu as a solution for some of the issues?
Review of that patchset from that POV would be appreciated.

> --
> Thiago Jung Bauermann
> IBM Linux Technology Center

^ permalink raw reply

* Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
From: Thiago Jung Bauermann @ 2019-04-25  1:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Mike Anderson, Michael Roth, Jean-Philippe Brucker, Jason Wang,
	Alexey Kardashevskiy, Ram Pai, linux-kernel, virtualization,
	iommu, linuxppc-dev, Christoph Hellwig, David Gibson
In-Reply-To: <20190419190258-mutt-send-email-mst@kernel.org>


Michael S. Tsirkin <mst@redhat.com> writes:

> On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
>>
>> Michael S. Tsirkin <mst@redhat.com> writes:
>>
>> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
>> >>
>> >> Michael S. Tsirkin <mst@redhat.com> writes:
>> >>
>> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
>> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
>> >> >> only ever try to access memory addresses that are supplied to it by the
>> >> >> guest, so all of the secure guest memory that the host cares about is
>> >> >> accessible:
>> >> >>
>> >> >>     If this feature bit is set to 0, then the device has same access to
>> >> >>     memory addresses supplied to it as the driver has. In particular,
>> >> >>     the device will always use physical addresses matching addresses
>> >> >>     used by the driver (typically meaning physical addresses used by the
>> >> >>     CPU) and not translated further, and can access any address supplied
>> >> >>     to it by the driver. When clear, this overrides any
>> >> >>     platform-specific description of whether device access is limited or
>> >> >>     translated in any way, e.g. whether an IOMMU may be present.
>> >> >>
>> >> >> All of the above is true for POWER guests, whether they are secure
>> >> >> guests or not.
>> >> >>
>> >> >> Or are you saying that a virtio device may want to access memory
>> >> >> addresses that weren't supplied to it by the driver?
>> >> >
>> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
>> >> > specific encrypted memory regions that driver has access to but device
>> >> > does not. that seems to violate the constraint.
>> >>
>> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
>> >> the device can ignore the IOMMU for all practical purposes I would
>> >> indeed say that the logic would apply to IOMMUs as well. :-)
>> >>
>> >> I guess I'm still struggling with the purpose of signalling to the
>> >> driver that the host may not have access to memory addresses that it
>> >> will never try to access.
>> >
>> > For example, one of the benefits is to signal to host that driver does
>> > not expect ability to access all memory. If it does, host can
>> > fail initialization gracefully.
>>
>> But why would the ability to access all memory be necessary or even
>> useful? When would the host access memory that the driver didn't tell it
>> to access?
>
> When I say all memory I mean even memory not allowed by the IOMMU.

Yes, but why? How is that memory relevant?

>> >> >> >> > But the name "sev_active" makes me scared because at least AMD guys who
>> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
>> >> >> >>
>> >> >> >> My understanding is, AMD guest-platform knows in advance that their
>> >> >> >> guest will run in secure mode and hence sets the flag at the time of VM
>> >> >> >> instantiation. Unfortunately we dont have that luxury on our platforms.
>> >> >> >
>> >> >> > Well you do have that luxury. It looks like that there are existing
>> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
>> >> >> > with how that path is slow. So you are trying to optimize for
>> >> >> > them by clearing ACCESS_PLATFORM and then you have lost ability
>> >> >> > to invoke DMA API.
>> >> >> >
>> >> >> > For example if there was another flag just like ACCESS_PLATFORM
>> >> >> > just not yet used by anyone, you would be all fine using that right?
>> >> >>
>> >> >> Yes, a new flag sounds like a great idea. What about the definition
>> >> >> below?
>> >> >>
>> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
>> >> >>     VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
>> >> >>     exception that the IOMMU is explicitly defined to be off or bypassed
>> >> >>     when accessing memory addresses supplied to the device by the
>> >> >>     driver. This flag should be set by the guest if offered, but to
>> >> >>     allow for backward-compatibility device implementations allow for it
>> >> >>     to be left unset by the guest. It is an error to set both this flag
>> >> >>     and VIRTIO_F_ACCESS_PLATFORM.
>> >> >
>> >> > It looks kind of narrow but it's an option.
>> >>
>> >> Great!
>> >>
>> >> > I wonder how we'll define what's an iommu though.
>> >>
>> >> Hm, it didn't occur to me it could be an issue. I'll try.
>>
>> I rephrased it in terms of address translation. What do you think of
>> this version? The flag name is slightly different too:
>>
>>
>> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
>>     meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
>>     with the exception that address translation is guaranteed to be
>>     unnecessary when accessing memory addresses supplied to the device
>>     by the driver. Which is to say, the device will always use physical
>>     addresses matching addresses used by the driver (typically meaning
>>     physical addresses used by the CPU) and not translated further. This
>>     flag should be set by the guest if offered, but to allow for
>>     backward-compatibility device implementations allow for it to be
>>     left unset by the guest. It is an error to set both this flag and
>>     VIRTIO_F_ACCESS_PLATFORM.
>
> Thanks, I'll think about this approach. Will respond next week.

Thanks!

>> >> > Another idea is maybe something like virtio-iommu?
>> >>
>> >> You mean, have legacy guests use virtio-iommu to request an IOMMU
>> >> bypass? If so, it's an interesting idea for new guests but it doesn't
>> >> help with guests that are out today in the field, which don't have A
>> >> virtio-iommu driver.
>> >
>> > I presume legacy guests don't use encrypted memory so why do we
>> > worry about them at all?
>>
>> They don't use encrypted memory, but a host machine will run a mix of
>> secure and legacy guests. And since the hypervisor doesn't know whether
>> a guest will be secure or not at the time it is launched, legacy guests
>> will have to be launched with the same configuration as secure guests.
>
> OK and so I think the issue is that hosts generally fail if they set
> ACCESS_PLATFORM and guests do not negotiate it.
> So you can not just set ACCESS_PLATFORM for everyone.
> Is that the issue here?

Yes, that is one half of the issue. The other is that even if hosts
didn't fail, existing legacy guests wouldn't "take the initiative" of
not negotiating ACCESS_PLATFORM to get the improved performance. They'd
have to be modified to do that.

--
Thiago Jung Bauermann
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH v5 20/23] Documentation: ACPI: move cppc_sysfs.txt to admin-guide/acpi and convert to reST
From: Changbin Du @ 2019-04-25  0:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: fenghua.yu, Jonathan Corbet, linux-gpio, linux-doc, rjw,
	linux-kernel, linux-acpi, Bjorn Helgaas, linuxppc-dev,
	Changbin Du
In-Reply-To: <20190424151236.776d9678@coco.lan>

On Wed, Apr 24, 2019 at 03:12:36PM -0300, Mauro Carvalho Chehab wrote:
> Em Thu, 25 Apr 2019 01:53:03 +0800
> Changbin Du <changbin.du@gmail.com> escreveu:
> 
> > This converts the plain text documentation to reStructuredText format and
> > add it to Sphinx TOC tree. No essential content change.
> > 
> > Signed-off-by: Changbin Du <changbin.du@gmail.com>
> > ---
> >  .../acpi/cppc_sysfs.rst}                      | 71 ++++++++++---------
> >  Documentation/admin-guide/acpi/index.rst      |  1 +
> >  2 files changed, 40 insertions(+), 32 deletions(-)
> >  rename Documentation/{acpi/cppc_sysfs.txt => admin-guide/acpi/cppc_sysfs.rst} (51%)
> > 
> > diff --git a/Documentation/acpi/cppc_sysfs.txt b/Documentation/admin-guide/acpi/cppc_sysfs.rst
> > similarity index 51%
> > rename from Documentation/acpi/cppc_sysfs.txt
> > rename to Documentation/admin-guide/acpi/cppc_sysfs.rst
> > index f20fb445135d..a4b99afbe331 100644
> > --- a/Documentation/acpi/cppc_sysfs.txt
> > +++ b/Documentation/admin-guide/acpi/cppc_sysfs.rst
> > @@ -1,5 +1,11 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> >  
> > -	Collaborative Processor Performance Control (CPPC)
> > +==================================================
> > +Collaborative Processor Performance Control (CPPC)
> > +==================================================
> > +
> > +CPPC
> > +====
> >  
> >  CPPC defined in the ACPI spec describes a mechanism for the OS to manage the
> >  performance of a logical processor on a contigious and abstract performance
> > @@ -10,31 +16,28 @@ For more details on CPPC please refer to the ACPI specification at:
> >  
> >  http://uefi.org/specifications
> >  
> > -Some of the CPPC registers are exposed via sysfs under:
> > -
> > -/sys/devices/system/cpu/cpuX/acpi_cppc/
> 
> 
> > -
> > -for each cpu X
> 
> Yeah, there it is: you're removing those two lines.
>
They are indented, see below diffs. I did remove "-----...." two lines.
> > +Some of the CPPC registers are exposed via sysfs under::
> >  
> > ---------------------------------------------------------------------------------
> > +  /sys/devices/system/cpu/cpuX/acpi_cppc/
> 
> As per my reply to v4.
> 
> Thanks,
> Mauro

-- 
Cheers,
Changbin Du

^ permalink raw reply

* Re: [PATCH v5 23/23] Documentation: ACPI: move video_extension.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-24 18:26 UTC (permalink / raw)
  To: Changbin Du, linux-acpi
  Cc: fenghua.yu, linux-doc, Jonathan Corbet, rjw, linux-kernel,
	linux-gpio, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <20190424175306.25880-24-changbin.du@gmail.com>

Em Thu, 25 Apr 2019 01:53:06 +0800
Changbin Du <changbin.du@gmail.com> escreveu:

> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
> 
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
> ---
>  Documentation/firmware-guide/acpi/index.rst   |  1 +
>  .../acpi/video_extension.rst}                 | 83 +++++++++++--------
>  2 files changed, 50 insertions(+), 34 deletions(-)
>  rename Documentation/{acpi/video_extension.txt => firmware-guide/acpi/video_extension.rst} (70%)
> 
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index 0e60f4b7129a..ae609eec4679 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -23,3 +23,4 @@ ACPI Support
>     i2c-muxes
>     acpi-lid
>     lpit
> +   video_extension
> diff --git a/Documentation/acpi/video_extension.txt b/Documentation/firmware-guide/acpi/video_extension.rst
> similarity index 70%
> rename from Documentation/acpi/video_extension.txt
> rename to Documentation/firmware-guide/acpi/video_extension.rst
> index 79bf6a4921be..099b8607e07b 100644
> --- a/Documentation/acpi/video_extension.txt
> +++ b/Documentation/firmware-guide/acpi/video_extension.rst
> @@ -1,5 +1,8 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
>  ACPI video extensions
> -~~~~~~~~~~~~~~~~~~~~~
> +=====================
>  
>  This driver implement the ACPI Extensions For Display Adapters for
>  integrated graphics devices on motherboard, as specified in ACPI 2.0
> @@ -8,9 +11,10 @@ defining the video POST device, retrieving EDID information or to
>  setup a video output, etc.  Note that this is an ref. implementation
>  only.  It may or may not work for your integrated video device.
>  
> -The ACPI video driver does 3 things regarding backlight control:
> +The ACPI video driver does 3 things regarding backlight control.

Hmm... didn't notice this change before... The way it is, this
sentence sounds incomplete, specially since you removed the
numbering from the paragraphs. Perhaps you could, instead, be
explicit about what the video driver does, e. g.:

	The ACPI video driver exports the backlight control via a
	sysfs interface, notifies userspace with events and
	changes the backlight level via ACPI firmware, as detailed
	at the following chapters:

@ACPI maintainers: 

Please check if the above properly summarizes the activity done 
with regards to backlight control.

If you agree with that, feel free to add:

Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

>  
> -1 Export a sysfs interface for user space to control backlight level
> +Export a sysfs interface for user space to control backlight level
> +==================================================================
>  
>  If the ACPI table has a video device, and acpi_backlight=vendor kernel
>  command line is not present, the driver will register a backlight device
> @@ -22,36 +26,41 @@ The backlight sysfs interface has a standard definition here:
>  Documentation/ABI/stable/sysfs-class-backlight.
>  
>  And what ACPI video driver does is:
> -actual_brightness: on read, control method _BQC will be evaluated to
> -get the brightness level the firmware thinks it is at;
> -bl_power: not implemented, will set the current brightness instead;
> -brightness: on write, control method _BCM will run to set the requested
> -brightness level;
> -max_brightness: Derived from the _BCL package(see below);
> -type: firmware
> +
> +actual_brightness:
> +  on read, control method _BQC will be evaluated to
> +  get the brightness level the firmware thinks it is at;
> +bl_power:
> +  not implemented, will set the current brightness instead;
> +brightness:
> +  on write, control method _BCM will run to set the requested brightness level;
> +max_brightness:
> +  Derived from the _BCL package(see below);
> +type:
> +  firmware
>  
>  Note that ACPI video backlight driver will always use index for
>  brightness, actual_brightness and max_brightness. So if we have
> -the following _BCL package:
> +the following _BCL package::
>  
> -Method (_BCL, 0, NotSerialized)
> -{
> -	Return (Package (0x0C)
> +	Method (_BCL, 0, NotSerialized)
>  	{
> -		0x64,
> -		0x32,
> -		0x0A,
> -		0x14,
> -		0x1E,
> -		0x28,
> -		0x32,
> -		0x3C,
> -		0x46,
> -		0x50,
> -		0x5A,
> -		0x64
> -	})
> -}
> +		Return (Package (0x0C)
> +		{
> +			0x64,
> +			0x32,
> +			0x0A,
> +			0x14,
> +			0x1E,
> +			0x28,
> +			0x32,
> +			0x3C,
> +			0x46,
> +			0x50,
> +			0x5A,
> +			0x64
> +		})
> +	}
>  
>  The first two levels are for when laptop are on AC or on battery and are
>  not used by Linux currently. The remaining 10 levels are supported levels
> @@ -62,13 +71,15 @@ as a "brightness level" indicator. Thus from the user space perspective
>  the range of available brightness levels is from 0 to 9 (max_brightness)
>  inclusive.
>  
> -2 Notify user space about hotkey event
> +Notify user space about hotkey event
> +====================================
>  
>  There are generally two cases for hotkey event reporting:
> +
>  i) For some laptops, when user presses the hotkey, a scancode will be
>     generated and sent to user space through the input device created by
>     the keyboard driver as a key type input event, with proper remap, the
> -   following key code will appear to user space:
> +   following key code will appear to user space::
>  
>  	EV_KEY, KEY_BRIGHTNESSUP
>  	EV_KEY, KEY_BRIGHTNESSDOWN
> @@ -84,23 +95,27 @@ ii) For some laptops, the press of the hotkey will not generate the
>      notify value it received and send the event to user space through the
>      input device it created:
>  
> +	=====		==================
>  	event		keycode
> +	=====		==================
>  	0x86		KEY_BRIGHTNESSUP
>  	0x87		KEY_BRIGHTNESSDOWN
>  	etc.
> +	=====		==================
>  
>  so this would lead to the same effect as case i) now.
>  
>  Once user space tool receives this event, it can modify the backlight
>  level through the sysfs interface.
>  
> -3 Change backlight level in the kernel
> +Change backlight level in the kernel
> +====================================
>  
>  This works for machines covered by case ii) in Section 2. Once the driver
>  received a notification, it will set the backlight level accordingly. This does
>  not affect the sending of event to user space, they are always sent to user
>  space regardless of whether or not the video module controls the backlight level
>  directly. This behaviour can be controlled through the brightness_switch_enabled
> -module parameter as documented in admin-guide/kernel-parameters.rst. It is recommended to
> -disable this behaviour once a GUI environment starts up and wants to have full
> -control of the backlight level.
> +module parameter as documented in admin-guide/kernel-parameters.rst. It is
> +recommended to disable this behaviour once a GUI environment starts up and
> +wants to have full control of the backlight level.



Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH v2 5/5] arm64/speculation: Support 'mitigations=' cmdline option
From: Thomas Gleixner @ 2019-04-24 18:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Heiko Carstens, Paul Mackerras, H . Peter Anvin,
	Ingo Molnar, Andrea Arcangeli, linux-s390, x86, Steven Price,
	Linus Torvalds, Catalin Marinas, Waiman Long, linux-arch,
	Jon Masters, Jiri Kosina, Borislav Petkov, Andy Lutomirski,
	Josh Poimboeuf, linux-arm-kernel, Phil Auld, Greg Kroah-Hartman,
	Randy Dunlap, linux-kernel, Tyler Hicks, Martin Schwidefsky,
	linuxppc-dev
In-Reply-To: <20190424141631.GH14829@fuggles.cambridge.arm.com>

On Wed, 24 Apr 2019, Will Deacon wrote:

> Hi Thomas,
> 
> On Tue, Apr 16, 2019 at 09:26:13PM +0200, Thomas Gleixner wrote:
> > On Fri, 12 Apr 2019, Josh Poimboeuf wrote:
> > 
> > > Configure arm64 runtime CPU speculation bug mitigations in accordance
> > > with the 'mitigations=' cmdline option.  This affects Meltdown, Spectre
> > > v2, and Speculative Store Bypass.
> > > 
> > > The default behavior is unchanged.
> > > 
> > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > ---
> > > NOTE: This is based on top of Jeremy Linton's patches:
> > >       https://lkml.kernel.org/r/20190410231237.52506-1-jeremy.linton@arm.com
> > 
> > So I keep that out and we have to revisit that once the ARM64 stuff hits a
> > tree, right? I can have a branch with just the 4 first patches applied
> > which ARM64 folks can pull in when they apply Jeremy's patches before te
> > merge window.
> 
> I'm assuming that this refers to the core/speculation branch in tip:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=core/speculation
> 
> but please can you confirm that I'm good to pull that into arm64?

Yes. It's all yours :)

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
From: Dan Williams @ 2019-04-24 18:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, linux-nvdimm, Aneesh Kumar K.V, stable, Linux MM,
	Chandan Rajendra, Andrew Morton, linuxppc-dev
In-Reply-To: <20190424173833.GE19031@bombadil.infradead.org>

On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:
> > I think unaligned addresses have always been passed to
> > vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
> > the only change needed is the following, thoughts?
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index ca0671d55aa6..82aee9a87efa 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> > vm_fault *vmf, pfn_t *pfnp,
> >                 }
> >
> >                 trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
> > -               result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
> > +               result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
> >                                             write);
>
> We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
> that need to change too?

It wasn't clear to me that it was a problem. I think that one already
happens to be pmd-aligned.

^ permalink raw reply

* Re: [PATCH v5 20/23] Documentation: ACPI: move cppc_sysfs.txt to admin-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-24 18:12 UTC (permalink / raw)
  To: Changbin Du
  Cc: fenghua.yu, linux-doc, linux-gpio, Jonathan Corbet, rjw,
	linux-kernel, linux-acpi, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <20190424175306.25880-21-changbin.du@gmail.com>

Em Thu, 25 Apr 2019 01:53:03 +0800
Changbin Du <changbin.du@gmail.com> escreveu:

> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
> 
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
> ---
>  .../acpi/cppc_sysfs.rst}                      | 71 ++++++++++---------
>  Documentation/admin-guide/acpi/index.rst      |  1 +
>  2 files changed, 40 insertions(+), 32 deletions(-)
>  rename Documentation/{acpi/cppc_sysfs.txt => admin-guide/acpi/cppc_sysfs.rst} (51%)
> 
> diff --git a/Documentation/acpi/cppc_sysfs.txt b/Documentation/admin-guide/acpi/cppc_sysfs.rst
> similarity index 51%
> rename from Documentation/acpi/cppc_sysfs.txt
> rename to Documentation/admin-guide/acpi/cppc_sysfs.rst
> index f20fb445135d..a4b99afbe331 100644
> --- a/Documentation/acpi/cppc_sysfs.txt
> +++ b/Documentation/admin-guide/acpi/cppc_sysfs.rst
> @@ -1,5 +1,11 @@
> +.. SPDX-License-Identifier: GPL-2.0
>  
> -	Collaborative Processor Performance Control (CPPC)
> +==================================================
> +Collaborative Processor Performance Control (CPPC)
> +==================================================
> +
> +CPPC
> +====
>  
>  CPPC defined in the ACPI spec describes a mechanism for the OS to manage the
>  performance of a logical processor on a contigious and abstract performance
> @@ -10,31 +16,28 @@ For more details on CPPC please refer to the ACPI specification at:
>  
>  http://uefi.org/specifications
>  
> -Some of the CPPC registers are exposed via sysfs under:
> -
> -/sys/devices/system/cpu/cpuX/acpi_cppc/


> -
> -for each cpu X

Yeah, there it is: you're removing those two lines.

> +Some of the CPPC registers are exposed via sysfs under::
>  
> ---------------------------------------------------------------------------------
> +  /sys/devices/system/cpu/cpuX/acpi_cppc/

As per my reply to v4.

Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH v5 14/23] Documentation: ACPI: move dsd/data-node-references.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-24 18:10 UTC (permalink / raw)
  To: Changbin Du
  Cc: fenghua.yu, linux-doc, linux-gpio, Jonathan Corbet, rjw,
	linux-kernel, linux-acpi, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <20190424175306.25880-15-changbin.du@gmail.com>

Em Thu, 25 Apr 2019 01:52:57 +0800
Changbin Du <changbin.du@gmail.com> escreveu:

> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
> 
> Signed-off-by: Changbin Du <changbin.du@gmail.com>

Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

> ---
>  .../acpi/dsd/data-node-references.rst}        | 36 ++++++++++---------
>  Documentation/firmware-guide/acpi/index.rst   |  1 +
>  2 files changed, 21 insertions(+), 16 deletions(-)
>  rename Documentation/{acpi/dsd/data-node-references.txt => firmware-guide/acpi/dsd/data-node-references.rst} (71%)
> 
> diff --git a/Documentation/acpi/dsd/data-node-references.txt b/Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> similarity index 71%
> rename from Documentation/acpi/dsd/data-node-references.txt
> rename to Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> index c3871565c8cf..1351984e767c 100644
> --- a/Documentation/acpi/dsd/data-node-references.txt
> +++ b/Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> @@ -1,9 +1,12 @@
> -Copyright (C) 2018 Intel Corporation
> -Author: Sakari Ailus <sakari.ailus@linux.intel.com>
> -
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
>  
> +===================================
>  Referencing hierarchical data nodes
> ------------------------------------
> +===================================
> +
> +:Copyright: |copy| 2018 Intel Corporation
> +:Author: Sakari Ailus <sakari.ailus@linux.intel.com>
>  
>  ACPI in general allows referring to device objects in the tree only.
>  Hierarchical data extension nodes may not be referred to directly, hence this
> @@ -28,13 +31,14 @@ extension key.
>  
>  
>  Example
> --------
> +=======
>  
> -	In the ASL snippet below, the "reference" _DSD property [2] contains a
> -	device object reference to DEV0 and under that device object, a
> -	hierarchical data extension key "node@1" referring to the NOD1 object
> -	and lastly, a hierarchical data extension key "anothernode" referring to
> -	the ANOD object which is also the final target node of the reference.
> +In the ASL snippet below, the "reference" _DSD property [2] contains a
> +device object reference to DEV0 and under that device object, a
> +hierarchical data extension key "node@1" referring to the NOD1 object
> +and lastly, a hierarchical data extension key "anothernode" referring to
> +the ANOD object which is also the final target node of the reference.
> +::
>  
>  	Device (DEV0)
>  	{
> @@ -75,15 +79,15 @@ Example
>  	    })
>  	}
>  
> -Please also see a graph example in graph.txt .
> +Please also see a graph example in :doc:`graph`.
>  
>  References
> -----------
> +==========
>  
>  [1] Hierarchical Data Extension UUID For _DSD.
> -    <URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
> -    referenced 2018-07-17.
> +<http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
> +referenced 2018-07-17.
>  
>  [2] Device Properties UUID For _DSD.
> -    <URL:http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf>,
> -    referenced 2016-10-04.
> +<http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf>,
> +referenced 2016-10-04.
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index f81cfbcb6878..6d4e0df4f063 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -9,6 +9,7 @@ ACPI Support
>  
>     namespace
>     dsd/graph
> +   dsd/data-node-references
>     enumeration
>     osi
>     method-customizing



Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH v5 08/23] Documentation: ACPI: move method-customizing.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-24 18:08 UTC (permalink / raw)
  To: Changbin Du
  Cc: fenghua.yu, linux-doc, linux-gpio, Jonathan Corbet, rjw,
	linux-kernel, linux-acpi, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <20190424175306.25880-9-changbin.du@gmail.com>

Em Thu, 25 Apr 2019 01:52:51 +0800
Changbin Du <changbin.du@gmail.com> escreveu:

> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
> 
> Signed-off-by: Changbin Du <changbin.du@gmail.com>

Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

> ---
>  Documentation/acpi/method-customizing.txt     | 73 ---------------
>  Documentation/firmware-guide/acpi/index.rst   |  3 +-
>  .../acpi/method-customizing.rst               | 89 +++++++++++++++++++
>  3 files changed, 91 insertions(+), 74 deletions(-)
>  delete mode 100644 Documentation/acpi/method-customizing.txt
>  create mode 100644 Documentation/firmware-guide/acpi/method-customizing.rst
> 
> diff --git a/Documentation/acpi/method-customizing.txt b/Documentation/acpi/method-customizing.txt
> deleted file mode 100644
> index 7235da975f23..000000000000
> --- a/Documentation/acpi/method-customizing.txt
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -Linux ACPI Custom Control Method How To
> -=======================================
> -
> -Written by Zhang Rui <rui.zhang@intel.com>
> -
> -
> -Linux supports customizing ACPI control methods at runtime.
> -
> -Users can use this to
> -1. override an existing method which may not work correctly,
> -   or just for debugging purposes.
> -2. insert a completely new method in order to create a missing
> -   method such as _OFF, _ON, _STA, _INI, etc.
> -For these cases, it is far simpler to dynamically install a single
> -control method rather than override the entire DSDT, because kernel
> -rebuild/reboot is not needed and test result can be got in minutes.
> -
> -Note: Only ACPI METHOD can be overridden, any other object types like
> -      "Device", "OperationRegion", are not recognized. Methods
> -      declared inside scope operators are also not supported.
> -Note: The same ACPI control method can be overridden for many times,
> -      and it's always the latest one that used by Linux/kernel.
> -Note: To get the ACPI debug object output (Store (AAAA, Debug)),
> -      please run "echo 1 > /sys/module/acpi/parameters/aml_debug_output".
> -
> -1. override an existing method
> -   a) get the ACPI table via ACPI sysfs I/F. e.g. to get the DSDT,
> -      just run "cat /sys/firmware/acpi/tables/DSDT > /tmp/dsdt.dat"
> -   b) disassemble the table by running "iasl -d dsdt.dat".
> -   c) rewrite the ASL code of the method and save it in a new file,
> -   d) package the new file (psr.asl) to an ACPI table format.
> -      Here is an example of a customized \_SB._AC._PSR method,
> -
> -      DefinitionBlock ("", "SSDT", 1, "", "", 0x20080715)
> -      {
> -	Method (\_SB_.AC._PSR, 0, NotSerialized)
> -	{
> -		Store ("In AC _PSR", Debug)
> -		Return (ACON)
> -	}
> -      }
> -      Note that the full pathname of the method in ACPI namespace
> -      should be used.
> -   e) assemble the file to generate the AML code of the method.
> -      e.g. "iasl -vw 6084 psr.asl" (psr.aml is generated as a result)
> -      If parameter "-vw 6084" is not supported by your iASL compiler,
> -      please try a newer version.
> -   f) mount debugfs by "mount -t debugfs none /sys/kernel/debug"
> -   g) override the old method via the debugfs by running
> -      "cat /tmp/psr.aml > /sys/kernel/debug/acpi/custom_method"
> -
> -2. insert a new method
> -   This is easier than overriding an existing method.
> -   We just need to create the ASL code of the method we want to
> -   insert and then follow the step c) ~ g) in section 1.
> -
> -3. undo your changes
> -   The "undo" operation is not supported for a new inserted method
> -   right now, i.e. we can not remove a method currently.
> -   For an overridden method, in order to undo your changes, please
> -   save a copy of the method original ASL code in step c) section 1,
> -   and redo step c) ~ g) to override the method with the original one.
> -
> -
> -Note: We can use a kernel with multiple custom ACPI method running,
> -      But each individual write to debugfs can implement a SINGLE
> -      method override. i.e. if we want to insert/override multiple
> -      ACPI methods, we need to redo step c) ~ g) for multiple times.
> -
> -Note: Be aware that root can mis-use this driver to modify arbitrary
> -      memory and gain additional rights, if root's privileges got
> -      restricted (for example if root is not allowed to load additional
> -      modules after boot).
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index 61d67763851b..d1d069b26bbc 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -10,5 +10,6 @@ ACPI Support
>     namespace
>     enumeration
>     osi
> +   method-customizing
>     DSD-properties-rules
> -   gpio-properties
> +   gpio-properties
> \ No newline at end of file
> diff --git a/Documentation/firmware-guide/acpi/method-customizing.rst b/Documentation/firmware-guide/acpi/method-customizing.rst
> new file mode 100644
> index 000000000000..de3ebcaed4cf
> --- /dev/null
> +++ b/Documentation/firmware-guide/acpi/method-customizing.rst
> @@ -0,0 +1,89 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=======================================
> +Linux ACPI Custom Control Method How To
> +=======================================
> +
> +:Author: Zhang Rui <rui.zhang@intel.com>
> +
> +
> +Linux supports customizing ACPI control methods at runtime.
> +
> +Users can use this to:
> +
> +1. override an existing method which may not work correctly,
> +   or just for debugging purposes.
> +2. insert a completely new method in order to create a missing
> +   method such as _OFF, _ON, _STA, _INI, etc.
> +
> +For these cases, it is far simpler to dynamically install a single
> +control method rather than override the entire DSDT, because kernel
> +rebuild/reboot is not needed and test result can be got in minutes.
> +
> +.. note::
> +
> +  - Only ACPI METHOD can be overridden, any other object types like
> +    "Device", "OperationRegion", are not recognized. Methods
> +    declared inside scope operators are also not supported.
> +
> +  - The same ACPI control method can be overridden for many times,
> +    and it's always the latest one that used by Linux/kernel.
> +
> +  - To get the ACPI debug object output (Store (AAAA, Debug)),
> +    please run::
> +
> +      echo 1 > /sys/module/acpi/parameters/aml_debug_output
> +
> +
> +1. override an existing method
> +==============================
> +a) get the ACPI table via ACPI sysfs I/F. e.g. to get the DSDT,
> +   just run "cat /sys/firmware/acpi/tables/DSDT > /tmp/dsdt.dat"
> +b) disassemble the table by running "iasl -d dsdt.dat".
> +c) rewrite the ASL code of the method and save it in a new file,
> +d) package the new file (psr.asl) to an ACPI table format.
> +   Here is an example of a customized \_SB._AC._PSR method::
> +
> +      DefinitionBlock ("", "SSDT", 1, "", "", 0x20080715)
> +      {
> +         Method (\_SB_.AC._PSR, 0, NotSerialized)
> +         {
> +            Store ("In AC _PSR", Debug)
> +            Return (ACON)
> +         }
> +      }
> +
> +   Note that the full pathname of the method in ACPI namespace
> +   should be used.
> +e) assemble the file to generate the AML code of the method.
> +   e.g. "iasl -vw 6084 psr.asl" (psr.aml is generated as a result)
> +   If parameter "-vw 6084" is not supported by your iASL compiler,
> +   please try a newer version.
> +f) mount debugfs by "mount -t debugfs none /sys/kernel/debug"
> +g) override the old method via the debugfs by running
> +   "cat /tmp/psr.aml > /sys/kernel/debug/acpi/custom_method"
> +
> +2. insert a new method
> +======================
> +This is easier than overriding an existing method.
> +We just need to create the ASL code of the method we want to
> +insert and then follow the step c) ~ g) in section 1.
> +
> +3. undo your changes
> +====================
> +The "undo" operation is not supported for a new inserted method
> +right now, i.e. we can not remove a method currently.
> +For an overridden method, in order to undo your changes, please
> +save a copy of the method original ASL code in step c) section 1,
> +and redo step c) ~ g) to override the method with the original one.
> +
> +
> +.. note:: We can use a kernel with multiple custom ACPI method running,
> +   But each individual write to debugfs can implement a SINGLE
> +   method override. i.e. if we want to insert/override multiple
> +   ACPI methods, we need to redo step c) ~ g) for multiple times.
> +
> +.. note:: Be aware that root can mis-use this driver to modify arbitrary
> +   memory and gain additional rights, if root's privileges got
> +   restricted (for example if root is not allowed to load additional
> +   modules after boot).



Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH v4 21/63] Documentation: ACPI: move cppc_sysfs.txt to admin-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-24 18:04 UTC (permalink / raw)
  To: Changbin Du
  Cc: fenghua.yu, x86, linux-doc, linux-pci, linux-gpio,
	Jonathan Corbet, rjw, linux-kernel, linux-acpi, mingo,
	Bjorn Helgaas, tglx, linuxppc-dev
In-Reply-To: <20190424172232.qtzogm6sweaa4gva@mail.google.com>

Em Thu, 25 Apr 2019 01:22:34 +0800
Changbin Du <changbin.du@gmail.com> escreveu:

> On Wed, Apr 24, 2019 at 11:48:44AM -0300, Mauro Carvalho Chehab wrote:
> > Em Wed, 24 Apr 2019 00:28:50 +0800
> > Changbin Du <changbin.du@gmail.com> escreveu:
> >   
> > > This converts the plain text documentation to reStructuredText format and
> > > add it to Sphinx TOC tree. No essential content change.
> > > 
> > > Signed-off-by: Changbin Du <changbin.du@gmail.com>
> > > ---
> > >  .../acpi/cppc_sysfs.rst}                      | 71 ++++++++++---------
> > >  Documentation/admin-guide/acpi/index.rst      |  1 +
> > >  2 files changed, 40 insertions(+), 32 deletions(-)
> > >  rename Documentation/{acpi/cppc_sysfs.txt => admin-guide/acpi/cppc_sysfs.rst} (51%)
> > > 
> > > diff --git a/Documentation/acpi/cppc_sysfs.txt b/Documentation/admin-guide/acpi/cppc_sysfs.rst
> > > similarity index 51%
> > > rename from Documentation/acpi/cppc_sysfs.txt
> > > rename to Documentation/admin-guide/acpi/cppc_sysfs.rst
> > > index f20fb445135d..a4b99afbe331 100644
> > > --- a/Documentation/acpi/cppc_sysfs.txt
> > > +++ b/Documentation/admin-guide/acpi/cppc_sysfs.rst
> > > @@ -1,5 +1,11 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > >  
> > > -	Collaborative Processor Performance Control (CPPC)
> > > +==================================================
> > > +Collaborative Processor Performance Control (CPPC)
> > > +==================================================
> > > +
> > > +CPPC
> > > +====
> > >  
> > >  CPPC defined in the ACPI spec describes a mechanism for the OS to manage the
> > >  performance of a logical processor on a contigious and abstract performance
> > > @@ -10,31 +16,28 @@ For more details on CPPC please refer to the ACPI specification at:
> > >  
> > >  http://uefi.org/specifications
> > >  
> > > -Some of the CPPC registers are exposed via sysfs under:
> > > -
> > > -/sys/devices/system/cpu/cpuX/acpi_cppc/
> > > -  
> > 
> >   
> > > -for each cpu X  
> > 
> > Hmm... removed by mistake?
> >  
> I comfirmed that no content removed.

At this patch, it looks that you removed the line: "for each cpu X"
(or am I reading it wrong?)

> 
> > > +Some of the CPPC registers are exposed via sysfs under::
> > >  
> > > ---------------------------------------------------------------------------------
> > > +  /sys/devices/system/cpu/cpuX/acpi_cppc/  
> > 
> > Did you parse this with Sphinx? It doesn't sound a valid ReST construction
> > to my eyes, as:
> > 
> > 1) I've seen some versions of Sphinx to abort with severe errors when 
> >    there's no blank line after the horizontal bar markup;
> > 
> > 2) It will very likely ignore the "::" (I didn't test it myself), as you're
> >    not indenting the horizontal bar. End of indentation will mean the end
> >    of an (empty) literal block.
> > 
> > So, I would stick with:
> > 
> > 
> > 	Some of the CPPC registers are exposed via sysfs under:
> > 
> > 	  /sys/devices/system/cpu/cpuX/acpi_cppc/
> > 
> > 	---------------------------------------------------------------------------------
> > 
> > 	for each cpu X::
> > 
> > 
> > or:
> > 
> > 	Some of the CPPC registers are exposed via sysfs under:
> > 
> > 		/sys/devices/system/cpu/cpuX/acpi_cppc/
> > 
> > 	for each cpu X
> > 
> > 	--------------------------------------------------------------------------------
> > 
> > 	::
> > 
> > (with is closer to the original author's intent)
> > 
> > Same applies to the other similar changes on this document.
> >  
> I didn't seen any warning here and the generated html is good. So I think it is
> ok.

Basically, what you're doing is:

<rst>

::

foo
   literal-block bar

</rst>

(where "foo" is the horizontal bar markup)

I would avoid such pattern for two reasons:

1) it sounds a violation of ReST syntax to format an in
   indented paragraph some non-blank lines after a non-indented
   line. As such, I won't doubt that different versions of Sphinx
   would handle it differently. I'm even tempted to open a BZ
   to Sphinx in order for them to provide a fix for that, if the
   latest version of Sphinx accepts such crazy markup.

2) It is very confusing for any human reading it.

Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH v12 00/31] Speculative page faults
From: Laurent Dufour @ 2019-04-24 18:01 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Jan Kara, sergey.senozhatsky.work, Peter Zijlstra, Will Deacon,
	Michal Hocko, linux-mm, Paul Mackerras, Punit Agrawal,
	H. Peter Anvin, Mike Rapoport, Alexei Starovoitov,
	Andrea Arcangeli, Andi Kleen, Minchan Kim, aneesh.kumar, x86,
	Matthew Wilcox, Daniel Jordan, Ingo Molnar, David Rientjes,
	Paul E. McKenney, Haiyan Song, Nick Piggin, sj38.park,
	Jerome Glisse, dave, kemi.wang, Kirill A. Shutemov,
	Thomas Gleixner, zhong jiang, Ganesh Mahendran, Yang Shi,
	linuxppc-dev, LKML, Sergey Senozhatsky, vinayak menon,
	Andrew Morton, Tim Chen, haren
In-Reply-To: <CANN689F1h9XoHPzr_FQY2WfN5bb2TTd6M3HLqoJ-DQuHkNbA7g@mail.gmail.com>

Le 22/04/2019 à 23:29, Michel Lespinasse a écrit :
> Hi Laurent,
> 
> Thanks a lot for copying me on this patchset. It took me a few days to
> go through it - I had not been following the previous iterations of
> this series so I had to catch up. I will be sending comments for
> individual commits, but before tat I would like to discuss the series
> as a whole.

Hi Michel,

Thanks for reviewing this series.

> I think these changes are a big step in the right direction. My main
> reservation about them is that they are additive - adding some complexity
> for speculative page faults - and I wonder if it'd be possible, over the
> long term, to replace the existing complexity we have in mmap_sem retry
> mechanisms instead of adding to it. This is not something that should
> block your progress, but I think it would be good, as we introduce spf,
> to evaluate whether we could eventually get all the way to removing the
> mmap_sem retry mechanism, or if we will actually have to keep both.

Until we get rid of the mmap_sem which seems to be a very long story, I 
can't see how we could get rid of the retry mechanism.

> The proposed spf mechanism only handles anon vmas. Is there a
> fundamental reason why it couldn't handle mapped files too ?
> My understanding is that the mechanism of verifying the vma after
> taking back the ptl at the end of the fault would work there too ?
> The file has to stay referenced during the fault, but holding the vma's
> refcount could be made to cover that ? the vm_file refcount would have
> to be released in __free_vma() instead of remove_vma; I'm not quite sure
> if that has more implications than I realize ?

The only concern is the flow of operation  done in the vm_ops->fault() 
processing. Most of the file system relie on the generic filemap_fault() 
which should be safe to use. But we need a clever way to identify fault 
processing which are compatible with the SPF handler. This could be done 
using a tag/flag in the vm_ops structure or in the vma's flags.

This would be the next step.


> The proposed spf mechanism only works at the pte level after the page
> tables have already been created. The non-spf page fault path takes the
> mm->page_table_lock to protect against concurrent page table allocation
> by multiple page faults; I think unmapping/freeing page tables could
> be done under mm->page_table_lock too so that spf could implement
> allocating new page tables by verifying the vma after taking the
> mm->page_table_lock ?

I've to admit that I didn't dig further here.
Do you have a patch? ;)

> 
> The proposed spf mechanism depends on ARCH_HAS_PTE_SPECIAL.
> I am not sure what is the issue there - is this due to the vma->vm_start
> and vma->vm_pgoff reads in *__vm_normal_page() ?

Yes that's the reason, no way to guarantee the value of these fields in 
the SPF path.

> 
> My last potential concern is about performance. The numbers you have
> look great, but I worry about potential regressions in PF performance
> for threaded processes that don't currently encounter contention
> (i.e. there may be just one thread actually doing all the work while
> the others are blocked). I think one good proxy for measuring that
> would be to measure a single threaded workload - kernbench would be
> fine - without the special-case optimization in patch 22 where
> handle_speculative_fault() immediately aborts in the single-threaded case.

I'll have to give it a try.

> Reviewed-by: Michel Lespinasse <walken@google.com>
> This is for the series as a whole; I expect to do another review pass on
> individual commits in the series when we have agreement on the toplevel
> stuff (I noticed a few things like out-of-date commit messages but that's
> really minor stuff).

Thanks a lot for reviewing this long series.

> 
> I want to add a note about mmap_sem. In the past there has been
> discussions about replacing it with an interval lock, but these never
> went anywhere because, mostly, of the fact that such mechanisms were
> too expensive to use in the page fault path. I think adding the spf
> mechanism would invite us to revisit this issue - interval locks may
> be a great way to avoid blocking between unrelated mmap_sem writers
> (for example, do not delay stack creation for new threads while a
> large mmap or munmap may be going on), and probably also to handle
> mmap_sem readers that can't easily use the spf mechanism (for example,
> gup callers which make use of the returned vmas). But again that is a
> separate topic to explore which doesn't have to get resolved before
> spf goes in.
> 


^ permalink raw reply

* [PATCH v5 23/23] Documentation: ACPI: move video_extension.txt to firmware-guide/acpi and convert to reST
From: Changbin Du @ 2019-04-24 17:53 UTC (permalink / raw)
  To: rjw, Jonathan Corbet
  Cc: fenghua.yu, mchehab+samsung, linux-doc, linux-gpio, linux-kernel,
	linux-acpi, Bjorn Helgaas, linuxppc-dev, Changbin Du
In-Reply-To: <20190424175306.25880-1-changbin.du@gmail.com>

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 Documentation/firmware-guide/acpi/index.rst   |  1 +
 .../acpi/video_extension.rst}                 | 83 +++++++++++--------
 2 files changed, 50 insertions(+), 34 deletions(-)
 rename Documentation/{acpi/video_extension.txt => firmware-guide/acpi/video_extension.rst} (70%)

diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
index 0e60f4b7129a..ae609eec4679 100644
--- a/Documentation/firmware-guide/acpi/index.rst
+++ b/Documentation/firmware-guide/acpi/index.rst
@@ -23,3 +23,4 @@ ACPI Support
    i2c-muxes
    acpi-lid
    lpit
+   video_extension
diff --git a/Documentation/acpi/video_extension.txt b/Documentation/firmware-guide/acpi/video_extension.rst
similarity index 70%
rename from Documentation/acpi/video_extension.txt
rename to Documentation/firmware-guide/acpi/video_extension.rst
index 79bf6a4921be..099b8607e07b 100644
--- a/Documentation/acpi/video_extension.txt
+++ b/Documentation/firmware-guide/acpi/video_extension.rst
@@ -1,5 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
 ACPI video extensions
-~~~~~~~~~~~~~~~~~~~~~
+=====================
 
 This driver implement the ACPI Extensions For Display Adapters for
 integrated graphics devices on motherboard, as specified in ACPI 2.0
@@ -8,9 +11,10 @@ defining the video POST device, retrieving EDID information or to
 setup a video output, etc.  Note that this is an ref. implementation
 only.  It may or may not work for your integrated video device.
 
-The ACPI video driver does 3 things regarding backlight control:
+The ACPI video driver does 3 things regarding backlight control.
 
-1 Export a sysfs interface for user space to control backlight level
+Export a sysfs interface for user space to control backlight level
+==================================================================
 
 If the ACPI table has a video device, and acpi_backlight=vendor kernel
 command line is not present, the driver will register a backlight device
@@ -22,36 +26,41 @@ The backlight sysfs interface has a standard definition here:
 Documentation/ABI/stable/sysfs-class-backlight.
 
 And what ACPI video driver does is:
-actual_brightness: on read, control method _BQC will be evaluated to
-get the brightness level the firmware thinks it is at;
-bl_power: not implemented, will set the current brightness instead;
-brightness: on write, control method _BCM will run to set the requested
-brightness level;
-max_brightness: Derived from the _BCL package(see below);
-type: firmware
+
+actual_brightness:
+  on read, control method _BQC will be evaluated to
+  get the brightness level the firmware thinks it is at;
+bl_power:
+  not implemented, will set the current brightness instead;
+brightness:
+  on write, control method _BCM will run to set the requested brightness level;
+max_brightness:
+  Derived from the _BCL package(see below);
+type:
+  firmware
 
 Note that ACPI video backlight driver will always use index for
 brightness, actual_brightness and max_brightness. So if we have
-the following _BCL package:
+the following _BCL package::
 
-Method (_BCL, 0, NotSerialized)
-{
-	Return (Package (0x0C)
+	Method (_BCL, 0, NotSerialized)
 	{
-		0x64,
-		0x32,
-		0x0A,
-		0x14,
-		0x1E,
-		0x28,
-		0x32,
-		0x3C,
-		0x46,
-		0x50,
-		0x5A,
-		0x64
-	})
-}
+		Return (Package (0x0C)
+		{
+			0x64,
+			0x32,
+			0x0A,
+			0x14,
+			0x1E,
+			0x28,
+			0x32,
+			0x3C,
+			0x46,
+			0x50,
+			0x5A,
+			0x64
+		})
+	}
 
 The first two levels are for when laptop are on AC or on battery and are
 not used by Linux currently. The remaining 10 levels are supported levels
@@ -62,13 +71,15 @@ as a "brightness level" indicator. Thus from the user space perspective
 the range of available brightness levels is from 0 to 9 (max_brightness)
 inclusive.
 
-2 Notify user space about hotkey event
+Notify user space about hotkey event
+====================================
 
 There are generally two cases for hotkey event reporting:
+
 i) For some laptops, when user presses the hotkey, a scancode will be
    generated and sent to user space through the input device created by
    the keyboard driver as a key type input event, with proper remap, the
-   following key code will appear to user space:
+   following key code will appear to user space::
 
 	EV_KEY, KEY_BRIGHTNESSUP
 	EV_KEY, KEY_BRIGHTNESSDOWN
@@ -84,23 +95,27 @@ ii) For some laptops, the press of the hotkey will not generate the
     notify value it received and send the event to user space through the
     input device it created:
 
+	=====		==================
 	event		keycode
+	=====		==================
 	0x86		KEY_BRIGHTNESSUP
 	0x87		KEY_BRIGHTNESSDOWN
 	etc.
+	=====		==================
 
 so this would lead to the same effect as case i) now.
 
 Once user space tool receives this event, it can modify the backlight
 level through the sysfs interface.
 
-3 Change backlight level in the kernel
+Change backlight level in the kernel
+====================================
 
 This works for machines covered by case ii) in Section 2. Once the driver
 received a notification, it will set the backlight level accordingly. This does
 not affect the sending of event to user space, they are always sent to user
 space regardless of whether or not the video module controls the backlight level
 directly. This behaviour can be controlled through the brightness_switch_enabled
-module parameter as documented in admin-guide/kernel-parameters.rst. It is recommended to
-disable this behaviour once a GUI environment starts up and wants to have full
-control of the backlight level.
+module parameter as documented in admin-guide/kernel-parameters.rst. It is
+recommended to disable this behaviour once a GUI environment starts up and
+wants to have full control of the backlight level.
-- 
2.20.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox