LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH RFC PKS/PMEM 26/58] fs/zonefs: Utilize new kmap_thread()
From: Damien Le Moal @ 2020-10-12  2:30 UTC (permalink / raw)
  To: ira.weiny@intel.com, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra
  Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org,
	kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	kexec@lists.infradead.org, Dave Hansen,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
	target-devel@vger.kernel.org, linux-mtd@lists.infradead.org,
	linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org,
	ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com,
	Naohiro Aota, linux-cifs@vger.kernel.org,
	linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org,
	x86@kernel.org, amd-gfx@lists.freedesktop.org,
	linux-afs@lists.infradead.org, cluster-devel@redhat.com,
	linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org,
	xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org,
	Fenghua Yu, devel@driverdev.osuosl.org,
	linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org,
	ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-bcache@vger.kernel.org, Dan Williams,
	io-uring@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org,
	linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org
In-Reply-To: <20201009195033.3208459-27-ira.weiny@intel.com>

On 2020/10/10 4:52, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The kmap() calls in this FS are localized to a single thread.  To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
> 
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Naohiro Aota <naohiro.aota@wdc.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  fs/zonefs/super.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 8ec7c8f109d7..2fd6c86beee1 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -1297,7 +1297,7 @@ static int zonefs_read_super(struct super_block *sb)
>  	if (ret)
>  		goto free_page;
>  
> -	super = kmap(page);
> +	super = kmap_thread(page);
>  
>  	ret = -EINVAL;
>  	if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC)
> @@ -1349,7 +1349,7 @@ static int zonefs_read_super(struct super_block *sb)
>  	ret = 0;
>  
>  unmap:
> -	kunmap(page);
> +	kunmap_thread(page);
>  free_page:
>  	__free_page(page);
>  
> 

acked-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Daniel Axtens @ 2020-10-12  1:51 UTC (permalink / raw)
  To: Ravi Bangoria, mpe
  Cc: ravi.bangoria, jniethe5, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201011050908.72173-2-ravi.bangoria@linux.ibm.com>

Hi,

Apologies if this has come up in a previous revision.

>  	case 1:
> +		if (!cpu_has_feature(CPU_FTR_ARCH_31))
> +			return -1;
> +
>  		prefix_r = GET_PREFIX_R(word);
>  		ra = GET_PREFIX_RA(suffix);

The comment above analyse_instr reads in part:

 * Return value is 1 if the instruction can be emulated just by
 * updating *regs with the information in *op, -1 if we need the
 * GPRs but *regs doesn't contain the full register set, or 0
 * otherwise.

I was wondering why returning -1 if the instruction isn't supported the
right thing to do - it seemed to me that it should return 0?

I did look and see that there are other cases where the code returns -1
for an unsupported operation, e.g.:

#ifdef __powerpc64__
	case 4:
		if (!cpu_has_feature(CPU_FTR_ARCH_300))
			return -1;

		switch (word & 0x3f) {
		case 48:	/* maddhd */

That's from commit 930d6288a267 ("powerpc: sstep: Add support for
maddhd, maddhdu, maddld instructions"), but it's not explained there either

I see the same pattern in a number of commits: commit 6324320de609
("powerpc sstep: Add support for modsd, modud instructions"), commit
6c180071509a ("powerpc sstep: Add support for modsw, moduw
instructions"), commit a23987ef267a ("powerpc: sstep: Add support for
darn instruction") and a few others, all of which seem to have come
through Sandipan in February 2019. I haven't spotted any explanation for
why -1 was chosen, but I haven't checked the mailing list archives.

If -1 is OK, would it be possible to update the comment to explain why?

Kind regards,
Daniel

^ permalink raw reply

* Re: [PATCH] powerpc/smp: Use GFP_ATOMIC while allocating tmp mask
From: Michael Ellerman @ 2020-10-12  1:45 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Srikar Dronamraju, Qian Cai, LKML,
	Valentin Schneider, Peter Zijlstra, linuxppc-dev, Ingo Molnar
In-Reply-To: <20201008034240.34059-1-srikar@linux.vnet.ibm.com>

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> Qian Cai reported a regression where CPU Hotplug fails with the latest
> powerpc/next
>
> BUG: sleeping function called from invalid context at mm/slab.h:494
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88
> no locks held by swapper/88/0.
> irq event stamp: 18074448
> hardirqs last  enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110
> hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0
> do_idle at kernel/sched/idle.c:253 (discriminator 1)
> softirqs last  enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0
> softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0
> CPU: 88 PID: 0 Comm: swapper/88 Tainted: G        W         5.9.0-rc8-next-20201007 #1
> Call Trace:
> [c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable)
> [c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310
> [c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190
> [c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0
> slab_alloc_node at mm/slub.c:2817
> (inlined by) __kmalloc_node at mm/slub.c:4013
> [c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80
> kmalloc_node at include/linux/slab.h:577
> (inlined by) alloc_cpumask_var_node at lib/cpumask.c:116
> [c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800
> update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267
> (inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387
> (inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420
> [c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14
>
> Allocating a temporary mask while performing a CPU Hotplug operation
> with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable
> function from a atomic context. Fix this by allocating the temporary
> mask with GFP_ATOMIC flag.
>
> If there is a failure to allocate a mask, scheduler is going to observe
> that this CPU's topology is broken. Instead of having to speculate why
> the topology is broken, add a WARN_ON_ONCE.
>
> Fixes: 70a94089d7f7 ("powerpc/smp: Optimize update_coregroup_mask")
> Fixes: 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2")
> Reported-by: Qian Cai <cai@redhat.com>
> Suggested-by: Qian Cai <cai@redhat.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: LKML <linux-kernel@vger.kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Qian Cai <cai@redhat.com>
> ---
>  arch/powerpc/kernel/smp.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 0dc1b85..1268558 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1264,7 +1264,8 @@ static bool update_mask_by_l2(int cpu)
>  		return false;
>  	}
>  
> -	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));
> +	/* In CPU-hotplug path, hence use GFP_ATOMIC */
> +	WARN_ON_ONCE(!alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu)));

A failed memory allocation is not something that should trigger a WARN,
a pr_warn() maybe.

But ...

>  	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));

If the allocation failed this will oops (mask will be NULL).

cheers

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test
From: Guenter Roeck @ 2020-10-11 20:02 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, akpm, Anshuman Khandual, linuxppc-dev
In-Reply-To: <20200902114222.181353-14-aneesh.kumar@linux.ibm.com>

On Wed, Sep 02, 2020 at 05:12:22PM +0530, Aneesh Kumar K.V wrote:
> pte_clear_tests operate on an existing pte entry. Make sure that
> is not a none pte entry.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

This patch causes all riscv64 images to crash. Reverting it
as well as the follow-up patch fixes the problem, but there are
still several warning messages starting with
	BUG kmem_cache (Not tainted): Freechain corrupt
I did not try to track down this other problem.

A detailed crash log is at
	https://kerneltests.org/builders/qemu-riscv64-next/builds/523/steps/qemubuildcommand/logs/stdio

Bisect log is attached.

Guenter

---
# bad: [d67bc7812221606e1886620a357b13f906814af7] Add linux-next specific files for 20201009
# good: [549738f15da0e5a00275977623be199fbbf7df50] Linux 5.9-rc8
git bisect start 'HEAD' 'v5.9-rc8'
# good: [b71be15b496cc71a3434a198fc1a1b9e08af6c57] Merge remote-tracking branch 'bpf-next/master' into master
git bisect good b71be15b496cc71a3434a198fc1a1b9e08af6c57
# good: [3542e5a87341bdca83e3d7f061b0f4e0f4c23f73] Merge remote-tracking branch 'spi/for-next' into master
git bisect good 3542e5a87341bdca83e3d7f061b0f4e0f4c23f73
# good: [65f9c957115c749ba79fb469083caf14101a93bb] Merge remote-tracking branch 'char-misc/char-misc-next' into master
git bisect good 65f9c957115c749ba79fb469083caf14101a93bb
# good: [aadfe5ecb55641d5994a5f3b27f074beead8f49b] Merge remote-tracking branch 'scsi-mkp/for-next' into master
git bisect good aadfe5ecb55641d5994a5f3b27f074beead8f49b
# good: [060196553d119d5c252f09ed5b0316929cc25983] Merge remote-tracking branch 'memblock/for-next' into master
git bisect good 060196553d119d5c252f09ed5b0316929cc25983
# bad: [d2340d89ffa6d2643f0689600dbf0969d86fdd3c] x86/setup: simplify initrd relocation and reservation
git bisect bad d2340d89ffa6d2643f0689600dbf0969d86fdd3c
# bad: [4b23734f5ba1a0487b2b569a9e64e4e5360009c1] mm/swapfile.c: remove unnecessary goto out in _swap_info_get()
git bisect bad 4b23734f5ba1a0487b2b569a9e64e4e5360009c1
# good: [b30f5277e97be65a99ca5af3d4337cb9acbf8fa7] device-dax: introduce 'struct dev_dax' typed-driver operations
git bisect good b30f5277e97be65a99ca5af3d4337cb9acbf8fa7
# good: [8c2075296dbbbce685fa14bbf46d29fba54f8a62] mm/debug_vm_pgtable: drop hugetlb_advanced_tests()
git bisect good 8c2075296dbbbce685fa14bbf46d29fba54f8a62
# bad: [52ce97889b3c85826995d22a690cd1430c14f316] mm/filemap: fix filemap_map_pages for THP
git bisect bad 52ce97889b3c85826995d22a690cd1430c14f316
# bad: [1ddc0b707be99faece6cdd77f34544f408c6617d] proc: optimise smaps for shmem entries
git bisect bad 1ddc0b707be99faece6cdd77f34544f408c6617d
# bad: [5d685be3785638202430b6baa2ecde482b52c41e] mm: factor find_get_incore_page out of mincore_page
git bisect bad 5d685be3785638202430b6baa2ecde482b52c41e
# bad: [0797f84d689b9f1e7256d954280b28bfeaf5b1fc] mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.
git bisect bad 0797f84d689b9f1e7256d954280b28bfeaf5b1fc
# bad: [4f9a78e6bcd60a25b187adc1526ab3815fc40dae] mm/debug_vm_pgtable: avoid none pte in pte_clear_test
git bisect bad 4f9a78e6bcd60a25b187adc1526ab3815fc40dae
# first bad commit: [4f9a78e6bcd60a25b187adc1526ab3815fc40dae] mm/debug_vm_pgtable: avoid none pte in pte_clear_test

^ permalink raw reply

* Re: [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Sandipan Das @ 2020-10-11 15:06 UTC (permalink / raw)
  To: Ravi Bangoria, bala24; +Cc: naveen.n.rao, linuxppc-dev, paulus, jniethe5
In-Reply-To: <20201011050908.72173-2-ravi.bangoria@linux.ibm.com>


On 11/10/20 10:39 am, Ravi Bangoria wrote:
> From: Balamuruhan S <bala24@linux.ibm.com>
> 
> Unconditional emulation of prefixed instructions will allow
> emulation of them on Power10 predecessors which might cause
> issues. Restrict that.
> 
> Fixes: 3920742b92f5 ("powerpc sstep: Add support for prefixed fixed-point arithmetic")
> Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
> Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---
>  arch/powerpc/lib/sstep.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index e9dcaba9a4f8..e6242744c71b 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
>  	switch (opcode) {
>  #ifdef __powerpc64__
>  	case 1:
> +		if (!cpu_has_feature(CPU_FTR_ARCH_31))
> +			return -1;
> +
>  		prefix_r = GET_PREFIX_R(word);
>  		ra = GET_PREFIX_RA(suffix);
>  		rd = (suffix >> 21) & 0x1f;
> @@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
>  		}
>  		break;
>  	case 1: /* Prefixed instructions */
> +		if (!cpu_has_feature(CPU_FTR_ARCH_31))
> +			return -1;
> +
>  		prefix_r = GET_PREFIX_R(word);
>  		ra = GET_PREFIX_RA(suffix);
>  		op->update_reg = ra;
> 

LGTM

Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>

^ permalink raw reply

* [PATCH v5 5/5] powerpc/sstep: Add testcases for VSX vector paired load/store instructions
From: Ravi Bangoria @ 2020-10-11  5:09 UTC (permalink / raw)
  To: mpe
  Cc: ravi.bangoria, jniethe5, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201011050908.72173-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp           : PASS
  emulate_step_test: stxvp          : PASS
  emulate_step_test: lxvpx          : PASS
  emulate_step_test: stxvpx         : PASS
  emulate_step_test: plxvp          : PASS
  emulate_step_test: pstxvp         : PASS

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/lib/test_emulate_step.c | 270 +++++++++++++++++++++++++++
 1 file changed, 270 insertions(+)

diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 0a201b771477..783d1b85ecfe 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void)
 }
 #endif /* CONFIG_VSX */
 
+#ifdef CONFIG_VSX
+static void __init test_lxvp_stxvp(void)
+{
+	struct pt_regs regs;
+	union {
+		vector128 a;
+		u32 b[4];
+	} c[2];
+	u32 cached_b[8];
+	int stepped = -1;
+
+	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+		show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		return;
+	}
+
+	init_pt_regs(&regs);
+
+	/*** lxvp ***/
+
+	cached_b[0] = c[0].b[0] = 18233;
+	cached_b[1] = c[0].b[1] = 34863571;
+	cached_b[2] = c[0].b[2] = 834;
+	cached_b[3] = c[0].b[3] = 6138911;
+	cached_b[4] = c[1].b[0] = 1234;
+	cached_b[5] = c[1].b[1] = 5678;
+	cached_b[6] = c[1].b[2] = 91011;
+	cached_b[7] = c[1].b[3] = 121314;
+
+	regs.gpr[4] = (unsigned long)&c[0].a;
+
+	/*
+	 * lxvp XTp,DQ(RA)
+	 * XTp = 32xTX + 2xTp
+	 * let TX=1 Tp=1 RA=4 DQ=0
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_LXVP(34, 4, 0)));
+
+	if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("lxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("lxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("lxvp", "FAIL");
+	}
+
+	/*** stxvp ***/
+
+	c[0].b[0] = 21379463;
+	c[0].b[1] = 87;
+	c[0].b[2] = 374234;
+	c[0].b[3] = 4;
+	c[1].b[0] = 90;
+	c[1].b[1] = 122;
+	c[1].b[2] = 555;
+	c[1].b[3] = 32144;
+
+	/*
+	 * stxvp XSp,DQ(RA)
+	 * XSp = 32xSX + 2xSp
+	 * let SX=1 Sp=1 RA=4 DQ=0
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_STXVP(34, 4, 0)));
+
+	if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] &&
+	    cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+	    cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+	    cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+	    cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("stxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("stxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("stxvp", "FAIL");
+	}
+}
+#else
+static void __init test_lxvp_stxvp(void)
+{
+	show_result("lxvp", "SKIP (CONFIG_VSX is not set)");
+	show_result("stxvp", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
+#ifdef CONFIG_VSX
+static void __init test_lxvpx_stxvpx(void)
+{
+	struct pt_regs regs;
+	union {
+		vector128 a;
+		u32 b[4];
+	} c[2];
+	u32 cached_b[8];
+	int stepped = -1;
+
+	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+		show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+		show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+		return;
+	}
+
+	init_pt_regs(&regs);
+
+	/*** lxvpx ***/
+
+	cached_b[0] = c[0].b[0] = 18233;
+	cached_b[1] = c[0].b[1] = 34863571;
+	cached_b[2] = c[0].b[2] = 834;
+	cached_b[3] = c[0].b[3] = 6138911;
+	cached_b[4] = c[1].b[0] = 1234;
+	cached_b[5] = c[1].b[1] = 5678;
+	cached_b[6] = c[1].b[2] = 91011;
+	cached_b[7] = c[1].b[3] = 121314;
+
+	regs.gpr[3] = (unsigned long)&c[0].a;
+	regs.gpr[4] = 0;
+
+	/*
+	 * lxvpx XTp,RA,RB
+	 * XTp = 32xTX + 2xTp
+	 * let TX=1 Tp=1 RA=3 RB=4
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_LXVPX(34, 3, 4)));
+
+	if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("lxvpx", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("lxvpx", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("lxvpx", "FAIL");
+	}
+
+	/*** stxvpx ***/
+
+	c[0].b[0] = 21379463;
+	c[0].b[1] = 87;
+	c[0].b[2] = 374234;
+	c[0].b[3] = 4;
+	c[1].b[0] = 90;
+	c[1].b[1] = 122;
+	c[1].b[2] = 555;
+	c[1].b[3] = 32144;
+
+	/*
+	 * stxvpx XSp,RA,RB
+	 * XSp = 32xSX + 2xSp
+	 * let SX=1 Sp=1 RA=3 RB=4
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_STXVPX(34, 3, 4)));
+
+	if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] &&
+	    cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+	    cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+	    cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+	    cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("stxvpx", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("stxvpx", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("stxvpx", "FAIL");
+	}
+}
+#else
+static void __init test_lxvpx_stxvpx(void)
+{
+	show_result("lxvpx", "SKIP (CONFIG_VSX is not set)");
+	show_result("stxvpx", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
+#ifdef CONFIG_VSX
+static void __init test_plxvp_pstxvp(void)
+{
+	struct ppc_inst instr;
+	struct pt_regs regs;
+	union {
+		vector128 a;
+		u32 b[4];
+	} c[2];
+	u32 cached_b[8];
+	int stepped = -1;
+
+	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+		show_result("plxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		show_result("pstxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		return;
+	}
+
+	/*** plxvp ***/
+
+	cached_b[0] = c[0].b[0] = 18233;
+	cached_b[1] = c[0].b[1] = 34863571;
+	cached_b[2] = c[0].b[2] = 834;
+	cached_b[3] = c[0].b[3] = 6138911;
+	cached_b[4] = c[1].b[0] = 1234;
+	cached_b[5] = c[1].b[1] = 5678;
+	cached_b[6] = c[1].b[2] = 91011;
+	cached_b[7] = c[1].b[3] = 121314;
+
+	init_pt_regs(&regs);
+	regs.gpr[3] = (unsigned long)&c[0].a;
+
+	/*
+	 * plxvp XTp,D(RA),R
+	 * XTp = 32xTX + 2xTp
+	 * let RA=3 R=0 D=d0||d1=0 R=0 Tp=1 TX=1
+	 */
+	instr = ppc_inst_prefix(PPC_RAW_PLXVP(34, 0, 3, 0) >> 32,
+			PPC_RAW_PLXVP(34, 0, 3, 0) & 0xffffffff);
+
+	stepped = emulate_step(&regs, instr);
+	if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("plxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("plxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("plxvp", "FAIL");
+	}
+
+	/*** pstxvp ***/
+
+	c[0].b[0] = 21379463;
+	c[0].b[1] = 87;
+	c[0].b[2] = 374234;
+	c[0].b[3] = 4;
+	c[1].b[0] = 90;
+	c[1].b[1] = 122;
+	c[1].b[2] = 555;
+	c[1].b[3] = 32144;
+
+	/*
+	 * pstxvp XSp,D(RA),R
+	 * XSp = 32xSX + 2xSp
+	 * let RA=3 D=d0||d1=0 R=0 Sp=1 SX=1
+	 */
+	instr = ppc_inst_prefix(PPC_RAW_PSTXVP(34, 0, 3, 0) >> 32,
+			PPC_RAW_PSTXVP(34, 0, 3, 0) & 0xffffffff);
+
+	stepped = emulate_step(&regs, instr);
+
+	if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] &&
+	    cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+	    cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+	    cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+	    cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("pstxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("pstxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("pstxvp", "FAIL");
+	}
+}
+#else
+static void __init test_plxvp_pstxvp(void)
+{
+	show_result("plxvp", "SKIP (CONFIG_VSX is not set)");
+	show_result("pstxvp", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
 static void __init run_tests_load_store(void)
 {
 	test_ld();
@@ -628,6 +895,9 @@ static void __init run_tests_load_store(void)
 	test_plfd_pstfd();
 	test_lvx_stvx();
 	test_lxvd2x_stxvd2x();
+	test_lxvp_stxvp();
+	test_lxvpx_stxvpx();
+	test_plxvp_pstxvp();
 }
 
 struct compute_test {
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 4/5] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
From: Ravi Bangoria @ 2020-10-11  5:09 UTC (permalink / raw)
  To: mpe
  Cc: ravi.bangoria, jniethe5, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201011050908.72173-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index a6e3700c4566..5e7918ca4fb7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -78,6 +78,9 @@
 
 #define IMM_L(i)               ((uintptr_t)(i) & 0xffff)
 #define IMM_DS(i)              ((uintptr_t)(i) & 0xfffc)
+#define IMM_DQ(i)              ((uintptr_t)(i) & 0xfff0)
+#define IMM_D0(i)              (((uintptr_t)(i) >> 16) & 0x3ffff)
+#define IMM_D1(i)              IMM_L(i)
 
 /*
  * 16-bit immediate helper macros: HA() is for use with sign-extending instrs
@@ -295,6 +298,8 @@
 #define __PPC_XB(b)	((((b) & 0x1f) << 11) | (((b) & 0x20) >> 4))
 #define __PPC_XS(s)	((((s) & 0x1f) << 21) | (((s) & 0x20) >> 5))
 #define __PPC_XT(s)	__PPC_XS(s)
+#define __PPC_XSP(s)	((((s) & 0x1e) | (((s) >> 5) & 0x1)) << 21)
+#define __PPC_XTP(s)	__PPC_XSP(s)
 #define __PPC_T_TLB(t)	(((t) & 0x3) << 21)
 #define __PPC_WC(w)	(((w) & 0x3) << 21)
 #define __PPC_WS(w)	(((w) & 0x1f) << 11)
@@ -395,6 +400,14 @@
 #define PPC_RAW_XVCPSGNDP(t, a, b)	((0xf0000780 | VSX_XX3((t), (a), (b))))
 #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \
 	((0x1000002d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | (((vrc) & 0x1f) << 6)))
+#define PPC_RAW_LXVP(xtp, a, i)		(0x18000000 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_STXVP(xsp, a, i)	(0x18000001 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_LXVPX(xtp, a, b)	(0x7c00029a | __PPC_XTP(xtp) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_STXVPX(xsp, a, b)	(0x7c00039a | __PPC_XSP(xsp) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_PLXVP(xtp, i, a, pr) \
+	((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe8000000 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i)))
+#define PPC_RAW_PSTXVP(xsp, i, a, pr) \
+	((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf8000000 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i)))
 #define PPC_RAW_NAP			(0x4c000364)
 #define PPC_RAW_SLEEP			(0x4c0003a4)
 #define PPC_RAW_WINKLE			(0x4c0003e4)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 3/5] powerpc/sstep: Support VSX vector paired storage access instructions
From: Ravi Bangoria @ 2020-10-11  5:09 UTC (permalink / raw)
  To: mpe
  Cc: ravi.bangoria, jniethe5, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201011050908.72173-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
[kernel test robot reported a build failure]
Reported-by: kernel test robot <lkp@intel.com>
---
 arch/powerpc/lib/sstep.c | 150 +++++++++++++++++++++++++++++++++------
 1 file changed, 129 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index faf0bbf3efb7..96ca813a65e7 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[];
 #define XER_OV32	0x00080000U
 #define XER_CA32	0x00040000U
 
+#ifdef CONFIG_VSX
+#define VSX_REGISTER_XTP(rd)   ((((rd) & 1) << 5) | ((rd) & 0xfe))
+#endif
+
 #ifdef CONFIG_PPC_FPU
 /*
  * Functions in ldstfp.S
@@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int nb)
 		up[1] = tmp;
 		break;
 	}
+	case 32: {
+		unsigned long *up = (unsigned long *)ptr;
+		unsigned long tmp;
+
+		tmp = byterev_8(up[0]);
+		up[0] = byterev_8(up[3]);
+		up[3] = tmp;
+		tmp = byterev_8(up[2]);
+		up[2] = byterev_8(up[1]);
+		up[1] = tmp;
+		break;
+	}
+
 #endif
 	default:
 		WARN_ON_ONCE(1);
@@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 	reg->d[0] = reg->d[1] = 0;
 
 	switch (op->element_size) {
+	case 32:
+		/* [p]lxvp[x] */
 	case 16:
 		/* whole vector; lxv[x] or lxvl[l] */
 		if (size == 0)
@@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 		if (IS_LE && (op->vsx_flags & VSX_LDLEFT))
 			rev = !rev;
 		if (rev)
-			do_byte_reverse(reg, 16);
+			do_byte_reverse(reg, size);
 		break;
 	case 8:
 		/* scalar loads, lxvd2x, lxvdsx */
@@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
 	size = GETSIZE(op->type);
 
 	switch (op->element_size) {
+	case 32:
+		/* [p]stxvp[x] */
+		if (size == 0)
+			break;
+		if (rev) {
+			/* reverse 32 bytes */
+			buf.d[0] = byterev_8(reg->d[3]);
+			buf.d[1] = byterev_8(reg->d[2]);
+			buf.d[2] = byterev_8(reg->d[1]);
+			buf.d[3] = byterev_8(reg->d[0]);
+			reg = &buf;
+		}
+		memcpy(mem, reg, size);
+		break;
 	case 16:
 		/* stxv, stxvx, stxvl, stxvll */
 		if (size == 0)
@@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op,
 				       bool cross_endian)
 {
 	int reg = op->reg;
-	u8 mem[16];
-	union vsx_reg buf;
+	int i, j, nr_vsx_regs;
+	u8 mem[32];
+	union vsx_reg buf[2];
 	int size = GETSIZE(op->type);
 
 	if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
 		return -EFAULT;
 
-	emulate_vsx_load(op, &buf, mem, cross_endian);
+	nr_vsx_regs = size / sizeof(__vector128);
+	emulate_vsx_load(op, buf, mem, cross_endian);
 	preempt_disable();
 	if (reg < 32) {
 		/* FP regs + extensions */
 		if (regs->msr & MSR_FP) {
-			load_vsrn(reg, &buf);
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				load_vsrn(reg + i, &buf[j].v);
+			}
 		} else {
-			current->thread.fp_state.fpr[reg][0] = buf.d[0];
-			current->thread.fp_state.fpr[reg][1] = buf.d[1];
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				current->thread.fp_state.fpr[reg + i][0] = buf[j].d[0];
+				current->thread.fp_state.fpr[reg + i][1] = buf[j].d[1];
+			}
 		}
 	} else {
-		if (regs->msr & MSR_VEC)
-			load_vsrn(reg, &buf);
-		else
-			current->thread.vr_state.vr[reg - 32] = buf.v;
+		if (regs->msr & MSR_VEC) {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				load_vsrn(reg + i, &buf[j].v);
+			}
+		} else {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				current->thread.vr_state.vr[reg - 32 + i] = buf[j].v;
+			}
+		}
 	}
 	preempt_enable();
 	return 0;
@@ -893,30 +941,45 @@ static nokprobe_inline int do_vsx_store(struct instruction_op *op,
 					bool cross_endian)
 {
 	int reg = op->reg;
-	u8 mem[16];
-	union vsx_reg buf;
+	int i, j, nr_vsx_regs;
+	u8 mem[32];
+	union vsx_reg buf[2];
 	int size = GETSIZE(op->type);
 
 	if (!address_ok(regs, ea, size))
 		return -EFAULT;
 
+	nr_vsx_regs = size / sizeof(__vector128);
 	preempt_disable();
 	if (reg < 32) {
 		/* FP regs + extensions */
 		if (regs->msr & MSR_FP) {
-			store_vsrn(reg, &buf);
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				store_vsrn(reg + i, &buf[j].v);
+			}
 		} else {
-			buf.d[0] = current->thread.fp_state.fpr[reg][0];
-			buf.d[1] = current->thread.fp_state.fpr[reg][1];
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				buf[j].d[0] = current->thread.fp_state.fpr[reg + i][0];
+				buf[j].d[1] = current->thread.fp_state.fpr[reg + i][1];
+			}
 		}
 	} else {
-		if (regs->msr & MSR_VEC)
-			store_vsrn(reg, &buf);
-		else
-			buf.v = current->thread.vr_state.vr[reg - 32];
+		if (regs->msr & MSR_VEC) {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				store_vsrn(reg + i, &buf[j].v);
+			}
+		} else {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				buf[j].v = current->thread.vr_state.vr[reg - 32 + i];
+			}
+		}
 	}
 	preempt_enable();
-	emulate_vsx_store(op, &buf, mem, cross_endian);
+	emulate_vsx_store(op, buf, mem, cross_endian);
 	return  copy_mem_out(mem, ea, size, regs);
 }
 #endif /* CONFIG_VSX */
@@ -2403,6 +2466,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			op->vsx_flags = VSX_SPLAT;
 			break;
 
+		case 333:       /* lxvpx */
+			if (!cpu_has_feature(CPU_FTR_ARCH_31))
+				return -1;
+			op->reg = VSX_REGISTER_XTP(rd);
+			op->type = MKOP(LOAD_VSX, 0, 32);
+			op->element_size = 32;
+			break;
+
 		case 364:	/* lxvwsx */
 			op->reg = rd | ((word & 1) << 5);
 			op->type = MKOP(LOAD_VSX, 0, 4);
@@ -2431,6 +2502,13 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 				VSX_CHECK_VEC;
 			break;
 		}
+		case 461:       /* stxvpx */
+			if (!cpu_has_feature(CPU_FTR_ARCH_31))
+				return -1;
+			op->reg = VSX_REGISTER_XTP(rd);
+			op->type = MKOP(STORE_VSX, 0, 32);
+			op->element_size = 32;
+			break;
 		case 524:	/* lxsspx */
 			op->reg = rd | ((word & 1) << 5);
 			op->type = MKOP(LOAD_VSX, 0, 4);
@@ -2672,6 +2750,22 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #endif
 
 #ifdef CONFIG_VSX
+	case 6:
+		if (!cpu_has_feature(CPU_FTR_ARCH_31))
+			return -1;
+		op->ea = dqform_ea(word, regs);
+		op->reg = VSX_REGISTER_XTP(rd);
+		op->element_size = 32;
+		switch (word & 0xf) {
+		case 0:         /* lxvp */
+			op->type = MKOP(LOAD_VSX, 0, 32);
+			break;
+		case 1:         /* stxvp */
+			op->type = MKOP(STORE_VSX, 0, 32);
+			break;
+		}
+		break;
+
 	case 61:	/* stfdp, lxv, stxsd, stxssp, stxv */
 		switch (word & 7) {
 		case 0:		/* stfdp with LSB of DS field = 0 */
@@ -2805,12 +2899,26 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			case 57:	/* pld */
 				op->type = MKOP(LOAD, PREFIXED, 8);
 				break;
+#ifdef CONFIG_VSX
+			case 58:        /* plxvp */
+				op->reg = VSX_REGISTER_XTP(rd);
+				op->type = MKOP(LOAD_VSX, PREFIXED, 32);
+				op->element_size = 32;
+				break;
+#endif /* CONFIG_VSX */
 			case 60:        /* pstq */
 				op->type = MKOP(STORE, PREFIXED, 16);
 				break;
 			case 61:	/* pstd */
 				op->type = MKOP(STORE, PREFIXED, 8);
 				break;
+#ifdef CONFIG_VSX
+			case 62:        /* pstxvp */
+				op->reg = VSX_REGISTER_XTP(rd);
+				op->type = MKOP(STORE_VSX, PREFIXED, 32);
+				op->element_size = 32;
+				break;
+#endif /* CONFIG_VSX */
 			}
 			break;
 		case 1: /* Type 01 Eight-Byte Register-to-Register */
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 2/5] powerpc/sstep: Cover new VSX instructions under CONFIG_VSX
From: Ravi Bangoria @ 2020-10-11  5:09 UTC (permalink / raw)
  To: mpe
  Cc: ravi.bangoria, jniethe5, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201011050908.72173-1-ravi.bangoria@linux.ibm.com>

Recently added Power10 prefixed VSX instruction are included
unconditionally in the kernel. If they are executed on a
machine without VSX support, it might create issues. Fix that.
Also fix one mnemonics spelling mistake in comment.

Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/lib/sstep.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e6242744c71b..faf0bbf3efb7 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2757,6 +2757,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			case 41:	/* plwa */
 				op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4);
 				break;
+#ifdef CONFIG_VSX
 			case 42:        /* plxsd */
 				op->reg = rd + 32;
 				op->type = MKOP(LOAD_VSX, PREFIXED, 8);
@@ -2797,13 +2798,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 				op->element_size = 16;
 				op->vsx_flags = VSX_CHECK_VEC;
 				break;
+#endif /* CONFIG_VSX */
 			case 56:        /* plq */
 				op->type = MKOP(LOAD, PREFIXED, 16);
 				break;
 			case 57:	/* pld */
 				op->type = MKOP(LOAD, PREFIXED, 8);
 				break;
-			case 60:        /* stq */
+			case 60:        /* pstq */
 				op->type = MKOP(STORE, PREFIXED, 16);
 				break;
 			case 61:	/* pstd */
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Ravi Bangoria @ 2020-10-11  5:09 UTC (permalink / raw)
  To: mpe
  Cc: ravi.bangoria, jniethe5, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201011050908.72173-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Fixes: 3920742b92f5 ("powerpc sstep: Add support for prefixed fixed-point arithmetic")
Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/lib/sstep.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e9dcaba9a4f8..e6242744c71b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 	switch (opcode) {
 #ifdef __powerpc64__
 	case 1:
+		if (!cpu_has_feature(CPU_FTR_ARCH_31))
+			return -1;
+
 		prefix_r = GET_PREFIX_R(word);
 		ra = GET_PREFIX_RA(suffix);
 		rd = (suffix >> 21) & 0x1f;
@@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		}
 		break;
 	case 1: /* Prefixed instructions */
+		if (!cpu_has_feature(CPU_FTR_ARCH_31))
+			return -1;
+
 		prefix_r = GET_PREFIX_R(word);
 		ra = GET_PREFIX_RA(suffix);
 		op->update_reg = ra;
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 0/5] powerpc/sstep: VSX 32-byte vector paired load/store instructions
From: Ravi Bangoria @ 2020-10-11  5:09 UTC (permalink / raw)
  To: mpe
  Cc: ravi.bangoria, jniethe5, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev

VSX vector paired instructions operates with octword (32-byte)
operand for loads and stores between storage and a pair of two
sequential Vector-Scalar Registers (VSRs). There are 4 word
instructions and 2 prefixed instructions that provides this
32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx,
plxvp, pstxvp.

Emulation infrastructure doesn't have support for these instructions,
to operate with 32-byte storage access and to operate with 2 VSX
registers. This patch series enables the instruction emulation
support and adds test cases for them respectively.

v4: https://lore.kernel.org/r/20201008072726.233086-1-ravi.bangoria@linux.ibm.com

Changes in v5:
-------------
* Fix build breakage reported by Kernel test robote
* Patch #2 is new. CONFIG_VSX check was missing for some VSX
  instructions. Patch #2 adds that check.

Changes in v4:
-------------
* Patch #1 is (kind of) new.
* Patch #2 now enables both analyse_instr() and emulate_step()
  unlike prev series where both were in separate patches.
* Patch #2 also has important fix for emulation on LE.
* Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands,
  removed *_EX_OP, __PPC_T[P][X] macros which are incorrect,
  and adhered to PPC_RAW_* convention.
* Added `CPU_FTR_ARCH_31` check in testcases to avoid failing
  in p8/p9.
* Some consmetic changes.
* Rebased to powerpc/next

Changes in v3:
-------------
Worked on review comments and suggestions from Ravi and Naveen,

* Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC
  cleared in exception conditions and it reaches to read/write to
  thread_struct member fp_state/vr_state respectively.
* Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should
  hold a single vsx register size.
* Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check
  `VSX_LDLEFT` that is not applicable for these vsx instructions.
* Fix comments in emulate_vsx_load() that were misleading.
* Rebased on latest powerpc next branch.

Changes in v2:
-------------
* Fix suggestion from Sandipan, wrap ISA 3.1 instructions with
  cpu_has_feature(CPU_FTR_ARCH_31) check.

Balamuruhan S (4):
  powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31
    is set
  powerpc/sstep: Support VSX vector paired storage access instructions
  powerpc/ppc-opcode: Add encoding macros for VSX vector paired
    instructions
  powerpc/sstep: Add testcases for VSX vector paired load/store
    instructions

Ravi Bangoria (1):
  powerpc/sstep: Cover new VSX instructions under CONFIG_VSX

 arch/powerpc/include/asm/ppc-opcode.h |  13 ++
 arch/powerpc/lib/sstep.c              | 160 ++++++++++++---
 arch/powerpc/lib/test_emulate_step.c  | 270 ++++++++++++++++++++++++++
 3 files changed, 421 insertions(+), 22 deletions(-)

-- 
2.26.2

^ permalink raw reply

* Re: [PATCH RFC PKS/PMEM 09/58] drivers/gpu: Utilize new kmap_thread()
From: Ira Weiny @ 2020-10-10 23:01 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Peter Zijlstra, David Airlie, Patrik Jakobsson,
	x86, Dave Hansen, Dan Williams, Fenghua Yu, linux-doc,
	linux-kernel, linux-nvdimm, linux-fsdevel, linux-mm,
	linux-kselftest, linuxppc-dev, kvm, netdev, bpf, kexec,
	linux-bcache, linux-mtd, devel, linux-efi, linux-mmc, linux-scsi,
	target-devel, linux-nfs, ceph-devel, linux-ext4, linux-aio,
	io-uring, linux-erofs, linux-um, linux-ntfs-dev, reiserfs-devel,
	linux-f2fs-devel, linux-nilfs, cluster-devel, ecryptfs,
	linux-cifs, linux-btrfs, linux-afs, linux-rdma, amd-gfx,
	dri-devel, intel-gfx, drbd-dev, linux-block, xen-devel,
	linux-cachefs, samba-technical, intel-wired-lan
In-Reply-To: <20201009220349.GQ438822@phenom.ffwll.local>

On Sat, Oct 10, 2020 at 12:03:49AM +0200, Daniel Vetter wrote:
> On Fri, Oct 09, 2020 at 12:49:44PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > These kmap() calls in the gpu stack are localized to a single thread.
> > To avoid the over head of global PKRS updates use the new kmap_thread()
> > call.
> > 
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> I'm guessing the entire pile goes in through some other tree.
>

Apologies for not realizing there were multiple maintainers here.

But, I was thinking it would land together through the mm tree once the core
support lands.  I've tried to split these out in a way they can be easily
reviewed/acked by the correct developers.

> If so:
> 
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> If you want this to land through maintainer trees, then we need a
> per-driver split (since aside from amdgpu and radeon they're all different
> subtrees).

It is just RFC for the moment.  I need to get the core support accepted first
then this can land.

> 
> btw the two kmap calls in drm you highlight in the cover letter should
> also be convertible to kmap_thread. We only hold vmalloc mappings for a
> longer time (or it'd be quite a driver bug). So if you want maybe throw
> those two as two additional patches on top, and we can do some careful
> review & testing for them.

Cool.  I'll add them in.

Ira

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c              | 12 ++++++------
> >  drivers/gpu/drm/gma500/gma_display.c                 |  4 ++--
> >  drivers/gpu/drm/gma500/mmu.c                         | 10 +++++-----
> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c            |  4 ++--
> >  .../gpu/drm/i915/gem/selftests/i915_gem_context.c    |  4 ++--
> >  drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c   |  8 ++++----
> >  drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c         |  4 ++--
> >  drivers/gpu/drm/i915/gt/intel_gtt.c                  |  4 ++--
> >  drivers/gpu/drm/i915/gt/shmem_utils.c                |  4 ++--
> >  drivers/gpu/drm/i915/i915_gem.c                      |  8 ++++----
> >  drivers/gpu/drm/i915/i915_gpu_error.c                |  4 ++--
> >  drivers/gpu/drm/i915/selftests/i915_perf.c           |  4 ++--
> >  drivers/gpu/drm/radeon/radeon_ttm.c                  |  4 ++--
> >  13 files changed, 37 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 978bae731398..bd564bccb7a3 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -2437,11 +2437,11 @@ static ssize_t amdgpu_ttm_gtt_read(struct file *f, char __user *buf,
> >  
> >  		page = adev->gart.pages[p];
> >  		if (page) {
> > -			ptr = kmap(page);
> > +			ptr = kmap_thread(page);
> >  			ptr += off;
> >  
> >  			r = copy_to_user(buf, ptr, cur_size);
> > -			kunmap(adev->gart.pages[p]);
> > +			kunmap_thread(adev->gart.pages[p]);
> >  		} else
> >  			r = clear_user(buf, cur_size);
> >  
> > @@ -2507,9 +2507,9 @@ static ssize_t amdgpu_iomem_read(struct file *f, char __user *buf,
> >  		if (p->mapping != adev->mman.bdev.dev_mapping)
> >  			return -EPERM;
> >  
> > -		ptr = kmap(p);
> > +		ptr = kmap_thread(p);
> >  		r = copy_to_user(buf, ptr + off, bytes);
> > -		kunmap(p);
> > +		kunmap_thread(p);
> >  		if (r)
> >  			return -EFAULT;
> >  
> > @@ -2558,9 +2558,9 @@ static ssize_t amdgpu_iomem_write(struct file *f, const char __user *buf,
> >  		if (p->mapping != adev->mman.bdev.dev_mapping)
> >  			return -EPERM;
> >  
> > -		ptr = kmap(p);
> > +		ptr = kmap_thread(p);
> >  		r = copy_from_user(ptr + off, buf, bytes);
> > -		kunmap(p);
> > +		kunmap_thread(p);
> >  		if (r)
> >  			return -EFAULT;
> >  
> > diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c
> > index 3df6d6e850f5..35f4e55c941f 100644
> > --- a/drivers/gpu/drm/gma500/gma_display.c
> > +++ b/drivers/gpu/drm/gma500/gma_display.c
> > @@ -400,9 +400,9 @@ int gma_crtc_cursor_set(struct drm_crtc *crtc,
> >  		/* Copy the cursor to cursor mem */
> >  		tmp_dst = dev_priv->vram_addr + cursor_gt->offset;
> >  		for (i = 0; i < cursor_pages; i++) {
> > -			tmp_src = kmap(gt->pages[i]);
> > +			tmp_src = kmap_thread(gt->pages[i]);
> >  			memcpy(tmp_dst, tmp_src, PAGE_SIZE);
> > -			kunmap(gt->pages[i]);
> > +			kunmap_thread(gt->pages[i]);
> >  			tmp_dst += PAGE_SIZE;
> >  		}
> >  
> > diff --git a/drivers/gpu/drm/gma500/mmu.c b/drivers/gpu/drm/gma500/mmu.c
> > index 505044c9a673..fba7a3a461fd 100644
> > --- a/drivers/gpu/drm/gma500/mmu.c
> > +++ b/drivers/gpu/drm/gma500/mmu.c
> > @@ -192,20 +192,20 @@ struct psb_mmu_pd *psb_mmu_alloc_pd(struct psb_mmu_driver *driver,
> >  		pd->invalid_pte = 0;
> >  	}
> >  
> > -	v = kmap(pd->dummy_pt);
> > +	v = kmap_thread(pd->dummy_pt);
> >  	for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i)
> >  		v[i] = pd->invalid_pte;
> >  
> > -	kunmap(pd->dummy_pt);
> > +	kunmap_thread(pd->dummy_pt);
> >  
> > -	v = kmap(pd->p);
> > +	v = kmap_thread(pd->p);
> >  	for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i)
> >  		v[i] = pd->invalid_pde;
> >  
> > -	kunmap(pd->p);
> > +	kunmap_thread(pd->p);
> >  
> >  	clear_page(kmap(pd->dummy_page));
> > -	kunmap(pd->dummy_page);
> > +	kunmap_thread(pd->dummy_page);
> >  
> >  	pd->tables = vmalloc_user(sizeof(struct psb_mmu_pt *) * 1024);
> >  	if (!pd->tables)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > index 38113d3c0138..274424795fb7 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > @@ -566,9 +566,9 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
> >  		if (err < 0)
> >  			goto fail;
> >  
> > -		vaddr = kmap(page);
> > +		vaddr = kmap_thread(page);
> >  		memcpy(vaddr, data, len);
> > -		kunmap(page);
> > +		kunmap_thread(page);
> >  
> >  		err = pagecache_write_end(file, file->f_mapping,
> >  					  offset, len, len,
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> > index 7ffc3c751432..b466c677d007 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> > @@ -1754,7 +1754,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
> >  		return -EINVAL;
> >  	}
> >  
> > -	vaddr = kmap(page);
> > +	vaddr = kmap_thread(page);
> >  	if (!vaddr) {
> >  		pr_err("No (mappable) scratch page!\n");
> >  		return -EINVAL;
> > @@ -1765,7 +1765,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
> >  		pr_err("Inconsistent initial state of scratch page!\n");
> >  		err = -EINVAL;
> >  	}
> > -	kunmap(page);
> > +	kunmap_thread(page);
> >  
> >  	return err;
> >  }
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> > index 9c7402ce5bf9..447df22e2e06 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> > @@ -143,7 +143,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
> >  	intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt);
> >  
> >  	p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
> > -	cpu = kmap(p) + offset_in_page(offset);
> > +	cpu = kmap_thread(p) + offset_in_page(offset);
> >  	drm_clflush_virt_range(cpu, sizeof(*cpu));
> >  	if (*cpu != (u32)page) {
> >  		pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",
> > @@ -161,7 +161,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
> >  	}
> >  	*cpu = 0;
> >  	drm_clflush_virt_range(cpu, sizeof(*cpu));
> > -	kunmap(p);
> > +	kunmap_thread(p);
> >  
> >  out:
> >  	__i915_vma_put(vma);
> > @@ -236,7 +236,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
> >  		intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt);
> >  
> >  		p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
> > -		cpu = kmap(p) + offset_in_page(offset);
> > +		cpu = kmap_thread(p) + offset_in_page(offset);
> >  		drm_clflush_virt_range(cpu, sizeof(*cpu));
> >  		if (*cpu != (u32)page) {
> >  			pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",
> > @@ -254,7 +254,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
> >  		}
> >  		*cpu = 0;
> >  		drm_clflush_virt_range(cpu, sizeof(*cpu));
> > -		kunmap(p);
> > +		kunmap_thread(p);
> >  		if (err)
> >  			return err;
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > index 7fb36b12fe7a..38da348282f1 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> > @@ -731,7 +731,7 @@ static void swizzle_page(struct page *page)
> >  	char *vaddr;
> >  	int i;
> >  
> > -	vaddr = kmap(page);
> > +	vaddr = kmap_thread(page);
> >  
> >  	for (i = 0; i < PAGE_SIZE; i += 128) {
> >  		memcpy(temp, &vaddr[i], 64);
> > @@ -739,7 +739,7 @@ static void swizzle_page(struct page *page)
> >  		memcpy(&vaddr[i + 64], temp, 64);
> >  	}
> >  
> > -	kunmap(page);
> > +	kunmap_thread(page);
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index 2a72cce63fd9..4cfb24e9ed62 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -312,9 +312,9 @@ static void poison_scratch_page(struct page *page, unsigned long size)
> >  	do {
> >  		void *vaddr;
> >  
> > -		vaddr = kmap(page);
> > +		vaddr = kmap_thread(page);
> >  		memset(vaddr, POISON_FREE, PAGE_SIZE);
> > -		kunmap(page);
> > +		kunmap_thread(page);
> >  
> >  		page = pfn_to_page(page_to_pfn(page) + 1);
> >  		size -= PAGE_SIZE;
> > diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
> > index 43c7acbdc79d..a40d3130cebf 100644
> > --- a/drivers/gpu/drm/i915/gt/shmem_utils.c
> > +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
> > @@ -142,12 +142,12 @@ static int __shmem_rw(struct file *file, loff_t off,
> >  		if (IS_ERR(page))
> >  			return PTR_ERR(page);
> >  
> > -		vaddr = kmap(page);
> > +		vaddr = kmap_thread(page);
> >  		if (write)
> >  			memcpy(vaddr + offset_in_page(off), ptr, this);
> >  		else
> >  			memcpy(ptr, vaddr + offset_in_page(off), this);
> > -		kunmap(page);
> > +		kunmap_thread(page);
> >  		put_page(page);
> >  
> >  		len -= this;
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9aa3066cb75d..cae8300fd224 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -312,14 +312,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data,
> >  	char *vaddr;
> >  	int ret;
> >  
> > -	vaddr = kmap(page);
> > +	vaddr = kmap_thread(page);
> >  
> >  	if (needs_clflush)
> >  		drm_clflush_virt_range(vaddr + offset, len);
> >  
> >  	ret = __copy_to_user(user_data, vaddr + offset, len);
> >  
> > -	kunmap(page);
> > +	kunmap_thread(page);
> >  
> >  	return ret ? -EFAULT : 0;
> >  }
> > @@ -708,7 +708,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
> >  	char *vaddr;
> >  	int ret;
> >  
> > -	vaddr = kmap(page);
> > +	vaddr = kmap_thread(page);
> >  
> >  	if (needs_clflush_before)
> >  		drm_clflush_virt_range(vaddr + offset, len);
> > @@ -717,7 +717,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
> >  	if (!ret && needs_clflush_after)
> >  		drm_clflush_virt_range(vaddr + offset, len);
> >  
> > -	kunmap(page);
> > +	kunmap_thread(page);
> >  
> >  	return ret ? -EFAULT : 0;
> >  }
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 3e6cbb0d1150..aecd469b6b6e 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1058,9 +1058,9 @@ i915_vma_coredump_create(const struct intel_gt *gt,
> >  
> >  			drm_clflush_pages(&page, 1);
> >  
> > -			s = kmap(page);
> > +			s = kmap_thread(page);
> >  			ret = compress_page(compress, s, dst, false);
> > -			kunmap(page);
> > +			kunmap_thread(page);
> >  
> >  			drm_clflush_pages(&page, 1);
> >  
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c
> > index c2d001d9c0ec..7f7ef2d056f4 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c
> > @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg)
> >  	}
> >  
> >  	/* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */
> > -	scratch = kmap(ce->vm->scratch[0].base.page);
> > +	scratch = kmap_thread(ce->vm->scratch[0].base.page);
> >  	memset(scratch, POISON_FREE, PAGE_SIZE);
> >  
> >  	rq = intel_context_create_request(ce);
> > @@ -405,7 +405,7 @@ static int live_noa_gpr(void *arg)
> >  out_rq:
> >  	i915_request_put(rq);
> >  out_ce:
> > -	kunmap(ce->vm->scratch[0].base.page);
> > +	kunmap_thread(ce->vm->scratch[0].base.page);
> >  	intel_context_put(ce);
> >  out:
> >  	stream_destroy(stream);
> > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> > index 004344dce140..0aba0cac51e1 100644
> > --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> > @@ -1013,11 +1013,11 @@ static ssize_t radeon_ttm_gtt_read(struct file *f, char __user *buf,
> >  
> >  		page = rdev->gart.pages[p];
> >  		if (page) {
> > -			ptr = kmap(page);
> > +			ptr = kmap_thread(page);
> >  			ptr += off;
> >  
> >  			r = copy_to_user(buf, ptr, cur_size);
> > -			kunmap(rdev->gart.pages[p]);
> > +			kunmap_thread(rdev->gart.pages[p]);
> >  		} else
> >  			r = clear_user(buf, cur_size);
> >  
> > -- 
> > 2.28.0.rc0.12.gb6a658bd00c9
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH RFC PKS/PMEM 10/58] drivers/rdma: Utilize new kmap_thread()
From: Bernard Metzler @ 2020-10-10 11:36 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
	Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
	linux-kselftest, samba-technical, Thomas Gleixner, devel,
	linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
	x86, ceph-devel, io-uring, cluster-devel, Jason Gunthorpe,
	Doug Ledford, Ingo Molnar, intel-wired-lan, xen-devel, linux-ext4,
	Fenghua Yu, linux-afs, Faisal Latif, linux-um, intel-gfx,
	ecryptfs, linux-erofs, linux-nfs, reiserfs-devel, linux-block,
	linux-bcache, Borislav Petkov, Andy Lutomirski, drbd-dev, amd-gfx,
	Dan Williams, Shiraz Saleem, bpf, linux-cachefs, Mike Marciniszyn,
	linux-ntfs-dev, netdev, Dennis Dalessandro, kexec, linux-kernel,
	linux-f2fs-devel, linux-fsdevel, Andrew Morton, linuxppc-dev,
	linux-btrfs
In-Reply-To: <20201009195033.3208459-11-ira.weiny@intel.com>

-----ira.weiny@intel.com wrote: -----

>To: "Andrew Morton" <akpm@linux-foundation.org>, "Thomas Gleixner"
><tglx@linutronix.de>, "Ingo Molnar" <mingo@redhat.com>, "Borislav
>Petkov" <bp@alien8.de>, "Andy Lutomirski" <luto@kernel.org>, "Peter
>Zijlstra" <peterz@infradead.org>
>From: ira.weiny@intel.com
>Date: 10/09/2020 09:52PM
>Cc: "Ira Weiny" <ira.weiny@intel.com>, "Mike Marciniszyn"
><mike.marciniszyn@intel.com>, "Dennis Dalessandro"
><dennis.dalessandro@intel.com>, "Doug Ledford" <dledford@redhat.com>,
>"Jason Gunthorpe" <jgg@ziepe.ca>, "Faisal Latif"
><faisal.latif@intel.com>, "Shiraz Saleem" <shiraz.saleem@intel.com>,
>"Bernard Metzler" <bmt@zurich.ibm.com>, x86@kernel.org, "Dave Hansen"
><dave.hansen@linux.intel.com>, "Dan Williams"
><dan.j.williams@intel.com>, "Fenghua Yu" <fenghua.yu@intel.com>,
>linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
>linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org,
>linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
>linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
>netdev@vger.kernel.org, bpf@vger.kernel.org,
>kexec@lists.infradead.org, linux-bcache@vger.kernel.org,
>linux-mtd@lists.infradead.org, devel@driverdev.osuosl.org,
>linux-efi@vger.kernel.org, linux-mmc@vger.kernel.org,
>linux-scsi@vger.kernel.org, target-devel@vger.kernel.org,
>linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org,
>linux-ext4@vger.kernel.org, linux-aio@kvack.org,
>io-uring@vger.kernel.org, linux-erofs@lists.ozlabs.org,
>linux-um@lists.infradead.org, linux-ntfs-dev@lists.sourceforge.net,
>reiserfs-devel@vger.kernel.org,
>linux-f2fs-devel@lists.sourceforge.net, linux-nilfs@vger.kernel.org,
>cluster-devel@redhat.com, ecryptfs@vger.kernel.org,
>linux-cifs@vger.kernel.org, linux-btrfs@vger.kernel.org,
>linux-afs@lists.infradead.org, linux-rdma@vger.kernel.org,
>amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
>intel-gfx@lists.freedesktop.org, drbd-dev@tron.linbit.com,
>linux-block@vger.kernel.org, xen-devel@lists.xenproject.org,
>linux-cachefs@redhat.com, samba-technical@lists.samba.org,
>intel-wired-lan@lists.osuosl.org
>Subject: [EXTERNAL] [PATCH RFC PKS/PMEM 10/58] drivers/rdma: Utilize
>new kmap_thread()
>
>From: Ira Weiny <ira.weiny@intel.com>
>
>The kmap() calls in these drivers are localized to a single thread.
>To
>avoid the over head of global PKRS updates use the new kmap_thread()
>call.
>
>Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
>Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
>Cc: Doug Ledford <dledford@redhat.com>
>Cc: Jason Gunthorpe <jgg@ziepe.ca>
>Cc: Faisal Latif <faisal.latif@intel.com>
>Cc: Shiraz Saleem <shiraz.saleem@intel.com>
>Cc: Bernard Metzler <bmt@zurich.ibm.com>
>Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>---
> drivers/infiniband/hw/hfi1/sdma.c      |  4 ++--
> drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +++++-----
> drivers/infiniband/sw/siw/siw_qp_tx.c  | 14 +++++++-------
> 3 files changed, 14 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/infiniband/hw/hfi1/sdma.c
>b/drivers/infiniband/hw/hfi1/sdma.c
>index 04575c9afd61..09d206e3229a 100644
>--- a/drivers/infiniband/hw/hfi1/sdma.c
>+++ b/drivers/infiniband/hw/hfi1/sdma.c
>@@ -3130,7 +3130,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata
>*dd, struct sdma_txreq *tx,
> 		}
> 
> 		if (type == SDMA_MAP_PAGE) {
>-			kvaddr = kmap(page);
>+			kvaddr = kmap_thread(page);
> 			kvaddr += offset;
> 		} else if (WARN_ON(!kvaddr)) {
> 			__sdma_txclean(dd, tx);
>@@ -3140,7 +3140,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata
>*dd, struct sdma_txreq *tx,
> 		memcpy(tx->coalesce_buf + tx->coalesce_idx, kvaddr, len);
> 		tx->coalesce_idx += len;
> 		if (type == SDMA_MAP_PAGE)
>-			kunmap(page);
>+			kunmap_thread(page);
> 
> 		/* If there is more data, return */
> 		if (tx->tlen - tx->coalesce_idx)
>diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.c
>b/drivers/infiniband/hw/i40iw/i40iw_cm.c
>index a3b95805c154..122d7a5642a1 100644
>--- a/drivers/infiniband/hw/i40iw/i40iw_cm.c
>+++ b/drivers/infiniband/hw/i40iw/i40iw_cm.c
>@@ -3721,7 +3721,7 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct
>iw_cm_conn_param *conn_param)
> 		ibmr->device = iwpd->ibpd.device;
> 		iwqp->lsmm_mr = ibmr;
> 		if (iwqp->page)
>-			iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
>+			iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
> 		dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp,
> 							iwqp->ietf_mem.va,
> 							(accept.size + conn_param->private_data_len),
>@@ -3729,12 +3729,12 @@ int i40iw_accept(struct iw_cm_id *cm_id,
>struct iw_cm_conn_param *conn_param)
> 
> 	} else {
> 		if (iwqp->page)
>-			iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
>+			iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
> 		dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp, NULL, 0, 0);
> 	}
> 
> 	if (iwqp->page)
>-		kunmap(iwqp->page);
>+		kunmap_thread(iwqp->page);
> 
> 	iwqp->cm_id = cm_id;
> 	cm_node->cm_id = cm_id;
>@@ -4102,10 +4102,10 @@ static void i40iw_cm_event_connected(struct
>i40iw_cm_event *event)
> 	i40iw_cm_init_tsa_conn(iwqp, cm_node);
> 	read0 = (cm_node->send_rdma0_op == SEND_RDMA_READ_ZERO);
> 	if (iwqp->page)
>-		iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
>+		iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
> 	dev->iw_priv_qp_ops->qp_send_rtt(&iwqp->sc_qp, read0);
> 	if (iwqp->page)
>-		kunmap(iwqp->page);
>+		kunmap_thread(iwqp->page);
> 
> 	memset(&attr, 0, sizeof(attr));
> 	attr.qp_state = IB_QPS_RTS;
>diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
>b/drivers/infiniband/sw/siw/siw_qp_tx.c
>index d19d8325588b..4ed37c328d02 100644
>--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
>+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
>@@ -76,7 +76,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx,
>void *paddr)
> 			if (unlikely(!p))
> 				return -EFAULT;
> 
>-			buffer = kmap(p);
>+			buffer = kmap_thread(p);
> 
> 			if (likely(PAGE_SIZE - off >= bytes)) {
> 				memcpy(paddr, buffer + off, bytes);
>@@ -84,7 +84,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx,
>void *paddr)
> 				unsigned long part = bytes - (PAGE_SIZE - off);
> 
> 				memcpy(paddr, buffer + off, part);
>-				kunmap(p);
>+				kunmap_thread(p);
> 
> 				if (!mem->is_pbl)
> 					p = siw_get_upage(mem->umem,
>@@ -96,10 +96,10 @@ static int siw_try_1seg(struct siw_iwarp_tx
>*c_tx, void *paddr)
> 				if (unlikely(!p))
> 					return -EFAULT;
> 
>-				buffer = kmap(p);
>+				buffer = kmap_thread(p);
> 				memcpy(paddr + part, buffer, bytes - part);
> 			}
>-			kunmap(p);
>+			kunmap_thread(p);
> 		}
> 	}
> 	return (int)bytes;
>@@ -505,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> 				page_array[seg] = p;
> 
> 				if (!c_tx->use_sendpage) {
>-					iov[seg].iov_base = kmap(p) + fp_off;
>+					iov[seg].iov_base = kmap_thread(p) + fp_off;

This misses a corresponding kunmap_thread() in siw_unmap_pages()
(pls change line 403 in siw_qp_tx.c as well)

Thanks,
Bernard.

> 					iov[seg].iov_len = plen;
> 
> 					/* Remember for later kunmap() */
>@@ -518,9 +518,9 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> 							plen);
> 				} else if (do_crc) {
> 					crypto_shash_update(c_tx->mpa_crc_hd,
>-							    kmap(p) + fp_off,
>+							    kmap_thread(p) + fp_off,
> 							    plen);
>-					kunmap(p);
>+					kunmap_thread(p);
> 				}
> 			} else {
> 				u64 va = sge->laddr + sge_off;
>-- 
>2.28.0.rc0.12.gb6a658bd00c9
>
>


^ permalink raw reply

* [PATCH] powerpc/mm: Fix verification of MMU_FTR_TYPE_44x
From: Christophe Leroy @ 2020-10-10 17:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Alastair D'Silva
  Cc: linuxppc-dev, linux-kernel

MMU_FTR_TYPE_44x cannot be checked by cpu_has_feature()

Use mmu_has_feature() instead

Fixes: 23eb7f560a2a ("powerpc: Convert flush_icache_range & friends to C")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/mm/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index ddc32cc1b6cf..b7586d8c835b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -530,7 +530,7 @@ void __flush_dcache_icache(void *p)
 	 * space occurs, before returning to user space.
 	 */
 
-	if (cpu_has_feature(MMU_FTR_TYPE_44x))
+	if (mmu_has_feature(MMU_FTR_TYPE_44x))
 		return;
 
 	invalidate_icache_range(addr, addr + PAGE_SIZE);
-- 
2.25.0


^ permalink raw reply related

* [PATCH 4/4] powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry
From: Christophe Leroy @ 2020-10-10 15:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <3f2386f94fada00b541be76d6bc4b6409746a72d.1602342819.git.christophe.leroy@csgroup.eu>

When _PAGE_ACCESSED is not set, a minor fault is expected.
To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED
into the L2 entry valid bit.

To simplify the processing and reduce the number of instructions in
TLB miss exceptions, manage it as an APG bit and get it next to
_PAGE_GUARDED bit to allow a copy in one go. Then declare the
corresponding groups as handling all accesses as user accesses.
As the PP bits always define user as No Access, it will generate
a fault.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/nohash/32/kup-8xx.h |  2 +-
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 47 +++++++-------------
 arch/powerpc/include/asm/nohash/32/pte-8xx.h |  9 ++--
 arch/powerpc/kernel/head_8xx.S               | 36 +++------------
 4 files changed, 28 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 85ed2390fb99..567cdc557402 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -63,7 +63,7 @@ static inline void restore_user_access(unsigned long flags)
 static inline bool
 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
-	return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xf0000000),
+	return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xff000000),
 		    "Bug: fault blocked by AP register !");
 }
 
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index 1d9ac0f9c794..0bd1b144eb76 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -33,19 +33,18 @@
  * respectively NA for All or X for Supervisor and no access for User.
  * Then we use the APG to say whether accesses are according to Page rules or
  * "all Supervisor" rules (Access to all)
- * Therefore, we define 2 APG groups. lsb is _PMD_USER
- * 0 => Kernel => 01 (all accesses performed according to page definition)
- * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
- * 2-15 => Not Used
- */
-#define MI_APG_INIT	0x40000000
-
-/*
- * 0 => Kernel => 01 (all accesses performed according to page definition)
- * 1 => User => 10 (all accesses performed according to swaped page definition)
- * 2-15 => Not Used
- */
-#define MI_APG_KUEP	0x60000000
+ * _PAGE_ACCESSED is also managed via APG. When _PAGE_ACCESSED is not set, say
+ * "all User" rules, that will lead to NA for all.
+ * Therefore, we define 4 APG groups. lsb is _PAGE_ACCESSED
+ * 0 => Kernel => 11 (all accesses performed according as user iaw page definition)
+ * 1 => Kernel+Accessed => 01 (all accesses performed according to page definition)
+ * 2 => User => 11 (all accesses performed according as user iaw page definition)
+ * 3 => User+Accessed => 00 (all accesses performed as supervisor iaw page definition) for INIT
+ *                    => 10 (all accesses performed according to swaped page definition) for KUEP
+ * 4-15 => Not Used
+ */
+#define MI_APG_INIT	0xdc000000
+#define MI_APG_KUEP	0xde000000
 
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MI_RPN is written, bits in
@@ -106,25 +105,9 @@
 #define MD_Ks		0x80000000	/* Should not be set */
 #define MD_Kp		0x40000000	/* Should always be set */
 
-/*
- * All pages' PP data bits are set to either 000 or 011 or 001, which means
- * respectively RW for Supervisor and no access for User, or RO for
- * Supervisor and no access for user and NA for ALL.
- * Then we use the APG to say whether accesses are according to Page rules or
- * "all Supervisor" rules (Access to all)
- * Therefore, we define 2 APG groups. lsb is _PMD_USER
- * 0 => Kernel => 01 (all accesses performed according to page definition)
- * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
- * 2-15 => Not Used
- */
-#define MD_APG_INIT	0x40000000
-
-/*
- * 0 => No user => 01 (all accesses performed according to page definition)
- * 1 => User => 10 (all accesses performed according to swaped page definition)
- * 2-15 => Not Used
- */
-#define MD_APG_KUAP	0x60000000
+/* See explanation above at the definition of MI_APG_INIT */
+#define MD_APG_INIT	0xdc000000
+#define MD_APG_KUAP	0xde000000
 
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MD_RPN is written, bits in
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 66f403a7da44..1581204467e1 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -39,9 +39,9 @@
  * into the TLB.
  */
 #define _PAGE_GUARDED	0x0010	/* Copied to L1 G entry in DTLB */
-#define _PAGE_SPECIAL	0x0020	/* SW entry */
+#define _PAGE_ACCESSED	0x0020	/* Copied to L1 APG 1 entry in I/DTLB */
 #define _PAGE_EXEC	0x0040	/* Copied to PP (bit 21) in ITLB */
-#define _PAGE_ACCESSED	0x0080	/* software: page referenced */
+#define _PAGE_SPECIAL	0x0080	/* SW entry */
 
 #define _PAGE_NA	0x0200	/* Supervisor NA, User no access */
 #define _PAGE_RO	0x0600	/* Supervisor RO, User no access */
@@ -59,11 +59,12 @@
 
 #define _PMD_PRESENT	0x0001
 #define _PMD_PRESENT_MASK	_PMD_PRESENT
-#define _PMD_BAD	0x0fd0
+#define _PMD_BAD	0x0f90
 #define _PMD_PAGE_MASK	0x000c
 #define _PMD_PAGE_8M	0x000c
 #define _PMD_PAGE_512K	0x0004
-#define _PMD_USER	0x0020	/* APG 1 */
+#define _PMD_ACCESSED	0x0020	/* APG 1 */
+#define _PMD_USER	0x0040	/* APG 2 */
 
 #define _PTE_NONE_MASK	0
 
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 6f3799a04121..ee0bfebc375f 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -222,23 +222,13 @@ InstructionTLBMiss:
 3:
 	mtcr	r11
 #endif
-#if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT)
 	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r10)	/* Get level 1 entry */
 	mtspr	SPRN_MD_TWC, r11
-#else
-	lwz	r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10)	/* Get level 1 entry */
-	mtspr	SPRN_MI_TWC, r10	/* Set segment attributes */
-	mtspr	SPRN_MD_TWC, r10
-#endif
 	mfspr	r10, SPRN_MD_TWC
 	lwz	r10, 0(r10)	/* Get the pte */
-#if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT)
+	rlwimi	r11, r10, 0, _PAGE_GUARDED | _PAGE_ACCESSED
 	rlwimi	r11, r10, 32 - 9, _PMD_PAGE_512K
 	mtspr	SPRN_MI_TWC, r11
-#endif
-	rlwinm	r11, r10, 32-7, _PAGE_PRESENT
-	and	r11, r11, r10
-	rlwimi	r10, r11, 0, _PAGE_PRESENT
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 20 and 23 must be clear.
 	 * Software indicator bits 22, 24, 25, 26, and 27 must be
@@ -289,28 +279,16 @@ DataStoreTLBMiss:
 	mfspr	r10, SPRN_MD_TWC
 	lwz	r10, 0(r10)	/* Get the pte */
 
-	/* Insert the Guarded flag into the TWC from the Linux PTE.
+	/* Insert Guarded and Accessed flags into the TWC from the Linux PTE.
 	 * It is bit 27 of both the Linux PTE and the TWC (at least
 	 * I got that right :-).  It will be better when we can put
 	 * this into the Linux pgd/pmd and load it in the operation
 	 * above.
 	 */
-	rlwimi	r11, r10, 0, _PAGE_GUARDED
+	rlwimi	r11, r10, 0, _PAGE_GUARDED | _PAGE_ACCESSED
 	rlwimi	r11, r10, 32 - 9, _PMD_PAGE_512K
 	mtspr	SPRN_MD_TWC, r11
 
-	/* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
-	 * We also need to know if the insn is a load/store, so:
-	 * Clear _PAGE_PRESENT and load that which will
-	 * trap into DTLB Error with store bit set accordinly.
-	 */
-	/* PRESENT=0x1, ACCESSED=0x20
-	 * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
-	 * r10 = (r10 & ~PRESENT) | r11;
-	 */
-	rlwinm	r11, r10, 32-7, _PAGE_PRESENT
-	and	r11, r11, r10
-	rlwimi	r10, r11, 0, _PAGE_PRESENT
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
@@ -701,7 +679,7 @@ initial_mmu:
 	li	r9, 4				/* up to 4 pages of 8M */
 	mtctr	r9
 	lis	r9, KERNELBASE@h		/* Create vaddr for TLB */
-	li	r10, MI_PS8MEG | MI_SVALID	/* Set 8M byte page */
+	li	r10, MI_PS8MEG | _PMD_ACCESSED | MI_SVALID
 	li	r11, MI_BOOTINIT		/* Create RPN for address 0 */
 1:
 	mtspr	SPRN_MI_CTR, r8	/* Set instruction MMU control */
@@ -765,7 +743,7 @@ _GLOBAL(mmu_pin_tlb)
 #ifdef CONFIG_PIN_TLB_TEXT
 	LOAD_REG_IMMEDIATE(r5, 28 << 8)
 	LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET)
-	LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG)
+	LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG | _PMD_ACCESSED)
 	LOAD_REG_IMMEDIATE(r8, 0xf0 | _PAGE_RO | _PAGE_SPS | _PAGE_SH | _PAGE_PRESENT)
 	LOAD_REG_ADDR(r9, _sinittext)
 	li	r0, 4
@@ -787,7 +765,7 @@ _GLOBAL(mmu_pin_tlb)
 	LOAD_REG_IMMEDIATE(r5, 28 << 8 | MD_TWAM)
 #ifdef CONFIG_PIN_TLB_DATA
 	LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET)
-	LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG)
+	LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG | _PMD_ACCESSED)
 #ifdef CONFIG_PIN_TLB_IMMR
 	li	r0, 3
 #else
@@ -824,7 +802,7 @@ _GLOBAL(mmu_pin_tlb)
 #endif
 #ifdef CONFIG_PIN_TLB_IMMR
 	LOAD_REG_IMMEDIATE(r0, VIRT_IMMR_BASE | MD_EVALID)
-	LOAD_REG_IMMEDIATE(r7, MD_SVALID | MD_PS512K | MD_GUARDED)
+	LOAD_REG_IMMEDIATE(r7, MD_SVALID | MD_PS512K | MD_GUARDED | _PMD_ACCESSED)
 	mfspr   r8, SPRN_IMMR
 	rlwinm	r8, r8, 0, 0xfff80000
 	ori	r8, r8, 0xf0 | _PAGE_DIRTY | _PAGE_SPS | _PAGE_SH | \
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc/40x: Always fault when _PAGE_ACCESSED is not set
From: Christophe Leroy @ 2020-10-10 15:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_40x.S | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 44c9018aed1b..a1ae00689e0f 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -284,11 +284,7 @@ _ENTRY(saved_ksp_limit)
 
 	rlwimi	r11, r10, 22, 20, 29	/* Compute PTE address */
 	lwz	r11, 0(r11)		/* Get Linux PTE */
-#ifdef CONFIG_SWAP
 	li	r9, _PAGE_PRESENT | _PAGE_ACCESSED
-#else
-	li	r9, _PAGE_PRESENT
-#endif
 	andc.	r9, r9, r11		/* Check permission */
 	bne	5f
 
@@ -369,11 +365,7 @@ _ENTRY(saved_ksp_limit)
 
 	rlwimi	r11, r10, 22, 20, 29	/* Compute PTE address */
 	lwz	r11, 0(r11)		/* Get Linux PTE */
-#ifdef CONFIG_SWAP
 	li	r9, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
-#else
-	li	r9, _PAGE_PRESENT | _PAGE_EXEC
-#endif
 	andc.	r9, r9, r11		/* Check permission */
 	bne	5f
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH 3/4] powerpc/8xx: Always fault when _PAGE_ACCESSED is not set
From: Christophe Leroy @ 2020-10-10 15:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <3f2386f94fada00b541be76d6bc4b6409746a72d.1602342819.git.christophe.leroy@csgroup.eu>

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.

Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_8xx.S | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 83101c5888e3..6f3799a04121 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -202,9 +202,7 @@ SystemCall:
 
 InstructionTLBMiss:
 	mtspr	SPRN_SPRG_SCRATCH0, r10
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS)
 	mtspr	SPRN_SPRG_SCRATCH1, r11
-#endif
 
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
@@ -238,11 +236,9 @@ InstructionTLBMiss:
 	rlwimi	r11, r10, 32 - 9, _PMD_PAGE_512K
 	mtspr	SPRN_MI_TWC, r11
 #endif
-#ifdef CONFIG_SWAP
 	rlwinm	r11, r10, 32-7, _PAGE_PRESENT
 	and	r11, r11, r10
 	rlwimi	r10, r11, 0, _PAGE_PRESENT
-#endif
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 20 and 23 must be clear.
 	 * Software indicator bits 22, 24, 25, 26, and 27 must be
@@ -256,9 +252,7 @@ InstructionTLBMiss:
 
 	/* Restore registers */
 0:	mfspr	r10, SPRN_SPRG_SCRATCH0
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS)
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-#endif
 	rfi
 	patch_site	0b, patch__itlbmiss_exit_1
 
@@ -268,9 +262,7 @@ InstructionTLBMiss:
 	addi	r10, r10, 1
 	stw	r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS)
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-#endif
 	rfi
 #endif
 
@@ -316,11 +308,9 @@ DataStoreTLBMiss:
 	 * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
 	 * r10 = (r10 & ~PRESENT) | r11;
 	 */
-#ifdef CONFIG_SWAP
 	rlwinm	r11, r10, 32-7, _PAGE_PRESENT
 	and	r11, r11, r10
 	rlwimi	r10, r11, 0, _PAGE_PRESENT
-#endif
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
-- 
2.25.0


^ permalink raw reply related

* [PATCH 2/4] powerpc/8xx: Fix TLB Miss exceptions with CONFIG_SWAP
From: Christophe Leroy @ 2020-10-10 15:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <3f2386f94fada00b541be76d6bc4b6409746a72d.1602342819.git.christophe.leroy@csgroup.eu>

When CONFIG_SWAP is set, _PAGE_ACCESSED gets ANDed with
_PAGE_PRESENT. But during several changes of _PAGE_ACCESSED value,
this operation which is based on bit rotation has been forgotten.

As of today it doesn't work.

Update to the rotation to the correct number of bits.

Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_8xx.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 32d85387bdc5..83101c5888e3 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -239,7 +239,7 @@ InstructionTLBMiss:
 	mtspr	SPRN_MI_TWC, r11
 #endif
 #ifdef CONFIG_SWAP
-	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
+	rlwinm	r11, r10, 32-7, _PAGE_PRESENT
 	and	r11, r11, r10
 	rlwimi	r10, r11, 0, _PAGE_PRESENT
 #endif
@@ -317,7 +317,7 @@ DataStoreTLBMiss:
 	 * r10 = (r10 & ~PRESENT) | r11;
 	 */
 #ifdef CONFIG_SWAP
-	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
+	rlwinm	r11, r10, 32-7, _PAGE_PRESENT
 	and	r11, r11, r10
 	rlwimi	r10, r11, 0, _PAGE_PRESENT
 #endif
-- 
2.25.0


^ permalink raw reply related

* [PATCH 1/4] powerpc/8xx: Fix instruction TLB miss exception with perf enabled
From: Christophe Leroy @ 2020-10-10 15:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

When perf is enabled, r11 must also be restored when CONFIG_HUGETLBFS
is selected.

Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_8xx.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 9f359d3fba74..32d85387bdc5 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -268,7 +268,7 @@ InstructionTLBMiss:
 	addi	r10, r10, 1
 	stw	r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS)
 	mfspr	r11, SPRN_SPRG_SCRATCH1
 #endif
 	rfi
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc/603: Always fault when _PAGE_ACCESSED is not set
From: Christophe Leroy @ 2020-10-10 15:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_book3s_32.S | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 5eb9eedac920..2aa16d5368e1 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -457,11 +457,7 @@ InstructionTLBMiss:
 	cmplw	0,r1,r3
 #endif
 	mfspr	r2, SPRN_SPRG_PGDIR
-#ifdef CONFIG_SWAP
 	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
-#else
-	li	r1,_PAGE_PRESENT | _PAGE_EXEC
-#endif
 #if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC)
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
@@ -523,11 +519,7 @@ DataLoadTLBMiss:
 	lis	r1, TASK_SIZE@h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SPRG_PGDIR
-#ifdef CONFIG_SWAP
 	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
-#else
-	li	r1, _PAGE_PRESENT
-#endif
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
@@ -603,11 +595,7 @@ DataStoreTLBMiss:
 	lis	r1, TASK_SIZE@h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SPRG_PGDIR
-#ifdef CONFIG_SWAP
 	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
-#else
-	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT
-#endif
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
From: Catalin Marinas @ 2020-10-10 11:09 UTC (permalink / raw)
  To: Khalid Aziz
  Cc: Jann Horn, linuxppc-dev, linux-kernel, Christoph Hellwig,
	linux-mm, Paul Mackerras, sparclinux, Anthony Yznaga,
	Andrew Morton, Will Deacon, David S. Miller, linux-arm-kernel
In-Reply-To: <d5332a7b-c300-6d28-18b9-4b7d4110ef86@oracle.com>

Hi Khalid,

On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote:
> On 10/7/20 1:39 AM, Jann Horn wrote:
> > arch_validate_prot() is a hook that can validate whether a given set of
> > protection flags is valid in an mprotect() operation. It is given the set
> > of protection flags and the address being modified.
> > 
> > However, the address being modified can currently not actually be used in
> > a meaningful way because:
> > 
> > 1. Only the address is given, but not the length, and the operation can
> >    span multiple VMAs. Therefore, the callee can't actually tell which
> >    virtual address range, or which VMAs, are being targeted.
> > 2. The mmap_lock is not held, meaning that if the callee were to check
> >    the VMA at @addr, that VMA would be unrelated to the one the
> >    operation is performed on.
> > 
> > Currently, custom arch_validate_prot() handlers are defined by
> > arm64, powerpc and sparc.
> > arm64 and powerpc don't care about the address range, they just check the
> > flags against CPU support masks.
> > sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take
> > the mmap_lock.
> > 
> > Change the function signature to also take a length, and move the
> > arch_validate_prot() call in mm/mprotect.c down into the locked region.
[...]
> As Chris pointed out, the call to arch_validate_prot() from do_mmap2()
> is made without holding mmap_lock. Lock is not acquired until
> vm_mmap_pgoff(). This variance is uncomfortable but I am more
> uncomfortable forcing all implementations of validate_prot to require
> mmap_lock be held when non-sparc implementations do not have such need
> yet. Since do_mmap2() is in powerpc specific code, for now this patch
> solves a current problem.

I still think sparc should avoid walking the vmas in
arch_validate_prot(). The core code already has the vmas, though not
when calling arch_validate_prot(). That's one of the reasons I added
arch_validate_flags() with the MTE patches. For sparc, this could be
(untested, just copied the arch_validate_prot() code):

static inline bool arch_validate_flags(unsigned long vm_flags)
{
	if (!(vm_flags & VM_SPARC_ADI))
		return true;

	if (!adi_capable())
		return false;

	/* ADI can not be enabled on PFN mapped pages */
	if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
		return false;

	/*
	 * Mergeable pages can become unmergeable if ADI is enabled on
	 * them even if they have identical data on them. This can be
	 * because ADI enabled pages with identical data may still not
	 * have identical ADI tags on them. Disallow ADI on mergeable
	 * pages.
	 */
	if (vma->vm_flags & VM_MERGEABLE)
		return false;

	return true;
}

> That leaves open the question of should
> generic mmap call arch_validate_prot and return EINVAL for invalid
> combination of protection bits, but that is better addressed in a
> separate patch.

The above would cover mmap() as well.

The current sparc_validate_prot() relies on finding the vma for the
corresponding address. However, if you call this early in the mmap()
path, there's no such vma. It is only created later in mmap_region()
which no longer has the original PROT_* flags (all converted to VM_*
flags).

Calling arch_validate_flags() on mmap() has a small side-effect on the
user ABI: if the CPU is not adi_capable(), PROT_ADI is currently ignored
on mmap() but rejected by sparc_validate_prot(). Powerpc already does
this already and I think it should be fine for arm64 (it needs checking
though as we have another flag, PROT_BTI, hopefully dynamic loaders
don't pass this flag unconditionally).

However, as I said above, it doesn't solve the mmap() PROT_ADI checking
for sparc since there's no vma yet. I'd strongly recommend the
arch_validate_flags() approach and reverting the "start" parameter added
to arch_validate_prot() if you go for the flags route.

Thanks.

-- 
Catalin

^ permalink raw reply

* Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()
From: James Bottomley @ 2020-10-10  2:43 UTC (permalink / raw)
  To: Eric Biggers, ira.weiny
  Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
	Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
	linux-kselftest, Thomas Gleixner, drbd-dev, devel, linux-cifs,
	linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma, x86, amd-gfx,
	linux-afs, cluster-devel, Ingo Molnar, intel-wired-lan, kexec,
	xen-devel, linux-ext4, bpf, Dan Williams, Fenghua Yu, intel-gfx,
	ecryptfs, linux-um, reiserfs-devel, linux-block, linux-bcache,
	Borislav Petkov, Andy Lutomirski, Jaegeuk Kim, ceph-devel,
	io-uring, linux-cachefs, linux-nfs, linux-ntfs-dev, netdev,
	linuxppc-dev, samba-technical, linux-kernel, linux-f2fs-devel,
	linux-fsdevel, Andrew Morton, linux-erofs, linux-btrfs
In-Reply-To: <20201009213434.GA839@sol.localdomain>

On Fri, 2020-10-09 at 14:34 -0700, Eric Biggers wrote:
> On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > The kmap() calls in this FS are localized to a single thread.  To
> > avoid the over head of global PKRS updates use the new
> > kmap_thread() call.
> > 
> > Cc: Jaegeuk Kim <jaegeuk@kernel.org>
> > Cc: Chao Yu <chao@kernel.org>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  fs/f2fs/f2fs.h | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index d9e52a7f3702..ff72a45a577e 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -2410,12 +2410,12 @@ static inline struct page
> > *f2fs_pagecache_get_page(
> >  
> >  static inline void f2fs_copy_page(struct page *src, struct page
> > *dst)
> >  {
> > -	char *src_kaddr = kmap(src);
> > -	char *dst_kaddr = kmap(dst);
> > +	char *src_kaddr = kmap_thread(src);
> > +	char *dst_kaddr = kmap_thread(dst);
> >  
> >  	memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > -	kunmap(dst);
> > -	kunmap(src);
> > +	kunmap_thread(dst);
> > +	kunmap_thread(src);
> >  }
> 
> Wouldn't it make more sense to switch cases like this to
> kmap_atomic()?
> The pages are only mapped to do a memcpy(), then they're immediately
> unmapped.

On a VIPT/VIVT architecture, this is horrendously wasteful.  You're
taking something that was mapped at colour c_src mapping it to a new
address src_kaddr, which is likely a different colour and necessitates
flushing the original c_src, then you copy it to dst_kaddr, which is
also likely a different colour from c_dst, so dst_kaddr has to be
flushed on kunmap and c_dst has to be invalidated on kmap.  What we
should have is an architectural primitive for doing this, something
like kmemcopy_arch(dst, src).  PIPT architectures can implement it as
the above (possibly losing kmap if they don't need it) but VIPT/VIVT
architectures can set up a correctly coloured mapping so they can
simply copy from c_src to c_dst without any need to flush and the data
arrives cache hot at c_dst.

James



^ permalink raw reply

* Re: [PATCH RFC PKS/PMEM 51/58] kernel: Utilize new kmap_thread()
From: Eric W. Biederman @ 2020-10-10  3:43 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
	Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
	linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
	devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
	linux-rdma, x86, ceph-devel, amd-gfx, io-uring, cluster-devel,
	Ingo Molnar, intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu,
	linux-afs, linux-um, intel-gfx, ecryptfs, linux-erofs,
	reiserfs-devel, linux-block, linux-bcache, Borislav Petkov,
	Andy Lutomirski, Dan Williams, Andrew Morton, linux-cachefs,
	linux-nfs, linux-ntfs-dev, netdev, kexec, linux-kernel,
	linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-52-ira.weiny@intel.com>

ira.weiny@intel.com writes:

> From: Ira Weiny <ira.weiny@intel.com>
>
> This kmap() call is localized to a single thread.  To avoid the over
> head of global PKRS updates use the new kmap_thread() call.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  kernel/kexec_core.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..272a9920c0d6 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -815,7 +815,7 @@ static int kimage_load_normal_segment(struct kimage *image,
>  		if (result < 0)
>  			goto out;
>  
> -		ptr = kmap(page);
> +		ptr = kmap_thread(page);
>  		/* Start with a clear page */
>  		clear_page(ptr);
>  		ptr += maddr & ~PAGE_MASK;
> @@ -828,7 +828,7 @@ static int kimage_load_normal_segment(struct kimage *image,
>  			memcpy(ptr, kbuf, uchunk);
>  		else
>  			result = copy_from_user(ptr, buf, uchunk);
> -		kunmap(page);
> +		kunmap_thread(page);
>  		if (result) {
>  			result = -EFAULT;
>  			goto out;
> @@ -879,7 +879,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>  			goto out;
>  		}
>  		arch_kexec_post_alloc_pages(page_address(page), 1, 0);
> -		ptr = kmap(page);
> +		ptr = kmap_thread(page);
>  		ptr += maddr & ~PAGE_MASK;
>  		mchunk = min_t(size_t, mbytes,
>  				PAGE_SIZE - (maddr & ~PAGE_MASK));
> @@ -895,7 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>  		else
>  			result = copy_from_user(ptr, buf, uchunk);
>  		kexec_flush_icache_page(page);
> -		kunmap(page);
> +		kunmap_thread(page);
>  		arch_kexec_pre_free_pages(page_address(page), 1);
>  		if (result) {
>  			result = -EFAULT;

^ permalink raw reply

* Re: [PATCH RFC PKS/PMEM 57/58] nvdimm/pmem: Stray access protection for pmem->virt_addr
From: John Hubbard @ 2020-10-10  2:53 UTC (permalink / raw)
  To: ira.weiny, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Peter Zijlstra
  Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
	dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
	samba-technical, ceph-devel, drbd-dev, devel, linux-cifs,
	linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma, x86, amd-gfx,
	linux-afs, cluster-devel, linux-cachefs, intel-wired-lan,
	xen-devel, linux-ext4, Fenghua Yu, linux-um, intel-gfx, ecryptfs,
	linux-erofs, reiserfs-devel, linux-block, linux-bcache,
	Dan Williams, io-uring, linux-nfs, linux-ntfs-dev, netdev, kexec,
	linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev,
	linux-btrfs
In-Reply-To: <20201009195033.3208459-58-ira.weiny@intel.com>

On 10/9/20 12:50 PM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The pmem driver uses a cached virtual address to access its memory
> directly.  Because the nvdimm driver is well aware of the special
> protections it has mapped memory with, we call dev_access_[en|dis]able()
> around the direct pmem->virt_addr (pmem_addr) usage instead of the
> unnecessary overhead of trying to get a page to kmap.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>   drivers/nvdimm/pmem.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index fab29b514372..e4dc1ae990fc 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
>   	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
>   		return BLK_STS_IOERR;
>   
> +	dev_access_enable(false);
>   	rc = read_pmem(page, page_off, pmem_addr, len);
> +	dev_access_disable(false);

Hi Ira!

The APIs should be tweaked to use a symbol (GLOBAL, PER_THREAD), instead of
true/false. Try reading the above and you'll see that it sounds like it's
doing the opposite of what it is ("enable_this(false)" sounds like a clumsy
API design to *disable*, right?). And there is no hint about the scope.

And it *could* be so much more readable like this:

     dev_access_enable(DEV_ACCESS_THIS_THREAD);



thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH RFC PKS/PMEM 48/58] drivers/md: Utilize new kmap_thread()
From: Coly Li @ 2020-10-10  2:20 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
	Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
	linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
	devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
	linux-rdma, x86, ceph-devel, amd-gfx, io-uring, cluster-devel,
	Ingo Molnar, intel-wired-lan, xen-devel, linux-ext4,
	Kent Overstreet, Fenghua Yu, linux-afs, linux-um, intel-gfx,
	ecryptfs, linux-erofs, reiserfs-devel, linux-block, linux-bcache,
	Borislav Petkov, Andy Lutomirski, Dan Williams, Andrew Morton,
	linux-cachefs, linux-nfs, linux-ntfs-dev, netdev, kexec,
	linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev,
	linux-btrfs
In-Reply-To: <20201009195033.3208459-49-ira.weiny@intel.com>

On 2020/10/10 03:50, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> These kmap() calls are localized to a single thread.  To avoid the over
> head of global PKRS updates use the new kmap_thread() call.
> 

Hi Ira,

There were a number of options considered.

1) Attempt to change all the thread local kmap() calls to kmap_atomic()
2) Introduce a flags parameter to kmap() to indicate if the mapping
should be global or not
3) Change ~20-30 call sites to 'kmap_global()' to indicate that they
require a global mapping of the pages
4) Change ~209 call sites to 'kmap_thread()' to indicate that the
mapping is to be used within that thread of execution only


I copied the above information from patch 00/58 to this message. The
idea behind kmap_thread() is fine to me, but as you said the new api is
very easy to be missed in new code (even for me). I would like to be
supportive to option 2) introduce a flag to kmap(), then we won't forget
the new thread-localized kmap method, and people won't ask why a
_thread() function is called but no kthread created.

Thanks.


Coly Li



> Cc: Coly Li <colyli@suse.de> (maintainer:BCACHE (BLOCK LAYER CACHE))
> Cc: Kent Overstreet <kent.overstreet@gmail.com> (maintainer:BCACHE (BLOCK LAYER CACHE))
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  drivers/md/bcache/request.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index c7cadaafa947..a4571f6d09dd 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -44,10 +44,10 @@ static void bio_csum(struct bio *bio, struct bkey *k)
>  	uint64_t csum = 0;
>  
>  	bio_for_each_segment(bv, bio, iter) {
> -		void *d = kmap(bv.bv_page) + bv.bv_offset;
> +		void *d = kmap_thread(bv.bv_page) + bv.bv_offset;
>  
>  		csum = bch_crc64_update(csum, d, bv.bv_len);
> -		kunmap(bv.bv_page);
> +		kunmap_thread(bv.bv_page);
>  	}
>  
>  	k->ptr[KEY_PTRS(k)] = csum & (~0ULL >> 1);
> 


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox