LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v8 00/30] Initial Prefixed Instruction support
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
  To: linuxppc-dev, Jordan Niethe; +Cc: alistair, naveen.n.rao, dja, npiggin, bala24
In-Reply-To: <20200506034050.24806-1-jniethe5@gmail.com>

On Wed, 6 May 2020 13:40:20 +1000, Jordan Niethe wrote:
> A future revision of the ISA will introduce prefixed instructions. A
> prefixed instruction is composed of a 4-byte prefix followed by a
> 4-byte suffix.
> 
> All prefixes have the major opcode 1. A prefix will never be a valid
> word instruction. A suffix may be an existing word instruction or a
> new instruction.
> 
> [...]

Applied to powerpc/next.

[01/30] powerpc/xmon: Remove store_inst() for patch_instruction()
        https://git.kernel.org/powerpc/c/802268fd82676ffce432776f60b93a0b15e58e0c
[02/30] powerpc/xmon: Move breakpoint instructions to own array
        https://git.kernel.org/powerpc/c/51c9ba11f17f25ace1ea6bbfd4586c59105432de
[03/30] powerpc/xmon: Move breakpoints to text section
        https://git.kernel.org/powerpc/c/4eff2b4f32a309e2171bfe53db3e93b5614f77cb
[04/30] powerpc/xmon: Use bitwise calculations in_breakpoint_table()
        https://git.kernel.org/powerpc/c/5a7fdcab54ef17c395fc47e73c828a1432e51683
[05/30] powerpc: Change calling convention for create_branch() et. al.
        https://git.kernel.org/powerpc/c/7c95d8893fb55869882c9f68f4c94840dc43f18f
[06/30] powerpc: Use a macro for creating instructions from u32s
        https://git.kernel.org/powerpc/c/753462512868674a788ecc77bb96752efb818785
[07/30] powerpc: Use an accessor for instructions
        https://git.kernel.org/powerpc/c/777e26f0edf8dab58b8dd474d35d83bde0ac6d76
[08/30] powerpc: Use a function for getting the instruction op code
        https://git.kernel.org/powerpc/c/8094892d1aff14269d3b7bfcd8b941217eecd81f
[09/30] powerpc: Use a function for byte swapping instructions
        https://git.kernel.org/powerpc/c/aabd2233b6aefeee6d7a2f667076d8346be1d30a
[10/30] powerpc: Introduce functions for instruction equality
        https://git.kernel.org/powerpc/c/217862d9b98bf08958d57fd7b31b9de0f1a9477d
[11/30] powerpc: Use a datatype for instructions
        https://git.kernel.org/powerpc/c/94afd069d937d84fb4f696eb9a78db4084e43d21
[12/30] powerpc: Use a function for reading instructions
        https://git.kernel.org/powerpc/c/f8faaffaa7d99028e457ef2d1dcb43a98f736938
[13/30] powerpc: Add a probe_user_read_inst() function
        https://git.kernel.org/powerpc/c/7ba68b2172c19031fdc2a2caf37328edd146e299
[14/30] powerpc: Add a probe_kernel_read_inst() function
        https://git.kernel.org/powerpc/c/95b980a00d1220ca67550a933166704db8bc5c14
[15/30] powerpc/kprobes: Use patch_instruction()
        https://git.kernel.org/powerpc/c/a8646f43ba5046e7f5c4396125d5136bfcb17b49
[16/30] powerpc: Define and use get_user_instr() et. al.
        https://git.kernel.org/powerpc/c/5249385ad7f0ac178433f0ae9cc5b64612c8ff77
[17/30] powerpc: Introduce a function for reporting instruction length
        https://git.kernel.org/powerpc/c/622cf6f436a12338bbcfbb3474629755547fd112
[18/30] powerpc/xmon: Use a function for reading instructions
        https://git.kernel.org/powerpc/c/6c7a4f0a9f66fc7fdc6e208559e5d562f53e0991
[19/30] powerpc/xmon: Move insertion of breakpoint for xol'ing
        https://git.kernel.org/powerpc/c/7fccfcfba04f9cb46438f368755d368f6c57f3a0
[20/30] powerpc: Make test_translate_branch() independent of instruction length
        https://git.kernel.org/powerpc/c/0b582db5490a1f250ef63337dd46d5c7599dae80
[21/30] powerpc: Enable Prefixed Instructions
        https://git.kernel.org/powerpc/c/2aa6195e43b3740258ead93aee42ac719dd4c4b0
[22/30] powerpc: Define new SRR1 bits for a ISA v3.1
        https://git.kernel.org/powerpc/c/b691505ef9232a6e82f1c160911afcb4cb20487b
[23/30] powerpc: Add prefixed instructions to instruction data type
        https://git.kernel.org/powerpc/c/650b55b707fdfa764e9f2b81314d3eb4216fb962
[24/30] powerpc: Test prefixed code patching
        https://git.kernel.org/powerpc/c/f77f8ff7f13e6411c2e0ba25bb7e012a5ae6c927
[25/30] powerpc: Test prefixed instructions in feature fixups
        https://git.kernel.org/powerpc/c/785b79d1e02873c2088ee1301154c66dace66ce5
[26/30] powerpc/xmon: Don't allow breakpoints on suffixes
        https://git.kernel.org/powerpc/c/c9c831aebd8663d0129bbcee4d76be889f0627fe
[27/30] powerpc/kprobes: Don't allow breakpoints on suffixes
        https://git.kernel.org/powerpc/c/b4657f7650babc9bfb41ce875abe41b18604a105
[28/30] powerpc: Support prefixed instructions in alignment handler
        https://git.kernel.org/powerpc/c/9409d2f9dad2f0679d67dc24d8116dd3e837b035
[29/30] powerpc sstep: Add support for prefixed load/stores
        https://git.kernel.org/powerpc/c/50b80a12e4ccff46d53b93754d817acd98bc9ae0
[30/30] powerpc sstep: Add support for prefixed fixed-point arithmetic
        https://git.kernel.org/powerpc/c/3920742b92f5ea19a220edb947b6f33c99f501da

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: Replace zero-length array with flexible-array
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Gustavo A. R. Silva; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200507185749.GA14994@embeddedor>

On Thu, 7 May 2020 13:57:49 -0500, Gustavo A. R. Silva wrote:
> The current codebase makes use of the zero-length array language
> extension to the C90 standard, but the preferred mechanism to declare
> variable-length types such as these ones is a flexible array member[1][2],
> introduced in C99:
> 
> struct foo {
>         int stuff;
>         struct boo array[];
> };
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Replace zero-length array with flexible-array
      https://git.kernel.org/powerpc/c/0f6be41c60699fd8cdfa93e5e85a306cec1ac1d0

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Replace zero-length array with flexible-array
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
  To: Michael Ellerman, Gustavo A. R. Silva
  Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <20200507185755.GA15014@embeddedor>

On Thu, 7 May 2020 13:57:55 -0500, Gustavo A. R. Silva wrote:
> The current codebase makes use of the zero-length array language
> extension to the C90 standard, but the preferred mechanism to declare
> variable-length types such as these ones is a flexible array member[1][2],
> introduced in C99:
> 
> struct foo {
>         int stuff;
>         struct boo array[];
> };
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/mm: Replace zero-length array with flexible-array
      https://git.kernel.org/powerpc/c/02bddf21c34d0a918acc8647195ba4507e3db8fc

cheers

^ permalink raw reply

* Re: [PATCH v2 0/9] powerpc + ps3 patches
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
  To: Michael Ellerman, Jens Axboe, Geoff Levand, David S. Miller
  Cc: Geert Uytterhoeven, linuxppc-dev, Markus Elfring,
	Emmanuel Nicolet
In-Reply-To: <cover.1589049250.git.geoff@infradead.org>

On Sat, 09 May 2020 18:58:31 +0000, Geoff Levand wrote:
> This is a combined V2 of the two patch sets I sent out on March 27th,
> 'PS3 patches for v5.7' and 'powerpc: Minor updates to improve build debugging'.
> 
> I've dropped these two patches that were in my 'PS3 patches for v5.7' set:
> 
>       powerpc/ps3: Add lv1_panic
>       powerpc/ps3: Add udbg_panic
> 
> [...]

Patches 1-6 and 8 applied to powerpc/next.

[1/9] powerpc/head_check: Automatic verbosity
      https://git.kernel.org/powerpc/c/4c592a34391ea4987d29c1718f931b50416ca015
[2/9] powerpc/wrapper: Output linker map file
      https://git.kernel.org/powerpc/c/f61200d3e3386e78d49677dfb3911c9d7c0dfe4b
[3/9] powerpc/head_check: Avoid broken pipe
      https://git.kernel.org/powerpc/c/331aa46aaf51325d8532a4948f5127b2edc441a5
[4/9] drivers/ps3: Remove duplicate error messages
      https://git.kernel.org/powerpc/c/6a8aa782cece2330322c33452a767f53f8ba38c9
[5/9] net/ps3_gelic_net: Remove duplicate error message
      https://git.kernel.org/powerpc/c/7b27b95a894d6a85c076f8d1f00e35316739bf51
[6/9] ps3disk: use the default segment boundary
      https://git.kernel.org/powerpc/c/720bc316690bd27dea9d71510b50f0cd698ffc32
[8/9] powerpc/ps3: Fix kexec shutdown hang
      https://git.kernel.org/powerpc/c/126554465d93b10662742128918a5fc338cda4aa

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: Update email address in MAINTAINERS
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Michael Ellerman, Christophe Leroy,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <9fd0f9a827ebbeae64ad7a6f6c595d242f4dd5fc.1588747860.git.christophe.leroy@csgroup.eu>

On Wed, 6 May 2020 06:51:59 +0000 (UTC), Christophe Leroy wrote:
> Since 01 May 2020, our email adresses have changed to @csgroup.eu
> 
> Update MAINTAINERS accordingly.

Applied to powerpc/next.

[1/1] powerpc/8xx: Update email address in MAINTAINERS
      https://git.kernel.org/powerpc/c/679d74abc4e14cb369e46f080d90f4dc8c143e65

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
  To: Christophe Leroy, erhard_f, Paul Mackerras,
	Benjamin Herrenschmidt, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <2c50f3b1c9bbaa4217c9a98f3044bd2a36c46a4f.1586361277.git.christophe.leroy@c-s.fr>

On Wed, 8 Apr 2020 15:58:49 +0000 (UTC), Christophe Leroy wrote:
> When CONFIG_KASAN is selected, the stack usage is increased.
> 
> In the same way as x86 and arm64 architectures, increase
> THREAD_SHIFT when CONFIG_KASAN is selected.

Applied to powerpc/next.

[1/1] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
      https://git.kernel.org/powerpc/c/edbadaf0671072298e506074128b64e003c5812c

cheers

^ permalink raw reply

* Re: [PATCH 1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN()
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Michael Ellerman, Christophe Leroy,
	Paul Mackerras
  Cc: linux-fbdev, kvm, alsa-devel, dri-devel, linux-kernel,
	linuxppc-dev
In-Reply-To: <a5945463f86c984151962a475a3ee56a2893e85d.1587407777.git.christophe.leroy@c-s.fr>

On Mon, 20 Apr 2020 18:36:34 +0000 (UTC), Christophe Leroy wrote:
> _ALIGN_UP() is specific to powerpc
> ALIGN() is generic and does the same
> 
> Replace _ALIGN_UP() by ALIGN()

Applied to powerpc/next.

[1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN()
      https://git.kernel.org/powerpc/c/7bfc3c84cbf5167d943cff9b3d2619dab0b7894c
[2/5] powerpc: Replace _ALIGN_DOWN() by ALIGN_DOWN()
      https://git.kernel.org/powerpc/c/e96d904ede6756641563d27daa746875b478a6c8
[3/5] powerpc: Replace _ALIGN_UP() by ALIGN()
      https://git.kernel.org/powerpc/c/b711531641038f3ff3723914f3d5ba79848d347e
[4/5] powerpc: Replace _ALIGN() by ALIGN()
      https://git.kernel.org/powerpc/c/d3f3d3bf76cfb04e73436a15e3987d3573e7523a
[5/5] powerpc: Remove _ALIGN_UP(), _ALIGN_DOWN() and _ALIGN()
      https://git.kernel.org/powerpc/c/4cdb2da654033d76e1b1cb4ac427d9193dce816b

cheers

^ permalink raw reply

* [PATCH] Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."
From: Christophe Leroy @ 2020-05-20 10:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rui Salvaterra
  Cc: linuxppc-dev, linux-kernel

This reverts commit 697ece78f8f749aeea40f2711389901f0974017a.

The implementation of SWAP on powerpc requires page protection
bits to not be one of the least significant PTE bits.

Until the SWAP implementation is changed and this requirement voids,
we have to keep at least _PAGE_RW outside of the 3 last bits.

For now, revert to previous PTE bits order. A further rework
may come later.

Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Fixes: 697ece78f8f7 ("powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/book3s/32/hash.h |  8 ++++----
 arch/powerpc/kernel/head_32.S             |  9 ++++++---
 arch/powerpc/mm/book3s32/hash_low.S       | 14 ++++++++------
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h b/arch/powerpc/include/asm/book3s/32/hash.h
index 34a7215ae81e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -17,9 +17,9 @@
  * updating the accessed and modified bits in the page table tree.
  */
 
-#define _PAGE_USER	0x001	/* usermode access allowed */
-#define _PAGE_RW	0x002	/* software: user write access allowed */
-#define _PAGE_PRESENT	0x004	/* software: pte contains a translation */
+#define _PAGE_PRESENT	0x001	/* software: pte contains a translation */
+#define _PAGE_HASHPTE	0x002	/* hash_page has made an HPTE for this pte */
+#define _PAGE_USER	0x004	/* usermode access allowed */
 #define _PAGE_GUARDED	0x008	/* G: prohibit speculative access */
 #define _PAGE_COHERENT	0x010	/* M: enforce memory coherence (SMP systems) */
 #define _PAGE_NO_CACHE	0x020	/* I: cache inhibit */
@@ -27,7 +27,7 @@
 #define _PAGE_DIRTY	0x080	/* C: page changed */
 #define _PAGE_ACCESSED	0x100	/* R: page referenced */
 #define _PAGE_EXEC	0x200	/* software: exec allowed */
-#define _PAGE_HASHPTE	0x400	/* hash_page has made an HPTE for this pte */
+#define _PAGE_RW	0x400	/* software: user write access allowed */
 #define _PAGE_SPECIAL	0x800	/* software: Special page */
 
 #ifdef CONFIG_PTE_64BIT
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index daaa153950c2..97c887950c3c 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -348,7 +348,7 @@ BEGIN_MMU_FTR_SECTION
 	andis.	r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH)@h
 #endif
 	bne	handle_page_fault_tramp_2	/* if not, try to put a PTE */
-	rlwinm	r3, r5, 32 - 24, 30, 30		/* DSISR_STORE -> _PAGE_RW */
+	rlwinm	r3, r5, 32 - 15, 21, 21		/* DSISR_STORE -> _PAGE_RW */
 	bl	hash_page
 	b	handle_page_fault_tramp_1
 FTR_SECTION_ELSE
@@ -497,6 +497,7 @@ InstructionTLBMiss:
 	andc.	r1,r1,r0		/* check access & ~permission */
 	bne-	InstructionAddressInvalid /* return if access not permitted */
 	/* Convert linux-style PTE to low word of PPC-style PTE */
+	rlwimi	r0,r0,32-2,31,31	/* _PAGE_USER -> PP lsb */
 	ori	r1, r1, 0xe06		/* clear out reserved bits */
 	andc	r1, r0, r1		/* PP = user? 1 : 0 */
 BEGIN_FTR_SECTION
@@ -564,8 +565,9 @@ DataLoadTLBMiss:
 	 * we would need to update the pte atomically with lwarx/stwcx.
 	 */
 	/* Convert linux-style PTE to low word of PPC-style PTE */
-	rlwinm	r1,r0,0,30,30		/* _PAGE_RW -> PP msb */
-	rlwimi	r0,r0,1,30,30		/* _PAGE_USER -> PP msb */
+	rlwinm	r1,r0,32-9,30,30	/* _PAGE_RW -> PP msb */
+	rlwimi	r0,r0,32-1,30,30	/* _PAGE_USER -> PP msb */
+	rlwimi	r0,r0,32-1,31,31	/* _PAGE_USER -> PP lsb */
 	ori	r1,r1,0xe04		/* clear out reserved bits */
 	andc	r1,r0,r1		/* PP = user? rw? 1: 3: 0 */
 BEGIN_FTR_SECTION
@@ -643,6 +645,7 @@ DataStoreTLBMiss:
 	 * we would need to update the pte atomically with lwarx/stwcx.
 	 */
 	/* Convert linux-style PTE to low word of PPC-style PTE */
+	rlwimi	r0,r0,32-2,31,31	/* _PAGE_USER -> PP lsb */
 	li	r1,0xe06		/* clear out reserved bits & PP msb */
 	andc	r1,r0,r1		/* PP = user? 1: 0 */
 BEGIN_FTR_SECTION
diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S
index 6d236080cb1a..877d880890fe 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -35,7 +35,7 @@ mmu_hash_lock:
 /*
  * Load a PTE into the hash table, if possible.
  * The address is in r4, and r3 contains an access flag:
- * _PAGE_RW (0x002) if a write.
+ * _PAGE_RW (0x400) if a write.
  * r9 contains the SRR1 value, from which we use the MSR_PR bit.
  * SPRG_THREAD contains the physical address of the current task's thread.
  *
@@ -69,7 +69,7 @@ _GLOBAL(hash_page)
 	blt+	112f			/* assume user more likely */
 	lis	r5, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
 	addi	r5 ,r5 ,(swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
-	rlwimi	r3,r9,32-14,31,31	/* MSR_PR -> _PAGE_USER */
+	rlwimi	r3,r9,32-12,29,29	/* MSR_PR -> _PAGE_USER */
 112:
 #ifndef CONFIG_PTE_64BIT
 	rlwimi	r5,r4,12,20,29		/* insert top 10 bits of address */
@@ -94,7 +94,7 @@ _GLOBAL(hash_page)
 #else
 	rlwimi	r8,r4,23,20,28		/* compute pte address */
 #endif
-	rlwinm	r0,r3,6,24,24		/* _PAGE_RW access -> _PAGE_DIRTY */
+	rlwinm	r0,r3,32-3,24,24	/* _PAGE_RW access -> _PAGE_DIRTY */
 	ori	r0,r0,_PAGE_ACCESSED|_PAGE_HASHPTE
 
 	/*
@@ -310,9 +310,11 @@ Hash_msk = (((1 << Hash_bits) - 1) * 64)
 
 _GLOBAL(create_hpte)
 	/* Convert linux-style PTE (r5) to low word of PPC-style PTE (r8) */
+	rlwinm	r8,r5,32-9,30,30	/* _PAGE_RW -> PP msb */
 	rlwinm	r0,r5,32-6,30,30	/* _PAGE_DIRTY -> PP msb */
-	and	r8,r5,r0		/* writable if _RW & _DIRTY */
-	rlwimi	r5,r5,1,30,30		/* _PAGE_USER -> PP msb */
+	and	r8,r8,r0		/* writable if _RW & _DIRTY */
+	rlwimi	r5,r5,32-1,30,30	/* _PAGE_USER -> PP msb */
+	rlwimi	r5,r5,32-2,31,31	/* _PAGE_USER -> PP lsb */
 	ori	r8,r8,0xe04		/* clear out reserved bits */
 	andc	r8,r5,r8		/* PP = user? (rw&dirty? 1: 3): 0 */
 BEGIN_FTR_SECTION
@@ -564,7 +566,7 @@ _GLOBAL(flush_hash_pages)
 33:	lwarx	r8,0,r5			/* fetch the pte flags word */
 	andi.	r0,r8,_PAGE_HASHPTE
 	beq	8f			/* done if HASHPTE is already clear */
-	rlwinm	r8,r8,0,~_PAGE_HASHPTE	/* clear HASHPTE bit */
+	rlwinm	r8,r8,0,31,29		/* clear HASHPTE bit */
 	stwcx.	r8,0,r5			/* update the pte */
 	bne-	33b
 
-- 
2.25.0


^ permalink raw reply related

* Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-05-20 10:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, linux-nvdimm, linux-kernel; +Cc: Steven Rostedt
In-Reply-To: <87a723f5fs.fsf@linux.ibm.com>

Thanks for reviewing this patch Aneesh.

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>
> ....
>
>  +
>> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> +struct nd_pdsm_cmd_pkg {
>> +	struct nd_cmd_pkg hdr;	/* Package header containing sub-cmd */
>> +	__s32 cmd_status;	/* Out: Sub-cmd status returned back */
>> +	__u16 payload_offset;	/* In: offset from start of struct */
>> +	__u16 payload_version;	/* In/Out: version of the payload */
>> +	__u8 payload[];		/* In/Out: Sub-cmd data buffer */
>> +} __packed;
>
> that payload_offset can be avoided if we prevent userspace to user a
> different variant of nd_pdsm_cmd_pkg which different header. We can keep
> things simpler if we can always find payload at
> nd_pdsm_cmd_pkg->payload.
Had introduced this member to handle case where new fields are added to
'struct nd_pdsm_cmd_pkg' without having to break the ABI. But agree with
the point that you made later that this can be simplified by replacing
'payload_offset' with a set of reserved variables. Will address this in
next iteration of this patchset.

>
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> + */
>> +enum papr_scm_pdsm {
>> +	PAPR_SCM_PDSM_MIN = 0x0,
>> +	PAPR_SCM_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> +{
>> +	return (struct nd_pdsm_cmd_pkg *) cmd;
>> +}
>> +
>> +/* Return the payload pointer for a given pcmd */
>> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> +{
>> +	if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> +		return NULL;
>> +	else
>> +		return (void *)((__u8 *) pcmd + pcmd->payload_offset);
>> +}
>> +
>
> we need to make sure userspace is not passing a wrong payload_offset.
Agree, that this function should have more strict checking for
payload_offset field. However will be getting rid of
'payload_offset' all together in the next iteration as you previously
suggested.

> and in the next patch you do
>
> +	/* Copy the health struct to the payload */
> +	memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
> +	pkg->hdr.nd_fw_size = copysize;
> +
Yes this is already being done in the patchset and changes proposed to
this pdsm_cmd_to_payload() should not impact other patches as
pdsm_cmd_to_payload() abstracts rest of the code from how to access the
payload.

> All this can be simplified if you can keep payload at
> nd_pdsm_cmd_pkg->payload.
>
> If you still want to have the ability to extend the header, then added a
> reserved field similar to nd_cmd_pkg.
>
Agree to this and will address this in V8.

>
> -aneesh

-- 
Cheers
~ Vaibhav

^ permalink raw reply

* [PATCH V3 2/2] tools/perf: Add perf tools support for extended register capability in powerpc
From: Athira Rajeev @ 2020-05-20  9:45 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ravi.bangoria, atrajeev, maddy, linux-kernel, acme, anju, jolsa
In-Reply-To: <1589967933-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com>

From: Anju T Sudhakar <anju@linux.vnet.ibm.com>

Add extended regs to sample_reg_mask in the tool side to use
with `-I?` option. Perf tools side uses extended mask to display
the platform supported register names (with -I? option) to the user
and also send this mask to the kernel to capture the extended registers
in each sample. Hence decide the mask value based on the processor
version.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
[Decide extended mask at run time based on platform]
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++++++-
 tools/perf/arch/powerpc/include/perf_regs.h     |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c        | 55 +++++++++++++++++++++++++
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
 	PERF_REG_POWERPC_DSISR,
 	PERF_REG_POWERPC_SIER,
 	PERF_REG_POWERPC_MMCRA,
-	PERF_REG_POWERPC_MAX,
+	/* Extended registers */
+	PERF_REG_POWERPC_MMCR0,
+	PERF_REG_POWERPC_MMCR1,
+	PERF_REG_POWERPC_MMCR2,
+	/* Max regs without the extended regs */
+	PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK	((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+				- PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h b/tools/perf/arch/powerpc/include/perf_regs.h
index e18a355..46ed00d 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -64,7 +64,10 @@
 	[PERF_REG_POWERPC_DAR] = "dar",
 	[PERF_REG_POWERPC_DSISR] = "dsisr",
 	[PERF_REG_POWERPC_SIER] = "sier",
-	[PERF_REG_POWERPC_MMCRA] = "mmcra"
+	[PERF_REG_POWERPC_MMCRA] = "mmcra",
+	[PERF_REG_POWERPC_MMCR0] = "mmcr0",
+	[PERF_REG_POWERPC_MMCR1] = "mmcr1",
+	[PERF_REG_POWERPC_MMCR2] = "mmcr2",
 };
 
 static inline const char *perf_reg_name(int id)
diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/powerpc/util/perf_regs.c
index 0a52429..9179230 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -6,9 +6,14 @@
 
 #include "../../../util/perf_regs.h"
 #include "../../../util/debug.h"
+#include "../../../util/event.h"
+#include "../../../util/header.h"
+#include "../../../perf-sys.h"
 
 #include <linux/kernel.h>
 
+#define PVR_POWER9		0x004E
+
 const struct sample_reg sample_reg_masks[] = {
 	SMPL_REG(r0, PERF_REG_POWERPC_R0),
 	SMPL_REG(r1, PERF_REG_POWERPC_R1),
@@ -55,6 +60,9 @@
 	SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
 	SMPL_REG(sier, PERF_REG_POWERPC_SIER),
 	SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
+	SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
+	SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
+	SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
 	SMPL_REG_END
 };
 
@@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
 
 	return SDT_ARG_VALID;
 }
+
+uint64_t arch__intr_reg_mask(void)
+{
+	struct perf_event_attr attr = {
+		.type                   = PERF_TYPE_HARDWARE,
+		.config                 = PERF_COUNT_HW_CPU_CYCLES,
+		.sample_type            = PERF_SAMPLE_REGS_INTR,
+		.precise_ip             = 1,
+		.disabled               = 1,
+		.exclude_kernel         = 1,
+	};
+	int fd, ret;
+	char buffer[64];
+	u32 version;
+	u64 extended_mask = 0;
+
+	/* Get the PVR value to set the extended
+	 * mask specific to platform
+	 */
+	get_cpuid(buffer, sizeof(buffer));
+	ret = sscanf(buffer, "%u,", &version);
+
+	if (ret != 1) {
+		pr_debug("Failed to get the processor version, unable to output extended registers\n");
+		return PERF_REGS_MASK;
+	}
+
+	if (version == PVR_POWER9)
+		extended_mask = PERF_REG_PMU_MASK_300;
+	else
+		return PERF_REGS_MASK;
+
+	attr.sample_regs_intr = extended_mask;
+	attr.sample_period = 1;
+	event_attr_init(&attr);
+
+	/*
+	 * check if the pmu supports perf extended regs, before
+	 * returning the register mask to sample.
+	 */
+	fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+	if (fd != -1) {
+		close(fd);
+		return (extended_mask | PERF_REGS_MASK);
+	}
+	return PERF_REGS_MASK;
+}
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH V3 1/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs
From: Athira Rajeev @ 2020-05-20  9:45 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ravi.bangoria, atrajeev, maddy, linux-kernel, acme, anju, jolsa
In-Reply-To: <1589967933-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com>

From: Anju T Sudhakar <anju@linux.vnet.ibm.com>

Add support for perf extended register capability in powerpc.
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
PMU which support extended registers. The generic code define the mask
of extended registers as 0 for non supported architectures.

Patch adds extended regs support for power9 platform by
exposing MMCR0, MMCR1 and MMCR2 registers.

REG_RESERVED mask needs update to include extended regs.
`PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
is defined at runtime in the kernel based on platform since the supported
registers may differ from one processor version to another and hence the
MASK value.

with patch
----------

available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2

PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
... intr regs: mask 0xffffffffffff ABI 64-bit
.... r0    0xc00000000012b77c
.... r1    0xc000003fe5e03930
.... r2    0xc000000001b0e000
.... r3    0xc000003fdcddf800
.... r4    0xc000003fc7880000
.... r5    0x9c422724be
.... r6    0xc000003fe5e03908
.... r7    0xffffff63bddc8706
.... r8    0x9e4
.... r9    0x0
.... r10   0x1
.... r11   0x0
.... r12   0xc0000000001299c0
.... r13   0xc000003ffffc4800
.... r14   0x0
.... r15   0x7fffdd8b8b00
.... r16   0x0
.... r17   0x7fffdd8be6b8
.... r18   0x7e7076607730
.... r19   0x2f
.... r20   0xc00000001fc26c68
.... r21   0xc0002041e4227e00
.... r22   0xc00000002018fb60
.... r23   0x1
.... r24   0xc000003ffec4d900
.... r25   0x80000000
.... r26   0x0
.... r27   0x1
.... r28   0x1
.... r29   0xc000000001be1260
.... r30   0x6008010
.... r31   0xc000003ffebb7218
.... nip   0xc00000000012b910
.... msr   0x9000000000009033
.... orig_r3 0xc00000000012b86c
.... ctr   0xc0000000001299c0
.... link  0xc00000000012b77c
.... xer   0x0
.... ccr   0x28002222
.... softe 0x1
.... trap  0xf00
.... dar   0x0
.... dsisr 0x80000000000
.... sier  0x0
.... mmcra 0x80000000000
.... mmcr0 0x82008090
.... mmcr1 0x1e000000
.... mmcr2 0x0
 ... thread: perf:4784

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
[Defined PERF_REG_EXTENDED_MASK at run time to add support for different platforms ]
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h |  8 +++++++
 arch/powerpc/include/uapi/asm/perf_regs.h    | 14 +++++++++++-
 arch/powerpc/perf/core-book3s.c              |  1 +
 arch/powerpc/perf/perf_regs.c                | 34 +++++++++++++++++++++++++---
 arch/powerpc/perf/power9-pmu.c               |  6 +++++
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 3e9703f..1458e1a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -15,6 +15,9 @@
 #define MAX_EVENT_ALTERNATIVES	8
 #define MAX_LIMITED_HWCOUNTERS	2
 
+extern u64 mask_var;
+#define PERF_REG_EXTENDED_MASK          mask_var
+
 struct perf_event;
 
 /*
@@ -55,6 +58,11 @@ struct power_pmu {
 	int 		*blacklist_ev;
 	/* BHRB entries in the PMU */
 	int		bhrb_nr;
+	/*
+	 * set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
+	 * the pmu supports extended perf regs capability
+	 */
+	int		capabilities;
 };
 
 /*
diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h b/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
 	PERF_REG_POWERPC_DSISR,
 	PERF_REG_POWERPC_SIER,
 	PERF_REG_POWERPC_MMCRA,
-	PERF_REG_POWERPC_MAX,
+	/* Extended registers */
+	PERF_REG_POWERPC_MMCR0,
+	PERF_REG_POWERPC_MMCR1,
+	PERF_REG_POWERPC_MMCR2,
+	/* Max regs without the extended regs */
+	PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK	((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+				- PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf..f56b778 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2276,6 +2276,7 @@ int register_power_pmu(struct power_pmu *pmu)
 
 	power_pmu.attr_groups = ppmu->attr_groups;
 
+	power_pmu.capabilities |= (ppmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS);
 #ifdef MSR_HV
 	/*
 	 * Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
index a213a0a..f1dbbc5 100644
--- a/arch/powerpc/perf/perf_regs.c
+++ b/arch/powerpc/perf/perf_regs.c
@@ -13,9 +13,11 @@
 #include <asm/ptrace.h>
 #include <asm/perf_regs.h>
 
+u64 mask_var;
+
 #define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
 
-#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1))
+#define REG_RESERVED (~(PERF_REG_EXTENDED_MASK | PERF_REG_PMU_MASK))
 
 static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
 	PT_REGS_OFFSET(PERF_REG_POWERPC_R0,  gpr[0]),
@@ -69,10 +71,26 @@
 	PT_REGS_OFFSET(PERF_REG_POWERPC_MMCRA, dsisr),
 };
 
+/* Function to return the extended register values */
+static u64 get_ext_regs_value(int idx)
+{
+	switch (idx) {
+	case PERF_REG_POWERPC_MMCR0:
+			return mfspr(SPRN_MMCR0);
+	case PERF_REG_POWERPC_MMCR1:
+			return mfspr(SPRN_MMCR1);
+	case PERF_REG_POWERPC_MMCR2:
+			return mfspr(SPRN_MMCR2);
+	default: return 0;
+	}
+}
+
 u64 perf_reg_value(struct pt_regs *regs, int idx)
 {
-	if (WARN_ON_ONCE(idx >= PERF_REG_POWERPC_MAX))
-		return 0;
+	u64 PERF_REG_EXTENDED_MAX;
+
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		PERF_REG_EXTENDED_MAX = PERF_REG_POWERPC_MMCR2 + 1;
 
 	if (idx == PERF_REG_POWERPC_SIER &&
 	   (IS_ENABLED(CONFIG_FSL_EMB_PERF_EVENT) ||
@@ -85,6 +103,16 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
 	    IS_ENABLED(CONFIG_PPC32)))
 		return 0;
 
+	if (idx >= PERF_REG_POWERPC_MAX && idx < PERF_REG_EXTENDED_MAX)
+		return get_ext_regs_value(idx);
+
+	/*
+	 * If the idx is referring to value beyond the
+	 * supported registers, return 0 with a warning
+	 */
+	if (WARN_ON_ONCE(idx >= PERF_REG_EXTENDED_MAX))
+		return 0;
+
 	return regs_get_register(regs, pt_regs_offset[idx]);
 }
 
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 08c3ef7..4525090 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -90,6 +90,8 @@ enum {
 #define POWER9_MMCRA_IFM3		0x00000000C0000000UL
 #define POWER9_MMCRA_BHRB_MASK		0x00000000C0000000UL
 
+extern u64 mask_var;
+
 /* Nasty Power9 specific hack */
 #define PVR_POWER9_CUMULUS		0x00002000
 
@@ -434,6 +436,7 @@ static void power9_config_bhrb(u64 pmu_bhrb_filter)
 	.cache_events		= &power9_cache_events,
 	.attr_groups		= power9_pmu_attr_groups,
 	.bhrb_nr		= 32,
+	.capabilities           = PERF_PMU_CAP_EXTENDED_REGS,
 };
 
 int init_power9_pmu(void)
@@ -457,6 +460,9 @@ int init_power9_pmu(void)
 		}
 	}
 
+	/* Set the PERF_REG_EXTENDED_MASK here */
+	mask_var = PERF_REG_PMU_MASK_300;
+
 	rc = register_power_pmu(&power9_pmu);
 	if (rc)
 		return rc;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH V3 0/2] powerpc/perf: Add support for perf extended regs in powerpc
From: Athira Rajeev @ 2020-05-20  9:45 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ravi.bangoria, atrajeev, maddy, linux-kernel, acme, anju, jolsa

Patch set to add support for perf extended register capability in
powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to
indicate the PMU which support extended registers. The generic code
define the mask of extended registers as 0 for non supported architectures.

patch 1/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the
values of mmcr0,mmcr1,mmcr2 for POWER9. Defines `PERF_REG_EXTENDED_MASK`
at runtime which contains mask value of the supported registers under
extended regs.

Patch 2/2 adds extended regs to sample_reg_mask in the tool side to use
with `-I?` option.

Anju T Sudhakar (2):
  powerpc/perf: Add support for outputting extended regs in perf
    intr_regs
  tools/perf: Add perf tools support for extended register capability in
    powerpc
---
Changes from v2 -> v3
- Split kernel and tools side patches as suggested by Arnaldo
- Addressed review comment from Madhavan Srinivasn

Changes from v1 -> v2

- PERF_REG_EXTENDED_MASK` is defined at runtime in the kernel
based on platform. This will give flexibility in using extended
regs for all processor versions where the supported registers may differ.
- removed PERF_REG_EXTENDED_MASK from the perf tools side. Based on the
processor version(from PVR value), tool side will return the appropriate
extended mask
- Since tool changes can handle without a "PERF_REG_EXTENDED_MASK" macro,
dropped patch to set NO_AUXTRACE.
- Addressed review comments from Ravi Bangoria for V1

---

 arch/powerpc/include/asm/perf_event_server.h    |  8 ++++
 arch/powerpc/include/uapi/asm/perf_regs.h       | 14 ++++++-
 arch/powerpc/perf/core-book3s.c                 |  1 +
 arch/powerpc/perf/perf_regs.c                   | 34 +++++++++++++--
 arch/powerpc/perf/power9-pmu.c                  |  6 +++
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++++++-
 tools/perf/arch/powerpc/include/perf_regs.h     |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c        | 55 +++++++++++++++++++++++++
 8 files changed, 131 insertions(+), 6 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
From: Lucas Stach @ 2020-05-20  9:42 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: sumit.semwal, linaro-mm-sig, Linux-ALSA, linuxppc-dev,
	linux-kernel, Timur Tabi, Xiubo Li, shawnguo, Shengjiu Wang,
	Takashi Iwai, Liam Girdwood, dri-devel, perex, Nicolin Chen,
	Mark Brown, linux-imx, kernel, Fabio Estevam, s.hauer,
	linux-arm-kernel, linux-media
In-Reply-To: <CAA+D8APhHvA39wmCayeCsAEKmOJ0n7qOQiT1tZmFHr4+yASgTw@mail.gmail.com>

Am Mittwoch, den 20.05.2020, 16:20 +0800 schrieb Shengjiu Wang:
> Hi
> 
> On Tue, May 19, 2020 at 6:04 PM Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang:
> > > There are two requirements that we need to move the request
> > > of dma channel from probe to open.
> > 
> > How do you handle -EPROBE_DEFER return code from the channel request if
> > you don't do it in probe?
> 
> I use the dma_request_slave_channel or dma_request_channel instead
> of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER
> return code.

This is a pretty weak argument. The dmaengine device might probe after
you try to get the channel. Using a function to request the channel
that doesn't allow you to handle probe deferral is IMHO a bug and
should be fixed, instead of building even more assumptions on top of
it.

> > > - When dma device binds with power-domains, the power will
> > > be enabled when we request dma channel. If the request of dma
> > > channel happen on probe, then the power-domains will be always
> > > enabled after kernel boot up,  which is not good for power
> > > saving,  so we need to move the request of dma channel to .open();
> > 
> > This is certainly something which could be fixed in the dmaengine
> > driver.
> 
> Dma driver always call the pm_runtime_get_sync in
> device_alloc_chan_resources, the device_alloc_chan_resources is
> called when channel is requested. so power is enabled on channel
> request.

So why can't you fix the dmaengine driver to do that RPM call at a
later time when the channel is actually going to be used? This will
allow further power savings with other slave devices than the audio
PCM.

Regards,
Lucas

> > > - With FE-BE case, if the dma channel is requested in probe,
> > > then there will be below issue, which is caused by that the
> > > dma channel will be requested duplicately
> > 
> > Why is this requested a second time? Is this just some missing cleanup
> > on a deferred probe path?
> 
> Not relate with deferred probe.  With DMA1->ASRC->DMA2->ESAI case,
> the DMA1->ASRC->DMA2 is in FE,  ESAI is in BE.  When ESAI drvier
> probe,  DMA3 channel is created with ESAI's "dma:tx" (DMA3 channel
> is not used in this FE-BE case).    When FE-BE startup, DMA2
> channel is created, it needs the ESAI's "dma:tx", so below warning
> comes out.
> 
> > Regards,
> > Lucas
> > 
> > > [  638.906268] sysfs: cannot create duplicate filename '/devices/soc0/soc/2000000.bus/2000000.spba-bus/2024000.esai/dma:tx'
> > > [  638.919061] CPU: 1 PID: 673 Comm: aplay Not tainted 5.7.0-rc1-12956-gfc64b2585593 #287
> > > [  638.927113] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > > [  638.933690] [<c0110dd8>] (unwind_backtrace) from [<c010b8ec>] (show_stack+0x10/0x14)
> > > [  638.941464] [<c010b8ec>] (show_stack) from [<c0557fc0>] (dump_stack+0xe4/0x118)
> > > [  638.948808] [<c0557fc0>] (dump_stack) from [<c032aeb4>] (sysfs_warn_dup+0x50/0x64)
> > > [  638.956406] [<c032aeb4>] (sysfs_warn_dup) from [<c032b1a8>] (sysfs_do_create_link_sd+0xc8/0xd4)
> > > [  638.965134] [<c032b1a8>] (sysfs_do_create_link_sd) from [<c05dc668>] (dma_request_chan+0xb0/0x210)
> > > [  638.974120] [<c05dc668>] (dma_request_chan) from [<c05dc7d0>] (dma_request_slave_channel+0x8/0x14)
> > > [  638.983111] [<c05dc7d0>] (dma_request_slave_channel) from [<c09d5548>] (fsl_asrc_dma_hw_params+0x1e0/0x438)
> > > [  638.992881] [<c09d5548>] (fsl_asrc_dma_hw_params) from [<c09c1654>] (soc_pcm_hw_params+0x4a0/0x6a8)
> > > [  639.001952] [<c09c1654>] (soc_pcm_hw_params) from [<c09c39d4>] (dpcm_fe_dai_hw_params+0x70/0xe4)
> > > [  639.010765] [<c09c39d4>] (dpcm_fe_dai_hw_params) from [<c099b274>] (snd_pcm_hw_params+0x158/0x418)
> > > [  639.019750] [<c099b274>] (snd_pcm_hw_params) from [<c099c5a0>] (snd_pcm_ioctl+0x734/0x183c)
> > > [  639.028129] [<c099c5a0>] (snd_pcm_ioctl) from [<c029ff94>] (ksys_ioctl+0x2ac/0xb98)
> > > [  639.035812] [<c029ff94>] (ksys_ioctl) from [<c0100080>] (ret_fast_syscall+0x0/0x28)
> > > [  639.043490] Exception stack(0xec529fa8 to 0xec529ff0)
> > > [  639.048565] 9fa0:                   bee84650 01321870 00000004 c25c4111 bee84650 0002000f
> > > [  639.056766] 9fc0: bee84650 01321870 01321820 00000036 00001f40 00000000 0002c2f8 00000003
> > > [  639.064964] 9fe0: b6f483fc bee8451c b6ee2655 b6e1dcf8
> > > [  639.070339] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
> > > 
> > > Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
> > > ---
> > >  sound/soc/fsl/imx-pcm-dma.c | 173 +++++++++++++++++++++++++++++++++---
> > >  1 file changed, 159 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/sound/soc/fsl/imx-pcm-dma.c b/sound/soc/fsl/imx-pcm-dma.c
> > > index 04a9bc749016..dae53b384df4 100644
> > > --- a/sound/soc/fsl/imx-pcm-dma.c
> > > +++ b/sound/soc/fsl/imx-pcm-dma.c
> > > @@ -11,6 +11,7 @@
> > >  #include <linux/dmaengine.h>
> > >  #include <linux/types.h>
> > >  #include <linux/module.h>
> > > +#include <linux/dma-mapping.h>
> > > 
> > >  #include <sound/core.h>
> > >  #include <sound/pcm.h>
> > > @@ -29,24 +30,168 @@ static bool filter(struct dma_chan *chan, void *param)
> > >       return true;
> > >  }
> > > 
> > > -static const struct snd_dmaengine_pcm_config imx_dmaengine_pcm_config = {
> > > -     .prepare_slave_config = snd_dmaengine_pcm_prepare_slave_config,
> > > -     .compat_filter_fn = filter,
> > > -};
> > > +static int imx_pcm_hw_params(struct snd_soc_component *component,
> > > +                          struct snd_pcm_substream *substream,
> > > +                          struct snd_pcm_hw_params *params)
> > > +{
> > > +     struct snd_pcm_runtime *runtime = substream->runtime;
> > > +     struct snd_soc_pcm_runtime *rtd = substream->private_data;
> > > +     struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0);
> > > +     struct snd_dmaengine_dai_dma_data *dma_data;
> > > +     struct dma_slave_config config;
> > > +     struct dma_chan *chan;
> > > +     int ret = 0;
> > > 
> > > -int imx_pcm_dma_init(struct platform_device *pdev, size_t size)
> > > +     snd_pcm_set_runtime_buffer(substream, &substream->dma_buffer);
> > > +     runtime->dma_bytes = params_buffer_bytes(params);
> > > +
> > > +     chan = snd_dmaengine_pcm_get_chan(substream);
> > > +     if (!chan)
> > > +             return -EINVAL;
> > > +
> > > +     ret = snd_hwparams_to_dma_slave_config(substream, params, &config);
> > > +     if (ret)
> > > +             return ret;
> > > +
> > > +     dma_data = snd_soc_dai_get_dma_data(cpu_dai, substream);
> > > +     if (!dma_data)
> > > +             return -EINVAL;
> > > +
> > > +     snd_dmaengine_pcm_set_config_from_dai_data(substream,
> > > +                                                dma_data,
> > > +                                                &config);
> > > +     return dmaengine_slave_config(chan, &config);
> > > +}
> > > +
> > > +static int imx_pcm_hw_free(struct snd_soc_component *component,
> > > +                        struct snd_pcm_substream *substream)
> > >  {
> > > -     struct snd_dmaengine_pcm_config *config;
> > > +     snd_pcm_set_runtime_buffer(substream, NULL);
> > > +     return 0;
> > > +}
> > > +
> > > +static snd_pcm_uframes_t imx_pcm_pointer(struct snd_soc_component *component,
> > > +                                      struct snd_pcm_substream *substream)
> > > +{
> > > +     return snd_dmaengine_pcm_pointer(substream);
> > > +}
> > > +
> > > +static int imx_pcm_trigger(struct snd_soc_component *component,
> > > +                        struct snd_pcm_substream *substream, int cmd)
> > > +{
> > > +     return snd_dmaengine_pcm_trigger(substream, cmd);
> > > +}
> > > +
> > > +static int imx_pcm_open(struct snd_soc_component *component,
> > > +                     struct snd_pcm_substream *substream)
> > > +{
> > > +     struct snd_soc_pcm_runtime *rtd = substream->private_data;
> > > +     bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
> > > +     struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0);
> > > +     struct snd_dmaengine_dai_dma_data *dma_data;
> > > +     struct device *dev = component->dev;
> > > +     struct snd_pcm_hardware hw;
> > > +     struct dma_chan *chan;
> > > +     int ret;
> > > +
> > > +     ret = snd_pcm_hw_constraint_integer(substream->runtime,
> > > +                                         SNDRV_PCM_HW_PARAM_PERIODS);
> > > +     if (ret < 0) {
> > > +             dev_err(dev, "failed to set pcm hw params periods\n");
> > > +             return ret;
> > > +     }
> > > +
> > > +     dma_data = snd_soc_dai_get_dma_data(cpu_dai, substream);
> > > +     if (!dma_data)
> > > +             return -EINVAL;
> > > +
> > > +     chan = dma_request_slave_channel(cpu_dai->dev, tx ? "tx" : "rx");
> > > +     if (!chan) {
> > > +             /* Try to request channel using compat_filter_fn */
> > > +             chan = snd_dmaengine_pcm_request_channel(filter,
> > > +                                                      dma_data->filter_data);
> > > +             if (!chan)
> > > +                     return -ENXIO;
> > > +     }
> > > 
> > > -     config = devm_kzalloc(&pdev->dev,
> > > -                     sizeof(struct snd_dmaengine_pcm_config), GFP_KERNEL);
> > > -     if (!config)
> > > -             return -ENOMEM;
> > > -     *config = imx_dmaengine_pcm_config;
> > > +     ret = snd_dmaengine_pcm_open(substream, chan);
> > > +     if (ret)
> > > +             goto pcm_open_fail;
> > > 
> > > -     return devm_snd_dmaengine_pcm_register(&pdev->dev,
> > > -             config,
> > > -             SND_DMAENGINE_PCM_FLAG_COMPAT);
> > > +     memset(&hw, 0, sizeof(hw));
> > > +     hw.info = SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_MMAP_VALID |
> > > +                     SNDRV_PCM_INFO_INTERLEAVED;
> > > +     hw.periods_min = 2;
> > > +     hw.periods_max = UINT_MAX;
> > > +     hw.period_bytes_min = 256;
> > > +     hw.period_bytes_max = dma_get_max_seg_size(chan->device->dev);
> > > +     hw.buffer_bytes_max = IMX_DEFAULT_DMABUF_SIZE;
> > > +     hw.fifo_size = dma_data->fifo_size;
> > > +
> > > +     /* Refine the hw according to caps of DMA. */
> > > +     ret = snd_dmaengine_pcm_refine_runtime_hwparams(substream,
> > > +                                                     dma_data,
> > > +                                                     &hw,
> > > +                                                     chan);
> > > +     if (ret < 0)
> > > +             goto refine_runtime_hwparams_fail;
> > > +
> > > +     snd_soc_set_runtime_hwparams(substream, &hw);
> > > +
> > > +     /* Support allocate memory from IRAM */
> > > +     ret = snd_dma_alloc_pages(SNDRV_DMA_TYPE_DEV_IRAM,
> > > +                               chan->device->dev,
> > > +                               hw.buffer_bytes_max,
> > > +                               &substream->dma_buffer);
> > > +     if (ret < 0)
> > > +             goto alloc_pagas_fail;
> > > +
> > > +     return 0;
> > > +
> > > +alloc_pagas_fail:
> > > +refine_runtime_hwparams_fail:
> > > +     snd_dmaengine_pcm_close(substream);
> > > +pcm_open_fail:
> > > +     dma_release_channel(chan);
> > > +
> > > +     return ret;
> > > +}
> > > +
> > > +static int imx_pcm_close(struct snd_soc_component *component,
> > > +                      struct snd_pcm_substream *substream)
> > > +{
> > > +     if (substream) {
> > > +             snd_dma_free_pages(&substream->dma_buffer);
> > > +             substream->dma_buffer.area = NULL;
> > > +             substream->dma_buffer.addr = 0;
> > > +     }
> > > +
> > > +     return snd_dmaengine_pcm_close_release_chan(substream);
> > > +}
> > > +
> > > +static int imx_pcm_new(struct snd_soc_component *component,
> > > +                    struct snd_soc_pcm_runtime *rtd)
> > > +{
> > > +     struct snd_card *card = rtd->card->snd_card;
> > > +
> > > +     return dma_coerce_mask_and_coherent(card->dev, DMA_BIT_MASK(32));
> > > +}
> > > +
> > > +static const struct snd_soc_component_driver imx_pcm_component = {
> > > +     .name           = "imx-pcm-dma",
> > > +     .pcm_construct  = imx_pcm_new,
> > > +     .open           = imx_pcm_open,
> > > +     .close          = imx_pcm_close,
> > > +     .hw_params      = imx_pcm_hw_params,
> > > +     .hw_free        = imx_pcm_hw_free,
> > > +     .trigger        = imx_pcm_trigger,
> > > +     .pointer        = imx_pcm_pointer,
> > > +};
> > > +
> > > +int imx_pcm_dma_init(struct platform_device *pdev, size_t size)
> > > +{
> > > +     return devm_snd_soc_register_component(&pdev->dev,
> > > +                                            &imx_pcm_component, NULL, 0);
> > >  }
> > >  EXPORT_SYMBOL_GPL(imx_pcm_dma_init);
> > > 


^ permalink raw reply

* Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open
From: Jiri Slaby @ 2020-05-20  9:38 UTC (permalink / raw)
  To: rananta, Greg KH; +Cc: andrew, linuxppc-dev, linux-kernel
In-Reply-To: <0ab0b49f19b824ac1c4db4c4937ed388@codeaurora.org>

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

On 15. 05. 20, 1:22, rananta@codeaurora.org wrote:
> On 2020-05-13 00:04, Greg KH wrote:
>> On Tue, May 12, 2020 at 02:39:50PM -0700, rananta@codeaurora.org wrote:
>>> On 2020-05-12 01:25, Greg KH wrote:
>>> > On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote:
>>> > > commit bdb498c20040616e94b05c31a0ceb3e134b7e829
>>> > > Author: Jiri Slaby <jslaby@suse.cz>
>>> > > Date:   Tue Aug 7 21:48:04 2012 +0200
>>> > >
>>> > >     TTY: hvc_console, add tty install
>>> > >
>>> > > added hvc_install but did not move 'tty->driver_data = NULL;' from
>>> > > hvc_open's fail path to hvc_cleanup.
>>> > >
>>> > > IOW hvc_open now NULLs tty->driver_data even for another task which
>>> > > opened the tty earlier. The same holds for
>>> > > "tty_port_tty_set(&hp->port,
>>> > > NULL);" there. And actually "tty_port_put(&hp->port);" is also
>>> > > incorrect
>>> > > for the 2nd task opening the tty.
>>> > >

...

> These are the traces you get when the issue happens:
> [  154.212291] hvc_install called for pid: 666
> [  154.216705] hvc_open called for pid: 666
> [  154.233657] hvc_open: request_irq failed with rc -22.
> [  154.238934] hvc_open called for pid: 678
> [  154.243012] Unable to handle kernel NULL pointer dereference at
> virtual address 00000000000000c4
> # hvc_install isn't called for pid: 678 as the file wasn't closed yet.

Nice. Does the attached help?

I wonder how comes the tty_port_put in hvc_open does not cause a UAF? I
would say hvc_open fails, tty_port_put is called. It decrements the
reference taken in hvc_install. So far so good.

Now, this should happen IMO:
tty_open
  -> hvc_open (fails)
    -> tty_port_put
  -> tty_release
    -> tty_release_struct
      -> tty_kref_put
        -> queue_release_one_tty
SCHEDULED WORKQUEUE
release_one_tty
  -> hvc_cleanup
    -> tty_port_put (should die terrible death now)

What am I missing?

thanks,
-- 
js
suse labs

[-- Attachment #2: 0001-hvc_console-fix-open.patch --]
[-- Type: text/x-patch, Size: 2381 bytes --]

From d891cdfcbd3b41eb23ddfc8d9e6cbe038ff8fb72 Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby@suse.cz>
Date: Wed, 20 May 2020 11:29:25 +0200
Subject: [PATCH] hvc_console: fix open

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/tty/hvc/hvc_console.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 436cc51c92c3..cdcc64ea2554 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -371,15 +371,14 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 * tty fields and return the kref reference.
 	 */
 	if (rc) {
-		tty_port_tty_set(&hp->port, NULL);
-		tty->driver_data = NULL;
-		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
-	} else
+	} else {
 		/* We are ready... raise DTR/RTS */
 		if (C_BAUD(tty))
 			if (hp->ops->dtr_rts)
 				hp->ops->dtr_rts(hp, 1);
+		tty_port_set_initialized(&hp->port, true);
+	}
 
 	/* Force wakeup of the polling thread */
 	hvc_kick();
@@ -389,22 +388,12 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 
 static void hvc_close(struct tty_struct *tty, struct file * filp)
 {
-	struct hvc_struct *hp;
+	struct hvc_struct *hp = tty->driver_data;
 	unsigned long flags;
 
 	if (tty_hung_up_p(filp))
 		return;
 
-	/*
-	 * No driver_data means that this close was issued after a failed
-	 * hvc_open by the tty layer's release_dev() function and we can just
-	 * exit cleanly because the kref reference wasn't made.
-	 */
-	if (!tty->driver_data)
-		return;
-
-	hp = tty->driver_data;
-
 	spin_lock_irqsave(&hp->port.lock, flags);
 
 	if (--hp->port.count == 0) {
@@ -412,6 +401,9 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		/* We are done with the tty pointer now. */
 		tty_port_tty_set(&hp->port, NULL);
 
+		if (!tty_port_initialized(&hp->port))
+			return;
+
 		if (C_HUPCL(tty))
 			if (hp->ops->dtr_rts)
 				hp->ops->dtr_rts(hp, 0);
@@ -428,6 +420,7 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		 * waking periodically to check chars_in_buffer().
 		 */
 		tty_wait_until_sent(tty, HVC_CLOSE_WAIT);
+		tty_port_set_initialized(&hp->port, false);
 	} else {
 		if (hp->port.count < 0)
 			printk(KERN_ERR "hvc_close %X: oops, count is %d\n",
-- 
2.26.2


^ permalink raw reply related

* [v2 1/2] dts: ppc: t4240rdb: remove interrupts property
From: Biwen Li @ 2020-05-20  9:15 UTC (permalink / raw)
  To: leoyang.li, robh+dt, mpe, benh, a.zummo, alexandre.belloni
  Cc: linux-rtc, devicetree, linuxppc-dev, linux-kernel, Biwen Li

From: Biwen Li <biwen.li@nxp.com>

This removes interrupts property to drop warning as follows:
    - $ hwclock.util-linux
      hwclock.util-linux: select() to /dev/rtc0
      to wait for clock tick timed out

My case:
    - RTC ds1374's INT pin is connected to VCC on T4240RDB,
      then the RTC cannot inform cpu about the alarm interrupt

Signed-off-by: Biwen Li <biwen.li@nxp.com>
---
 arch/powerpc/boot/dts/fsl/t4240rdb.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
index a56a705d41f7..145896f2eef6 100644
--- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
@@ -144,7 +144,6 @@
 			rtc@68 {
 				compatible = "dallas,ds1374";
 				reg = <0x68>;
-				interrupts = <0x1 0x1 0 0>;
 			};
 		};
 
-- 
2.17.1


^ permalink raw reply related

* [v2 2/2] dts: ppc: t1024rdb: remove interrupts property
From: Biwen Li @ 2020-05-20  9:15 UTC (permalink / raw)
  To: leoyang.li, robh+dt, mpe, benh, a.zummo, alexandre.belloni
  Cc: linux-rtc, devicetree, linuxppc-dev, linux-kernel, Biwen Li
In-Reply-To: <20200520091543.44692-1-biwen.li@oss.nxp.com>

From: Biwen Li <biwen.li@nxp.com>

This removes interrupts property to drop warning as follows:
    - $ hwclock.util-linux
      hwclock.util-linux: select() to /dev/rtc0
      to wait for clock tick timed out

My case:
    - RTC ds1339s INT pin isn't connected to cpus INT pin on T1024RDB,
      then the RTC cannot inform cpu about alarm interrupt

How to fix it?
    - remove IRQ line

Signed-off-by: Biwen Li <biwen.li@nxp.com>
---
 arch/powerpc/boot/dts/fsl/t1024rdb.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/fsl/t1024rdb.dts b/arch/powerpc/boot/dts/fsl/t1024rdb.dts
index 645caff98ed1..605ceec66af3 100644
--- a/arch/powerpc/boot/dts/fsl/t1024rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t1024rdb.dts
@@ -161,7 +161,6 @@
 			rtc@68 {
 				compatible = "dallas,ds1339";
 				reg = <0x68>;
-				interrupts = <0x1 0x1 0 0>;
 			};
 		};
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH -next] scsi: ibmvscsi: Make some functions static
From: Chen Tao @ 2020-05-20  9:10 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: tyreld, linux-scsi, linux-kernel, chentao107, paulus,
	linuxppc-dev

Fix the following warning:

drivers/scsi/ibmvscsi/ibmvscsi.c:2387:12: warning: symbol
'ibmvscsi_module_init' was not declared. Should it be static?
drivers/scsi/ibmvscsi/ibmvscsi.c:2409:13: warning: symbol
'ibmvscsi_module_exit' was not declared. Should it be static?

Signed-off-by: Chen Tao <chentao107@huawei.com>
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 59f0f1030c54..44e64aa21194 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2384,7 +2384,7 @@ static struct vio_driver ibmvscsi_driver = {
 static struct srp_function_template ibmvscsi_transport_functions = {
 };
 
-int __init ibmvscsi_module_init(void)
+static int __init ibmvscsi_module_init(void)
 {
 	int ret;
 
@@ -2406,7 +2406,7 @@ int __init ibmvscsi_module_init(void)
 	return ret;
 }
 
-void __exit ibmvscsi_module_exit(void)
+static void __exit ibmvscsi_module_exit(void)
 {
 	vio_unregister_driver(&ibmvscsi_driver);
 	srp_release_transport(ibmvscsi_transport_template);
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v2] tty: hvc: Fix data abort due to race in hvc_open
From: Jiri Slaby @ 2020-05-20  8:59 UTC (permalink / raw)
  To: Raghavendra Rao Ananta, gregkh, andrew; +Cc: linuxppc-dev, linux-kernel, stable
In-Reply-To: <20200520064708.24278-1-rananta@codeaurora.org>

On 20. 05. 20, 8:47, Raghavendra Rao Ananta wrote:
> Potentially, hvc_open() can be called in parallel when two tasks calls
> open() on /dev/hvcX. In such a scenario, if the hp->ops->notifier_add()
> callback in the function fails, where it sets the tty->driver_data to
> NULL, the parallel hvc_open() can see this NULL and cause a memory abort.
> Hence, do a NULL check at the beginning, before proceeding ahead.
> 
> The issue can be easily reproduced by launching two tasks simultaneously
> that does an open() call on /dev/hvcX.
> For example:
> $ cat /dev/hvc0 & cat /dev/hvc0 &
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Raghavendra Rao Ananta <rananta@codeaurora.org>
> ---
>  drivers/tty/hvc/hvc_console.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 436cc51c92c3..80709f754cc8 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -350,6 +350,9 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
>  	unsigned long flags;
>  	int rc = 0;
> 
> +	if (!hp)
> +		return -ENODEV;
> +

This is still not fixing the bug properly. See:
https://lore.kernel.org/linuxppc-dev/0f7791f5-0a53-59f6-7277-247a789f30c2@suse.cz/

In particular, the paragraph starting "IOW".

thanks,
-- 
js
suse labs

^ permalink raw reply

* [PATCH v4 6/6] asm-generic/tlb: avoid potential double flush
From: Santosh Sivaraj @ 2020-05-20  8:30 UTC (permalink / raw)
  To: stable, linuxppc-dev
  Cc: Sasha Levin, Santosh Sivaraj, Peter Zijlstra, Greg KH,
	Aneesh Kumar K . V
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>

From: Peter Zijlstra <peterz@infradead.org>

commit 0758cd8304942292e95a0f750c374533db378b32 upstream

Aneesh reported that:

	tlb_flush_mmu()
	  tlb_flush_mmu_tlbonly()
	    tlb_flush()			<-- #1
	  tlb_flush_mmu_free()
	    tlb_table_flush()
	      tlb_table_invalidate()
		tlb_flush_mmu_tlbonly()
		  tlb_flush()		<-- #2

does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not
clear tlb->end in that case.

Observe that any caller to __tlb_adjust_range() also sets at least one of
the tlb->freed_tables || tlb->cleared_p* bits, and those are
unconditionally cleared by __tlb_reset_range().

Change the condition for actually issuing TLBI to having one of those bits
set, as opposed to having tlb->end != 0.

Link: http://lkml.kernel.org/r/20200116064531.483522-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org>  # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: backported to 4.19 stable]
---
 include/asm-generic/tlb.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 19934cdd143e..427a70c56ddd 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -179,7 +179,12 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
-	if (!tlb->end)
+	/*
+	 * Anything calling __tlb_adjust_range() also sets at least one of
+	 * these bits.
+	 */
+	if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
+	      tlb->cleared_puds || tlb->cleared_p4ds))
 		return;
 
 	tlb_flush(tlb);
-- 
2.25.4


^ permalink raw reply related

* [PATCH v4 5/6] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
From: Santosh Sivaraj @ 2020-05-20  8:30 UTC (permalink / raw)
  To: stable, linuxppc-dev
  Cc: Sasha Levin, Santosh Sivaraj, Peter Zijlstra, Greg KH,
	Aneesh Kumar K . V
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>

From: Peter Zijlstra <peterz@infradead.org>

commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream

Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI.  This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table.  With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.

More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")

The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
false for sparc architecture.

Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org>  # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: backported to 4.19 stable]
---
 arch/Kconfig                    |  3 ---
 arch/powerpc/Kconfig            |  1 -
 arch/powerpc/include/asm/tlb.h  | 11 +++++++++++
 arch/sparc/Kconfig              |  1 -
 arch/sparc/include/asm/tlb_64.h |  9 +++++++++
 include/asm-generic/tlb.h       | 15 +++++++++++++++
 mm/memory.c                     | 16 ++++++++--------
 7 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 061a12b8140e..3abbdb0cea44 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,9 +363,6 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
 	bool
 
-config HAVE_RCU_TABLE_NO_INVALIDATE
-	bool
-
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1a00ce4b0040..e5bc0cfea2b1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,7 +216,6 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE
-	select HAVE_RCU_TABLE_NO_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
 	select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index f0e571b2dc7c..63418275f402 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,17 @@
 #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
 extern void tlb_flush(struct mmu_gather *tlb);
+/*
+ * book3s:
+ * Hash does not use the linux page-tables, so we can avoid
+ * the TLB invalidate for page-table freeing, Radix otoh does use the
+ * page-tables and needs the TLBI.
+ *
+ * nohash:
+ * We still do TLB invalidate in the __pte_free_tlb routine before we
+ * add the page table pages to mmu gather table batch.
+ */
+#define tlb_needs_table_invalidate()	radix_enabled()
 
 /* Get the generic bits... */
 #include <asm-generic/tlb.h>
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index d90d632868aa..e6f2a38d2e61 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,7 +64,6 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
-	select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
index a2f3fa61ee36..8cb8f3833239 100644
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -28,6 +28,15 @@ void flush_tlb_pending(void);
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 #define tlb_flush(tlb)	flush_tlb_pending()
 
+/*
+ * SPARC64's hardware TLB fill does not use the Linux page-tables
+ * and therefore we don't need a TLBI when freeing page-table pages.
+ */
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_needs_table_invalidate()	(false)
+#endif
+
 #include <asm-generic/tlb.h>
 
 #endif /* _SPARC64_TLB_H */
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index f2b9dc9cbaf8..19934cdd143e 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -61,8 +61,23 @@ struct mmu_table_batch {
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
+/*
+ * This allows an architecture that does not use the linux page-tables for
+ * hardware to skip the TLBI when freeing page tables.
+ */
+#ifndef tlb_needs_table_invalidate
+#define tlb_needs_table_invalidate() (true)
 #endif
 
+#else
+
+#ifdef tlb_needs_table_invalidate
+#error tlb_needs_table_invalidate() requires HAVE_RCU_TABLE_FREE
+#endif
+
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
+
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
diff --git a/mm/memory.c b/mm/memory.c
index ba5689610c04..7daa7ae1b046 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -327,14 +327,14 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
-	/*
-	 * Invalidate page-table caches used by hardware walkers. Then we still
-	 * need to RCU-sched wait while freeing the pages because software
-	 * walkers can still be in-flight.
-	 */
-	tlb_flush_mmu_tlbonly(tlb);
-#endif
+	if (tlb_needs_table_invalidate()) {
+		/*
+		 * Invalidate page-table caches used by hardware walkers. Then
+		 * we still need to RCU-sched wait while freeing the pages
+		 * because software walkers can still be in-flight.
+		 */
+		tlb_flush_mmu_tlbonly(tlb);
+	}
 }
 
 static void tlb_remove_table_smp_sync(void *arg)
-- 
2.25.4


^ permalink raw reply related

* [PATCH v4 4/6] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
From: Santosh Sivaraj @ 2020-05-20  8:30 UTC (permalink / raw)
  To: stable, linuxppc-dev
  Cc: Sasha Levin, Santosh Sivaraj, Aneesh Kumar K.V, Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>

commit 12e4d53f3f04e81f9e83d6fc10edc7314ab9f6b9 upstream

Patch series "Fixup page directory freeing", v4.

This is a repost of patch series from Peter with the arch specific changes
except ppc64 dropped.  ppc64 changes are added here because we are redoing
the patch series on top of ppc64 changes.  This makes it easy to backport
these changes.  Only the first 2 patches need to be backported to stable.

The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:

 1) unhook page/directory
 2) TLB invalidate
 3) free page/directory

Without this, any concurrent page-table walk could end up with a
Use-after-Free.  This is esp.  trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie.  caches page directories).

Even on UP this might give issues since mmu_gather is preemptible these
days.  An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.

This patch series fixes ppc64 and add generic MMU_GATHER changes to
support the conversion of other architectures.  I haven't added patches
w.r.t other architecture because they are yet to be acked.

This patch (of 9):

A followup patch is going to make sure we correctly invalidate page walk
cache before we free page table pages.  In order to keep things simple
enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
!SMP case differently in the followup patch

!SMP case is right now broken for radix translation w.r.t page walk
cache flush.  We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already.  Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."

Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: backported for 4.19 stable]
---
 arch/powerpc/Kconfig                         | 2 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 --------
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 --
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 --------
 arch/powerpc/mm/pgtable-book3s64.c           | 7 -------
 5 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e09cfb109b8c..1a00ce4b0040 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -215,7 +215,7 @@ config PPC
 	select HAVE_HARDLOCKUP_DETECTOR_PERF	if PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
-	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_RCU_TABLE_NO_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 82e44b1a00ae..79ba3fbb512e 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -110,7 +110,6 @@ static inline void pgtable_free(void *table, unsigned index_size)
 #define check_pgt_cache()	do { } while (0)
 #define get_hugepd_cache_index(x)  (x)
 
-#ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
 				    void *table, int shift)
 {
@@ -127,13 +126,6 @@ static inline void __tlb_remove_table(void *_table)
 
 	pgtable_free(table, shift);
 }
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-				    void *table, int shift)
-{
-	pgtable_free(table, shift);
-}
-#endif
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
 				  unsigned long address)
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f9019b579903..1013c0214213 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -47,9 +47,7 @@ extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
 extern void pte_fragment_free(unsigned long *, int);
 extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
-#ifdef CONFIG_SMP
 extern void __tlb_remove_table(void *_table);
-#endif
 
 static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 8825953c225b..96eed46d5684 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -111,7 +111,6 @@ static inline void pgtable_free(void *table, unsigned index_size)
 #define check_pgt_cache()	do { } while (0)
 #define get_hugepd_cache_index(x)	(x)
 
-#ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
 				    void *table, int shift)
 {
@@ -128,13 +127,6 @@ static inline void __tlb_remove_table(void *_table)
 
 	pgtable_free(table, shift);
 }
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-				    void *table, int shift)
-{
-	pgtable_free(table, shift);
-}
-#endif
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
 				  unsigned long address)
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 297db665d953..5b4e9fd8990c 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -432,7 +432,6 @@ static inline void pgtable_free(void *table, int index)
 	}
 }
 
-#ifdef CONFIG_SMP
 void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int index)
 {
 	unsigned long pgf = (unsigned long)table;
@@ -449,12 +448,6 @@ void __tlb_remove_table(void *_table)
 
 	return pgtable_free(table, index);
 }
-#else
-void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int index)
-{
-	return pgtable_free(table, index);
-}
-#endif
 
 #ifdef CONFIG_PROC_FS
 atomic_long_t direct_pages_count[MMU_PAGE_COUNT];
-- 
2.25.4


^ permalink raw reply related

* [PATCH v4 3/6] asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
From: Santosh Sivaraj @ 2020-05-20  8:30 UTC (permalink / raw)
  To: stable, linuxppc-dev
  Cc: Sasha Levin, Peter Zijlstra, Santosh Sivaraj, Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>

From: Peter Zijlstra <peterz@infradead.org>

commit 96bc9567cbe112e9320250f01b9c060c882e8619 upstream

Make issuing a TLB invalidate for page-table pages the normal case.

The reason is twofold:

 - too many invalidates is safer than too few,
 - most architectures use the linux page-tables natively
   and would thus require this.

Make it an opt-out, instead of an opt-in.

No change in behavior intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: prerequisite for upcoming tlbflush backports]
---
 arch/Kconfig         | 2 +-
 arch/powerpc/Kconfig | 1 +
 arch/sparc/Kconfig   | 1 +
 arch/x86/Kconfig     | 1 -
 mm/memory.c          | 2 +-
 5 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a336548487e6..061a12b8140e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,7 +363,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
 	bool
 
-config HAVE_RCU_TABLE_INVALIDATE
+config HAVE_RCU_TABLE_NO_INVALIDATE
 	bool
 
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f475dc5829b..e09cfb109b8c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
 	select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e6f2a38d2e61..d90d632868aa 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,6 +64,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index af35f5caadbe..181d0d522977 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,7 +181,6 @@ config X86
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if PARAVIRT
-	select HAVE_RCU_TABLE_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
 	select HAVE_STACKPROTECTOR		if CC_HAS_SANE_STACKPROTECTOR
diff --git a/mm/memory.c b/mm/memory.c
index 1832c5ed6ac0..ba5689610c04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -327,7 +327,7 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
 	/*
 	 * Invalidate page-table caches used by hardware walkers. Then we still
 	 * need to RCU-sched wait while freeing the pages because software
-- 
2.25.4


^ permalink raw reply related

* [PATCH v4 2/6] asm-generic/tlb: Track which levels of the page tables have been cleared
From: Santosh Sivaraj @ 2020-05-20  8:30 UTC (permalink / raw)
  To: stable, linuxppc-dev; +Cc: Sasha Levin, Santosh Sivaraj, Will Deacon, Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>

From: Will Deacon <will.deacon@arm.com>

commit a6d60245d6d9b1caf66b0d94419988c4836980af upstream

It is common for architectures with hugepage support to require only a
single TLB invalidation operation per hugepage during unmap(), rather than
iterating through the mapping at a PAGE_SIZE increment. Currently,
however, the level in the page table where the unmap() operation occurs
is not stored in the mmu_gather structure, therefore forcing
architectures to issue additional TLB invalidation operations or to give
up and over-invalidate by e.g. invalidating the entire TLB.

Ideally, we could add an interval rbtree to the mmu_gather structure,
which would allow us to associate the correct mapping granule with the
various sub-mappings within the range being invalidated. However, this
is costly in terms of book-keeping and memory management, so instead we
approximate by keeping track of the page table levels that are cleared
and provide a means to query the smallest granule required for invalidation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: prerequisite for upcoming tlbflush backports]
---
 include/asm-generic/tlb.h | 58 +++++++++++++++++++++++++++++++++------
 mm/memory.c               |  4 ++-
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 97306b32d8d2..f2b9dc9cbaf8 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -114,6 +114,14 @@ struct mmu_gather {
 	 */
 	unsigned int		freed_tables : 1;
 
+	/*
+	 * at which levels have we cleared entries?
+	 */
+	unsigned int		cleared_ptes : 1;
+	unsigned int		cleared_pmds : 1;
+	unsigned int		cleared_puds : 1;
+	unsigned int		cleared_p4ds : 1;
+
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
@@ -148,6 +156,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 		tlb->end = 0;
 	}
 	tlb->freed_tables = 0;
+	tlb->cleared_ptes = 0;
+	tlb->cleared_pmds = 0;
+	tlb->cleared_puds = 0;
+	tlb->cleared_p4ds = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -197,6 +209,25 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 }
 #endif
 
+static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
+{
+	if (tlb->cleared_ptes)
+		return PAGE_SHIFT;
+	if (tlb->cleared_pmds)
+		return PMD_SHIFT;
+	if (tlb->cleared_puds)
+		return PUD_SHIFT;
+	if (tlb->cleared_p4ds)
+		return P4D_SHIFT;
+
+	return PAGE_SHIFT;
+}
+
+static inline unsigned long tlb_get_unmap_size(struct mmu_gather *tlb)
+{
+	return 1UL << tlb_get_unmap_shift(tlb);
+}
+
 /*
  * In the case of tlb vma handling, we can optimise these away in the
  * case where we're doing a full MM flush.  When we're doing a munmap,
@@ -230,13 +261,19 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_tlb_entry(tlb, ptep, address)		\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->cleared_ptes = 1;				\
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	     \
-	do {							     \
-		__tlb_adjust_range(tlb, address, huge_page_size(h)); \
-		__tlb_remove_tlb_entry(tlb, ptep, address);	     \
+#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
+	do {							\
+		unsigned long _sz = huge_page_size(h);		\
+		__tlb_adjust_range(tlb, address, _sz);		\
+		if (_sz == PMD_SIZE)				\
+			tlb->cleared_pmds = 1;			\
+		else if (_sz == PUD_SIZE)			\
+			tlb->cleared_puds = 1;			\
+		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
 /**
@@ -250,6 +287,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE);	\
+		tlb->cleared_pmds = 1;					\
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);		\
 	} while (0)
 
@@ -264,6 +302,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pud_tlb_entry(tlb, pudp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PUD_SIZE);	\
+		tlb->cleared_puds = 1;					\
 		__tlb_remove_pud_tlb_entry(tlb, pudp, address);		\
 	} while (0)
 
@@ -289,7 +328,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pte_free_tlb(tlb, ptep, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
+		tlb->cleared_pmds = 1;				\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
 #endif
@@ -298,7 +338,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
+		tlb->cleared_puds = 1;				\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
 #endif
@@ -308,7 +349,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pud_free_tlb(tlb, pudp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
+		tlb->cleared_p4ds = 1;				\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
@@ -319,7 +361,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define p4d_free_tlb(tlb, pudp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
 		__p4d_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index bbf0cc4066c8..1832c5ed6ac0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -267,8 +267,10 @@ void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 {
 	struct mmu_gather_batch *batch, *next;
 
-	if (force)
+	if (force) {
+		__tlb_reset_range(tlb);
 		__tlb_adjust_range(tlb, start, end - start);
+	}
 
 	tlb_flush_mmu(tlb);
 
-- 
2.25.4


^ permalink raw reply related

* [PATCH v4 1/6] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather
From: Santosh Sivaraj @ 2020-05-20  8:30 UTC (permalink / raw)
  To: stable, linuxppc-dev
  Cc: Sasha Levin, Santosh Sivaraj, Peter Zijlstra, Will Deacon,
	Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>

From: Peter Zijlstra <peterz@infradead.org>

commit 22a61c3c4f1379ef8b0ce0d5cb78baf3178950e2 upstream

Some architectures require different TLB invalidation instructions
depending on whether it is only the last-level of page table being
changed, or whether there are also changes to the intermediate
(directory) entries higher up the tree.

Add a new bit to the flags bitfield in struct mmu_gather so that the
architecture code can operate accordingly if it's the intermediate
levels being invalidated.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: prerequisite for tlbflush backports]
---
 include/asm-generic/tlb.h | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..97306b32d8d2 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,12 +97,22 @@ struct mmu_gather {
 #endif
 	unsigned long		start;
 	unsigned long		end;
-	/* we are in the middle of an operation to clear
-	 * a full mm and can make some optimizations */
-	unsigned int		fullmm : 1,
-	/* we have performed an operation which
-	 * requires a complete flush of the tlb */
-				need_flush_all : 1;
+	/*
+	 * we are in the middle of an operation to clear
+	 * a full mm and can make some optimizations
+	 */
+	unsigned int		fullmm : 1;
+
+	/*
+	 * we have performed an operation which
+	 * requires a complete flush of the tlb
+	 */
+	unsigned int		need_flush_all : 1;
+
+	/*
+	 * we have removed page directories
+	 */
+	unsigned int		freed_tables : 1;
 
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
@@ -137,6 +147,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 		tlb->start = TASK_SIZE;
 		tlb->end = 0;
 	}
+	tlb->freed_tables = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -278,6 +289,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pte_free_tlb(tlb, ptep, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
 #endif
@@ -285,7 +297,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #ifndef pmd_free_tlb
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
 #endif
@@ -295,6 +308,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pud_free_tlb(tlb, pudp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
@@ -304,7 +318,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #ifndef p4d_free_tlb
 #define p4d_free_tlb(tlb, pudp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__p4d_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
-- 
2.25.4


^ permalink raw reply related

* [PATCH v4 0/6] Memory corruption may occur due to incorrent tlb flush
From: Santosh Sivaraj @ 2020-05-20  8:30 UTC (permalink / raw)
  To: stable, linuxppc-dev; +Cc: Sasha Levin, Santosh Sivaraj, Greg KH

The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
flushes) may result in random memory corruption. Any concurrent page-table walk
could end up with a Use-after-Free. Even on UP this might give issues, since
mmu_gather is preemptible these days. An interrupt or preempted task accessing
user pages might stumble into the free page if the hardware caches page
directories.

The series is a backport of the fix sent by Peter [1].

The first three patches are dependencies for the last patch (avoid potential
double flush). If the performance impact due to double flush is considered
trivial then the first three patches and last patch may be dropped.

This is only for v4.19 stable.
--
Changelog:
 v2: Send the patches with the correct format (commit sha1 upstream) for stable
 v3: Fix compilation for ppc44x_defconfig and mpc885_ads_defconfig
 v4: No change, Resend.

--
Aneesh Kumar K.V (1):
  powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case

Peter Zijlstra (4):
  asm-generic/tlb: Track freeing of page-table directories in struct
    mmu_gather
  asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
  mm/mmu_gather: invalidate TLB correctly on batch allocation failure
    and flush
  asm-generic/tlb: avoid potential double flush

Will Deacon (1):
  asm-generic/tlb: Track which levels of the page tables have been
    cleared

 arch/Kconfig                                 |   3 -
 arch/powerpc/Kconfig                         |   2 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h |   8 --
 arch/powerpc/include/asm/book3s/64/pgalloc.h |   2 -
 arch/powerpc/include/asm/nohash/32/pgalloc.h |   8 --
 arch/powerpc/include/asm/tlb.h               |  11 ++
 arch/powerpc/mm/pgtable-book3s64.c           |   7 --
 arch/sparc/include/asm/tlb_64.h              |   9 ++
 arch/x86/Kconfig                             |   1 -
 include/asm-generic/tlb.h                    | 103 ++++++++++++++++---
 mm/memory.c                                  |  20 ++--
 11 files changed, 122 insertions(+), 52 deletions(-)

-- 
2.25.4


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox