* Re: [PATCH v8 00/30] Initial Prefixed Instruction support
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
To: linuxppc-dev, Jordan Niethe; +Cc: alistair, naveen.n.rao, dja, npiggin, bala24
In-Reply-To: <20200506034050.24806-1-jniethe5@gmail.com>
On Wed, 6 May 2020 13:40:20 +1000, Jordan Niethe wrote:
> A future revision of the ISA will introduce prefixed instructions. A
> prefixed instruction is composed of a 4-byte prefix followed by a
> 4-byte suffix.
>
> All prefixes have the major opcode 1. A prefix will never be a valid
> word instruction. A suffix may be an existing word instruction or a
> new instruction.
>
> [...]
Applied to powerpc/next.
[01/30] powerpc/xmon: Remove store_inst() for patch_instruction()
https://git.kernel.org/powerpc/c/802268fd82676ffce432776f60b93a0b15e58e0c
[02/30] powerpc/xmon: Move breakpoint instructions to own array
https://git.kernel.org/powerpc/c/51c9ba11f17f25ace1ea6bbfd4586c59105432de
[03/30] powerpc/xmon: Move breakpoints to text section
https://git.kernel.org/powerpc/c/4eff2b4f32a309e2171bfe53db3e93b5614f77cb
[04/30] powerpc/xmon: Use bitwise calculations in_breakpoint_table()
https://git.kernel.org/powerpc/c/5a7fdcab54ef17c395fc47e73c828a1432e51683
[05/30] powerpc: Change calling convention for create_branch() et. al.
https://git.kernel.org/powerpc/c/7c95d8893fb55869882c9f68f4c94840dc43f18f
[06/30] powerpc: Use a macro for creating instructions from u32s
https://git.kernel.org/powerpc/c/753462512868674a788ecc77bb96752efb818785
[07/30] powerpc: Use an accessor for instructions
https://git.kernel.org/powerpc/c/777e26f0edf8dab58b8dd474d35d83bde0ac6d76
[08/30] powerpc: Use a function for getting the instruction op code
https://git.kernel.org/powerpc/c/8094892d1aff14269d3b7bfcd8b941217eecd81f
[09/30] powerpc: Use a function for byte swapping instructions
https://git.kernel.org/powerpc/c/aabd2233b6aefeee6d7a2f667076d8346be1d30a
[10/30] powerpc: Introduce functions for instruction equality
https://git.kernel.org/powerpc/c/217862d9b98bf08958d57fd7b31b9de0f1a9477d
[11/30] powerpc: Use a datatype for instructions
https://git.kernel.org/powerpc/c/94afd069d937d84fb4f696eb9a78db4084e43d21
[12/30] powerpc: Use a function for reading instructions
https://git.kernel.org/powerpc/c/f8faaffaa7d99028e457ef2d1dcb43a98f736938
[13/30] powerpc: Add a probe_user_read_inst() function
https://git.kernel.org/powerpc/c/7ba68b2172c19031fdc2a2caf37328edd146e299
[14/30] powerpc: Add a probe_kernel_read_inst() function
https://git.kernel.org/powerpc/c/95b980a00d1220ca67550a933166704db8bc5c14
[15/30] powerpc/kprobes: Use patch_instruction()
https://git.kernel.org/powerpc/c/a8646f43ba5046e7f5c4396125d5136bfcb17b49
[16/30] powerpc: Define and use get_user_instr() et. al.
https://git.kernel.org/powerpc/c/5249385ad7f0ac178433f0ae9cc5b64612c8ff77
[17/30] powerpc: Introduce a function for reporting instruction length
https://git.kernel.org/powerpc/c/622cf6f436a12338bbcfbb3474629755547fd112
[18/30] powerpc/xmon: Use a function for reading instructions
https://git.kernel.org/powerpc/c/6c7a4f0a9f66fc7fdc6e208559e5d562f53e0991
[19/30] powerpc/xmon: Move insertion of breakpoint for xol'ing
https://git.kernel.org/powerpc/c/7fccfcfba04f9cb46438f368755d368f6c57f3a0
[20/30] powerpc: Make test_translate_branch() independent of instruction length
https://git.kernel.org/powerpc/c/0b582db5490a1f250ef63337dd46d5c7599dae80
[21/30] powerpc: Enable Prefixed Instructions
https://git.kernel.org/powerpc/c/2aa6195e43b3740258ead93aee42ac719dd4c4b0
[22/30] powerpc: Define new SRR1 bits for a ISA v3.1
https://git.kernel.org/powerpc/c/b691505ef9232a6e82f1c160911afcb4cb20487b
[23/30] powerpc: Add prefixed instructions to instruction data type
https://git.kernel.org/powerpc/c/650b55b707fdfa764e9f2b81314d3eb4216fb962
[24/30] powerpc: Test prefixed code patching
https://git.kernel.org/powerpc/c/f77f8ff7f13e6411c2e0ba25bb7e012a5ae6c927
[25/30] powerpc: Test prefixed instructions in feature fixups
https://git.kernel.org/powerpc/c/785b79d1e02873c2088ee1301154c66dace66ce5
[26/30] powerpc/xmon: Don't allow breakpoints on suffixes
https://git.kernel.org/powerpc/c/c9c831aebd8663d0129bbcee4d76be889f0627fe
[27/30] powerpc/kprobes: Don't allow breakpoints on suffixes
https://git.kernel.org/powerpc/c/b4657f7650babc9bfb41ce875abe41b18604a105
[28/30] powerpc: Support prefixed instructions in alignment handler
https://git.kernel.org/powerpc/c/9409d2f9dad2f0679d67dc24d8116dd3e837b035
[29/30] powerpc sstep: Add support for prefixed load/stores
https://git.kernel.org/powerpc/c/50b80a12e4ccff46d53b93754d817acd98bc9ae0
[30/30] powerpc sstep: Add support for prefixed fixed-point arithmetic
https://git.kernel.org/powerpc/c/3920742b92f5ea19a220edb947b6f33c99f501da
cheers
^ permalink raw reply
* Re: [PATCH] powerpc: Replace zero-length array with flexible-array
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Gustavo A. R. Silva; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200507185749.GA14994@embeddedor>
On Thu, 7 May 2020 13:57:49 -0500, Gustavo A. R. Silva wrote:
> The current codebase makes use of the zero-length array language
> extension to the C90 standard, but the preferred mechanism to declare
> variable-length types such as these ones is a flexible array member[1][2],
> introduced in C99:
>
> struct foo {
> int stuff;
> struct boo array[];
> };
>
> [...]
Applied to powerpc/next.
[1/1] powerpc: Replace zero-length array with flexible-array
https://git.kernel.org/powerpc/c/0f6be41c60699fd8cdfa93e5e85a306cec1ac1d0
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/mm: Replace zero-length array with flexible-array
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
To: Michael Ellerman, Gustavo A. R. Silva
Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <20200507185755.GA15014@embeddedor>
On Thu, 7 May 2020 13:57:55 -0500, Gustavo A. R. Silva wrote:
> The current codebase makes use of the zero-length array language
> extension to the C90 standard, but the preferred mechanism to declare
> variable-length types such as these ones is a flexible array member[1][2],
> introduced in C99:
>
> struct foo {
> int stuff;
> struct boo array[];
> };
>
> [...]
Applied to powerpc/next.
[1/1] powerpc/mm: Replace zero-length array with flexible-array
https://git.kernel.org/powerpc/c/02bddf21c34d0a918acc8647195ba4507e3db8fc
cheers
^ permalink raw reply
* Re: [PATCH v2 0/9] powerpc + ps3 patches
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
To: Michael Ellerman, Jens Axboe, Geoff Levand, David S. Miller
Cc: Geert Uytterhoeven, linuxppc-dev, Markus Elfring,
Emmanuel Nicolet
In-Reply-To: <cover.1589049250.git.geoff@infradead.org>
On Sat, 09 May 2020 18:58:31 +0000, Geoff Levand wrote:
> This is a combined V2 of the two patch sets I sent out on March 27th,
> 'PS3 patches for v5.7' and 'powerpc: Minor updates to improve build debugging'.
>
> I've dropped these two patches that were in my 'PS3 patches for v5.7' set:
>
> powerpc/ps3: Add lv1_panic
> powerpc/ps3: Add udbg_panic
>
> [...]
Patches 1-6 and 8 applied to powerpc/next.
[1/9] powerpc/head_check: Automatic verbosity
https://git.kernel.org/powerpc/c/4c592a34391ea4987d29c1718f931b50416ca015
[2/9] powerpc/wrapper: Output linker map file
https://git.kernel.org/powerpc/c/f61200d3e3386e78d49677dfb3911c9d7c0dfe4b
[3/9] powerpc/head_check: Avoid broken pipe
https://git.kernel.org/powerpc/c/331aa46aaf51325d8532a4948f5127b2edc441a5
[4/9] drivers/ps3: Remove duplicate error messages
https://git.kernel.org/powerpc/c/6a8aa782cece2330322c33452a767f53f8ba38c9
[5/9] net/ps3_gelic_net: Remove duplicate error message
https://git.kernel.org/powerpc/c/7b27b95a894d6a85c076f8d1f00e35316739bf51
[6/9] ps3disk: use the default segment boundary
https://git.kernel.org/powerpc/c/720bc316690bd27dea9d71510b50f0cd698ffc32
[8/9] powerpc/ps3: Fix kexec shutdown hang
https://git.kernel.org/powerpc/c/126554465d93b10662742128918a5fc338cda4aa
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/8xx: Update email address in MAINTAINERS
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Michael Ellerman, Christophe Leroy,
Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <9fd0f9a827ebbeae64ad7a6f6c595d242f4dd5fc.1588747860.git.christophe.leroy@csgroup.eu>
On Wed, 6 May 2020 06:51:59 +0000 (UTC), Christophe Leroy wrote:
> Since 01 May 2020, our email adresses have changed to @csgroup.eu
>
> Update MAINTAINERS accordingly.
Applied to powerpc/next.
[1/1] powerpc/8xx: Update email address in MAINTAINERS
https://git.kernel.org/powerpc/c/679d74abc4e14cb369e46f080d90f4dc8c143e65
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
To: Christophe Leroy, erhard_f, Paul Mackerras,
Benjamin Herrenschmidt, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <2c50f3b1c9bbaa4217c9a98f3044bd2a36c46a4f.1586361277.git.christophe.leroy@c-s.fr>
On Wed, 8 Apr 2020 15:58:49 +0000 (UTC), Christophe Leroy wrote:
> When CONFIG_KASAN is selected, the stack usage is increased.
>
> In the same way as x86 and arm64 architectures, increase
> THREAD_SHIFT when CONFIG_KASAN is selected.
Applied to powerpc/next.
[1/1] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
https://git.kernel.org/powerpc/c/edbadaf0671072298e506074128b64e003c5812c
cheers
^ permalink raw reply
* Re: [PATCH 1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN()
From: Michael Ellerman @ 2020-05-20 10:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Michael Ellerman, Christophe Leroy,
Paul Mackerras
Cc: linux-fbdev, kvm, alsa-devel, dri-devel, linux-kernel,
linuxppc-dev
In-Reply-To: <a5945463f86c984151962a475a3ee56a2893e85d.1587407777.git.christophe.leroy@c-s.fr>
On Mon, 20 Apr 2020 18:36:34 +0000 (UTC), Christophe Leroy wrote:
> _ALIGN_UP() is specific to powerpc
> ALIGN() is generic and does the same
>
> Replace _ALIGN_UP() by ALIGN()
Applied to powerpc/next.
[1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN()
https://git.kernel.org/powerpc/c/7bfc3c84cbf5167d943cff9b3d2619dab0b7894c
[2/5] powerpc: Replace _ALIGN_DOWN() by ALIGN_DOWN()
https://git.kernel.org/powerpc/c/e96d904ede6756641563d27daa746875b478a6c8
[3/5] powerpc: Replace _ALIGN_UP() by ALIGN()
https://git.kernel.org/powerpc/c/b711531641038f3ff3723914f3d5ba79848d347e
[4/5] powerpc: Replace _ALIGN() by ALIGN()
https://git.kernel.org/powerpc/c/d3f3d3bf76cfb04e73436a15e3987d3573e7523a
[5/5] powerpc: Remove _ALIGN_UP(), _ALIGN_DOWN() and _ALIGN()
https://git.kernel.org/powerpc/c/4cdb2da654033d76e1b1cb4ac427d9193dce816b
cheers
^ permalink raw reply
* [PATCH] Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."
From: Christophe Leroy @ 2020-05-20 10:23 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Rui Salvaterra
Cc: linuxppc-dev, linux-kernel
This reverts commit 697ece78f8f749aeea40f2711389901f0974017a.
The implementation of SWAP on powerpc requires page protection
bits to not be one of the least significant PTE bits.
Until the SWAP implementation is changed and this requirement voids,
we have to keep at least _PAGE_RW outside of the 3 last bits.
For now, revert to previous PTE bits order. A further rework
may come later.
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Fixes: 697ece78f8f7 ("powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/book3s/32/hash.h | 8 ++++----
arch/powerpc/kernel/head_32.S | 9 ++++++---
arch/powerpc/mm/book3s32/hash_low.S | 14 ++++++++------
3 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/hash.h b/arch/powerpc/include/asm/book3s/32/hash.h
index 34a7215ae81e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -17,9 +17,9 @@
* updating the accessed and modified bits in the page table tree.
*/
-#define _PAGE_USER 0x001 /* usermode access allowed */
-#define _PAGE_RW 0x002 /* software: user write access allowed */
-#define _PAGE_PRESENT 0x004 /* software: pte contains a translation */
+#define _PAGE_PRESENT 0x001 /* software: pte contains a translation */
+#define _PAGE_HASHPTE 0x002 /* hash_page has made an HPTE for this pte */
+#define _PAGE_USER 0x004 /* usermode access allowed */
#define _PAGE_GUARDED 0x008 /* G: prohibit speculative access */
#define _PAGE_COHERENT 0x010 /* M: enforce memory coherence (SMP systems) */
#define _PAGE_NO_CACHE 0x020 /* I: cache inhibit */
@@ -27,7 +27,7 @@
#define _PAGE_DIRTY 0x080 /* C: page changed */
#define _PAGE_ACCESSED 0x100 /* R: page referenced */
#define _PAGE_EXEC 0x200 /* software: exec allowed */
-#define _PAGE_HASHPTE 0x400 /* hash_page has made an HPTE for this pte */
+#define _PAGE_RW 0x400 /* software: user write access allowed */
#define _PAGE_SPECIAL 0x800 /* software: Special page */
#ifdef CONFIG_PTE_64BIT
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index daaa153950c2..97c887950c3c 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -348,7 +348,7 @@ BEGIN_MMU_FTR_SECTION
andis. r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH)@h
#endif
bne handle_page_fault_tramp_2 /* if not, try to put a PTE */
- rlwinm r3, r5, 32 - 24, 30, 30 /* DSISR_STORE -> _PAGE_RW */
+ rlwinm r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */
bl hash_page
b handle_page_fault_tramp_1
FTR_SECTION_ELSE
@@ -497,6 +497,7 @@ InstructionTLBMiss:
andc. r1,r1,r0 /* check access & ~permission */
bne- InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
+ rlwimi r0,r0,32-2,31,31 /* _PAGE_USER -> PP lsb */
ori r1, r1, 0xe06 /* clear out reserved bits */
andc r1, r0, r1 /* PP = user? 1 : 0 */
BEGIN_FTR_SECTION
@@ -564,8 +565,9 @@ DataLoadTLBMiss:
* we would need to update the pte atomically with lwarx/stwcx.
*/
/* Convert linux-style PTE to low word of PPC-style PTE */
- rlwinm r1,r0,0,30,30 /* _PAGE_RW -> PP msb */
- rlwimi r0,r0,1,30,30 /* _PAGE_USER -> PP msb */
+ rlwinm r1,r0,32-9,30,30 /* _PAGE_RW -> PP msb */
+ rlwimi r0,r0,32-1,30,30 /* _PAGE_USER -> PP msb */
+ rlwimi r0,r0,32-1,31,31 /* _PAGE_USER -> PP lsb */
ori r1,r1,0xe04 /* clear out reserved bits */
andc r1,r0,r1 /* PP = user? rw? 1: 3: 0 */
BEGIN_FTR_SECTION
@@ -643,6 +645,7 @@ DataStoreTLBMiss:
* we would need to update the pte atomically with lwarx/stwcx.
*/
/* Convert linux-style PTE to low word of PPC-style PTE */
+ rlwimi r0,r0,32-2,31,31 /* _PAGE_USER -> PP lsb */
li r1,0xe06 /* clear out reserved bits & PP msb */
andc r1,r0,r1 /* PP = user? 1: 0 */
BEGIN_FTR_SECTION
diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S
index 6d236080cb1a..877d880890fe 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -35,7 +35,7 @@ mmu_hash_lock:
/*
* Load a PTE into the hash table, if possible.
* The address is in r4, and r3 contains an access flag:
- * _PAGE_RW (0x002) if a write.
+ * _PAGE_RW (0x400) if a write.
* r9 contains the SRR1 value, from which we use the MSR_PR bit.
* SPRG_THREAD contains the physical address of the current task's thread.
*
@@ -69,7 +69,7 @@ _GLOBAL(hash_page)
blt+ 112f /* assume user more likely */
lis r5, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
addi r5 ,r5 ,(swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
- rlwimi r3,r9,32-14,31,31 /* MSR_PR -> _PAGE_USER */
+ rlwimi r3,r9,32-12,29,29 /* MSR_PR -> _PAGE_USER */
112:
#ifndef CONFIG_PTE_64BIT
rlwimi r5,r4,12,20,29 /* insert top 10 bits of address */
@@ -94,7 +94,7 @@ _GLOBAL(hash_page)
#else
rlwimi r8,r4,23,20,28 /* compute pte address */
#endif
- rlwinm r0,r3,6,24,24 /* _PAGE_RW access -> _PAGE_DIRTY */
+ rlwinm r0,r3,32-3,24,24 /* _PAGE_RW access -> _PAGE_DIRTY */
ori r0,r0,_PAGE_ACCESSED|_PAGE_HASHPTE
/*
@@ -310,9 +310,11 @@ Hash_msk = (((1 << Hash_bits) - 1) * 64)
_GLOBAL(create_hpte)
/* Convert linux-style PTE (r5) to low word of PPC-style PTE (r8) */
+ rlwinm r8,r5,32-9,30,30 /* _PAGE_RW -> PP msb */
rlwinm r0,r5,32-6,30,30 /* _PAGE_DIRTY -> PP msb */
- and r8,r5,r0 /* writable if _RW & _DIRTY */
- rlwimi r5,r5,1,30,30 /* _PAGE_USER -> PP msb */
+ and r8,r8,r0 /* writable if _RW & _DIRTY */
+ rlwimi r5,r5,32-1,30,30 /* _PAGE_USER -> PP msb */
+ rlwimi r5,r5,32-2,31,31 /* _PAGE_USER -> PP lsb */
ori r8,r8,0xe04 /* clear out reserved bits */
andc r8,r5,r8 /* PP = user? (rw&dirty? 1: 3): 0 */
BEGIN_FTR_SECTION
@@ -564,7 +566,7 @@ _GLOBAL(flush_hash_pages)
33: lwarx r8,0,r5 /* fetch the pte flags word */
andi. r0,r8,_PAGE_HASHPTE
beq 8f /* done if HASHPTE is already clear */
- rlwinm r8,r8,0,~_PAGE_HASHPTE /* clear HASHPTE bit */
+ rlwinm r8,r8,0,31,29 /* clear HASHPTE bit */
stwcx. r8,0,r5 /* update the pte */
bne- 33b
--
2.25.0
^ permalink raw reply related
* Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-05-20 10:19 UTC (permalink / raw)
To: Aneesh Kumar K.V, linuxppc-dev, linux-nvdimm, linux-kernel; +Cc: Steven Rostedt
In-Reply-To: <87a723f5fs.fsf@linux.ibm.com>
Thanks for reviewing this patch Aneesh.
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>
> ....
>
> +
>> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> +struct nd_pdsm_cmd_pkg {
>> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> + __u16 payload_offset; /* In: offset from start of struct */
>> + __u16 payload_version; /* In/Out: version of the payload */
>> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> +} __packed;
>
> that payload_offset can be avoided if we prevent userspace to user a
> different variant of nd_pdsm_cmd_pkg which different header. We can keep
> things simpler if we can always find payload at
> nd_pdsm_cmd_pkg->payload.
Had introduced this member to handle case where new fields are added to
'struct nd_pdsm_cmd_pkg' without having to break the ABI. But agree with
the point that you made later that this can be simplified by replacing
'payload_offset' with a set of reserved variables. Will address this in
next iteration of this patchset.
>
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> + */
>> +enum papr_scm_pdsm {
>> + PAPR_SCM_PDSM_MIN = 0x0,
>> + PAPR_SCM_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> +{
>> + return (struct nd_pdsm_cmd_pkg *) cmd;
>> +}
>> +
>> +/* Return the payload pointer for a given pcmd */
>> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> +{
>> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> + return NULL;
>> + else
>> + return (void *)((__u8 *) pcmd + pcmd->payload_offset);
>> +}
>> +
>
> we need to make sure userspace is not passing a wrong payload_offset.
Agree, that this function should have more strict checking for
payload_offset field. However will be getting rid of
'payload_offset' all together in the next iteration as you previously
suggested.
> and in the next patch you do
>
> + /* Copy the health struct to the payload */
> + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
> + pkg->hdr.nd_fw_size = copysize;
> +
Yes this is already being done in the patchset and changes proposed to
this pdsm_cmd_to_payload() should not impact other patches as
pdsm_cmd_to_payload() abstracts rest of the code from how to access the
payload.
> All this can be simplified if you can keep payload at
> nd_pdsm_cmd_pkg->payload.
>
> If you still want to have the ability to extend the header, then added a
> reserved field similar to nd_cmd_pkg.
>
Agree to this and will address this in V8.
>
> -aneesh
--
Cheers
~ Vaibhav
^ permalink raw reply
* [PATCH V3 2/2] tools/perf: Add perf tools support for extended register capability in powerpc
From: Athira Rajeev @ 2020-05-20 9:45 UTC (permalink / raw)
To: linuxppc-dev
Cc: ravi.bangoria, atrajeev, maddy, linux-kernel, acme, anju, jolsa
In-Reply-To: <1589967933-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com>
From: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Add extended regs to sample_reg_mask in the tool side to use
with `-I?` option. Perf tools side uses extended mask to display
the platform supported register names (with -I? option) to the user
and also send this mask to the kernel to capture the extended registers
in each sample. Hence decide the mask value based on the processor
version.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
[Decide extended mask at run time based on platform]
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++++++-
tools/perf/arch/powerpc/include/perf_regs.h | 5 ++-
tools/perf/arch/powerpc/util/perf_regs.c | 55 +++++++++++++++++++++++++
3 files changed, 72 insertions(+), 2 deletions(-)
diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
- PERF_REG_POWERPC_MAX,
+ /* Extended registers */
+ PERF_REG_POWERPC_MMCR0,
+ PERF_REG_POWERPC_MMCR1,
+ PERF_REG_POWERPC_MMCR2,
+ /* Max regs without the extended regs */
+ PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
};
+
+#define PERF_REG_PMU_MASK ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300 (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+ - PERF_REG_PMU_MASK)
+
#endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h b/tools/perf/arch/powerpc/include/perf_regs.h
index e18a355..46ed00d 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -64,7 +64,10 @@
[PERF_REG_POWERPC_DAR] = "dar",
[PERF_REG_POWERPC_DSISR] = "dsisr",
[PERF_REG_POWERPC_SIER] = "sier",
- [PERF_REG_POWERPC_MMCRA] = "mmcra"
+ [PERF_REG_POWERPC_MMCRA] = "mmcra",
+ [PERF_REG_POWERPC_MMCR0] = "mmcr0",
+ [PERF_REG_POWERPC_MMCR1] = "mmcr1",
+ [PERF_REG_POWERPC_MMCR2] = "mmcr2",
};
static inline const char *perf_reg_name(int id)
diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/powerpc/util/perf_regs.c
index 0a52429..9179230 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -6,9 +6,14 @@
#include "../../../util/perf_regs.h"
#include "../../../util/debug.h"
+#include "../../../util/event.h"
+#include "../../../util/header.h"
+#include "../../../perf-sys.h"
#include <linux/kernel.h>
+#define PVR_POWER9 0x004E
+
const struct sample_reg sample_reg_masks[] = {
SMPL_REG(r0, PERF_REG_POWERPC_R0),
SMPL_REG(r1, PERF_REG_POWERPC_R1),
@@ -55,6 +60,9 @@
SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
SMPL_REG(sier, PERF_REG_POWERPC_SIER),
SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
+ SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
+ SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
+ SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
SMPL_REG_END
};
@@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
return SDT_ARG_VALID;
}
+
+uint64_t arch__intr_reg_mask(void)
+{
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_HARDWARE,
+ .config = PERF_COUNT_HW_CPU_CYCLES,
+ .sample_type = PERF_SAMPLE_REGS_INTR,
+ .precise_ip = 1,
+ .disabled = 1,
+ .exclude_kernel = 1,
+ };
+ int fd, ret;
+ char buffer[64];
+ u32 version;
+ u64 extended_mask = 0;
+
+ /* Get the PVR value to set the extended
+ * mask specific to platform
+ */
+ get_cpuid(buffer, sizeof(buffer));
+ ret = sscanf(buffer, "%u,", &version);
+
+ if (ret != 1) {
+ pr_debug("Failed to get the processor version, unable to output extended registers\n");
+ return PERF_REGS_MASK;
+ }
+
+ if (version == PVR_POWER9)
+ extended_mask = PERF_REG_PMU_MASK_300;
+ else
+ return PERF_REGS_MASK;
+
+ attr.sample_regs_intr = extended_mask;
+ attr.sample_period = 1;
+ event_attr_init(&attr);
+
+ /*
+ * check if the pmu supports perf extended regs, before
+ * returning the register mask to sample.
+ */
+ fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+ if (fd != -1) {
+ close(fd);
+ return (extended_mask | PERF_REGS_MASK);
+ }
+ return PERF_REGS_MASK;
+}
--
1.8.3.1
^ permalink raw reply related
* [PATCH V3 1/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs
From: Athira Rajeev @ 2020-05-20 9:45 UTC (permalink / raw)
To: linuxppc-dev
Cc: ravi.bangoria, atrajeev, maddy, linux-kernel, acme, anju, jolsa
In-Reply-To: <1589967933-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com>
From: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Add support for perf extended register capability in powerpc.
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
PMU which support extended registers. The generic code define the mask
of extended registers as 0 for non supported architectures.
Patch adds extended regs support for power9 platform by
exposing MMCR0, MMCR1 and MMCR2 registers.
REG_RESERVED mask needs update to include extended regs.
`PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
is defined at runtime in the kernel based on platform since the supported
registers may differ from one processor version to another and hence the
MASK value.
with patch
----------
available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2
PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
... intr regs: mask 0xffffffffffff ABI 64-bit
.... r0 0xc00000000012b77c
.... r1 0xc000003fe5e03930
.... r2 0xc000000001b0e000
.... r3 0xc000003fdcddf800
.... r4 0xc000003fc7880000
.... r5 0x9c422724be
.... r6 0xc000003fe5e03908
.... r7 0xffffff63bddc8706
.... r8 0x9e4
.... r9 0x0
.... r10 0x1
.... r11 0x0
.... r12 0xc0000000001299c0
.... r13 0xc000003ffffc4800
.... r14 0x0
.... r15 0x7fffdd8b8b00
.... r16 0x0
.... r17 0x7fffdd8be6b8
.... r18 0x7e7076607730
.... r19 0x2f
.... r20 0xc00000001fc26c68
.... r21 0xc0002041e4227e00
.... r22 0xc00000002018fb60
.... r23 0x1
.... r24 0xc000003ffec4d900
.... r25 0x80000000
.... r26 0x0
.... r27 0x1
.... r28 0x1
.... r29 0xc000000001be1260
.... r30 0x6008010
.... r31 0xc000003ffebb7218
.... nip 0xc00000000012b910
.... msr 0x9000000000009033
.... orig_r3 0xc00000000012b86c
.... ctr 0xc0000000001299c0
.... link 0xc00000000012b77c
.... xer 0x0
.... ccr 0x28002222
.... softe 0x1
.... trap 0xf00
.... dar 0x0
.... dsisr 0x80000000000
.... sier 0x0
.... mmcra 0x80000000000
.... mmcr0 0x82008090
.... mmcr1 0x1e000000
.... mmcr2 0x0
... thread: perf:4784
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
[Defined PERF_REG_EXTENDED_MASK at run time to add support for different platforms ]
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/perf_event_server.h | 8 +++++++
arch/powerpc/include/uapi/asm/perf_regs.h | 14 +++++++++++-
arch/powerpc/perf/core-book3s.c | 1 +
arch/powerpc/perf/perf_regs.c | 34 +++++++++++++++++++++++++---
arch/powerpc/perf/power9-pmu.c | 6 +++++
5 files changed, 59 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 3e9703f..1458e1a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -15,6 +15,9 @@
#define MAX_EVENT_ALTERNATIVES 8
#define MAX_LIMITED_HWCOUNTERS 2
+extern u64 mask_var;
+#define PERF_REG_EXTENDED_MASK mask_var
+
struct perf_event;
/*
@@ -55,6 +58,11 @@ struct power_pmu {
int *blacklist_ev;
/* BHRB entries in the PMU */
int bhrb_nr;
+ /*
+ * set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
+ * the pmu supports extended perf regs capability
+ */
+ int capabilities;
};
/*
diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h b/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
- PERF_REG_POWERPC_MAX,
+ /* Extended registers */
+ PERF_REG_POWERPC_MMCR0,
+ PERF_REG_POWERPC_MMCR1,
+ PERF_REG_POWERPC_MMCR2,
+ /* Max regs without the extended regs */
+ PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
};
+
+#define PERF_REG_PMU_MASK ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300 (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+ - PERF_REG_PMU_MASK)
+
#endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf..f56b778 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2276,6 +2276,7 @@ int register_power_pmu(struct power_pmu *pmu)
power_pmu.attr_groups = ppmu->attr_groups;
+ power_pmu.capabilities |= (ppmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS);
#ifdef MSR_HV
/*
* Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
index a213a0a..f1dbbc5 100644
--- a/arch/powerpc/perf/perf_regs.c
+++ b/arch/powerpc/perf/perf_regs.c
@@ -13,9 +13,11 @@
#include <asm/ptrace.h>
#include <asm/perf_regs.h>
+u64 mask_var;
+
#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
-#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1))
+#define REG_RESERVED (~(PERF_REG_EXTENDED_MASK | PERF_REG_PMU_MASK))
static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
PT_REGS_OFFSET(PERF_REG_POWERPC_R0, gpr[0]),
@@ -69,10 +71,26 @@
PT_REGS_OFFSET(PERF_REG_POWERPC_MMCRA, dsisr),
};
+/* Function to return the extended register values */
+static u64 get_ext_regs_value(int idx)
+{
+ switch (idx) {
+ case PERF_REG_POWERPC_MMCR0:
+ return mfspr(SPRN_MMCR0);
+ case PERF_REG_POWERPC_MMCR1:
+ return mfspr(SPRN_MMCR1);
+ case PERF_REG_POWERPC_MMCR2:
+ return mfspr(SPRN_MMCR2);
+ default: return 0;
+ }
+}
+
u64 perf_reg_value(struct pt_regs *regs, int idx)
{
- if (WARN_ON_ONCE(idx >= PERF_REG_POWERPC_MAX))
- return 0;
+ u64 PERF_REG_EXTENDED_MAX;
+
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ PERF_REG_EXTENDED_MAX = PERF_REG_POWERPC_MMCR2 + 1;
if (idx == PERF_REG_POWERPC_SIER &&
(IS_ENABLED(CONFIG_FSL_EMB_PERF_EVENT) ||
@@ -85,6 +103,16 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
IS_ENABLED(CONFIG_PPC32)))
return 0;
+ if (idx >= PERF_REG_POWERPC_MAX && idx < PERF_REG_EXTENDED_MAX)
+ return get_ext_regs_value(idx);
+
+ /*
+ * If the idx is referring to value beyond the
+ * supported registers, return 0 with a warning
+ */
+ if (WARN_ON_ONCE(idx >= PERF_REG_EXTENDED_MAX))
+ return 0;
+
return regs_get_register(regs, pt_regs_offset[idx]);
}
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 08c3ef7..4525090 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -90,6 +90,8 @@ enum {
#define POWER9_MMCRA_IFM3 0x00000000C0000000UL
#define POWER9_MMCRA_BHRB_MASK 0x00000000C0000000UL
+extern u64 mask_var;
+
/* Nasty Power9 specific hack */
#define PVR_POWER9_CUMULUS 0x00002000
@@ -434,6 +436,7 @@ static void power9_config_bhrb(u64 pmu_bhrb_filter)
.cache_events = &power9_cache_events,
.attr_groups = power9_pmu_attr_groups,
.bhrb_nr = 32,
+ .capabilities = PERF_PMU_CAP_EXTENDED_REGS,
};
int init_power9_pmu(void)
@@ -457,6 +460,9 @@ int init_power9_pmu(void)
}
}
+ /* Set the PERF_REG_EXTENDED_MASK here */
+ mask_var = PERF_REG_PMU_MASK_300;
+
rc = register_power_pmu(&power9_pmu);
if (rc)
return rc;
--
1.8.3.1
^ permalink raw reply related
* [PATCH V3 0/2] powerpc/perf: Add support for perf extended regs in powerpc
From: Athira Rajeev @ 2020-05-20 9:45 UTC (permalink / raw)
To: linuxppc-dev
Cc: ravi.bangoria, atrajeev, maddy, linux-kernel, acme, anju, jolsa
Patch set to add support for perf extended register capability in
powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to
indicate the PMU which support extended registers. The generic code
define the mask of extended registers as 0 for non supported architectures.
patch 1/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the
values of mmcr0,mmcr1,mmcr2 for POWER9. Defines `PERF_REG_EXTENDED_MASK`
at runtime which contains mask value of the supported registers under
extended regs.
Patch 2/2 adds extended regs to sample_reg_mask in the tool side to use
with `-I?` option.
Anju T Sudhakar (2):
powerpc/perf: Add support for outputting extended regs in perf
intr_regs
tools/perf: Add perf tools support for extended register capability in
powerpc
---
Changes from v2 -> v3
- Split kernel and tools side patches as suggested by Arnaldo
- Addressed review comment from Madhavan Srinivasn
Changes from v1 -> v2
- PERF_REG_EXTENDED_MASK` is defined at runtime in the kernel
based on platform. This will give flexibility in using extended
regs for all processor versions where the supported registers may differ.
- removed PERF_REG_EXTENDED_MASK from the perf tools side. Based on the
processor version(from PVR value), tool side will return the appropriate
extended mask
- Since tool changes can handle without a "PERF_REG_EXTENDED_MASK" macro,
dropped patch to set NO_AUXTRACE.
- Addressed review comments from Ravi Bangoria for V1
---
arch/powerpc/include/asm/perf_event_server.h | 8 ++++
arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++++++-
arch/powerpc/perf/core-book3s.c | 1 +
arch/powerpc/perf/perf_regs.c | 34 +++++++++++++--
arch/powerpc/perf/power9-pmu.c | 6 +++
tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++++++-
tools/perf/arch/powerpc/include/perf_regs.h | 5 ++-
tools/perf/arch/powerpc/util/perf_regs.c | 55 +++++++++++++++++++++++++
8 files changed, 131 insertions(+), 6 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
From: Lucas Stach @ 2020-05-20 9:42 UTC (permalink / raw)
To: Shengjiu Wang
Cc: sumit.semwal, linaro-mm-sig, Linux-ALSA, linuxppc-dev,
linux-kernel, Timur Tabi, Xiubo Li, shawnguo, Shengjiu Wang,
Takashi Iwai, Liam Girdwood, dri-devel, perex, Nicolin Chen,
Mark Brown, linux-imx, kernel, Fabio Estevam, s.hauer,
linux-arm-kernel, linux-media
In-Reply-To: <CAA+D8APhHvA39wmCayeCsAEKmOJ0n7qOQiT1tZmFHr4+yASgTw@mail.gmail.com>
Am Mittwoch, den 20.05.2020, 16:20 +0800 schrieb Shengjiu Wang:
> Hi
>
> On Tue, May 19, 2020 at 6:04 PM Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang:
> > > There are two requirements that we need to move the request
> > > of dma channel from probe to open.
> >
> > How do you handle -EPROBE_DEFER return code from the channel request if
> > you don't do it in probe?
>
> I use the dma_request_slave_channel or dma_request_channel instead
> of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER
> return code.
This is a pretty weak argument. The dmaengine device might probe after
you try to get the channel. Using a function to request the channel
that doesn't allow you to handle probe deferral is IMHO a bug and
should be fixed, instead of building even more assumptions on top of
it.
> > > - When dma device binds with power-domains, the power will
> > > be enabled when we request dma channel. If the request of dma
> > > channel happen on probe, then the power-domains will be always
> > > enabled after kernel boot up, which is not good for power
> > > saving, so we need to move the request of dma channel to .open();
> >
> > This is certainly something which could be fixed in the dmaengine
> > driver.
>
> Dma driver always call the pm_runtime_get_sync in
> device_alloc_chan_resources, the device_alloc_chan_resources is
> called when channel is requested. so power is enabled on channel
> request.
So why can't you fix the dmaengine driver to do that RPM call at a
later time when the channel is actually going to be used? This will
allow further power savings with other slave devices than the audio
PCM.
Regards,
Lucas
> > > - With FE-BE case, if the dma channel is requested in probe,
> > > then there will be below issue, which is caused by that the
> > > dma channel will be requested duplicately
> >
> > Why is this requested a second time? Is this just some missing cleanup
> > on a deferred probe path?
>
> Not relate with deferred probe. With DMA1->ASRC->DMA2->ESAI case,
> the DMA1->ASRC->DMA2 is in FE, ESAI is in BE. When ESAI drvier
> probe, DMA3 channel is created with ESAI's "dma:tx" (DMA3 channel
> is not used in this FE-BE case). When FE-BE startup, DMA2
> channel is created, it needs the ESAI's "dma:tx", so below warning
> comes out.
>
> > Regards,
> > Lucas
> >
> > > [ 638.906268] sysfs: cannot create duplicate filename '/devices/soc0/soc/2000000.bus/2000000.spba-bus/2024000.esai/dma:tx'
> > > [ 638.919061] CPU: 1 PID: 673 Comm: aplay Not tainted 5.7.0-rc1-12956-gfc64b2585593 #287
> > > [ 638.927113] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > > [ 638.933690] [<c0110dd8>] (unwind_backtrace) from [<c010b8ec>] (show_stack+0x10/0x14)
> > > [ 638.941464] [<c010b8ec>] (show_stack) from [<c0557fc0>] (dump_stack+0xe4/0x118)
> > > [ 638.948808] [<c0557fc0>] (dump_stack) from [<c032aeb4>] (sysfs_warn_dup+0x50/0x64)
> > > [ 638.956406] [<c032aeb4>] (sysfs_warn_dup) from [<c032b1a8>] (sysfs_do_create_link_sd+0xc8/0xd4)
> > > [ 638.965134] [<c032b1a8>] (sysfs_do_create_link_sd) from [<c05dc668>] (dma_request_chan+0xb0/0x210)
> > > [ 638.974120] [<c05dc668>] (dma_request_chan) from [<c05dc7d0>] (dma_request_slave_channel+0x8/0x14)
> > > [ 638.983111] [<c05dc7d0>] (dma_request_slave_channel) from [<c09d5548>] (fsl_asrc_dma_hw_params+0x1e0/0x438)
> > > [ 638.992881] [<c09d5548>] (fsl_asrc_dma_hw_params) from [<c09c1654>] (soc_pcm_hw_params+0x4a0/0x6a8)
> > > [ 639.001952] [<c09c1654>] (soc_pcm_hw_params) from [<c09c39d4>] (dpcm_fe_dai_hw_params+0x70/0xe4)
> > > [ 639.010765] [<c09c39d4>] (dpcm_fe_dai_hw_params) from [<c099b274>] (snd_pcm_hw_params+0x158/0x418)
> > > [ 639.019750] [<c099b274>] (snd_pcm_hw_params) from [<c099c5a0>] (snd_pcm_ioctl+0x734/0x183c)
> > > [ 639.028129] [<c099c5a0>] (snd_pcm_ioctl) from [<c029ff94>] (ksys_ioctl+0x2ac/0xb98)
> > > [ 639.035812] [<c029ff94>] (ksys_ioctl) from [<c0100080>] (ret_fast_syscall+0x0/0x28)
> > > [ 639.043490] Exception stack(0xec529fa8 to 0xec529ff0)
> > > [ 639.048565] 9fa0: bee84650 01321870 00000004 c25c4111 bee84650 0002000f
> > > [ 639.056766] 9fc0: bee84650 01321870 01321820 00000036 00001f40 00000000 0002c2f8 00000003
> > > [ 639.064964] 9fe0: b6f483fc bee8451c b6ee2655 b6e1dcf8
> > > [ 639.070339] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
> > >
> > > Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
> > > ---
> > > sound/soc/fsl/imx-pcm-dma.c | 173 +++++++++++++++++++++++++++++++++---
> > > 1 file changed, 159 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/sound/soc/fsl/imx-pcm-dma.c b/sound/soc/fsl/imx-pcm-dma.c
> > > index 04a9bc749016..dae53b384df4 100644
> > > --- a/sound/soc/fsl/imx-pcm-dma.c
> > > +++ b/sound/soc/fsl/imx-pcm-dma.c
> > > @@ -11,6 +11,7 @@
> > > #include <linux/dmaengine.h>
> > > #include <linux/types.h>
> > > #include <linux/module.h>
> > > +#include <linux/dma-mapping.h>
> > >
> > > #include <sound/core.h>
> > > #include <sound/pcm.h>
> > > @@ -29,24 +30,168 @@ static bool filter(struct dma_chan *chan, void *param)
> > > return true;
> > > }
> > >
> > > -static const struct snd_dmaengine_pcm_config imx_dmaengine_pcm_config = {
> > > - .prepare_slave_config = snd_dmaengine_pcm_prepare_slave_config,
> > > - .compat_filter_fn = filter,
> > > -};
> > > +static int imx_pcm_hw_params(struct snd_soc_component *component,
> > > + struct snd_pcm_substream *substream,
> > > + struct snd_pcm_hw_params *params)
> > > +{
> > > + struct snd_pcm_runtime *runtime = substream->runtime;
> > > + struct snd_soc_pcm_runtime *rtd = substream->private_data;
> > > + struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0);
> > > + struct snd_dmaengine_dai_dma_data *dma_data;
> > > + struct dma_slave_config config;
> > > + struct dma_chan *chan;
> > > + int ret = 0;
> > >
> > > -int imx_pcm_dma_init(struct platform_device *pdev, size_t size)
> > > + snd_pcm_set_runtime_buffer(substream, &substream->dma_buffer);
> > > + runtime->dma_bytes = params_buffer_bytes(params);
> > > +
> > > + chan = snd_dmaengine_pcm_get_chan(substream);
> > > + if (!chan)
> > > + return -EINVAL;
> > > +
> > > + ret = snd_hwparams_to_dma_slave_config(substream, params, &config);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + dma_data = snd_soc_dai_get_dma_data(cpu_dai, substream);
> > > + if (!dma_data)
> > > + return -EINVAL;
> > > +
> > > + snd_dmaengine_pcm_set_config_from_dai_data(substream,
> > > + dma_data,
> > > + &config);
> > > + return dmaengine_slave_config(chan, &config);
> > > +}
> > > +
> > > +static int imx_pcm_hw_free(struct snd_soc_component *component,
> > > + struct snd_pcm_substream *substream)
> > > {
> > > - struct snd_dmaengine_pcm_config *config;
> > > + snd_pcm_set_runtime_buffer(substream, NULL);
> > > + return 0;
> > > +}
> > > +
> > > +static snd_pcm_uframes_t imx_pcm_pointer(struct snd_soc_component *component,
> > > + struct snd_pcm_substream *substream)
> > > +{
> > > + return snd_dmaengine_pcm_pointer(substream);
> > > +}
> > > +
> > > +static int imx_pcm_trigger(struct snd_soc_component *component,
> > > + struct snd_pcm_substream *substream, int cmd)
> > > +{
> > > + return snd_dmaengine_pcm_trigger(substream, cmd);
> > > +}
> > > +
> > > +static int imx_pcm_open(struct snd_soc_component *component,
> > > + struct snd_pcm_substream *substream)
> > > +{
> > > + struct snd_soc_pcm_runtime *rtd = substream->private_data;
> > > + bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
> > > + struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0);
> > > + struct snd_dmaengine_dai_dma_data *dma_data;
> > > + struct device *dev = component->dev;
> > > + struct snd_pcm_hardware hw;
> > > + struct dma_chan *chan;
> > > + int ret;
> > > +
> > > + ret = snd_pcm_hw_constraint_integer(substream->runtime,
> > > + SNDRV_PCM_HW_PARAM_PERIODS);
> > > + if (ret < 0) {
> > > + dev_err(dev, "failed to set pcm hw params periods\n");
> > > + return ret;
> > > + }
> > > +
> > > + dma_data = snd_soc_dai_get_dma_data(cpu_dai, substream);
> > > + if (!dma_data)
> > > + return -EINVAL;
> > > +
> > > + chan = dma_request_slave_channel(cpu_dai->dev, tx ? "tx" : "rx");
> > > + if (!chan) {
> > > + /* Try to request channel using compat_filter_fn */
> > > + chan = snd_dmaengine_pcm_request_channel(filter,
> > > + dma_data->filter_data);
> > > + if (!chan)
> > > + return -ENXIO;
> > > + }
> > >
> > > - config = devm_kzalloc(&pdev->dev,
> > > - sizeof(struct snd_dmaengine_pcm_config), GFP_KERNEL);
> > > - if (!config)
> > > - return -ENOMEM;
> > > - *config = imx_dmaengine_pcm_config;
> > > + ret = snd_dmaengine_pcm_open(substream, chan);
> > > + if (ret)
> > > + goto pcm_open_fail;
> > >
> > > - return devm_snd_dmaengine_pcm_register(&pdev->dev,
> > > - config,
> > > - SND_DMAENGINE_PCM_FLAG_COMPAT);
> > > + memset(&hw, 0, sizeof(hw));
> > > + hw.info = SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_MMAP_VALID |
> > > + SNDRV_PCM_INFO_INTERLEAVED;
> > > + hw.periods_min = 2;
> > > + hw.periods_max = UINT_MAX;
> > > + hw.period_bytes_min = 256;
> > > + hw.period_bytes_max = dma_get_max_seg_size(chan->device->dev);
> > > + hw.buffer_bytes_max = IMX_DEFAULT_DMABUF_SIZE;
> > > + hw.fifo_size = dma_data->fifo_size;
> > > +
> > > + /* Refine the hw according to caps of DMA. */
> > > + ret = snd_dmaengine_pcm_refine_runtime_hwparams(substream,
> > > + dma_data,
> > > + &hw,
> > > + chan);
> > > + if (ret < 0)
> > > + goto refine_runtime_hwparams_fail;
> > > +
> > > + snd_soc_set_runtime_hwparams(substream, &hw);
> > > +
> > > + /* Support allocate memory from IRAM */
> > > + ret = snd_dma_alloc_pages(SNDRV_DMA_TYPE_DEV_IRAM,
> > > + chan->device->dev,
> > > + hw.buffer_bytes_max,
> > > + &substream->dma_buffer);
> > > + if (ret < 0)
> > > + goto alloc_pagas_fail;
> > > +
> > > + return 0;
> > > +
> > > +alloc_pagas_fail:
> > > +refine_runtime_hwparams_fail:
> > > + snd_dmaengine_pcm_close(substream);
> > > +pcm_open_fail:
> > > + dma_release_channel(chan);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static int imx_pcm_close(struct snd_soc_component *component,
> > > + struct snd_pcm_substream *substream)
> > > +{
> > > + if (substream) {
> > > + snd_dma_free_pages(&substream->dma_buffer);
> > > + substream->dma_buffer.area = NULL;
> > > + substream->dma_buffer.addr = 0;
> > > + }
> > > +
> > > + return snd_dmaengine_pcm_close_release_chan(substream);
> > > +}
> > > +
> > > +static int imx_pcm_new(struct snd_soc_component *component,
> > > + struct snd_soc_pcm_runtime *rtd)
> > > +{
> > > + struct snd_card *card = rtd->card->snd_card;
> > > +
> > > + return dma_coerce_mask_and_coherent(card->dev, DMA_BIT_MASK(32));
> > > +}
> > > +
> > > +static const struct snd_soc_component_driver imx_pcm_component = {
> > > + .name = "imx-pcm-dma",
> > > + .pcm_construct = imx_pcm_new,
> > > + .open = imx_pcm_open,
> > > + .close = imx_pcm_close,
> > > + .hw_params = imx_pcm_hw_params,
> > > + .hw_free = imx_pcm_hw_free,
> > > + .trigger = imx_pcm_trigger,
> > > + .pointer = imx_pcm_pointer,
> > > +};
> > > +
> > > +int imx_pcm_dma_init(struct platform_device *pdev, size_t size)
> > > +{
> > > + return devm_snd_soc_register_component(&pdev->dev,
> > > + &imx_pcm_component, NULL, 0);
> > > }
> > > EXPORT_SYMBOL_GPL(imx_pcm_dma_init);
> > >
^ permalink raw reply
* Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open
From: Jiri Slaby @ 2020-05-20 9:38 UTC (permalink / raw)
To: rananta, Greg KH; +Cc: andrew, linuxppc-dev, linux-kernel
In-Reply-To: <0ab0b49f19b824ac1c4db4c4937ed388@codeaurora.org>
[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]
On 15. 05. 20, 1:22, rananta@codeaurora.org wrote:
> On 2020-05-13 00:04, Greg KH wrote:
>> On Tue, May 12, 2020 at 02:39:50PM -0700, rananta@codeaurora.org wrote:
>>> On 2020-05-12 01:25, Greg KH wrote:
>>> > On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote:
>>> > > commit bdb498c20040616e94b05c31a0ceb3e134b7e829
>>> > > Author: Jiri Slaby <jslaby@suse.cz>
>>> > > Date: Tue Aug 7 21:48:04 2012 +0200
>>> > >
>>> > > TTY: hvc_console, add tty install
>>> > >
>>> > > added hvc_install but did not move 'tty->driver_data = NULL;' from
>>> > > hvc_open's fail path to hvc_cleanup.
>>> > >
>>> > > IOW hvc_open now NULLs tty->driver_data even for another task which
>>> > > opened the tty earlier. The same holds for
>>> > > "tty_port_tty_set(&hp->port,
>>> > > NULL);" there. And actually "tty_port_put(&hp->port);" is also
>>> > > incorrect
>>> > > for the 2nd task opening the tty.
>>> > >
...
> These are the traces you get when the issue happens:
> [ 154.212291] hvc_install called for pid: 666
> [ 154.216705] hvc_open called for pid: 666
> [ 154.233657] hvc_open: request_irq failed with rc -22.
> [ 154.238934] hvc_open called for pid: 678
> [ 154.243012] Unable to handle kernel NULL pointer dereference at
> virtual address 00000000000000c4
> # hvc_install isn't called for pid: 678 as the file wasn't closed yet.
Nice. Does the attached help?
I wonder how comes the tty_port_put in hvc_open does not cause a UAF? I
would say hvc_open fails, tty_port_put is called. It decrements the
reference taken in hvc_install. So far so good.
Now, this should happen IMO:
tty_open
-> hvc_open (fails)
-> tty_port_put
-> tty_release
-> tty_release_struct
-> tty_kref_put
-> queue_release_one_tty
SCHEDULED WORKQUEUE
release_one_tty
-> hvc_cleanup
-> tty_port_put (should die terrible death now)
What am I missing?
thanks,
--
js
suse labs
[-- Attachment #2: 0001-hvc_console-fix-open.patch --]
[-- Type: text/x-patch, Size: 2381 bytes --]
From d891cdfcbd3b41eb23ddfc8d9e6cbe038ff8fb72 Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby@suse.cz>
Date: Wed, 20 May 2020 11:29:25 +0200
Subject: [PATCH] hvc_console: fix open
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
drivers/tty/hvc/hvc_console.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)
diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 436cc51c92c3..cdcc64ea2554 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -371,15 +371,14 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
* tty fields and return the kref reference.
*/
if (rc) {
- tty_port_tty_set(&hp->port, NULL);
- tty->driver_data = NULL;
- tty_port_put(&hp->port);
printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
- } else
+ } else {
/* We are ready... raise DTR/RTS */
if (C_BAUD(tty))
if (hp->ops->dtr_rts)
hp->ops->dtr_rts(hp, 1);
+ tty_port_set_initialized(&hp->port, true);
+ }
/* Force wakeup of the polling thread */
hvc_kick();
@@ -389,22 +388,12 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
static void hvc_close(struct tty_struct *tty, struct file * filp)
{
- struct hvc_struct *hp;
+ struct hvc_struct *hp = tty->driver_data;
unsigned long flags;
if (tty_hung_up_p(filp))
return;
- /*
- * No driver_data means that this close was issued after a failed
- * hvc_open by the tty layer's release_dev() function and we can just
- * exit cleanly because the kref reference wasn't made.
- */
- if (!tty->driver_data)
- return;
-
- hp = tty->driver_data;
-
spin_lock_irqsave(&hp->port.lock, flags);
if (--hp->port.count == 0) {
@@ -412,6 +401,9 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
/* We are done with the tty pointer now. */
tty_port_tty_set(&hp->port, NULL);
+ if (!tty_port_initialized(&hp->port))
+ return;
+
if (C_HUPCL(tty))
if (hp->ops->dtr_rts)
hp->ops->dtr_rts(hp, 0);
@@ -428,6 +420,7 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
* waking periodically to check chars_in_buffer().
*/
tty_wait_until_sent(tty, HVC_CLOSE_WAIT);
+ tty_port_set_initialized(&hp->port, false);
} else {
if (hp->port.count < 0)
printk(KERN_ERR "hvc_close %X: oops, count is %d\n",
--
2.26.2
^ permalink raw reply related
* [v2 1/2] dts: ppc: t4240rdb: remove interrupts property
From: Biwen Li @ 2020-05-20 9:15 UTC (permalink / raw)
To: leoyang.li, robh+dt, mpe, benh, a.zummo, alexandre.belloni
Cc: linux-rtc, devicetree, linuxppc-dev, linux-kernel, Biwen Li
From: Biwen Li <biwen.li@nxp.com>
This removes interrupts property to drop warning as follows:
- $ hwclock.util-linux
hwclock.util-linux: select() to /dev/rtc0
to wait for clock tick timed out
My case:
- RTC ds1374's INT pin is connected to VCC on T4240RDB,
then the RTC cannot inform cpu about the alarm interrupt
Signed-off-by: Biwen Li <biwen.li@nxp.com>
---
arch/powerpc/boot/dts/fsl/t4240rdb.dts | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
index a56a705d41f7..145896f2eef6 100644
--- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
@@ -144,7 +144,6 @@
rtc@68 {
compatible = "dallas,ds1374";
reg = <0x68>;
- interrupts = <0x1 0x1 0 0>;
};
};
--
2.17.1
^ permalink raw reply related
* [v2 2/2] dts: ppc: t1024rdb: remove interrupts property
From: Biwen Li @ 2020-05-20 9:15 UTC (permalink / raw)
To: leoyang.li, robh+dt, mpe, benh, a.zummo, alexandre.belloni
Cc: linux-rtc, devicetree, linuxppc-dev, linux-kernel, Biwen Li
In-Reply-To: <20200520091543.44692-1-biwen.li@oss.nxp.com>
From: Biwen Li <biwen.li@nxp.com>
This removes interrupts property to drop warning as follows:
- $ hwclock.util-linux
hwclock.util-linux: select() to /dev/rtc0
to wait for clock tick timed out
My case:
- RTC ds1339s INT pin isn't connected to cpus INT pin on T1024RDB,
then the RTC cannot inform cpu about alarm interrupt
How to fix it?
- remove IRQ line
Signed-off-by: Biwen Li <biwen.li@nxp.com>
---
arch/powerpc/boot/dts/fsl/t1024rdb.dts | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/powerpc/boot/dts/fsl/t1024rdb.dts b/arch/powerpc/boot/dts/fsl/t1024rdb.dts
index 645caff98ed1..605ceec66af3 100644
--- a/arch/powerpc/boot/dts/fsl/t1024rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t1024rdb.dts
@@ -161,7 +161,6 @@
rtc@68 {
compatible = "dallas,ds1339";
reg = <0x68>;
- interrupts = <0x1 0x1 0 0>;
};
};
--
2.17.1
^ permalink raw reply related
* [PATCH -next] scsi: ibmvscsi: Make some functions static
From: Chen Tao @ 2020-05-20 9:10 UTC (permalink / raw)
To: jejb, martin.petersen
Cc: tyreld, linux-scsi, linux-kernel, chentao107, paulus,
linuxppc-dev
Fix the following warning:
drivers/scsi/ibmvscsi/ibmvscsi.c:2387:12: warning: symbol
'ibmvscsi_module_init' was not declared. Should it be static?
drivers/scsi/ibmvscsi/ibmvscsi.c:2409:13: warning: symbol
'ibmvscsi_module_exit' was not declared. Should it be static?
Signed-off-by: Chen Tao <chentao107@huawei.com>
---
drivers/scsi/ibmvscsi/ibmvscsi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 59f0f1030c54..44e64aa21194 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2384,7 +2384,7 @@ static struct vio_driver ibmvscsi_driver = {
static struct srp_function_template ibmvscsi_transport_functions = {
};
-int __init ibmvscsi_module_init(void)
+static int __init ibmvscsi_module_init(void)
{
int ret;
@@ -2406,7 +2406,7 @@ int __init ibmvscsi_module_init(void)
return ret;
}
-void __exit ibmvscsi_module_exit(void)
+static void __exit ibmvscsi_module_exit(void)
{
vio_unregister_driver(&ibmvscsi_driver);
srp_release_transport(ibmvscsi_transport_template);
--
2.22.0
^ permalink raw reply related
* Re: [PATCH v2] tty: hvc: Fix data abort due to race in hvc_open
From: Jiri Slaby @ 2020-05-20 8:59 UTC (permalink / raw)
To: Raghavendra Rao Ananta, gregkh, andrew; +Cc: linuxppc-dev, linux-kernel, stable
In-Reply-To: <20200520064708.24278-1-rananta@codeaurora.org>
On 20. 05. 20, 8:47, Raghavendra Rao Ananta wrote:
> Potentially, hvc_open() can be called in parallel when two tasks calls
> open() on /dev/hvcX. In such a scenario, if the hp->ops->notifier_add()
> callback in the function fails, where it sets the tty->driver_data to
> NULL, the parallel hvc_open() can see this NULL and cause a memory abort.
> Hence, do a NULL check at the beginning, before proceeding ahead.
>
> The issue can be easily reproduced by launching two tasks simultaneously
> that does an open() call on /dev/hvcX.
> For example:
> $ cat /dev/hvc0 & cat /dev/hvc0 &
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Raghavendra Rao Ananta <rananta@codeaurora.org>
> ---
> drivers/tty/hvc/hvc_console.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 436cc51c92c3..80709f754cc8 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -350,6 +350,9 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
> unsigned long flags;
> int rc = 0;
>
> + if (!hp)
> + return -ENODEV;
> +
This is still not fixing the bug properly. See:
https://lore.kernel.org/linuxppc-dev/0f7791f5-0a53-59f6-7277-247a789f30c2@suse.cz/
In particular, the paragraph starting "IOW".
thanks,
--
js
suse labs
^ permalink raw reply
* [PATCH v4 6/6] asm-generic/tlb: avoid potential double flush
From: Santosh Sivaraj @ 2020-05-20 8:30 UTC (permalink / raw)
To: stable, linuxppc-dev
Cc: Sasha Levin, Santosh Sivaraj, Peter Zijlstra, Greg KH,
Aneesh Kumar K . V
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>
From: Peter Zijlstra <peterz@infradead.org>
commit 0758cd8304942292e95a0f750c374533db378b32 upstream
Aneesh reported that:
tlb_flush_mmu()
tlb_flush_mmu_tlbonly()
tlb_flush() <-- #1
tlb_flush_mmu_free()
tlb_table_flush()
tlb_table_invalidate()
tlb_flush_mmu_tlbonly()
tlb_flush() <-- #2
does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not
clear tlb->end in that case.
Observe that any caller to __tlb_adjust_range() also sets at least one of
the tlb->freed_tables || tlb->cleared_p* bits, and those are
unconditionally cleared by __tlb_reset_range().
Change the condition for actually issuing TLBI to having one of those bits
set, as opposed to having tlb->end != 0.
Link: http://lkml.kernel.org/r/20200116064531.483522-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: backported to 4.19 stable]
---
include/asm-generic/tlb.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 19934cdd143e..427a70c56ddd 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -179,7 +179,12 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
{
- if (!tlb->end)
+ /*
+ * Anything calling __tlb_adjust_range() also sets at least one of
+ * these bits.
+ */
+ if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
+ tlb->cleared_puds || tlb->cleared_p4ds))
return;
tlb_flush(tlb);
--
2.25.4
^ permalink raw reply related
* [PATCH v4 5/6] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
From: Santosh Sivaraj @ 2020-05-20 8:30 UTC (permalink / raw)
To: stable, linuxppc-dev
Cc: Sasha Levin, Santosh Sivaraj, Peter Zijlstra, Greg KH,
Aneesh Kumar K . V
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>
From: Peter Zijlstra <peterz@infradead.org>
commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream
Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI. This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table. With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.
More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")
The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE. The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate. Hence we define tlb_needs_table_invalidate to
false for sparc architecture.
Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: backported to 4.19 stable]
---
arch/Kconfig | 3 ---
arch/powerpc/Kconfig | 1 -
arch/powerpc/include/asm/tlb.h | 11 +++++++++++
arch/sparc/Kconfig | 1 -
arch/sparc/include/asm/tlb_64.h | 9 +++++++++
include/asm-generic/tlb.h | 15 +++++++++++++++
mm/memory.c | 16 ++++++++--------
7 files changed, 43 insertions(+), 13 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 061a12b8140e..3abbdb0cea44 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,9 +363,6 @@ config HAVE_ARCH_JUMP_LABEL
config HAVE_RCU_TABLE_FREE
bool
-config HAVE_RCU_TABLE_NO_INVALIDATE
- bool
-
config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1a00ce4b0040..e5bc0cfea2b1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,7 +216,6 @@ config PPC
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
- select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index f0e571b2dc7c..63418275f402 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,17 @@
#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
extern void tlb_flush(struct mmu_gather *tlb);
+/*
+ * book3s:
+ * Hash does not use the linux page-tables, so we can avoid
+ * the TLB invalidate for page-table freeing, Radix otoh does use the
+ * page-tables and needs the TLBI.
+ *
+ * nohash:
+ * We still do TLB invalidate in the __pte_free_tlb routine before we
+ * add the page table pages to mmu gather table batch.
+ */
+#define tlb_needs_table_invalidate() radix_enabled()
/* Get the generic bits... */
#include <asm-generic/tlb.h>
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index d90d632868aa..e6f2a38d2e61 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,7 +64,6 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select HAVE_RCU_TABLE_FREE if SMP
- select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_DYNAMIC_FTRACE
diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
index a2f3fa61ee36..8cb8f3833239 100644
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -28,6 +28,15 @@ void flush_tlb_pending(void);
#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
#define tlb_flush(tlb) flush_tlb_pending()
+/*
+ * SPARC64's hardware TLB fill does not use the Linux page-tables
+ * and therefore we don't need a TLBI when freeing page-table pages.
+ */
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_needs_table_invalidate() (false)
+#endif
+
#include <asm-generic/tlb.h>
#endif /* _SPARC64_TLB_H */
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index f2b9dc9cbaf8..19934cdd143e 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -61,8 +61,23 @@ struct mmu_table_batch {
extern void tlb_table_flush(struct mmu_gather *tlb);
extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+/*
+ * This allows an architecture that does not use the linux page-tables for
+ * hardware to skip the TLBI when freeing page tables.
+ */
+#ifndef tlb_needs_table_invalidate
+#define tlb_needs_table_invalidate() (true)
#endif
+#else
+
+#ifdef tlb_needs_table_invalidate
+#error tlb_needs_table_invalidate() requires HAVE_RCU_TABLE_FREE
+#endif
+
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
+
/*
* If we can't allocate a page to make a big batch of page pointers
* to work on, then just handle a few from the on-stack structure.
diff --git a/mm/memory.c b/mm/memory.c
index ba5689610c04..7daa7ae1b046 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -327,14 +327,14 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_
*/
static inline void tlb_table_invalidate(struct mmu_gather *tlb)
{
-#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
- /*
- * Invalidate page-table caches used by hardware walkers. Then we still
- * need to RCU-sched wait while freeing the pages because software
- * walkers can still be in-flight.
- */
- tlb_flush_mmu_tlbonly(tlb);
-#endif
+ if (tlb_needs_table_invalidate()) {
+ /*
+ * Invalidate page-table caches used by hardware walkers. Then
+ * we still need to RCU-sched wait while freeing the pages
+ * because software walkers can still be in-flight.
+ */
+ tlb_flush_mmu_tlbonly(tlb);
+ }
}
static void tlb_remove_table_smp_sync(void *arg)
--
2.25.4
^ permalink raw reply related
* [PATCH v4 4/6] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
From: Santosh Sivaraj @ 2020-05-20 8:30 UTC (permalink / raw)
To: stable, linuxppc-dev
Cc: Sasha Levin, Santosh Sivaraj, Aneesh Kumar K.V, Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
commit 12e4d53f3f04e81f9e83d6fc10edc7314ab9f6b9 upstream
Patch series "Fixup page directory freeing", v4.
This is a repost of patch series from Peter with the arch specific changes
except ppc64 dropped. ppc64 changes are added here because we are redoing
the patch series on top of ppc64 changes. This makes it easy to backport
these changes. Only the first 2 patches need to be backported to stable.
The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:
1) unhook page/directory
2) TLB invalidate
3) free page/directory
Without this, any concurrent page-table walk could end up with a
Use-after-Free. This is esp. trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie. caches page directories).
Even on UP this might give issues since mmu_gather is preemptible these
days. An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.
This patch series fixes ppc64 and add generic MMU_GATHER changes to
support the conversion of other architectures. I haven't added patches
w.r.t other architecture because they are yet to be acked.
This patch (of 9):
A followup patch is going to make sure we correctly invalidate page walk
cache before we free page table pages. In order to keep things simple
enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
!SMP case differently in the followup patch
!SMP case is right now broken for radix translation w.r.t page walk
cache flush. We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already. Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."
Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: backported for 4.19 stable]
---
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 --------
arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 --
arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 --------
arch/powerpc/mm/pgtable-book3s64.c | 7 -------
5 files changed, 1 insertion(+), 26 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e09cfb109b8c..1a00ce4b0040 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -215,7 +215,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
- select HAVE_RCU_TABLE_FREE if SMP
+ select HAVE_RCU_TABLE_FREE
select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 82e44b1a00ae..79ba3fbb512e 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -110,7 +110,6 @@ static inline void pgtable_free(void *table, unsigned index_size)
#define check_pgt_cache() do { } while (0)
#define get_hugepd_cache_index(x) (x)
-#ifdef CONFIG_SMP
static inline void pgtable_free_tlb(struct mmu_gather *tlb,
void *table, int shift)
{
@@ -127,13 +126,6 @@ static inline void __tlb_remove_table(void *_table)
pgtable_free(table, shift);
}
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
- void *table, int shift)
-{
- pgtable_free(table, shift);
-}
-#endif
static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
unsigned long address)
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f9019b579903..1013c0214213 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -47,9 +47,7 @@ extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
extern void pte_fragment_free(unsigned long *, int);
extern void pmd_fragment_free(unsigned long *);
extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
-#ifdef CONFIG_SMP
extern void __tlb_remove_table(void *_table);
-#endif
static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
{
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 8825953c225b..96eed46d5684 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -111,7 +111,6 @@ static inline void pgtable_free(void *table, unsigned index_size)
#define check_pgt_cache() do { } while (0)
#define get_hugepd_cache_index(x) (x)
-#ifdef CONFIG_SMP
static inline void pgtable_free_tlb(struct mmu_gather *tlb,
void *table, int shift)
{
@@ -128,13 +127,6 @@ static inline void __tlb_remove_table(void *_table)
pgtable_free(table, shift);
}
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
- void *table, int shift)
-{
- pgtable_free(table, shift);
-}
-#endif
static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
unsigned long address)
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 297db665d953..5b4e9fd8990c 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -432,7 +432,6 @@ static inline void pgtable_free(void *table, int index)
}
}
-#ifdef CONFIG_SMP
void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int index)
{
unsigned long pgf = (unsigned long)table;
@@ -449,12 +448,6 @@ void __tlb_remove_table(void *_table)
return pgtable_free(table, index);
}
-#else
-void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int index)
-{
- return pgtable_free(table, index);
-}
-#endif
#ifdef CONFIG_PROC_FS
atomic_long_t direct_pages_count[MMU_PAGE_COUNT];
--
2.25.4
^ permalink raw reply related
* [PATCH v4 3/6] asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
From: Santosh Sivaraj @ 2020-05-20 8:30 UTC (permalink / raw)
To: stable, linuxppc-dev
Cc: Sasha Levin, Peter Zijlstra, Santosh Sivaraj, Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>
From: Peter Zijlstra <peterz@infradead.org>
commit 96bc9567cbe112e9320250f01b9c060c882e8619 upstream
Make issuing a TLB invalidate for page-table pages the normal case.
The reason is twofold:
- too many invalidates is safer than too few,
- most architectures use the linux page-tables natively
and would thus require this.
Make it an opt-out, instead of an opt-in.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: prerequisite for upcoming tlbflush backports]
---
arch/Kconfig | 2 +-
arch/powerpc/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/x86/Kconfig | 1 -
mm/memory.c | 2 +-
5 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index a336548487e6..061a12b8140e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,7 +363,7 @@ config HAVE_ARCH_JUMP_LABEL
config HAVE_RCU_TABLE_FREE
bool
-config HAVE_RCU_TABLE_INVALIDATE
+config HAVE_RCU_TABLE_NO_INVALIDATE
bool
config ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f475dc5829b..e09cfb109b8c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE if SMP
+ select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e6f2a38d2e61..d90d632868aa 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,6 +64,7 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select HAVE_RCU_TABLE_FREE if SMP
+ select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_DYNAMIC_FTRACE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index af35f5caadbe..181d0d522977 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,7 +181,6 @@ config X86
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE if PARAVIRT
- select HAVE_RCU_TABLE_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
diff --git a/mm/memory.c b/mm/memory.c
index 1832c5ed6ac0..ba5689610c04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -327,7 +327,7 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_
*/
static inline void tlb_table_invalidate(struct mmu_gather *tlb)
{
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
/*
* Invalidate page-table caches used by hardware walkers. Then we still
* need to RCU-sched wait while freeing the pages because software
--
2.25.4
^ permalink raw reply related
* [PATCH v4 2/6] asm-generic/tlb: Track which levels of the page tables have been cleared
From: Santosh Sivaraj @ 2020-05-20 8:30 UTC (permalink / raw)
To: stable, linuxppc-dev; +Cc: Sasha Levin, Santosh Sivaraj, Will Deacon, Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>
From: Will Deacon <will.deacon@arm.com>
commit a6d60245d6d9b1caf66b0d94419988c4836980af upstream
It is common for architectures with hugepage support to require only a
single TLB invalidation operation per hugepage during unmap(), rather than
iterating through the mapping at a PAGE_SIZE increment. Currently,
however, the level in the page table where the unmap() operation occurs
is not stored in the mmu_gather structure, therefore forcing
architectures to issue additional TLB invalidation operations or to give
up and over-invalidate by e.g. invalidating the entire TLB.
Ideally, we could add an interval rbtree to the mmu_gather structure,
which would allow us to associate the correct mapping granule with the
various sub-mappings within the range being invalidated. However, this
is costly in terms of book-keeping and memory management, so instead we
approximate by keeping track of the page table levels that are cleared
and provide a means to query the smallest granule required for invalidation.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: prerequisite for upcoming tlbflush backports]
---
include/asm-generic/tlb.h | 58 +++++++++++++++++++++++++++++++++------
mm/memory.c | 4 ++-
2 files changed, 53 insertions(+), 9 deletions(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 97306b32d8d2..f2b9dc9cbaf8 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -114,6 +114,14 @@ struct mmu_gather {
*/
unsigned int freed_tables : 1;
+ /*
+ * at which levels have we cleared entries?
+ */
+ unsigned int cleared_ptes : 1;
+ unsigned int cleared_pmds : 1;
+ unsigned int cleared_puds : 1;
+ unsigned int cleared_p4ds : 1;
+
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
struct page *__pages[MMU_GATHER_BUNDLE];
@@ -148,6 +156,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
tlb->end = 0;
}
tlb->freed_tables = 0;
+ tlb->cleared_ptes = 0;
+ tlb->cleared_pmds = 0;
+ tlb->cleared_puds = 0;
+ tlb->cleared_p4ds = 0;
}
static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -197,6 +209,25 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
}
#endif
+static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
+{
+ if (tlb->cleared_ptes)
+ return PAGE_SHIFT;
+ if (tlb->cleared_pmds)
+ return PMD_SHIFT;
+ if (tlb->cleared_puds)
+ return PUD_SHIFT;
+ if (tlb->cleared_p4ds)
+ return P4D_SHIFT;
+
+ return PAGE_SHIFT;
+}
+
+static inline unsigned long tlb_get_unmap_size(struct mmu_gather *tlb)
+{
+ return 1UL << tlb_get_unmap_shift(tlb);
+}
+
/*
* In the case of tlb vma handling, we can optimise these away in the
* case where we're doing a full MM flush. When we're doing a munmap,
@@ -230,13 +261,19 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define tlb_remove_tlb_entry(tlb, ptep, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
+ tlb->cleared_ptes = 1; \
__tlb_remove_tlb_entry(tlb, ptep, address); \
} while (0)
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \
- do { \
- __tlb_adjust_range(tlb, address, huge_page_size(h)); \
- __tlb_remove_tlb_entry(tlb, ptep, address); \
+#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \
+ do { \
+ unsigned long _sz = huge_page_size(h); \
+ __tlb_adjust_range(tlb, address, _sz); \
+ if (_sz == PMD_SIZE) \
+ tlb->cleared_pmds = 1; \
+ else if (_sz == PUD_SIZE) \
+ tlb->cleared_puds = 1; \
+ __tlb_remove_tlb_entry(tlb, ptep, address); \
} while (0)
/**
@@ -250,6 +287,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define tlb_remove_pmd_tlb_entry(tlb, pmdp, address) \
do { \
__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE); \
+ tlb->cleared_pmds = 1; \
__tlb_remove_pmd_tlb_entry(tlb, pmdp, address); \
} while (0)
@@ -264,6 +302,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define tlb_remove_pud_tlb_entry(tlb, pudp, address) \
do { \
__tlb_adjust_range(tlb, address, HPAGE_PUD_SIZE); \
+ tlb->cleared_puds = 1; \
__tlb_remove_pud_tlb_entry(tlb, pudp, address); \
} while (0)
@@ -289,7 +328,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define pte_free_tlb(tlb, ptep, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ tlb->freed_tables = 1; \
+ tlb->cleared_pmds = 1; \
__pte_free_tlb(tlb, ptep, address); \
} while (0)
#endif
@@ -298,7 +338,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define pmd_free_tlb(tlb, pmdp, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ tlb->freed_tables = 1; \
+ tlb->cleared_puds = 1; \
__pmd_free_tlb(tlb, pmdp, address); \
} while (0)
#endif
@@ -308,7 +349,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define pud_free_tlb(tlb, pudp, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ tlb->freed_tables = 1; \
+ tlb->cleared_p4ds = 1; \
__pud_free_tlb(tlb, pudp, address); \
} while (0)
#endif
@@ -319,7 +361,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define p4d_free_tlb(tlb, pudp, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ tlb->freed_tables = 1; \
__p4d_free_tlb(tlb, pudp, address); \
} while (0)
#endif
diff --git a/mm/memory.c b/mm/memory.c
index bbf0cc4066c8..1832c5ed6ac0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -267,8 +267,10 @@ void arch_tlb_finish_mmu(struct mmu_gather *tlb,
{
struct mmu_gather_batch *batch, *next;
- if (force)
+ if (force) {
+ __tlb_reset_range(tlb);
__tlb_adjust_range(tlb, start, end - start);
+ }
tlb_flush_mmu(tlb);
--
2.25.4
^ permalink raw reply related
* [PATCH v4 1/6] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather
From: Santosh Sivaraj @ 2020-05-20 8:30 UTC (permalink / raw)
To: stable, linuxppc-dev
Cc: Sasha Levin, Santosh Sivaraj, Peter Zijlstra, Will Deacon,
Greg KH
In-Reply-To: <20200520083025.229011-1-santosh@fossix.org>
From: Peter Zijlstra <peterz@infradead.org>
commit 22a61c3c4f1379ef8b0ce0d5cb78baf3178950e2 upstream
Some architectures require different TLB invalidation instructions
depending on whether it is only the last-level of page table being
changed, or whether there are also changes to the intermediate
(directory) entries higher up the tree.
Add a new bit to the flags bitfield in struct mmu_gather so that the
architecture code can operate accordingly if it's the intermediate
levels being invalidated.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
[santosh: prerequisite for tlbflush backports]
---
include/asm-generic/tlb.h | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..97306b32d8d2 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,12 +97,22 @@ struct mmu_gather {
#endif
unsigned long start;
unsigned long end;
- /* we are in the middle of an operation to clear
- * a full mm and can make some optimizations */
- unsigned int fullmm : 1,
- /* we have performed an operation which
- * requires a complete flush of the tlb */
- need_flush_all : 1;
+ /*
+ * we are in the middle of an operation to clear
+ * a full mm and can make some optimizations
+ */
+ unsigned int fullmm : 1;
+
+ /*
+ * we have performed an operation which
+ * requires a complete flush of the tlb
+ */
+ unsigned int need_flush_all : 1;
+
+ /*
+ * we have removed page directories
+ */
+ unsigned int freed_tables : 1;
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
@@ -137,6 +147,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
tlb->start = TASK_SIZE;
tlb->end = 0;
}
+ tlb->freed_tables = 0;
}
static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -278,6 +289,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define pte_free_tlb(tlb, ptep, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
+ tlb->freed_tables = 1; \
__pte_free_tlb(tlb, ptep, address); \
} while (0)
#endif
@@ -285,7 +297,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#ifndef pmd_free_tlb
#define pmd_free_tlb(tlb, pmdp, address) \
do { \
- __tlb_adjust_range(tlb, address, PAGE_SIZE); \
+ __tlb_adjust_range(tlb, address, PAGE_SIZE); \
+ tlb->freed_tables = 1; \
__pmd_free_tlb(tlb, pmdp, address); \
} while (0)
#endif
@@ -295,6 +308,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#define pud_free_tlb(tlb, pudp, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
+ tlb->freed_tables = 1; \
__pud_free_tlb(tlb, pudp, address); \
} while (0)
#endif
@@ -304,7 +318,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
#ifndef p4d_free_tlb
#define p4d_free_tlb(tlb, pudp, address) \
do { \
- __tlb_adjust_range(tlb, address, PAGE_SIZE); \
+ __tlb_adjust_range(tlb, address, PAGE_SIZE); \
+ tlb->freed_tables = 1; \
__p4d_free_tlb(tlb, pudp, address); \
} while (0)
#endif
--
2.25.4
^ permalink raw reply related
* [PATCH v4 0/6] Memory corruption may occur due to incorrent tlb flush
From: Santosh Sivaraj @ 2020-05-20 8:30 UTC (permalink / raw)
To: stable, linuxppc-dev; +Cc: Sasha Levin, Santosh Sivaraj, Greg KH
The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
flushes) may result in random memory corruption. Any concurrent page-table walk
could end up with a Use-after-Free. Even on UP this might give issues, since
mmu_gather is preemptible these days. An interrupt or preempted task accessing
user pages might stumble into the free page if the hardware caches page
directories.
The series is a backport of the fix sent by Peter [1].
The first three patches are dependencies for the last patch (avoid potential
double flush). If the performance impact due to double flush is considered
trivial then the first three patches and last patch may be dropped.
This is only for v4.19 stable.
--
Changelog:
v2: Send the patches with the correct format (commit sha1 upstream) for stable
v3: Fix compilation for ppc44x_defconfig and mpc885_ads_defconfig
v4: No change, Resend.
--
Aneesh Kumar K.V (1):
powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
Peter Zijlstra (4):
asm-generic/tlb: Track freeing of page-table directories in struct
mmu_gather
asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
mm/mmu_gather: invalidate TLB correctly on batch allocation failure
and flush
asm-generic/tlb: avoid potential double flush
Will Deacon (1):
asm-generic/tlb: Track which levels of the page tables have been
cleared
arch/Kconfig | 3 -
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 --
arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 -
arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 --
arch/powerpc/include/asm/tlb.h | 11 ++
arch/powerpc/mm/pgtable-book3s64.c | 7 --
arch/sparc/include/asm/tlb_64.h | 9 ++
arch/x86/Kconfig | 1 -
include/asm-generic/tlb.h | 103 ++++++++++++++++---
mm/memory.c | 20 ++--
11 files changed, 122 insertions(+), 52 deletions(-)
--
2.25.4
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox