* [PATCH v2 00/10] powerpc: Modernize unhandled signals message
From: Murilo Opsfelder Araujo @ 2018-07-27 14:58 UTC (permalink / raw)
To: linux-kernel
Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
Eric W . Biederman, Michael Ellerman, Michael Neuling,
Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
Hi, everyone.
This series was inspired by the need to modernize and display more
informative messages about unhandled signals.
The "unhandled signal NN" is not very informative. We thought it would be
helpful adding a human-readable message describing what the signal number
means, printing the VMA address, and dumping the instructions.
We can add more informative messages, like informing what each code of a
SIGSEGV signal means. We are open to suggestions.
Before this series:
pandafault[5815]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00003fff87ff5100 code 2
After this series:
pandafault[10850]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fff9f3e5100 code 2 in pandafault[10000000+10000]
pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
Link to v1:
https://lore.kernel.org/lkml/20180724192720.32417-1-muriloo@linux.ibm.com/
v1..v2:
- Broke patch 7 down into patches 7-9
- Added proper copyright in arch/powerpc/include/asm/stacktrace.h
- show_instructions(): prefixed lines with current->comm and current->pid
Cheers!
Murilo Opsfelder Araujo (10):
powerpc/traps: Print unhandled signals in a separate function
powerpc/traps: Return early in show_signal_msg()
powerpc/reg: Add REG_FMT definition
powerpc/traps: Use REG_FMT in show_signal_msg()
powerpc/traps: Print VMA for unhandled signals
powerpc/traps: Print signal name for unhandled signals
powerpc: Do not call __kernel_text_address() in show_instructions()
powerpc: Add stacktrace.h header
powerpc/traps: Show instructions on exceptions
powerpc/traps: Add line prefix in show_instructions()
arch/powerpc/include/asm/reg.h | 6 +++
arch/powerpc/include/asm/stacktrace.h | 13 +++++
arch/powerpc/kernel/process.c | 35 ++++++-------
arch/powerpc/kernel/traps.c | 73 +++++++++++++++++++++++----
4 files changed, 100 insertions(+), 27 deletions(-)
create mode 100644 arch/powerpc/include/asm/stacktrace.h
--
2.17.1
^ permalink raw reply
* Re: [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references
From: Nicholas Piggin @ 2018-07-27 14:29 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linuxppc-dev, Andrew Morton, Linus Torvalds, Aneesh Kumar K . V,
linux-mm
In-Reply-To: <20180727134156.GA13348@bombadil.infradead.org>
On Fri, 27 Jul 2018 06:41:56 -0700
Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Jul 27, 2018 at 09:48:17PM +1000, Nicholas Piggin wrote:
> > The page table fragment allocator uses the main page refcount racily
> > with respect to speculative references. A customer observed a BUG due
> > to page table page refcount underflow in the fragment allocator. This
> > can be caused by the fragment allocator set_page_count stomping on a
> > speculative reference, and then the speculative failure handler
> > decrements the new reference, and the underflow eventually pops when
> > the page tables are freed.
>
> Oof. Can't you fix this instead by using page_ref_add() instead of
> set_page_count()?
It's ugly doing it that way. The problem is we have a page table
destructor and that would be missed if the spec ref was the last
put. In practice with RCU page table freeing maybe you can say
there will be no spec ref there (unless something changes), but
still it just seems much simpler doing this and avoiding any
complexity or relying on other synchronization.
>
> > Any objection to the struct page change to grab the arch specific
> > page table page word for powerpc to use? If not, then this should
> > go via powerpc tree because it's inconsequential for core mm.
>
> I want (eventually) to get to the point where every struct page carries
> a pointer to the struct mm that it belongs to. It's good for debugging
> as well as handling memory errors in page tables.
That doesn't seem like it should be a problem, there's some spare
words there for arch independent users.
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE
From: Laurent Dufour @ 2018-07-27 14:10 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
In-Reply-To: <1532697739-4878-1-git-send-email-ldufour@linux.vnet.ibm.com>
Sorry for the noise, I forgot to add CC people in copy of this cover.
A wall new thread has been resent : https://lkml.org/lkml/2018/7/27/651
On 27/07/2018 15:22, Laurent Dufour wrote:
> On very large system we could see soft lockup fired when a process is exiting
>
> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
> NIP: c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
> REGS: c00006b0645b7610 TRAP: 0901 Not tainted (4.17.0)
> MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22042082 XER: 00000000
> CFAR: 00000000006cf8f0 SOFTE: 0
> GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000
> GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480
> GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480
> GPR12: ffffffffffffffff c00000001e690800
> NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
> LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
> Call Trace:
> [c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
> [c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
> [c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
> [c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
> [c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
> [c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
> [c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
> [c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
> [c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
> [c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
> [c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
> Instruction dump:
> 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
> e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
>
> This happens when removing the PTE by calling the hypervisor using the
> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
> tlbie for each PTE it is processing. This could lead to long time spent in
> the hypervisor (sometimes up to 4s) and soft lockup being raised because
> the scheduler is not called in zap_pte_range().
>
> Since the Power7's time, the hypervisor is providing a new hcall
> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
> tlbie. By limiting the amount of tlbie generated, this reduces the time
> spent invalidating the PTEs.
>
> This hcall requires that the pages are "all within the same naturally
> aligned 8 page virtual address block".
>
> With this patch series applied, I couldn't see any soft lockup raised on
> the victim LPAR I was running the test one.
>
> This series is covering both normal pages and huge pages.
>
> Laurent Dufour (3):
> powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
> powerpc/pseries/mm: factorize PTE slot computation
> powerpc/pseries/mm: call H_BLOCK_REMOVE
>
> arch/powerpc/include/asm/firmware.h | 3 +-
> arch/powerpc/include/asm/hvcall.h | 1 +
> arch/powerpc/platforms/pseries/firmware.c | 1 +
> arch/powerpc/platforms/pseries/lpar.c | 250 ++++++++++++++++++++++++++----
> 4 files changed, 228 insertions(+), 27 deletions(-)
>
^ permalink raw reply
* [PATCH] powerpc: Wire up file system mount new syscalls
From: Breno Leitao @ 2018-07-27 13:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: linux-kernel, Breno Leitao, David Howells
Wire up the new file system mount syscalls for Powerpc. The new syscall being
contemplated by this patch are those already available in the linux-next tree:
open_tree(), move_mount(), fsopen(), fsmount(), fspick() and fsinfo().
These system calls were tested with David Howells test case[1], as some others I
wrote[2]. For testing purpose, I also made usage of the new, but yet not
integrated, fsconfig() syscall.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=mount-context&id=2bae6fd63259c5244bd9de1b46cca706c4808438
[2] https://github.com/leitao/mcontext/
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/powerpc/include/asm/systbl.h | 6 ++++++
arch/powerpc/include/asm/unistd.h | 2 +-
arch/powerpc/include/uapi/asm/unistd.h | 6 ++++++
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 01b5171ea189..68d8dc0f8338 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -394,3 +394,9 @@ SYSCALL(pkey_free)
SYSCALL(pkey_mprotect)
SYSCALL(rseq)
COMPAT_SYS(io_pgetevents)
+SYSCALL(open_tree)
+SYSCALL(move_mount)
+SYSCALL(fsopen)
+SYSCALL(fsmount)
+SYSCALL(fspick)
+SYSCALL(fsinfo)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index c19379f0a32e..e6cf3bb0ed2f 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
#include <uapi/asm/unistd.h>
-#define NR_syscalls 389
+#define NR_syscalls 395
#define __NR__exit __NR_exit
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index 985534d0b448..89b450c6a246 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -400,5 +400,11 @@
#define __NR_pkey_mprotect 386
#define __NR_rseq 387
#define __NR_io_pgetevents 388
+#define __NR_open_tree 389
+#define __NR_move_mount 390
+#define __NR_fsopen 391
+#define __NR_fsmount 392
+#define __NR_fspick 393
+#define __NR_fsinfo 394
#endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
--
2.16.3
^ permalink raw reply related
* [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin
In-Reply-To: <1532699493-10883-1-git-send-email-ldufour@linux.vnet.ibm.com>
This feature tells if the hcall H_BLOCK_REMOVE is available.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/firmware.h | 3 ++-
arch/powerpc/platforms/pseries/firmware.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 535add3f7791..360ba197f9d2 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -53,6 +53,7 @@
#define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000)
#define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0000000400000000)
#define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000)
+#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)
#ifndef __ASSEMBLY__
@@ -70,7 +71,7 @@ enum {
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
- FW_FEATURE_DRC_INFO,
+ FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index a3bbeb43689e..1624501386f4 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_SET_MODE, "hcall-set-mode"},
{FW_FEATURE_BEST_ENERGY, "hcall-best-energy-1*"},
{FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
+ {FW_FEATURE_BLOCK_REMOVE, "hcall-block-remove"},
};
/* Build up the firmware features bitmask using the contents of
--
2.7.4
^ permalink raw reply related
* [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin
In-Reply-To: <1532699493-10883-1-git-send-email-ldufour@linux.vnet.ibm.com>
This hypervisor call allows to remove up to 8 ptes with only call to tlbie.
The virtual pages must be all within the same naturally aligned 8 page
virtual address block and have the same page and segment size encodings.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/hvcall.h | 1 +
arch/powerpc/platforms/pseries/lpar.c | 223 +++++++++++++++++++++++++++++++---
2 files changed, 205 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 662c8347d699..e403d574651d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -278,6 +278,7 @@
#define H_COP 0x304
#define H_GET_MPP_X 0x314
#define H_SET_MODE 0x31C
+#define H_BLOCK_REMOVE 0x328
#define H_CLEAR_HPT 0x358
#define H_REQUEST_VMC 0x360
#define H_RESIZE_HPT_PREPARE 0x36C
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 96b8cd8a802d..41ed03245eb4 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
BUG_ON(lpar_rc != H_SUCCESS);
}
+
+/*
+ * As defined in the PAPR's section 14.5.4.1.8
+ * The control mask doesn't include the returned reference and change bit from
+ * the processed PTE.
+ */
+#define HBLKR_AVPN 0x0100000000000000UL
+#define HBLKR_CTRL_MASK 0xf800000000000000UL
+#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL
+#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
+#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
+
+/**
+ * H_BLOCK_REMOVE caller.
+ * @idx should point to the latest @param entry set with a PTEX.
+ * If PTE cannot be processed because another CPUs has already locked that
+ * group, those entries are put back in @param starting at index 1.
+ * If entries has to be retried and @retry_busy is set to true, these entries
+ * are retried until success. If @retry_busy is set to false, the returned
+ * is the number of entries yet to process.
+ */
+static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
+ bool retry_busy)
+{
+ unsigned long i, rc, new_idx;
+ unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+again:
+ new_idx = 0;
+ BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
+ if (idx < PLPAR_HCALL9_BUFSIZE)
+ param[idx] = HBR_END;
+
+ rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
+ param[0], /* AVA */
+ param[1], param[2], param[3], param[4], /* TS0-7 */
+ param[5], param[6], param[7], param[8]);
+ if (rc == H_SUCCESS)
+ return 0;
+
+ BUG_ON(rc != H_PARTIAL);
+
+ /* Check that the unprocessed entries were 'not found' or 'busy' */
+ for (i = 0; i < idx-1; i++) {
+ unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
+
+ if (ctrl == HBLKR_CTRL_ERRBUSY) {
+ param[++new_idx] = param[i+1];
+ continue;
+ }
+
+ BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
+ && ctrl != HBLKR_CTRL_ERRNOTFOUND);
+ }
+
+ /*
+ * If there were entries found busy, retry these entries if requested,
+ * of if all the entries have to be retried.
+ */
+ if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) {
+ idx = new_idx + 1;
+ goto again;
+ }
+
+ return new_idx;
+}
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
* Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -425,17 +492,59 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
*/
#define PPC64_HUGE_HPTE_BATCH 12
-static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
- unsigned long *vpn, int count,
- int psize, int ssize)
+static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn,
+ int count, int psize, int ssize)
{
unsigned long param[PLPAR_HCALL9_BUFSIZE];
- int i = 0, pix = 0, rc;
- unsigned long flags = 0;
- int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+ unsigned long shift, current_vpgb, vpgb;
+ int i, pix = 0;
- if (lock_tlbie)
- spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+ shift = mmu_psize_defs[psize].shift;
+
+ for (i = 0; i < count; i++) {
+ /*
+ * Shifting 3 bits more on the right to get a
+ * 8 pages aligned virtual addresse.
+ */
+ vpgb = (vpn[i] >> (shift - VPN_SHIFT + 3));
+ if (!pix || vpgb != current_vpgb) {
+ /*
+ * Need to start a new 8 pages block, flush
+ * the current one if needed.
+ */
+ if (pix)
+ (void)call_block_remove(pix, param, true);
+ current_vpgb = vpgb;
+ param[0] = hpte_encode_avpn(vpn[i], psize, ssize);
+ pix = 1;
+ }
+
+ param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot[i];
+ if (pix == PLPAR_HCALL9_BUFSIZE) {
+ pix = call_block_remove(pix, param, false);
+ /*
+ * pix = 0 means that all the entries were
+ * removed, we can start a new block.
+ * Otherwise, this means that there are entries
+ * to retry, and pix points to latest one, so
+ * we should increment it and try to continue
+ * the same block.
+ */
+ if (!pix)
+ current_vpgb = 0;
+ else
+ pix++;
+ }
+ }
+ if (pix)
+ (void)call_block_remove(pix, param, true);
+}
+
+static void hugepage_bulk_invalidate(unsigned long *slot, unsigned long *vpn,
+ int count, int psize, int ssize)
+{
+ unsigned long param[PLPAR_HCALL9_BUFSIZE];
+ int i = 0, pix = 0, rc;
for (i = 0; i < count; i++) {
@@ -443,17 +552,6 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize, 0,
ssize, 0);
} else {
- param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
- param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
- pix += 2;
- if (pix == 8) {
- rc = plpar_hcall9(H_BULK_REMOVE, param,
- param[0], param[1], param[2],
- param[3], param[4], param[5],
- param[6], param[7]);
- BUG_ON(rc != H_SUCCESS);
- pix = 0;
- }
}
}
if (pix) {
@@ -463,6 +561,23 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
param[6], param[7]);
BUG_ON(rc != H_SUCCESS);
}
+}
+
+static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
+ unsigned long *vpn,
+ int count, int psize,
+ int ssize)
+{
+ unsigned long flags = 0;
+ int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+
+ if (lock_tlbie)
+ spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+
+ if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+ hugepage_block_invalidate(slot, vpn, count, psize, ssize);
+ else
+ hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
if (lock_tlbie)
spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
@@ -565,6 +680,70 @@ static inline unsigned long compute_slot(real_pte_t pte,
return slot;
}
+/**
+ * The hcall H_BLOCK_REMOVE implies that the virtual pages to processed are
+ * "all within the same naturally aligned 8 page virtual address block".
+ */
+static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
+ unsigned long *param)
+{
+ unsigned long vpn;
+ unsigned long i, pix = 0;
+ unsigned long index, shift, slot, current_vpgb, vpgb;
+ real_pte_t pte;
+ int psize, ssize;
+
+ psize = batch->psize;
+ ssize = batch->ssize;
+
+ for (i = 0; i < number; i++) {
+ vpn = batch->vpn[i];
+ pte = batch->pte[i];
+ pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+ /*
+ * Shifting 3 bits more on the right to get a
+ * 8 pages aligned virtual addresse.
+ */
+ vpgb = (vpn >> (shift - VPN_SHIFT + 3));
+ if (!pix || vpgb != current_vpgb) {
+ /*
+ * Need to start a new 8 pages block, flush
+ * the current one if needed.
+ */
+ if (pix)
+ (void)call_block_remove(pix, param,
+ true);
+ current_vpgb = vpgb;
+ param[0] = hpte_encode_avpn(vpn, psize,
+ ssize);
+ pix = 1;
+ }
+
+ slot = compute_slot(pte, vpn, index, shift, ssize);
+ param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot;
+
+ if (pix == PLPAR_HCALL9_BUFSIZE) {
+ pix = call_block_remove(pix, param, false);
+ /*
+ * pix = 0 means that all the entries were
+ * removed, we can start a new block.
+ * Otherwise, this means that there are entries
+ * to retry, and pix points to latest one, so
+ * we should increment it and try to continue
+ * the same block.
+ */
+ if (!pix)
+ current_vpgb = 0;
+ else
+ pix++;
+ }
+ } pte_iterate_hashed_end();
+ }
+
+ if (pix > 1)
+ (void)call_block_remove(pix, param, true);
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
@@ -584,6 +763,11 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
if (lock_tlbie)
spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+ if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+ do_block_remove(number, batch, param);
+ goto out;
+ }
+
psize = batch->psize;
ssize = batch->ssize;
pix = 0;
@@ -622,6 +806,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
BUG_ON(rc != H_SUCCESS);
}
+out:
if (lock_tlbie)
spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
}
--
2.7.4
^ permalink raw reply related
* [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin
In-Reply-To: <1532699493-10883-1-git-send-email-ldufour@linux.vnet.ibm.com>
This part of code will be called also when dealing with H_BLOCK_REMOVE.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/lpar.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 52eeff1297f4..96b8cd8a802d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
return 0;
}
+
+static inline unsigned long compute_slot(real_pte_t pte,
+ unsigned long vpn,
+ unsigned long index,
+ unsigned long shift,
+ int ssize)
+{
+ unsigned long slot, hash, hidx;
+
+ hash = hpt_hash(vpn, shift, ssize);
+ hidx = __rpte_to_hidx(pte, index);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+ return slot;
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
@@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
unsigned long param[PLPAR_HCALL9_BUFSIZE];
- unsigned long hash, index, shift, hidx, slot;
+ unsigned long index, shift, slot;
real_pte_t pte;
int psize, ssize;
@@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
vpn = batch->vpn[i];
pte = batch->pte[i];
pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
- hash = hpt_hash(vpn, shift, ssize);
- hidx = __rpte_to_hidx(pte, index);
- if (hidx & _PTEIDX_SECONDARY)
- hash = ~hash;
- slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
- slot += hidx & _PTEIDX_GROUP_IX;
+ slot = compute_slot(pte, vpn, index, shift, ssize);
if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
/*
* lpar doesn't use the passed actual page size
--
2.7.4
^ permalink raw reply related
* [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin
[Resending so everyone is getting the cover letter]
On very large system we could see soft lockup fired when a process is exiting
watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
NIP: c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
REGS: c00006b0645b7610 TRAP: 0901 Not tainted (4.17.0)
MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22042082 XER: 00000000
CFAR: 00000000006cf8f0 SOFTE: 0
GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000
GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480
GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480
GPR12: ffffffffffffffff c00000001e690800
NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
Call Trace:
[c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
[c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
[c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
[c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
[c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
[c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
[c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
[c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
[c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
[c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
[c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
Instruction dump:
60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
This happens when removing the PTE by calling the hypervisor using the
H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
tlbie for each PTE it is processing. This could lead to long time spent in
the hypervisor (sometimes up to 4s) and soft lockup being raised because
the scheduler is not called in zap_pte_range().
Since the Power7's time, the hypervisor is providing a new hcall
H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
tlbie. By limiting the amount of tlbie generated, this reduces the time
spent invalidating the PTEs.
This hcall requires that the pages are "all within the same naturally
aligned 8 page virtual address block".
With this patch series applied, I couldn't see any soft lockup raised on
the victim LPAR I was running the test one.
This series is covering both normal pages and huge pages.
Laurent Dufour (3):
powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
powerpc/pseries/mm: factorize PTE slot computation
powerpc/pseries/mm: call H_BLOCK_REMOVE
arch/powerpc/include/asm/firmware.h | 3 +-
arch/powerpc/include/asm/hvcall.h | 1 +
arch/powerpc/platforms/pseries/firmware.c | 1 +
arch/powerpc/platforms/pseries/lpar.c | 250 ++++++++++++++++++++++++++----
4 files changed, 228 insertions(+), 27 deletions(-)
--
2.7.4
^ permalink raw reply
* Re: [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references
From: Matthew Wilcox @ 2018-07-27 13:41 UTC (permalink / raw)
To: Nicholas Piggin
Cc: linuxppc-dev, Andrew Morton, Linus Torvalds, Aneesh Kumar K . V,
linux-mm
In-Reply-To: <20180727114817.27190-1-npiggin@gmail.com>
On Fri, Jul 27, 2018 at 09:48:17PM +1000, Nicholas Piggin wrote:
> The page table fragment allocator uses the main page refcount racily
> with respect to speculative references. A customer observed a BUG due
> to page table page refcount underflow in the fragment allocator. This
> can be caused by the fragment allocator set_page_count stomping on a
> speculative reference, and then the speculative failure handler
> decrements the new reference, and the underflow eventually pops when
> the page tables are freed.
Oof. Can't you fix this instead by using page_ref_add() instead of
set_page_count()?
> Any objection to the struct page change to grab the arch specific
> page table page word for powerpc to use? If not, then this should
> go via powerpc tree because it's inconsequential for core mm.
I want (eventually) to get to the point where every struct page carries
a pointer to the struct mm that it belongs to. It's good for debugging
as well as handling memory errors in page tables.
^ permalink raw reply
* [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE
From: Laurent Dufour @ 2018-07-27 13:22 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
On very large system we could see soft lockup fired when a process is exiting
watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
NIP: c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
REGS: c00006b0645b7610 TRAP: 0901 Not tainted (4.17.0)
MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22042082 XER: 00000000
CFAR: 00000000006cf8f0 SOFTE: 0
GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000
GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480
GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480
GPR12: ffffffffffffffff c00000001e690800
NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
Call Trace:
[c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
[c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
[c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
[c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
[c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
[c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
[c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
[c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
[c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
[c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
[c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
Instruction dump:
60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
This happens when removing the PTE by calling the hypervisor using the
H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
tlbie for each PTE it is processing. This could lead to long time spent in
the hypervisor (sometimes up to 4s) and soft lockup being raised because
the scheduler is not called in zap_pte_range().
Since the Power7's time, the hypervisor is providing a new hcall
H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
tlbie. By limiting the amount of tlbie generated, this reduces the time
spent invalidating the PTEs.
This hcall requires that the pages are "all within the same naturally
aligned 8 page virtual address block".
With this patch series applied, I couldn't see any soft lockup raised on
the victim LPAR I was running the test one.
This series is covering both normal pages and huge pages.
Laurent Dufour (3):
powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
powerpc/pseries/mm: factorize PTE slot computation
powerpc/pseries/mm: call H_BLOCK_REMOVE
arch/powerpc/include/asm/firmware.h | 3 +-
arch/powerpc/include/asm/hvcall.h | 1 +
arch/powerpc/platforms/pseries/firmware.c | 1 +
arch/powerpc/platforms/pseries/lpar.c | 250 ++++++++++++++++++++++++++----
4 files changed, 228 insertions(+), 27 deletions(-)
--
2.7.4
^ permalink raw reply
* [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
From: Laurent Dufour @ 2018-07-27 13:22 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: Aneesh Kumar K.V, Nicholas Piggin, Michael Ellerman,
Paul Mackerras, Benjamin Herrenschmidt
In-Reply-To: <1532697739-4878-1-git-send-email-ldufour@linux.vnet.ibm.com>
This hypervisor call allows to remove up to 8 ptes with only call to tlbie.
The virtual pages must be all within the same naturally aligned 8 page
virtual address block and have the same page and segment size encodings.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/hvcall.h | 1 +
arch/powerpc/platforms/pseries/lpar.c | 223 +++++++++++++++++++++++++++++++---
2 files changed, 205 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 662c8347d699..e403d574651d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -278,6 +278,7 @@
#define H_COP 0x304
#define H_GET_MPP_X 0x314
#define H_SET_MODE 0x31C
+#define H_BLOCK_REMOVE 0x328
#define H_CLEAR_HPT 0x358
#define H_REQUEST_VMC 0x360
#define H_RESIZE_HPT_PREPARE 0x36C
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 96b8cd8a802d..41ed03245eb4 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
BUG_ON(lpar_rc != H_SUCCESS);
}
+
+/*
+ * As defined in the PAPR's section 14.5.4.1.8
+ * The control mask doesn't include the returned reference and change bit from
+ * the processed PTE.
+ */
+#define HBLKR_AVPN 0x0100000000000000UL
+#define HBLKR_CTRL_MASK 0xf800000000000000UL
+#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL
+#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
+#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
+
+/**
+ * H_BLOCK_REMOVE caller.
+ * @idx should point to the latest @param entry set with a PTEX.
+ * If PTE cannot be processed because another CPUs has already locked that
+ * group, those entries are put back in @param starting at index 1.
+ * If entries has to be retried and @retry_busy is set to true, these entries
+ * are retried until success. If @retry_busy is set to false, the returned
+ * is the number of entries yet to process.
+ */
+static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
+ bool retry_busy)
+{
+ unsigned long i, rc, new_idx;
+ unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+again:
+ new_idx = 0;
+ BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
+ if (idx < PLPAR_HCALL9_BUFSIZE)
+ param[idx] = HBR_END;
+
+ rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
+ param[0], /* AVA */
+ param[1], param[2], param[3], param[4], /* TS0-7 */
+ param[5], param[6], param[7], param[8]);
+ if (rc == H_SUCCESS)
+ return 0;
+
+ BUG_ON(rc != H_PARTIAL);
+
+ /* Check that the unprocessed entries were 'not found' or 'busy' */
+ for (i = 0; i < idx-1; i++) {
+ unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
+
+ if (ctrl == HBLKR_CTRL_ERRBUSY) {
+ param[++new_idx] = param[i+1];
+ continue;
+ }
+
+ BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
+ && ctrl != HBLKR_CTRL_ERRNOTFOUND);
+ }
+
+ /*
+ * If there were entries found busy, retry these entries if requested,
+ * of if all the entries have to be retried.
+ */
+ if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) {
+ idx = new_idx + 1;
+ goto again;
+ }
+
+ return new_idx;
+}
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
* Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -425,17 +492,59 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
*/
#define PPC64_HUGE_HPTE_BATCH 12
-static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
- unsigned long *vpn, int count,
- int psize, int ssize)
+static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn,
+ int count, int psize, int ssize)
{
unsigned long param[PLPAR_HCALL9_BUFSIZE];
- int i = 0, pix = 0, rc;
- unsigned long flags = 0;
- int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+ unsigned long shift, current_vpgb, vpgb;
+ int i, pix = 0;
- if (lock_tlbie)
- spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+ shift = mmu_psize_defs[psize].shift;
+
+ for (i = 0; i < count; i++) {
+ /*
+ * Shifting 3 bits more on the right to get a
+ * 8 pages aligned virtual addresse.
+ */
+ vpgb = (vpn[i] >> (shift - VPN_SHIFT + 3));
+ if (!pix || vpgb != current_vpgb) {
+ /*
+ * Need to start a new 8 pages block, flush
+ * the current one if needed.
+ */
+ if (pix)
+ (void)call_block_remove(pix, param, true);
+ current_vpgb = vpgb;
+ param[0] = hpte_encode_avpn(vpn[i], psize, ssize);
+ pix = 1;
+ }
+
+ param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot[i];
+ if (pix == PLPAR_HCALL9_BUFSIZE) {
+ pix = call_block_remove(pix, param, false);
+ /*
+ * pix = 0 means that all the entries were
+ * removed, we can start a new block.
+ * Otherwise, this means that there are entries
+ * to retry, and pix points to latest one, so
+ * we should increment it and try to continue
+ * the same block.
+ */
+ if (!pix)
+ current_vpgb = 0;
+ else
+ pix++;
+ }
+ }
+ if (pix)
+ (void)call_block_remove(pix, param, true);
+}
+
+static void hugepage_bulk_invalidate(unsigned long *slot, unsigned long *vpn,
+ int count, int psize, int ssize)
+{
+ unsigned long param[PLPAR_HCALL9_BUFSIZE];
+ int i = 0, pix = 0, rc;
for (i = 0; i < count; i++) {
@@ -443,17 +552,6 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize, 0,
ssize, 0);
} else {
- param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
- param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
- pix += 2;
- if (pix == 8) {
- rc = plpar_hcall9(H_BULK_REMOVE, param,
- param[0], param[1], param[2],
- param[3], param[4], param[5],
- param[6], param[7]);
- BUG_ON(rc != H_SUCCESS);
- pix = 0;
- }
}
}
if (pix) {
@@ -463,6 +561,23 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
param[6], param[7]);
BUG_ON(rc != H_SUCCESS);
}
+}
+
+static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
+ unsigned long *vpn,
+ int count, int psize,
+ int ssize)
+{
+ unsigned long flags = 0;
+ int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+
+ if (lock_tlbie)
+ spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+
+ if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+ hugepage_block_invalidate(slot, vpn, count, psize, ssize);
+ else
+ hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
if (lock_tlbie)
spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
@@ -565,6 +680,70 @@ static inline unsigned long compute_slot(real_pte_t pte,
return slot;
}
+/**
+ * The hcall H_BLOCK_REMOVE implies that the virtual pages to processed are
+ * "all within the same naturally aligned 8 page virtual address block".
+ */
+static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
+ unsigned long *param)
+{
+ unsigned long vpn;
+ unsigned long i, pix = 0;
+ unsigned long index, shift, slot, current_vpgb, vpgb;
+ real_pte_t pte;
+ int psize, ssize;
+
+ psize = batch->psize;
+ ssize = batch->ssize;
+
+ for (i = 0; i < number; i++) {
+ vpn = batch->vpn[i];
+ pte = batch->pte[i];
+ pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+ /*
+ * Shifting 3 bits more on the right to get a
+ * 8 pages aligned virtual addresse.
+ */
+ vpgb = (vpn >> (shift - VPN_SHIFT + 3));
+ if (!pix || vpgb != current_vpgb) {
+ /*
+ * Need to start a new 8 pages block, flush
+ * the current one if needed.
+ */
+ if (pix)
+ (void)call_block_remove(pix, param,
+ true);
+ current_vpgb = vpgb;
+ param[0] = hpte_encode_avpn(vpn, psize,
+ ssize);
+ pix = 1;
+ }
+
+ slot = compute_slot(pte, vpn, index, shift, ssize);
+ param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot;
+
+ if (pix == PLPAR_HCALL9_BUFSIZE) {
+ pix = call_block_remove(pix, param, false);
+ /*
+ * pix = 0 means that all the entries were
+ * removed, we can start a new block.
+ * Otherwise, this means that there are entries
+ * to retry, and pix points to latest one, so
+ * we should increment it and try to continue
+ * the same block.
+ */
+ if (!pix)
+ current_vpgb = 0;
+ else
+ pix++;
+ }
+ } pte_iterate_hashed_end();
+ }
+
+ if (pix > 1)
+ (void)call_block_remove(pix, param, true);
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
@@ -584,6 +763,11 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
if (lock_tlbie)
spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+ if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+ do_block_remove(number, batch, param);
+ goto out;
+ }
+
psize = batch->psize;
ssize = batch->ssize;
pix = 0;
@@ -622,6 +806,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
BUG_ON(rc != H_SUCCESS);
}
+out:
if (lock_tlbie)
spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
}
--
2.7.4
^ permalink raw reply related
* [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation
From: Laurent Dufour @ 2018-07-27 13:22 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: Aneesh Kumar K.V, Nicholas Piggin, Michael Ellerman,
Paul Mackerras, Benjamin Herrenschmidt
In-Reply-To: <1532697739-4878-1-git-send-email-ldufour@linux.vnet.ibm.com>
This part of code will be called also when dealing with H_BLOCK_REMOVE.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/lpar.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 52eeff1297f4..96b8cd8a802d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
return 0;
}
+
+static inline unsigned long compute_slot(real_pte_t pte,
+ unsigned long vpn,
+ unsigned long index,
+ unsigned long shift,
+ int ssize)
+{
+ unsigned long slot, hash, hidx;
+
+ hash = hpt_hash(vpn, shift, ssize);
+ hidx = __rpte_to_hidx(pte, index);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+ return slot;
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
@@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
unsigned long param[PLPAR_HCALL9_BUFSIZE];
- unsigned long hash, index, shift, hidx, slot;
+ unsigned long index, shift, slot;
real_pte_t pte;
int psize, ssize;
@@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
vpn = batch->vpn[i];
pte = batch->pte[i];
pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
- hash = hpt_hash(vpn, shift, ssize);
- hidx = __rpte_to_hidx(pte, index);
- if (hidx & _PTEIDX_SECONDARY)
- hash = ~hash;
- slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
- slot += hidx & _PTEIDX_GROUP_IX;
+ slot = compute_slot(pte, vpn, index, shift, ssize);
if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
/*
* lpar doesn't use the passed actual page size
--
2.7.4
^ permalink raw reply related
* [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
From: Laurent Dufour @ 2018-07-27 13:22 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: Aneesh Kumar K.V, Nicholas Piggin, Michael Ellerman,
Paul Mackerras, Benjamin Herrenschmidt
In-Reply-To: <1532697739-4878-1-git-send-email-ldufour@linux.vnet.ibm.com>
This feature tells if the hcall H_BLOCK_REMOVE is available.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/firmware.h | 3 ++-
arch/powerpc/platforms/pseries/firmware.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 535add3f7791..360ba197f9d2 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -53,6 +53,7 @@
#define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000)
#define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0000000400000000)
#define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000)
+#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)
#ifndef __ASSEMBLY__
@@ -70,7 +71,7 @@ enum {
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
- FW_FEATURE_DRC_INFO,
+ FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index a3bbeb43689e..1624501386f4 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_SET_MODE, "hcall-set-mode"},
{FW_FEATURE_BEST_ENERGY, "hcall-best-energy-1*"},
{FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
+ {FW_FEATURE_BLOCK_REMOVE, "hcall-block-remove"},
};
/* Build up the firmware features bitmask using the contents of
--
2.7.4
^ permalink raw reply related
* [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references
From: Nicholas Piggin @ 2018-07-27 11:48 UTC (permalink / raw)
To: linuxppc-dev
Cc: Nicholas Piggin, Andrew Morton, Linus Torvalds,
Aneesh Kumar K . V, linux-mm
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Any objection to the struct page change to grab the arch specific
page table page word for powerpc to use? If not, then this should
go via powerpc tree because it's inconsequential for core mm.
Thanks,
Nick
arch/powerpc/mm/mmu_context_book3s64.c | 8 ++++----
arch/powerpc/mm/pgtable-book3s64.c | 17 +++++++++++------
include/linux/mm_types.h | 5 ++++-
3 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index f3d4b4a0e561..3bb5cec03d1f 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -200,9 +200,9 @@ static void pte_frag_destroy(void *pte_frag)
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PTE_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -215,9 +215,9 @@ static void pmd_frag_destroy(void *pmd_frag)
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PMD_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PMD_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 4afbfbb64bfd..78d0b3d5ebad 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -270,6 +270,8 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
+
ret = page_address(page);
/*
* if we support only one fragment just return the
@@ -285,7 +287,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
* count.
*/
if (likely(!mm->context.pmd_frag)) {
- set_page_count(page, PMD_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -308,9 +310,10 @@ void pmd_fragment_free(unsigned long *pmd)
{
struct page *page = virt_to_page(pmd);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -352,6 +355,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
ret = page_address(page);
/*
@@ -367,7 +371,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
* count.
*/
if (likely(!mm->context.pte_frag)) {
- set_page_count(page, PTE_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PTE_FRAG_NR);
mm->context.pte_frag = ret + PTE_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -390,10 +394,11 @@ void pte_fragment_free(unsigned long *table, int kernel)
{
struct page *page = virt_to_page(table);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
if (!kernel)
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 99ce070e7dcb..22651e124071 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -139,7 +139,10 @@ struct page {
unsigned long _pt_pad_1; /* compound_head */
pgtable_t pmd_huge_pte; /* protected by page->ptl */
unsigned long _pt_pad_2; /* mapping */
- struct mm_struct *pt_mm; /* x86 pgds only */
+ union {
+ struct mm_struct *pt_mm; /* x86 pgds only */
+ atomic_t pt_frag_refcount; /* powerpc */
+ };
#if ALLOC_SPLIT_PTLOCKS
spinlock_t *ptl;
#else
--
2.17.0
^ permalink raw reply related
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-07-27 11:31 UTC (permalink / raw)
To: Anshuman Khandual
Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
david
In-Reply-To: <4062dd48-2b5b-e454-e860-c6bfe321ebdc@linux.vnet.ibm.com>
On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote:
> Results with and without the patches are similar.
Thanks! And another thing to try is virtio-net with
a fast NIC backend (40G and up). Unfortunately
at this point loopback tests stress the host
scheduler too much.
--
MST
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Anshuman Khandual @ 2018-07-27 10:58 UTC (permalink / raw)
To: Will Deacon
Cc: robh, srikar, mst, aik, jasowang, linuxram, linux-kernel,
virtualization, hch, jean-philippe.brucker, paulus, marc.zyngier,
joe, robin.murphy, linuxppc-dev, elfring, haren, david
In-Reply-To: <20180727095804.GA25592@arm.com>
On 07/27/2018 03:28 PM, Will Deacon wrote:
> Hi Anshuman,
>
> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>> This patch series is the follow up on the discussions we had before about
>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
>> were suggestions about doing away with two different paths of transactions
>> with the host/QEMU, first being the direct GPA and the other being the DMA
>> API based translations.
>>
>> First patch attempts to create a direct GPA mapping based DMA operations
>> structure called 'virtio_direct_dma_ops' with exact same implementation
>> of the direct GPA path which virtio core currently has but just wrapped in
>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
>> existing semantics. The second patch does exactly that inside the function
>> virtio_finalize_features(). The third patch removes the default direct GPA
>> path from virtio core forcing it to use DMA API callbacks for all devices.
>> Now with that change, every device must have a DMA operations structure
>> associated with it. The fourth patch adds an additional hook which gives
>> the platform an opportunity to do yet another override if required. This
>> platform hook can be used on POWER Ultravisor based protected guests to
>> load up SWIOTLB DMA callbacks to do the required (as discussed previously
>> in the above mentioned thread how host is allowed to access only parts of
>> the guest GPA range) bounce buffering into the shared memory for all I/O
>> scatter gather buffers to be consumed on the host side.
>>
>> Please go through these patches and review whether this approach broadly
>> makes sense. I will appreciate suggestions, inputs, comments regarding
>> the patches or the approach in general. Thank you.
> I just wanted to say that this patch series provides a means for us to
> force the coherent DMA ops for legacy virtio devices on arm64, which in turn
> means that we can enable the SMMU with legacy devices in our fastmodel
> emulation platform (which is slowly being upgraded to virtio 1.0) without
> hanging during boot. Patch below.
>
> So:
>
> Acked-by: Will Deacon <will.deacon@arm.com>
> Tested-by: Will Deacon <will.deacon@arm.com>
Thanks Will.
^ permalink raw reply
* RE: [PATCH] Adds __init annotation at mmu_init_secondary func
From: Alexey Spirkov @ 2018-07-27 10:57 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
linuxppc-dev@lists.ozlabs.org
Cc: trivial@kernel.org, andrew@ncrmnt.org
In-Reply-To: <87tvolrm43.fsf@concordia.ellerman.id.au>
Without any additional option=20
WARNING: modpost: Found 1 section mismatch(es).
If detailed debug is switched on than:
WARNING: vmlinux.o(.text+0x142ac): Section mismatch in reference from the f=
unction mmu_init_secondary() to the function .init.text:ppc44x_pin_tlb()
The function mmu_init_secondary() references
the function __init ppc44x_pin_tlb().
This is often because mmu_init_secondary lacks a __init=20
annotation or the annotation of ppc44x_pin_tlb is wrong.
Best regards,
Alexey Spirkov
-----Original Message-----
From: Michael Ellerman <mpe@ellerman.id.au>=20
Sent: Friday, July 27, 2018 1:48 PM
To: Alexey Spirkov <AlexeiS@astrosoft.ru>; Benjamin Herrenschmidt <benh@ker=
nel.crashing.org>; Paul Mackerras <paulus@samba.org>; linuxppc-dev@lists.oz=
labs.org
Cc: trivial@kernel.org; andrew@ncrmnt.org
Subject: Re: [PATCH] Adds __init annotation at mmu_init_secondary func
Alexey Spirkov <AlexeiS@astrosoft.ru> writes:
> mmu_init_secondary function called at initialization sequence but it=20
> misses __init annotation. As result modpost warning is generated.
> Some building systems sensitive to such kind of warnings.
What warning are you seeing?
AFAICS it's not called from, nor does it call, any __init code.
So I'm a bit confused.
cheers
> diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c=20
> index 82b1ff7..12d9251 100644
> --- a/arch/powerpc/mm/44x_mmu.c
> +++ b/arch/powerpc/mm/44x_mmu.c
> @@ -229,7 +229,7 @@ void setup_initial_memory_limit(phys_addr_t=20
> first_memblock_base, }
> =20
> #ifdef CONFIG_SMP
> -void mmu_init_secondary(int cpu)
> +void __init mmu_init_secondary(int cpu)
> {
> unsigned long addr;
> unsigned long memstart =3D memstart_addr & ~(PPC_PIN_SIZE - 1);
> --
> 2.9.5
^ permalink raw reply
* Re: [PATCH] Adds __init annotation at mmu_init_secondary func
From: Michael Ellerman @ 2018-07-27 10:47 UTC (permalink / raw)
To: Alexey Spirkov, Benjamin Herrenschmidt, Paul Mackerras,
linuxppc-dev@lists.ozlabs.org
Cc: trivial@kernel.org, andrew@ncrmnt.org
In-Reply-To: <AM5PR03MB28499BE4A0B6200F7DA83835C92B0@AM5PR03MB2849.eurprd03.prod.outlook.com>
Alexey Spirkov <AlexeiS@astrosoft.ru> writes:
> mmu_init_secondary function called at initialization sequence
> but it misses __init annotation. As result modpost warning is generated.
> Some building systems sensitive to such kind of warnings.
What warning are you seeing?
AFAICS it's not called from, nor does it call, any __init code.
So I'm a bit confused.
cheers
> diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
> index 82b1ff7..12d9251 100644
> --- a/arch/powerpc/mm/44x_mmu.c
> +++ b/arch/powerpc/mm/44x_mmu.c
> @@ -229,7 +229,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> }
>
> #ifdef CONFIG_SMP
> -void mmu_init_secondary(int cpu)
> +void __init mmu_init_secondary(int cpu)
> {
> unsigned long addr;
> unsigned long memstart = memstart_addr & ~(PPC_PIN_SIZE - 1);
> --
> 2.9.5
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Will Deacon @ 2018-07-27 9:58 UTC (permalink / raw)
To: Anshuman Khandual
Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren,
paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180720035941.6844-1-khandual@linux.vnet.ibm.com>
Hi Anshuman,
On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> This patch series is the follow up on the discussions we had before about
> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> were suggestions about doing away with two different paths of transactions
> with the host/QEMU, first being the direct GPA and the other being the DMA
> API based translations.
>
> First patch attempts to create a direct GPA mapping based DMA operations
> structure called 'virtio_direct_dma_ops' with exact same implementation
> of the direct GPA path which virtio core currently has but just wrapped in
> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> existing semantics. The second patch does exactly that inside the function
> virtio_finalize_features(). The third patch removes the default direct GPA
> path from virtio core forcing it to use DMA API callbacks for all devices.
> Now with that change, every device must have a DMA operations structure
> associated with it. The fourth patch adds an additional hook which gives
> the platform an opportunity to do yet another override if required. This
> platform hook can be used on POWER Ultravisor based protected guests to
> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> in the above mentioned thread how host is allowed to access only parts of
> the guest GPA range) bounce buffering into the shared memory for all I/O
> scatter gather buffers to be consumed on the host side.
>
> Please go through these patches and review whether this approach broadly
> makes sense. I will appreciate suggestions, inputs, comments regarding
> the patches or the approach in general. Thank you.
I just wanted to say that this patch series provides a means for us to
force the coherent DMA ops for legacy virtio devices on arm64, which in turn
means that we can enable the SMMU with legacy devices in our fastmodel
emulation platform (which is slowly being upgraded to virtio 1.0) without
hanging during boot. Patch below.
So:
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Thanks!
Will
--->8
>From 4ef39e9de2c87c97bf046816ca762832f92e39b5 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon@arm.com>
Date: Fri, 27 Jul 2018 10:49:25 +0100
Subject: [PATCH] arm64: dma: Override DMA ops for legacy virtio devices
Virtio devices are always cache-coherent, so force use of the coherent
DMA ops for legacy virtio devices where the dma-coherent is known to
be omitted by QEMU for the MMIO transport.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/dma-mapping.h | 6 ++++++
arch/arm64/mm/dma-mapping.c | 19 +++++++++++++++++++
2 files changed, 25 insertions(+)
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index b7847eb8a7bb..30aa8fb62dc3 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -44,6 +44,12 @@ void arch_teardown_dma_ops(struct device *dev);
#define arch_teardown_dma_ops arch_teardown_dma_ops
#endif
+#ifdef CONFIG_VIRTIO
+struct virtio_device;
+void platform_override_dma_ops(struct virtio_device *vdev);
+#define platform_override_dma_ops platform_override_dma_ops
+#endif
+
/* do not use this function in a driver */
static inline bool is_device_dma_coherent(struct device *dev)
{
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 61e93f0b5482..f9ca61b1b34d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -891,3 +891,22 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
}
#endif
}
+
+#ifdef CONFIG_VIRTIO
+#include <linux/virtio_config.h>
+
+void platform_override_dma_ops(struct virtio_device *vdev)
+{
+ struct device *dev = vdev->dev.parent;
+ const struct dma_map_ops *dma_ops = &arm64_swiotlb_dma_ops;
+
+ if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
+ return;
+
+ dev->archdata.dma_coherent = true;
+ if (iommu_get_domain_for_dev(dev))
+ dma_ops = &iommu_dma_ops;
+
+ set_dma_ops(dev, dma_ops);
+}
+#endif /* CONFIG_VIRTIO */
--
2.1.4
^ permalink raw reply related
* [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for P2020
From: Bharat Bhushan @ 2018-07-27 9:48 UTC (permalink / raw)
To: benh, paulus, mpe, oss, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel
Cc: robh, keescook, tyreld, joe, Bharat Bhushan
In-Reply-To: <1532684881-19310-1-git-send-email-Bharat.Bhushan@nxp.com>
MPIC on NXP (Freescale) P2020 supports following irq
ranges:
> 0 - 11 (External interrupt)
> 16 - 79 (Internal interrupt)
> 176 - 183 (Messaging interrupt)
> 224 - 231 (Shared message signaled interrupt)
We have to remove "irq_count" from platform code as platform
is given precedence over device-tree, while I think device-tree
should have precedence.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
---
arch/powerpc/boot/dts/fsl/p2020si-post.dtsi | 3 +++
arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 5 +++++
2 files changed, 8 insertions(+)
diff --git a/arch/powerpc/boot/dts/fsl/p2020si-post.dtsi b/arch/powerpc/boot/dts/fsl/p2020si-post.dtsi
index 884e01b..08e266b 100644
--- a/arch/powerpc/boot/dts/fsl/p2020si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p2020si-post.dtsi
@@ -192,6 +192,9 @@
/include/ "pq3-sec3.1-0.dtsi"
/include/ "pq3-mpic.dtsi"
/include/ "pq3-mpic-timer-B.dtsi"
+ pic@40000 {
+ supported-irq-ranges = <0 11 16 79 176 183 224 231>;
+ };
global-utilities@e0000 {
compatible = "fsl,p2020-guts";
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
index 1006950..49ff348 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
@@ -57,6 +57,11 @@ void __init mpc85xx_rdb_pic_init(void)
MPIC_BIG_ENDIAN |
MPIC_SINGLE_DEST_CPU,
0, 256, " OpenPIC ");
+ } else if (of_machine_is_compatible("fsl,P2020RDB-PC")) {
+ mpic = mpic_alloc(NULL, 0,
+ MPIC_BIG_ENDIAN |
+ MPIC_SINGLE_DEST_CPU,
+ 0, 0, " OpenPIC ");
} else {
mpic = mpic_alloc(NULL, 0,
MPIC_BIG_ENDIAN |
--
1.9.3
^ permalink raw reply related
* [RFC 4/5] powerpc/mpic: Boot print supported interrupt ranges
From: Bharat Bhushan @ 2018-07-27 9:48 UTC (permalink / raw)
To: benh, paulus, mpe, oss, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel
Cc: robh, keescook, tyreld, joe, Bharat Bhushan
In-Reply-To: <1532684881-19310-1-git-send-email-Bharat.Bhushan@nxp.com>
As mpic can have non-contiguous source of interrupt range,
print same during boot.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
---
arch/powerpc/sysdev/mpic.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index cbf3a51..8df248f 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -155,6 +155,21 @@ struct bus_type mpic_subsys = {
#endif /* CONFIG_MPIC_WEIRD */
+static void mpic_show_irq_ranges(struct mpic *mpic)
+{
+ int i;
+
+ pr_info("mpic: Initializing for %d sources\n", mpic->num_sources);
+
+ if (mpic->num_ranges) {
+ pr_info(" Supported source of interrupt ranges\n");
+ for (i = 0; i < mpic->num_ranges; i++)
+ pr_info(" > %d - %d\n", mpic->irq_ranges[i].start_irq,
+ mpic->irq_ranges[i].end_irq);
+
+ }
+}
+
static int mpic_irq_source_invalid(struct mpic *mpic, unsigned int irq)
{
int i;
@@ -1646,8 +1661,7 @@ void __init mpic_init(struct mpic *mpic)
int num_timers = 4;
BUG_ON(mpic->num_sources == 0);
-
- printk(KERN_INFO "mpic: Initializing for %d sources\n", mpic->num_sources);
+ mpic_show_irq_ranges(mpic);
/* Set current processor priority to max */
mpic_cpu_write(MPIC_INFO(CPU_CURRENT_TASK_PRI), 0xf);
--
1.9.3
^ permalink raw reply related
* [RFC 3/5] powerpc/mpic: Add support for non-contiguous irq ranges
From: Bharat Bhushan @ 2018-07-27 9:47 UTC (permalink / raw)
To: benh, paulus, mpe, oss, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel
Cc: robh, keescook, tyreld, joe, Bharat Bhushan
In-Reply-To: <1532684881-19310-1-git-send-email-Bharat.Bhushan@nxp.com>
Freescale MPIC h/w may not support all interrupt sources reported
by hardware, "last-interrupt-source" or platform. On these platforms
a misconfigured device tree that assigns one of the reserved
interrupts leaves a non-functioning system without warning.
This patch adds "supported-irq-ranges" property in device tree to
provide the range of supported source of interrupts. If a reserved
interrupt used then it will not be programming h/w, which it does
currently, and through warning.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
---
.../devicetree/bindings/powerpc/fsl/mpic.txt | 8 ++
arch/powerpc/include/asm/mpic.h | 9 ++
arch/powerpc/sysdev/mpic.c | 113 +++++++++++++++++++--
3 files changed, 121 insertions(+), 9 deletions(-)
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt
index dc57446..bd6da54 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt
@@ -77,6 +77,14 @@ PROPERTIES
in the global feature registers. If specified, this field will
override the value read from MPIC_GREG_FEATURE_LAST_SRC.
+ - supported-irq-ranges
+ Usage: optional
+ Value type: <prop-encoded-array>
+ Definition: This encodes arbitrary number of start-irq and end-irq
+ pairs, both including. Interrupt source supported by an MPIC
+ may not be contigous, in that case this property will be used
+ to pass supported source of interrupt ranges.
+
INTERRUPT SPECIFIER DEFINITION
Interrupt specifiers consists of 4 cells encoded as
diff --git a/arch/powerpc/include/asm/mpic.h b/arch/powerpc/include/asm/mpic.h
index fad8ddd..4080c98 100644
--- a/arch/powerpc/include/asm/mpic.h
+++ b/arch/powerpc/include/asm/mpic.h
@@ -252,6 +252,11 @@ struct mpic_irq_save {
#endif
};
+struct mpic_irq_range {
+ u32 start_irq;
+ u32 end_irq;
+};
+
/* The instance data of a given MPIC */
struct mpic
{
@@ -281,6 +286,10 @@ struct mpic
/* Number of sources */
unsigned int num_sources;
+ /* Supported source ranges */
+ unsigned int num_ranges;
+ struct mpic_irq_range *irq_ranges;
+
/* vector numbers used for internal sources (ipi/timers) */
unsigned int ipi_vecs[4];
unsigned int timer_vecs[8];
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index d503887..cbf3a51 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -155,6 +155,23 @@ struct bus_type mpic_subsys = {
#endif /* CONFIG_MPIC_WEIRD */
+static int mpic_irq_source_invalid(struct mpic *mpic, unsigned int irq)
+{
+ int i;
+
+ for (i = 0; i < mpic->num_ranges; i++) {
+ if ((irq >= mpic->irq_ranges[i].start_irq) &&
+ (irq <= mpic->irq_ranges[i].end_irq))
+ return 0;
+ }
+
+ /* if not supported irq-ranges then check for num_sources */
+ if (!mpic->num_ranges && irq < mpic->num_sources)
+ return 0;
+
+ return -EINVAL;
+}
+
static inline unsigned int mpic_processor_id(struct mpic *mpic)
{
unsigned int cpu = 0;
@@ -873,8 +890,10 @@ int mpic_set_irq_type(struct irq_data *d, unsigned int flow_type)
DBG("mpic: set_irq_type(mpic:@%p,virq:%d,src:0x%x,type:0x%x)\n",
mpic, d->irq, src, flow_type);
- if (src >= mpic->num_sources)
+ if (mpic_irq_source_invalid(mpic, src)) {
+ WARN(1, "mpic: Reserved IRQ source %d\n", src);
return -EINVAL;
+ }
vold = mpic_irq_read(src, MPIC_INFO(IRQ_VECTOR_PRI));
@@ -933,8 +952,10 @@ void mpic_set_vector(unsigned int virq, unsigned int vector)
DBG("mpic: set_vector(mpic:@%p,virq:%d,src:%d,vector:0x%x)\n",
mpic, virq, src, vector);
- if (src >= mpic->num_sources)
+ if (mpic_irq_source_invalid(mpic, src)) {
+ WARN(1, "mpic: Reserved IRQ source %d\n", src);
return;
+ }
vecpri = mpic_irq_read(src, MPIC_INFO(IRQ_VECTOR_PRI));
vecpri = vecpri & ~MPIC_INFO(VECPRI_VECTOR_MASK);
@@ -950,8 +971,10 @@ static void mpic_set_destination(unsigned int virq, unsigned int cpuid)
DBG("mpic: set_destination(mpic:@%p,virq:%d,src:%d,cpuid:0x%x)\n",
mpic, virq, src, cpuid);
- if (src >= mpic->num_sources)
+ if (mpic_irq_source_invalid(mpic, src)) {
+ WARN(1, "mpic: Reserved IRQ source %d\n", src);
return;
+ }
mpic_irq_write(src, MPIC_INFO(IRQ_DESTINATION), 1 << cpuid);
}
@@ -1038,7 +1061,7 @@ static int mpic_host_map(struct irq_domain *h, unsigned int virq,
if (mpic_map_error_int(mpic, virq, hw))
return 0;
- if (hw >= mpic->num_sources) {
+ if (mpic_irq_source_invalid(mpic, hw)) {
pr_warn("mpic: Mapping of source 0x%x failed, source out of range !\n",
(unsigned int)hw);
return -EINVAL;
@@ -1210,6 +1233,52 @@ u32 fsl_mpic_primary_get_version(void)
return 0;
}
+static u32 mpic_last_irq_from_ranges(struct mpic *mpic)
+{
+ int i;
+ u32 last_irq = 0;
+
+ for (i = 0; i < mpic->num_ranges; i++)
+ if (last_irq < mpic->irq_ranges[i].end_irq)
+ last_irq = mpic->irq_ranges[i].end_irq;
+
+ return last_irq;
+}
+
+static int __init mpic_init_irq_ranges(struct mpic *mpic)
+{
+ const u32 *irq_ranges;
+ u32 len, count;
+ int i;
+
+ irq_ranges = of_get_property(mpic->node, "supported-irq-ranges", &len);
+ if (irq_ranges == NULL) {
+ pr_info("%s : supported-irq-ranges not found in mpic(%p)\n",
+ __func__, mpic->node);
+ return -1;
+ }
+
+ if (len % (2 * sizeof(u32)) != 0) {
+ pr_info("%s : incorrect irq ranges in mpic(%p)\n",
+ __func__, mpic->node);
+ return -1;
+ }
+
+ count = len / (2 * sizeof(u32));
+ mpic->irq_ranges = kcalloc(count, sizeof(struct mpic_irq_range),
+ GFP_KERNEL);
+ if (mpic->irq_ranges == NULL)
+ return -1;
+
+ mpic->num_ranges = count;
+ for (i = 0; i < count; i++) {
+ mpic->irq_ranges[i].start_irq = *irq_ranges++;
+ mpic->irq_ranges[i].end_irq = *irq_ranges++;
+ }
+
+ return 0;
+}
+
static int mpic_get_last_irq_source(struct mpic *mpic,
unsigned int irq_count,
unsigned int isu_size)
@@ -1219,14 +1288,18 @@ static int mpic_get_last_irq_source(struct mpic *mpic,
/* Current priority order for getting last irq:
* 1) irq_count from platform
- * 2) "last-interrupt-source" from device tree
- * 3) isu_size from platform
- * 4) MPIC h/w GREG_FEATURE_0 register
+ * 2) "supported-irq-ranges" from device tree
+ * 3) "last-interrupt-source" from device tree
+ * 4) isu_size from platform
+ * 5) MPIC h/w GREG_FEATURE_0 register
*/
if (irq_count)
return (irq_count - 1);
+ if (!mpic_init_irq_ranges(mpic))
+ return mpic_last_irq_from_ranges(mpic);
+
if (!of_property_read_u32(mpic->node, "last-interrupt-source",
&last_irq)) {
return last_irq;
@@ -1632,6 +1705,10 @@ void __init mpic_init(struct mpic *mpic)
u32 vecpri = MPIC_VECPRI_MASK | i |
(8 << MPIC_VECPRI_PRIORITY_SHIFT);
+ /* Skip if source irq not valid */
+ if (mpic_irq_source_invalid(mpic, i))
+ continue;
+
/* check if protected */
if (mpic->protected && test_bit(i, mpic->protected))
continue;
@@ -1732,9 +1809,14 @@ void mpic_setup_this_cpu(void)
* values of irq_desc[].affinity in irq.c.
*/
if (distribute_irqs && !(mpic->flags & MPIC_SINGLE_DEST_CPU)) {
- for (i = 0; i < mpic->num_sources ; i++)
+ for (i = 0; i < mpic->num_sources ; i++) {
+ /* Skip if irq source is not valid */
+ if (mpic_irq_source_invalid(mpic, i))
+ continue;
+
mpic_irq_write(i, MPIC_INFO(IRQ_DESTINATION),
mpic_irq_read(i, MPIC_INFO(IRQ_DESTINATION)) | msk);
+ }
}
/* Set current processor priority to 0 */
@@ -1772,9 +1854,14 @@ void mpic_teardown_this_cpu(int secondary)
raw_spin_lock_irqsave(&mpic_lock, flags);
/* let the mpic know we don't want intrs. */
- for (i = 0; i < mpic->num_sources ; i++)
+ for (i = 0; i < mpic->num_sources ; i++) {
+ /* Skip if irq not valid */
+ if (mpic_irq_source_invalid(mpic, i))
+ continue;
+
mpic_irq_write(i, MPIC_INFO(IRQ_DESTINATION),
mpic_irq_read(i, MPIC_INFO(IRQ_DESTINATION)) & ~msk);
+ }
/* Set current processor priority to max */
mpic_cpu_write(MPIC_INFO(CPU_CURRENT_TASK_PRI), 0xf);
@@ -1958,6 +2045,10 @@ static void mpic_suspend_one(struct mpic *mpic)
int i;
for (i = 0; i < mpic->num_sources; i++) {
+ /* Skip if irq source not valid */
+ if (mpic_irq_source_invalid(mpic, i))
+ continue;
+
mpic->save_data[i].vecprio =
mpic_irq_read(i, MPIC_INFO(IRQ_VECTOR_PRI));
mpic->save_data[i].dest =
@@ -1982,6 +2073,10 @@ static void mpic_resume_one(struct mpic *mpic)
int i;
for (i = 0; i < mpic->num_sources; i++) {
+ /* Skip if irq source not valid */
+ if (mpic_irq_source_invalid(mpic, i))
+ continue;
+
mpic_irq_write(i, MPIC_INFO(IRQ_VECTOR_PRI),
mpic->save_data[i].vecprio);
mpic_irq_write(i, MPIC_INFO(IRQ_DESTINATION),
--
1.9.3
^ permalink raw reply related
* [RFC 2/5] powerpc/mpic: Rework last source irq calculation logic
From: Bharat Bhushan @ 2018-07-27 9:47 UTC (permalink / raw)
To: benh, paulus, mpe, oss, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel
Cc: robh, keescook, tyreld, joe, Bharat Bhushan
In-Reply-To: <1532684881-19310-1-git-send-email-Bharat.Bhushan@nxp.com>
Last irq calculation logic uses below priority order:
1) irq_count from platform
2) "last-interrupt-source" from device tree
3) isu_size from platform
4) MPIC h/w GREG_FEATURE_0 register
This patch reworks the last irq calculation logic but
functionality and priority order are same as before.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
---
arch/powerpc/sysdev/mpic.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index b6803bc..d503887 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -1217,25 +1217,32 @@ static int mpic_get_last_irq_source(struct mpic *mpic,
u32 last_irq;
u32 greg_feature;
+ /* Current priority order for getting last irq:
+ * 1) irq_count from platform
+ * 2) "last-interrupt-source" from device tree
+ * 3) isu_size from platform
+ * 4) MPIC h/w GREG_FEATURE_0 register
+ */
+
+ if (irq_count)
+ return (irq_count - 1);
+
+ if (!of_property_read_u32(mpic->node, "last-interrupt-source",
+ &last_irq)) {
+ return last_irq;
+ }
+
+ if (isu_size)
+ return (isu_size * MPIC_MAX_ISU - 1);
+
/*
- * Read feature register. For non-ISU MPICs, num sources as well. On
+ * Read feature register. For non-ISU MPICs, num sources as well. On
* ISU MPICs, sources are counted as ISUs are added
*/
greg_feature = mpic_read(mpic->gregs, MPIC_INFO(GREG_FEATURE_0));
- /*
- * By default, the last source number comes from the MPIC, but the
- * device-tree and board support code can override it on buggy hw.
- * If we get passed an isu_size (multi-isu MPIC) then we use that
- * as a default instead of the value read from the HW.
- */
last_irq = (greg_feature & MPIC_GREG_FEATURE_LAST_SRC_MASK)
>> MPIC_GREG_FEATURE_LAST_SRC_SHIFT;
- if (isu_size)
- last_irq = isu_size * MPIC_MAX_ISU - 1;
- of_property_read_u32(mpic->node, "last-interrupt-source", &last_irq);
- if (irq_count)
- last_irq = irq_count - 1;
return last_irq;
}
--
1.9.3
^ permalink raw reply related
* [RFC 1/5] powerpc/mpic: move last irq logic to function
From: Bharat Bhushan @ 2018-07-27 9:47 UTC (permalink / raw)
To: benh, paulus, mpe, oss, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel
Cc: robh, keescook, tyreld, joe, Bharat Bhushan
In-Reply-To: <1532684881-19310-1-git-send-email-Bharat.Bhushan@nxp.com>
This function just moves the last-irq calculation
logic to a function, while no change in logic.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
---
arch/powerpc/sysdev/mpic.c | 52 +++++++++++++++++++++++++++++-----------------
1 file changed, 33 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index 353b439..b6803bc 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -1210,6 +1210,36 @@ u32 fsl_mpic_primary_get_version(void)
return 0;
}
+static int mpic_get_last_irq_source(struct mpic *mpic,
+ unsigned int irq_count,
+ unsigned int isu_size)
+{
+ u32 last_irq;
+ u32 greg_feature;
+
+ /*
+ * Read feature register. For non-ISU MPICs, num sources as well. On
+ * ISU MPICs, sources are counted as ISUs are added
+ */
+ greg_feature = mpic_read(mpic->gregs, MPIC_INFO(GREG_FEATURE_0));
+
+ /*
+ * By default, the last source number comes from the MPIC, but the
+ * device-tree and board support code can override it on buggy hw.
+ * If we get passed an isu_size (multi-isu MPIC) then we use that
+ * as a default instead of the value read from the HW.
+ */
+ last_irq = (greg_feature & MPIC_GREG_FEATURE_LAST_SRC_MASK)
+ >> MPIC_GREG_FEATURE_LAST_SRC_SHIFT;
+ if (isu_size)
+ last_irq = isu_size * MPIC_MAX_ISU - 1;
+ of_property_read_u32(mpic->node, "last-interrupt-source", &last_irq);
+ if (irq_count)
+ last_irq = irq_count - 1;
+
+ return last_irq;
+}
+
struct mpic * __init mpic_alloc(struct device_node *node,
phys_addr_t phys_addr,
unsigned int flags,
@@ -1451,25 +1481,7 @@ struct mpic * __init mpic_alloc(struct device_node *node,
0x1000);
}
- /*
- * Read feature register. For non-ISU MPICs, num sources as well. On
- * ISU MPICs, sources are counted as ISUs are added
- */
- greg_feature = mpic_read(mpic->gregs, MPIC_INFO(GREG_FEATURE_0));
-
- /*
- * By default, the last source number comes from the MPIC, but the
- * device-tree and board support code can override it on buggy hw.
- * If we get passed an isu_size (multi-isu MPIC) then we use that
- * as a default instead of the value read from the HW.
- */
- last_irq = (greg_feature & MPIC_GREG_FEATURE_LAST_SRC_MASK)
- >> MPIC_GREG_FEATURE_LAST_SRC_SHIFT;
- if (isu_size)
- last_irq = isu_size * MPIC_MAX_ISU - 1;
- of_property_read_u32(mpic->node, "last-interrupt-source", &last_irq);
- if (irq_count)
- last_irq = irq_count - 1;
+ last_irq = mpic_get_last_irq_source(mpic, irq_count, isu_size);
/* Initialize main ISU if none provided */
if (!isu_size) {
@@ -1495,6 +1507,8 @@ struct mpic * __init mpic_alloc(struct device_node *node,
if (mpic->irqhost == NULL)
return NULL;
+ greg_feature = mpic_read(mpic->gregs, MPIC_INFO(GREG_FEATURE_0));
+
/* Display version */
switch (greg_feature & MPIC_GREG_FEATURE_VERSION_MASK) {
case 1:
--
1.9.3
^ permalink raw reply related
* [RFC 0/5] powerpc/mpic: Add non-contiguous interrupt sources
From: Bharat Bhushan @ 2018-07-27 9:47 UTC (permalink / raw)
To: benh, paulus, mpe, oss, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel
Cc: robh, keescook, tyreld, joe, Bharat Bhushan
Freescale MPIC h/w may not support all interrupt sources reported
by hardware or "last-interrupt-source" or platform. On these platforms
a misconfigured device tree that assigns one of the reserved
interrupts leaves a non-functioning system without warning.
First Patch just moves the last-irq calculation logic to a function,
Second patch reworks same logic, While I feel that device-tree should
get precedence over platform provided last-irq, but in this series
I have not changed this logic.
Third and fourth patch add non-contiguous interrupt sources support
Fifth patch enables this for P2020RDB-PC for now.
Bharat Bhushan (5):
powerpc/mpic: move last irq logic to function
powerpc/mpic: Rework last source irq calculation logic
powerpc/mpic: Add support for non-contiguous irq ranges
powerpc/mpic: Boot print supported interrupt ranges
powerpc/fsl: Add supported-irq-ranges for P2020
.../devicetree/bindings/powerpc/fsl/mpic.txt | 8 +
arch/powerpc/boot/dts/fsl/p2020si-post.dtsi | 3 +
arch/powerpc/include/asm/mpic.h | 9 +
arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 5 +
arch/powerpc/sysdev/mpic.c | 184 ++++++++++++++++++---
5 files changed, 182 insertions(+), 27 deletions(-)
--
1.9.3
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox