linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs
@ 2017-06-22 10:06 Frederic Barrat
  2017-06-22 10:06 ` [RFC v2 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Frederic Barrat @ 2017-06-22 10:06 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard

capi2 and opencapi require the TLB invalidations being sent for
addresses used on the cxl adapter or opencapi device to be global, as
there's a translation cache in the PSL (for capi2) or NPU (for
opencapi). The CAPP (for PSL) and NPU snoop the power bus.

This is not new: for the hash memory model, as soon as the cxl driver
is active, all local TLBIs become global. We need a similar mechanism
for the radix memory model. This patch tries to improve things a bit
by flagging the contexts requiring global TLBIs, therefore limiting
the "upgrade" and not affecting contexts not used by the card.

A longer-term goal is to modify the current implementation for hash to
follow the same direction, i.e. identify contexts needing global
TLBIs, but that will be for later. It would be required to support
hash for opencapi.

Changelog:
>From previous comments:
 - rename MM_CONTEXT_GLOBAL_TLBI -> MM_GLOBAL_TLBIE
 - add memory barriers to make sure the device doesn't miss any TLBI
 - also add barrier for the hash implemention to fix the same issue
 
Frederic Barrat (3):
  powerpc/mm: Add marker for contexts requiring global TLB invalidations
  cxl: Mark context requiring global TLBIs
  cxl: Add memory barrier to guarantee TLBI scope

 arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++++++++++++++++++
 arch/powerpc/include/asm/tlb.h           | 23 +++++++++++++++++++++--
 arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
 drivers/misc/cxl/api.c                   | 12 ++++++++++--
 drivers/misc/cxl/file.c                  | 12 ++++++++++--
 include/misc/cxl-base.h                  | 18 +++++++++++++++++-
 6 files changed, 77 insertions(+), 7 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC v2 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations
  2017-06-22 10:06 [RFC v2 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
@ 2017-06-22 10:06 ` Frederic Barrat
  2017-06-22 10:06 ` [RFC v2 2/3] cxl: Mark context requiring global TLBIs Frederic Barrat
  2017-06-22 10:06 ` [RFC v2 3/3] cxl: Add memory barrier to guarantee TLBI scope Frederic Barrat
  2 siblings, 0 replies; 4+ messages in thread
From: Frederic Barrat @ 2017-06-22 10:06 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard

Introduce a new 'flags' attribute per context and define its first bit
to be a marker requiring all TLBIs for that context to be broadcasted
globally. Once that marker is set on a context, it cannot be removed.

Such a marker is useful for memory contexts used by devices behind the
NPU and CAPP/PSL. The NPU and the PSL keep their own translation cache
so they need to see all the TLBIs for those contexts.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++++++++++++++++++
 arch/powerpc/include/asm/tlb.h           | 23 +++++++++++++++++++++--
 arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 77529a3e3811..cd83f8eb6a3f 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -78,8 +78,12 @@ struct spinlock;
 /* Maximum possible number of NPUs in a system. */
 #define NV_MAX_NPUS 8
 
+/* Bits definition for the context flags */
+#define MM_GLOBAL_TLBIE	0	/* TLBI must be global */
+
 typedef struct {
 	mm_context_id_t id;
+	unsigned long flags;
 	u16 user_psize;		/* page size index */
 
 	/* NPU NMMU context */
@@ -164,5 +168,19 @@ extern void radix_init_pseries(void);
 static inline void radix_init_pseries(void) { };
 #endif
 
+/*
+ * Mark the memory context as requiring global TLBIs, when used by
+ * GPUs or CAPI accelerators managing their own TLB or ERAT.
+*/
+static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
+{
+	set_bit(MM_GLOBAL_TLBIE, &ctx->flags);
+}
+
+static inline bool mm_context_get_global_tlbi(mm_context_t *ctx)
+{
+	return test_bit(MM_GLOBAL_TLBIE, &ctx->flags);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 609557569f65..87d4ddcbf7f8 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -71,8 +71,27 @@ static inline int mm_is_core_local(struct mm_struct *mm)
 
 static inline int mm_is_thread_local(struct mm_struct *mm)
 {
-	return cpumask_equal(mm_cpumask(mm),
-			      cpumask_of(smp_processor_id()));
+	int rc;
+
+	rc = cpumask_equal(mm_cpumask(mm),
+			cpumask_of(smp_processor_id()));
+#ifdef CONFIG_PPC_BOOK3S_64
+	if (rc) {
+		/*
+		 * Check if context requires global TLBI.
+		 *
+		 * We need to make sure the PTE update is happening
+		 * before reading the context global flag. Otherwise,
+		 * reading the flag may be re-ordered and happen
+		 * first, and we could end up in a situation where the
+		 * old PTE was seen by a device, but the TLBI is not
+		 * global.
+		 */
+		smp_mb();
+		rc = !mm_context_get_global_tlbi(&mm->context);
+	}
+#endif
+	return rc;
 }
 
 #else
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index a3edf813d455..c32a3f729d81 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -156,6 +156,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 		return index;
 
 	mm->context.id = index;
+	mm->context.flags = 0;
 #ifdef CONFIG_PPC_ICSWX
 	mm->context.cop_lockp = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
 	if (!mm->context.cop_lockp) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC v2 2/3] cxl: Mark context requiring global TLBIs
  2017-06-22 10:06 [RFC v2 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  2017-06-22 10:06 ` [RFC v2 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
@ 2017-06-22 10:06 ` Frederic Barrat
  2017-06-22 10:06 ` [RFC v2 3/3] cxl: Add memory barrier to guarantee TLBI scope Frederic Barrat
  2 siblings, 0 replies; 4+ messages in thread
From: Frederic Barrat @ 2017-06-22 10:06 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard

The PSL needs to see all TLBIs pertinent to the memory contexts used
on the cxl adapter. For the hash memory model, it was done by making
all TLBIs global as soon as the cxl driver is in use. For radix, we
need something similar, but we can refine and only convert to global
the invalidations for contexts actually used by the device.

So mark the contexts being attached to the cxl adapter as requiring
global TLBIs.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 drivers/misc/cxl/api.c  | 12 ++++++++++--
 drivers/misc/cxl/file.c | 12 ++++++++++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 1a138c83f877..2cebe214195e 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -332,8 +332,17 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 		cxl_context_mm_count_get(ctx);
 
 		/* decrement the use count */
-		if (ctx->mm)
+		if (ctx->mm) {
 			mmput(ctx->mm);
+#ifdef CONFIG_PPC_BOOK3S_64
+			mm_context_set_global_tlbi(&ctx->mm->context);
+			/*
+			 * Barrier guarantees that the device will
+			 * receive all TLBIs from that point on
+			 */
+			smp_wmb();
+#endif
+		}
 	}
 
 	cxl_ctx_get();
@@ -347,7 +356,6 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 			cxl_context_mm_count_put(ctx);
 		goto out;
 	}
-
 	ctx->status = STARTED;
 out:
 	mutex_unlock(&ctx->status_mutex);
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 0761271d68c5..3ba6a60e0a6d 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -222,8 +222,17 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 	cxl_context_mm_count_get(ctx);
 
 	/* decrement the use count */
-	if (ctx->mm)
+	if (ctx->mm) {
 		mmput(ctx->mm);
+#ifdef CONFIG_PPC_BOOK3S_64
+		mm_context_set_global_tlbi(&ctx->mm->context);
+		/*
+		 * Barrier guarantees that the device will receive all
+		 * TLBIs from that point on
+		 */
+		smp_wmb();
+#endif
+	}
 
 	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
 
@@ -236,7 +245,6 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 		cxl_context_mm_count_put(ctx);
 		goto out;
 	}
-
 	ctx->status = STARTED;
 	rc = 0;
 out:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC v2 3/3] cxl: Add memory barrier to guarantee TLBI scope
  2017-06-22 10:06 [RFC v2 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
  2017-06-22 10:06 ` [RFC v2 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
  2017-06-22 10:06 ` [RFC v2 2/3] cxl: Mark context requiring global TLBIs Frederic Barrat
@ 2017-06-22 10:06 ` Frederic Barrat
  2 siblings, 0 replies; 4+ messages in thread
From: Frederic Barrat @ 2017-06-22 10:06 UTC (permalink / raw)
  To: mpe, aneesh.kumar, bsingharora, linuxppc-dev; +Cc: clombard

With the hash memory model, all TLBIs become global when the cxl
driver is active, i.e. as soon as one context is open.
It is theoretically possible to send a TLBI with the wrong scope as
there's currently no memory barrier between when the driver is marked
as in use, and attaching a context to the device, therefore we are
exposed to re-ordering. It is highly unlikely as the use count for the
driver is incremented on open() and the attachment to the device
happens on a different system call (ioctl)

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 include/misc/cxl-base.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index b2ebc91fe09a..dcb6d38ab3ad 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -25,12 +25,28 @@ extern atomic_t cxl_use_count;
 
 static inline bool cxl_ctx_in_use(void)
 {
-       return (atomic_read(&cxl_use_count) != 0);
+	/*
+	 * This is called when sending an TLBI, to know whether it
+	 * should be global or local.
+	 *
+	 * We need to make sure the PTE update is happening before
+	 * reading the context global flag. Otherwise, reading the
+	 * flag may be re-ordered and happen first, and we could end
+	 * up in a situation where the old PTE is seen by the device,
+	 * but the TLBI is not global.
+	 */
+	smp_mb();
+	return (atomic_read(&cxl_use_count) != 0);
 }
 
 static inline void cxl_ctx_get(void)
 {
        atomic_inc(&cxl_use_count);
+       /*
+	* Barrier guarantees that the device will receive all TLBIs
+	* from that point on
+	*/
+       smp_wmb();
 }
 
 static inline void cxl_ctx_put(void)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-22 10:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-22 10:06 [RFC v2 0/3] powerpc/mm: Mark memory contexts requiring global TLBIs Frederic Barrat
2017-06-22 10:06 ` [RFC v2 1/3] powerpc/mm: Add marker for contexts requiring global TLB invalidations Frederic Barrat
2017-06-22 10:06 ` [RFC v2 2/3] cxl: Mark context requiring global TLBIs Frederic Barrat
2017-06-22 10:06 ` [RFC v2 3/3] cxl: Add memory barrier to guarantee TLBI scope Frederic Barrat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).