[PATCH] cxl: Fix DSI misses when the context owning task exits

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] cxl: Fix DSI misses when the context owning task exits
@ 2015-11-24 10:56 Vaibhav Jain
  2015-11-25  2:05 ` Ian Munsie
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Vaibhav Jain @ 2015-11-24 10:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: imunsie, Matthew R. Ochs, Frank Haverkamp, Vaibhav Jain

Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
store the pid of the current task_struct and use it to get pointer to
the mm_struct of the process, while processing page or segment faults
from the capi card. However this causes issues when the thread that had
originally issued the start-work ioctl exits in which case the stored
pid is no more valid and the cxl driver is unable to handle faults as
the mm_struct corresponding to process is no more accessible.

This patch fixes this issue by using the mm_struct of the next alive
task in the thread group. This is done by iterating over all the tasks
in the thread group starting from thread group leader and calling
get_task_mm on each one of them. When a valid mm_struct is obtained the
pid of the associated task is stored in the context replacing the
exiting one for handling future faults.

The patch introduces a new function named get_mem_context that checks if
the current task pointed to by ctx->pid is dead? If yes it performs the
steps described above. Also a new variable cxl_context.glpid is
introduced which stores the pid of the thread group leader associated
with the context owning task.

Reported-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reported-by: Frank Haverkamp <HAVERKAM@de.ibm.com>
Suggested-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
---
 drivers/misc/cxl/api.c     |   2 +-
 drivers/misc/cxl/context.c |   6 ++-
 drivers/misc/cxl/cxl.h     |   3 ++
 drivers/misc/cxl/fault.c   | 129 +++++++++++++++++++++++++++++++++------------
 drivers/misc/cxl/file.c    |   6 ++-
 5 files changed, 109 insertions(+), 37 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 103baf0..befac57 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -176,7 +176,7 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 
 	if (task) {
 		ctx->pid = get_task_pid(task, PIDTYPE_PID);
-		get_pid(ctx->pid);
+		ctx->glpid = get_task_pid(task->group_leader, PIDTYPE_PID);
 		kernel = false;
 	}
 
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 2faa127..7b37ad9 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -42,7 +42,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu *afu, bool master,
 	spin_lock_init(&ctx->sste_lock);
 	ctx->afu = afu;
 	ctx->master = master;
-	ctx->pid = NULL; /* Set in start work ioctl */
+	ctx->pid = ctx->glpid = NULL; /* Set in start work ioctl */
 	mutex_init(&ctx->mapping_lock);
 	ctx->mapping = mapping;
 
@@ -211,7 +211,11 @@ int __detach_context(struct cxl_context *ctx)
 	WARN_ON(cxl_detach_process(ctx) &&
 		cxl_adapter_link_ok(ctx->afu->adapter));
 	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
+
+	/* release the reference to the group leader and mm handling pid */
 	put_pid(ctx->pid);
+	put_pid(ctx->glpid);
+
 	cxl_ctx_put();
 	return 0;
 }
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 0cfb9c1..a312cd4 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -433,6 +433,9 @@ struct cxl_context {
 	unsigned int sst_size, sst_lru;
 
 	wait_queue_head_t wq;
+	/* pid of the group leader associated with the pid */
+	struct pid *glpid;
+	/* use mm context associated with this pid for ds faults */
 	struct pid *pid;
 	spinlock_t lock; /* Protects pending_irq_mask, pending_fault and fault_addr */
 	/* Only used in PR mode */
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index 25a5418..81c3f75 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -166,13 +166,92 @@ static void cxl_handle_page_fault(struct cxl_context *ctx,
 	cxl_ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
 }
 
+/*
+ * Returns the mm_struct corresponding to the context ctx via ctx->pid
+ * In case the task has exited we use the task group leader accessible
+ * via ctx->glpid to find the next task in the thread group that has a
+ * valid  mm_struct associated with it. If a task with valid mm_struct
+ * is found the ctx->pid is updated to use the task struct for subsequent
+ * translations. In case no valid mm_struct is found in the task group to
+ * service the fault a NULL is returned.
+ */
+static struct mm_struct *get_mem_context(struct cxl_context *ctx)
+{
+	struct task_struct *task = NULL;
+	struct mm_struct *mm = NULL;
+	struct pid *old_pid = ctx->pid;
+
+	if (old_pid == NULL) {
+		pr_warn("%s: Invalid context for pe=%d\n",
+			 __func__, ctx->pe);
+		return NULL;
+	}
+
+	task = get_pid_task(old_pid, PIDTYPE_PID);
+
+	/*
+	 * pid_alive may look racy but this saves us from costly
+	 * get_task_mm when the task is a zombie. In worst case
+	 * we may think a task is alive, which is about to die
+	 * but get_task_mm will return NULL.
+	 */
+	if (task != NULL && pid_alive(task))
+		mm = get_task_mm(task);
+
+	/* release the task struct that was taken earlier */
+	if (task)
+		put_task_struct(task);
+	else
+		pr_devel("%s: Context owning pid=%i for pe=%i dead\n",
+			__func__, pid_nr(old_pid), ctx->pe);
+
+	/*
+	 * If we couldn't find the mm context then use the group
+	 * leader to iterate over the task group and find a task
+	 * that gives us mm_struct.
+	 */
+	if (unlikely(mm == NULL && ctx->glpid != NULL)) {
+
+		rcu_read_lock();
+		task = pid_task(ctx->glpid, PIDTYPE_PID);
+		if (task)
+			do {
+				mm = get_task_mm(task);
+				if (mm) {
+					ctx->pid = get_task_pid(task,
+								PIDTYPE_PID);
+					break;
+				}
+				task = next_thread(task);
+			} while (task && !thread_group_leader(task));
+		rcu_read_unlock();
+
+		/* check if we switched pid */
+		if (ctx->pid != old_pid) {
+			if (mm)
+				pr_devel("%s:pe=%i switch pid %i->%i\n",
+					 __func__, ctx->pe, pid_nr(old_pid),
+					 pid_nr(ctx->pid));
+			else
+				pr_devel("%s:Cannot find mm for pid=%i\n",
+					 __func__, pid_nr(old_pid));
+
+			/* drop the reference to older pid */
+			put_pid(old_pid);
+		}
+	}
+
+	return mm;
+}
+
+
+
 void cxl_handle_fault(struct work_struct *fault_work)
 {
 	struct cxl_context *ctx =
 		container_of(fault_work, struct cxl_context, fault_work);
 	u64 dsisr = ctx->dsisr;
 	u64 dar = ctx->dar;
-	struct task_struct *task = NULL;
 	struct mm_struct *mm = NULL;
 
 	if (cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An) != dsisr ||
@@ -195,17 +274,17 @@ void cxl_handle_fault(struct work_struct *fault_work)
 		"DSISR: %#llx DAR: %#llx\n", ctx->pe, dsisr, dar);
 
 	if (!ctx->kernel) {
-		if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
-			pr_devel("cxl_handle_fault unable to get task %i\n",
-				 pid_nr(ctx->pid));
+
+		mm = get_mem_context(ctx);
+		/* indicates all the thread in task group have exited */
+		if (mm == NULL) {
+			pr_devel("%s: unable to get mm for pe=%d pid=%i\n",
+				 __func__, ctx->pe, pid_nr(ctx->pid));
 			cxl_ack_ae(ctx);
 			return;
-		}
-		if (!(mm = get_task_mm(task))) {
-			pr_devel("cxl_handle_fault unable to get mm %i\n",
-				 pid_nr(ctx->pid));
-			cxl_ack_ae(ctx);
-			goto out;
+		} else {
+			pr_devel("Handling page fault for pe=%d pid=%i\n",
+				 ctx->pe, pid_nr(ctx->pid));
 		}
 	}
 
@@ -218,33 +297,22 @@ void cxl_handle_fault(struct work_struct *fault_work)
 
 	if (mm)
 		mmput(mm);
-out:
-	if (task)
-		put_task_struct(task);
 }
 
 static void cxl_prefault_one(struct cxl_context *ctx, u64 ea)
 {
-	int rc;
-	struct task_struct *task;
 	struct mm_struct *mm;
 
-	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
-		pr_devel("cxl_prefault_one unable to get task %i\n",
-			 pid_nr(ctx->pid));
-		return;
-	}
-	if (!(mm = get_task_mm(task))) {
+	mm = get_mem_context(ctx);
+	if (mm == NULL) {
 		pr_devel("cxl_prefault_one unable to get mm %i\n",
 			 pid_nr(ctx->pid));
-		put_task_struct(task);
 		return;
 	}
 
-	rc = cxl_fault_segment(ctx, mm, ea);
+	cxl_fault_segment(ctx, mm, ea);
 
 	mmput(mm);
-	put_task_struct(task);
 }
 
 static u64 next_segment(u64 ea, u64 vsid)
@@ -263,18 +331,13 @@ static void cxl_prefault_vma(struct cxl_context *ctx)
 	struct copro_slb slb;
 	struct vm_area_struct *vma;
 	int rc;
-	struct task_struct *task;
 	struct mm_struct *mm;
 
-	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
-		pr_devel("cxl_prefault_vma unable to get task %i\n",
-			 pid_nr(ctx->pid));
-		return;
-	}
-	if (!(mm = get_task_mm(task))) {
+	mm = get_mem_context(ctx);
+	if (mm == NULL) {
 		pr_devel("cxl_prefault_vm unable to get mm %i\n",
 			 pid_nr(ctx->pid));
-		goto out1;
+		return;
 	}
 
 	down_read(&mm->mmap_sem);
@@ -295,8 +358,6 @@ static void cxl_prefault_vma(struct cxl_context *ctx)
 	up_read(&mm->mmap_sem);
 
 	mmput(mm);
-out1:
-	put_task_struct(task);
 }
 
 void cxl_prefault(struct cxl_context *ctx, u64 wed)
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 7ccd299..7305104 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -198,8 +198,12 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
 	 * where a process (master, some daemon, etc) has opened the chardev on
 	 * behalf of another process, so the AFU's mm gets bound to the process
 	 * that performs this ioctl and not the process that opened the file.
+	 * Also we grab the PID of the group leader so that if the task that
+	 * has performed the attach operation exits the mm context of the
+	 * process is still accessible.
 	 */
-	ctx->pid = get_pid(get_task_pid(current, PIDTYPE_PID));
+	ctx->pid = get_task_pid(current, PIDTYPE_PID);
+	ctx->glpid = get_task_pid(current->group_leader, PIDTYPE_PID);
 
 	trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr);
 
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxl: Fix DSI misses when the context owning task exits
  2015-11-24 10:56 [PATCH] cxl: Fix DSI misses when the context owning task exits Vaibhav Jain
@ 2015-11-25  2:05 ` Ian Munsie
  2015-11-25  8:57 ` Frederic Barrat
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Ian Munsie @ 2015-11-25  2:05 UTC (permalink / raw)
  To: Vaibhav Jain; +Cc: linuxppc-dev, Frank Haverkamp, Matthew R. Ochs

Thanks Vaibhav! This should definitely help people avoid pain in this
corner case :)

Acked-by: Ian Munsie <imunsie@au1.ibm.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxl: Fix DSI misses when the context owning task exits
  2015-11-24 10:56 [PATCH] cxl: Fix DSI misses when the context owning task exits Vaibhav Jain
  2015-11-25  2:05 ` Ian Munsie
@ 2015-11-25  8:57 ` Frederic Barrat
  2015-11-25 16:17 ` Matthew R. Ochs
  2016-01-11  9:14 ` Michael Ellerman
  3 siblings, 0 replies; 9+ messages in thread
From: Frederic Barrat @ 2015-11-25  8:57 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev; +Cc: Frank Haverkamp, Matthew R. Ochs, imunsie



Le 24/11/2015 11:56, Vaibhav Jain a écrit :
> Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
> store the pid of the current task_struct and use it to get pointer to
> the mm_struct of the process, while processing page or segment faults
> from the capi card. However this causes issues when the thread that had
> originally issued the start-work ioctl exits in which case the stored
> pid is no more valid and the cxl driver is unable to handle faults as
> the mm_struct corresponding to process is no more accessible.
>
> This patch fixes this issue by using the mm_struct of the next alive
> task in the thread group. This is done by iterating over all the tasks
> in the thread group starting from thread group leader and calling
> get_task_mm on each one of them. When a valid mm_struct is obtained the
> pid of the associated task is stored in the context replacing the
> exiting one for handling future faults.
>
> The patch introduces a new function named get_mem_context that checks if
> the current task pointed to by ctx->pid is dead? If yes it performs the
> steps described above. Also a new variable cxl_context.glpid is
> introduced which stores the pid of the thread group leader associated
> with the context owning task.
>
> Reported-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
> Reported-by: Frank Haverkamp <HAVERKAM@de.ibm.com>
> Suggested-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

Looks good to me. Thanks!

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxl: Fix DSI misses when the context owning task exits
  2015-11-24 10:56 [PATCH] cxl: Fix DSI misses when the context owning task exits Vaibhav Jain
  2015-11-25  2:05 ` Ian Munsie
  2015-11-25  8:57 ` Frederic Barrat
@ 2015-11-25 16:17 ` Matthew R. Ochs
  2016-01-11  9:14 ` Michael Ellerman
  3 siblings, 0 replies; 9+ messages in thread
From: Matthew R. Ochs @ 2015-11-25 16:17 UTC (permalink / raw)
  To: Vaibhav Jain; +Cc: linuxppc-dev, imunsie, Frank Haverkamp

Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cxl: Fix DSI misses when the context owning task exits
  2015-11-24 10:56 [PATCH] cxl: Fix DSI misses when the context owning task exits Vaibhav Jain
                   ` (2 preceding siblings ...)
  2015-11-25 16:17 ` Matthew R. Ochs
@ 2016-01-11  9:14 ` Michael Ellerman
  2016-01-12 13:29   ` David Laight
  3 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2016-01-11  9:14 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Frank Haverkamp, Matthew R. Ochs, imunsie, Vaibhav Jain

On Tue, 2015-24-11 at 10:56:18 UTC, Vaibhav Jain wrote:
> Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
> store the pid of the current task_struct and use it to get pointer to
> the mm_struct of the process, while processing page or segment faults
> from the capi card. However this causes issues when the thread that had
> originally issued the start-work ioctl exits in which case the stored
> pid is no more valid and the cxl driver is unable to handle faults as
> the mm_struct corresponding to process is no more accessible.
> 
> This patch fixes this issue by using the mm_struct of the next alive
> task in the thread group. This is done by iterating over all the tasks
> in the thread group starting from thread group leader and calling
> get_task_mm on each one of them. When a valid mm_struct is obtained the
> pid of the associated task is stored in the context replacing the
> exiting one for handling future faults.
> 
> The patch introduces a new function named get_mem_context that checks if
> the current task pointed to by ctx->pid is dead? If yes it performs the
> steps described above. Also a new variable cxl_context.glpid is
> introduced which stores the pid of the thread group leader associated
> with the context owning task.
> 
> Reported-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
> Reported-by: Frank Haverkamp <HAVERKAM@de.ibm.com>
> Suggested-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> Acked-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7b8ad495d59280b634a7b546f4

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: cxl: Fix DSI misses when the context owning task exits
  2016-01-11  9:14 ` Michael Ellerman
@ 2016-01-12 13:29   ` David Laight
  2016-01-12 22:59     ` Michael Ellerman
  2016-01-14  7:50     ` Vaibhav Jain
  0 siblings, 2 replies; 9+ messages in thread
From: David Laight @ 2016-01-12 13:29 UTC (permalink / raw)
  To: 'Michael Ellerman', Vaibhav Jain,
	linuxppc-dev@lists.ozlabs.org
  Cc: Frank Haverkamp, Matthew R. Ochs, imunsie@au1.ibm.com,
	Vaibhav Jain

RnJvbTogTWljaGFlbCBFbGxlcm1hbg0KPiBTZW50OiAxMSBKYW51YXJ5IDIwMTYgMDk6MTQNCj4g
T24gVHVlLCAyMDE1LTI0LTExIGF0IDEwOjU2OjE4IFVUQywgVmFpYmhhdiBKYWluIHdyb3RlOg0K
PiA+IFByZXNlbnRseSB3aGVuIGEgdXNlci1zcGFjZSBwcm9jZXNzIGlzc3VlcyBDWExfSU9DVExf
U1RBUlRfV09SSyBpb2N0bCB3ZQ0KPiA+IHN0b3JlIHRoZSBwaWQgb2YgdGhlIGN1cnJlbnQgdGFz
a19zdHJ1Y3QgYW5kIHVzZSBpdCB0byBnZXQgcG9pbnRlciB0bw0KPiA+IHRoZSBtbV9zdHJ1Y3Qg
b2YgdGhlIHByb2Nlc3MsIHdoaWxlIHByb2Nlc3NpbmcgcGFnZSBvciBzZWdtZW50IGZhdWx0cw0K
PiA+IGZyb20gdGhlIGNhcGkgY2FyZC4gSG93ZXZlciB0aGlzIGNhdXNlcyBpc3N1ZXMgd2hlbiB0
aGUgdGhyZWFkIHRoYXQgaGFkDQo+ID4gb3JpZ2luYWxseSBpc3N1ZWQgdGhlIHN0YXJ0LXdvcmsg
aW9jdGwgZXhpdHMgaW4gd2hpY2ggY2FzZSB0aGUgc3RvcmVkDQo+ID4gcGlkIGlzIG5vIG1vcmUg
dmFsaWQgYW5kIHRoZSBjeGwgZHJpdmVyIGlzIHVuYWJsZSB0byBoYW5kbGUgZmF1bHRzIGFzDQo+
ID4gdGhlIG1tX3N0cnVjdCBjb3JyZXNwb25kaW5nIHRvIHByb2Nlc3MgaXMgbm8gbW9yZSBhY2Nl
c3NpYmxlLg0KPiA+DQo+ID4gVGhpcyBwYXRjaCBmaXhlcyB0aGlzIGlzc3VlIGJ5IHVzaW5nIHRo
ZSBtbV9zdHJ1Y3Qgb2YgdGhlIG5leHQgYWxpdmUNCj4gPiB0YXNrIGluIHRoZSB0aHJlYWQgZ3Jv
dXAuIFRoaXMgaXMgZG9uZSBieSBpdGVyYXRpbmcgb3ZlciBhbGwgdGhlIHRhc2tzDQo+ID4gaW4g
dGhlIHRocmVhZCBncm91cCBzdGFydGluZyBmcm9tIHRocmVhZCBncm91cCBsZWFkZXIgYW5kIGNh
bGxpbmcNCj4gPiBnZXRfdGFza19tbSBvbiBlYWNoIG9uZSBvZiB0aGVtLiBXaGVuIGEgdmFsaWQg
bW1fc3RydWN0IGlzIG9idGFpbmVkIHRoZQ0KPiA+IHBpZCBvZiB0aGUgYXNzb2NpYXRlZCB0YXNr
IGlzIHN0b3JlZCBpbiB0aGUgY29udGV4dCByZXBsYWNpbmcgdGhlDQo+ID4gZXhpdGluZyBvbmUg
Zm9yIGhhbmRsaW5nIGZ1dHVyZSBmYXVsdHMuDQoNCkkgZG9uJ3QgZXZlbiBjbGFpbSB0byB1bmRl
cnN0YW5kIHRoZSBsaW51eCBtb2RlbCBmb3IgaGFuZGxpbmcgcHJvY2Vzcw0KYWRkcmVzcyBtYXBz
LCBub3Igd2hhdCB0aGUgY3hsIGRyaXZlciBpcyBkb2luZywgYnV0IHRoZSBhYm92ZSBsb29rcw0K
bW9yZSB0aGFuIGRvZGd5Lg0KDQoJRGF2aWQNCg0K

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cxl: Fix DSI misses when the context owning task exits
  2016-01-12 13:29   ` David Laight
@ 2016-01-12 22:59     ` Michael Ellerman
  2016-01-14  7:50     ` Vaibhav Jain
  1 sibling, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2016-01-12 22:59 UTC (permalink / raw)
  To: David Laight, Vaibhav Jain, linuxppc-dev@lists.ozlabs.org
  Cc: Frank Haverkamp, Matthew R. Ochs, imunsie@au1.ibm.com

On Tue, 2016-01-12 at 13:29 +0000, David Laight wrote:
> From: Michael Ellerman
> > Sent: 11 January 2016 09:14
> > On Tue, 2015-24-11 at 10:56:18 UTC, Vaibhav Jain wrote:
> > > Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
> > > store the pid of the current task_struct and use it to get pointer to
> > > the mm_struct of the process, while processing page or segment faults
> > > from the capi card. However this causes issues when the thread that had
> > > originally issued the start-work ioctl exits in which case the stored
> > > pid is no more valid and the cxl driver is unable to handle faults as
> > > the mm_struct corresponding to process is no more accessible.
> > > 
> > > This patch fixes this issue by using the mm_struct of the next alive
> > > task in the thread group. This is done by iterating over all the tasks
> > > in the thread group starting from thread group leader and calling
> > > get_task_mm on each one of them. When a valid mm_struct is obtained the
> > > pid of the associated task is stored in the context replacing the
> > > exiting one for handling future faults.
> 
> I don't even claim to understand the linux model for handling process
> address maps, nor what the cxl driver is doing, but the above looks
> more than dodgy.

Thanks for reviewing it!

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: cxl: Fix DSI misses when the context owning task exits
  2016-01-12 13:29   ` David Laight
  2016-01-12 22:59     ` Michael Ellerman
@ 2016-01-14  7:50     ` Vaibhav Jain
  2016-01-14 10:11       ` David Laight
  1 sibling, 1 reply; 9+ messages in thread
From: Vaibhav Jain @ 2016-01-14  7:50 UTC (permalink / raw)
  To: David Laight, 'Michael Ellerman',
	linuxppc-dev@lists.ozlabs.org
  Cc: Frank Haverkamp, Matthew R. Ochs, imunsie@au1.ibm.com, vaibhav

Hi David,

David Laight <David.Laight@ACULAB.COM> writes:

> I don't even claim to understand the linux model for handling process
> address maps, nor what the cxl driver is doing, but the above looks
> more than dodgy.
>
> 	David

Thanks for reviewing the patch.

Yes, It does look dodgy but is needed to handle a very corner case wherein
the task_struct that had originally allocated the cxl context exit
without freeing it. In case of a multi threaded user-space process this
context can still be used by other threads.

A cxl_context holds pid of the task that allocated it since we need
access to the mm_struct associated with the task to handle storage
faults reported by the capi accelerator. Now in case of a MT process if
the thread that allocated the context exits the pid becomes invalid and
we are no more able to handle faults from the accelerator even though
the process is still alive.

In such a case the patch tries to use the task group leader pid to find
the next alive task in the task group which can be used to handle the
storage fault.

We could have instead used only the task group leader for handling the
faults but there is a possibility that even the task group leader has
exited in which case the mm_struct associated with it will also be NULL.

Hope this clarifies things,

Cheers,
~ Vaibhav

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: cxl: Fix DSI misses when the context owning task exits
  2016-01-14  7:50     ` Vaibhav Jain
@ 2016-01-14 10:11       ` David Laight
  0 siblings, 0 replies; 9+ messages in thread
From: David Laight @ 2016-01-14 10:11 UTC (permalink / raw)
  To: 'Vaibhav Jain', 'Michael Ellerman',
	linuxppc-dev@lists.ozlabs.org
  Cc: Frank Haverkamp, Matthew R. Ochs, imunsie@au1.ibm.com

From: Vaibhav Jain
> Sent: 14 January 2016 07:51
> David Laight <David.Laight@ACULAB.COM> writes:
>=20
> > I don't even claim to understand the linux model for handling process
> > address maps, nor what the cxl driver is doing, but the above looks
> > more than dodgy.
> >
> > 	David
>=20
> Thanks for reviewing the patch.
>=20
> Yes, It does look dodgy but is needed to handle a very corner case wherei=
n
> the task_struct that had originally allocated the cxl context exit
> without freeing it. In case of a multi threaded user-space process this
> context can still be used by other threads.

Right, and a non-threaded process might have done a fork+exit
with the fd open (etc).

> A cxl_context holds pid of the task that allocated it since we need
> access to the mm_struct associated with the task to handle storage
> faults reported by the capi accelerator. Now in case of a MT process if
> the thread that allocated the context exits the pid becomes invalid and
> we are no more able to handle faults from the accelerator even though
> the process is still alive.
>=20
> In such a case the patch tries to use the task group leader pid to find
> the next alive task in the task group which can be used to handle the
> storage fault.

Unless you are extremely careful whatever you are using for the 'pid'
can get reallocated.
=20
> We could have instead used only the task group leader for handling the
> faults but there is a possibility that even the task group leader has
> exited in which case the mm_struct associated with it will also be NULL.

It looks like you should be using mmap() to get the user pages accessible
to the driver - then you don't have a problem.

	David

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-14 10:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-24 10:56 [PATCH] cxl: Fix DSI misses when the context owning task exits Vaibhav Jain
2015-11-25  2:05 ` Ian Munsie
2015-11-25  8:57 ` Frederic Barrat
2015-11-25 16:17 ` Matthew R. Ochs
2016-01-11  9:14 ` Michael Ellerman
2016-01-12 13:29   ` David Laight
2016-01-12 22:59     ` Michael Ellerman
2016-01-14  7:50     ` Vaibhav Jain
2016-01-14 10:11       ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).