From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932893AbaICROe (ORCPT ); Wed, 3 Sep 2014 13:14:34 -0400 Received: from mail-oi0-f44.google.com ([209.85.218.44]:41024 "EHLO mail-oi0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753512AbaICROc (ORCPT ); Wed, 3 Sep 2014 13:14:32 -0400 From: Shawn Bohrer To: Roland Dreier Cc: Sean Hefty , hal.rosenstock@gmail.com, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, tomk@rgmadvisors.com, Yishai Hadas , Or Gerlitz , Haggai Eran , Shachar Raindel , Christoph Lameter , Shawn Bohrer Subject: [PATCH v3] ib_umem_release should decrement mm->pinned_vm from ib_umem_get Date: Wed, 3 Sep 2014 12:13:57 -0500 Message-Id: <1409764437-29699-1-git-send-email-shawn.bohrer@gmail.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <6B2A6E60C06CCC42AE31809BF572352B010E240634@MTLDAG02.mtl.com> References: <6B2A6E60C06CCC42AE31809BF572352B010E240634@MTLDAG02.mtl.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shawn Bohrer In debugging an application that receives -ENOMEM from ib_reg_mr() I found that ib_umem_get() can fail because the pinned_vm count has wrapped causing it to always be larger than the lock limit even with RLIMIT_MEMLOCK set to RLIM_INFINITY. The wrapping of pinned_vm occurs because the process that calls ib_reg_mr() will have its mm->pinned_vm count incremented. Later a different process with a different mm_struct than the one that allocated the ib_umem struct ends up releasing it which results in decrementing the new processes mm->pinned_vm count past zero and wrapping. I'm not entirely sure what circumstances cause a different process to release the ib_umem than the one that allocated it but the kernel stack trace of the freeing process from my situation looks like the following: Call Trace: [] dump_stack+0x19/0x1b [] ib_umem_release+0x1f5/0x200 [ib_core] [] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib] [] ib_destroy_qp+0x12c/0x170 [ib_core] [] ib_uverbs_close+0x259/0x4e0 [ib_uverbs] [] __fput+0xba/0x240 [] ____fput+0xe/0x10 [] task_work_run+0xc4/0xe0 [] do_notify_resume+0x95/0xa0 [] int_signal+0x12/0x17 The following patch fixes the issue by storing the pid struct of the process that calls ib_umem_get() so that ib_umem_release and/or ib_umem_account() can properly decrement the pinned_vm count of the correct mm_struct. Signed-off-by: Shawn Bohrer --- v3 changes: * Fix resource leak with put_task_struct() v2 changes: * Updated to use get_task_pid to avoid keeping a reference to the mm drivers/infiniband/core/umem.c | 19 +++++++++++++------ include/rdma/ib_umem.h | 1 + 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index a3a2e9c..df0c4f6 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -105,6 +105,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->length = size; umem->offset = addr & ~PAGE_MASK; umem->page_size = PAGE_SIZE; + umem->pid = get_task_pid(current, PIDTYPE_PID); /* * We ask for writable memory if any access flags other than * "remote read" are set. "Local write" and "remote write" @@ -198,6 +199,7 @@ out: if (ret < 0) { if (need_release) __ib_umem_release(context->device, umem, 0); + put_pid(umem->pid); kfree(umem); } else current->mm->pinned_vm = locked; @@ -230,15 +232,19 @@ void ib_umem_release(struct ib_umem *umem) { struct ib_ucontext *context = umem->context; struct mm_struct *mm; + struct task_struct *task; unsigned long diff; __ib_umem_release(umem->context->device, umem, 1); - mm = get_task_mm(current); - if (!mm) { - kfree(umem); - return; - } + task = get_pid_task(umem->pid, PIDTYPE_PID); + put_pid(umem->pid); + if (!task) + goto out; + mm = get_task_mm(task); + put_task_struct(task); + if (!mm) + goto out; diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT; @@ -262,9 +268,10 @@ void ib_umem_release(struct ib_umem *umem) } else down_write(&mm->mmap_sem); - current->mm->pinned_vm -= diff; + mm->pinned_vm -= diff; up_write(&mm->mmap_sem); mmput(mm); +out: kfree(umem); } EXPORT_SYMBOL(ib_umem_release); diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 1ea0b65..a2bf41e 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -47,6 +47,7 @@ struct ib_umem { int writable; int hugetlb; struct work_struct work; + struct pid *pid; struct mm_struct *mm; unsigned long diff; struct sg_table sg_head; -- 1.7.7.6