From: Yishai Hadas <yishaih-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Devesh Sharma
<devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>,
dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
Majd Dibbiny <majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH V2] IB/uverbs: Fix race between uverbs_close and remove_one
Date: Mon, 7 Mar 2016 13:14:29 +0200 [thread overview]
Message-ID: <56DD6295.6000705@dev.mellanox.co.il> (raw)
In-Reply-To: <1457343873-14869-1-git-send-email-devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
On 3/7/2016 11:44 AM, Devesh Sharma wrote:
> Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
It fixes 036b10635739 (IB/uverbs: Enable device removal when there are
active user space applications) and not the commit that you pointed on.
>
> While testing ocrdma for disassociate_ucontext support following
> kernel panic was seen:
>
> BUG: unable to handle kernel paging request at ffffffffa07ccd7a
> [67139.981020] IP: [<ffffffffa07ccd7a>] 0xffffffffa07ccd7a
> [67139.987185] PGD 19c5067 PUD 19c6063 PMD 469d08067 PTE 0
> [67139.993370] Oops: 0010 [#1] SMP
> [67140.257286] Call Trace:
> [67140.260665] [<ffffffff810c16a0>] ? prepare_to_wait_event+0xf0/0xf0
> [67140.268337] [<ffffffffa04cabc3>] ? ib_dereg_mr+0x23/0x30 [ib_core]
> [67140.276009] [<ffffffffa03ee5f0>] ? ib_uverbs_cleanup_ucontext+0x320/0x440 [ib_uverbs]
> [67140.285550] [<ffffffffa03ee9e9>] ? ib_uverbs_close+0x59/0xb0 [ib_uverbs]
> [67140.293807] [<ffffffff811ff744>] ? __fput+0xe4/0x210
> [67140.300132] [<ffffffff811ff8ae>] ? ____fput+0xe/0x10
> [67140.306457] [<ffffffff8109b697>] ? task_work_run+0x77/0x90
> [67140.313388] [<ffffffff81081af2>] ? do_exit+0x2d2/0xab0
> [67140.319910] [<ffffffff8108234f>] ? do_group_exit+0x3f/0xa0
> [67140.326821] [<ffffffff8108d54c>] ? get_signal+0x1cc/0x5e0
> [67140.333635] [<ffffffff81017387>] ? do_signal+0x37/0x660
> [67140.340257] [<ffffffffa021669a>] ? ucma_write+0x7a/0xc0 [rdma_ucm]
> [67140.347949] [<ffffffff81079c37>] ? exit_to_usermode_loop+0x59/0xa2
> [67140.355651] [<ffffffff81003bad>] ? syscall_return_slowpath+0x8d/0xa0
> [67140.363554] [<ffffffff81680dcc>] ? int_ret_from_sys_call+0x25/0x8f
> [67140.371259] Code: Bad RIP value.
> [67140.375678] RIP [<ffffffffa07ccd7a>] 0xffffffffa07ccd7a
> [67140.382314] RSP <ffff880867727ad8>
> [67140.386894] CR2: ffffffffa07ccd7a
> [67140.393737] ---[ end trace 807b4472c30412d0 ]---
> [67141.682413] Kernel panic - not syncing: Fatal exception
> [67141.682431] Kernel Offset: disabled
> [67141.733934] ---[ end Kernel panic - not syncing: Fatal exception
>
> Root Cause:
>
> During rmmod <vendor-driver> "ib_uverbs_close()" context is
> still running, while "ib_uverbs_remove_one()" context completes and
> ends up freeing ib_dev pointer, thus causing a Kernel Panic.
Kernel Panic -> kernel panic.
>
> This patch fixes the race. ib_uverbs_close validates dev->ib_dev against NULL
> inside an srcu lock. If it is NULL, it waits for a completion and drops the srcu
> else continues with the normal flow.
Need to fix this description as of the expected code change, see below.
Please also describe why/how it solves the problem and not what it does.
>
> CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Devesh Sharma <devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> ---
> drivers/infiniband/core/uverbs.h | 1 +
> drivers/infiniband/core/uverbs_main.c | 18 ++++++++++++++++--
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
> index 612ccfd..94a7339 100644
> --- a/drivers/infiniband/core/uverbs.h
> +++ b/drivers/infiniband/core/uverbs.h
> @@ -121,6 +121,7 @@ struct ib_uverbs_file {
> struct ib_event_handler event_handler;
> struct ib_uverbs_event_file *async_file;
> struct list_head list;
> + struct completion fcomp;
> int is_closed;
> };
>
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 39680ae..9531168 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -928,6 +928,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
> file->async_file = NULL;
> kref_init(&file->ref);
> mutex_init(&file->mutex);
> + init_completion(&file->fcomp);
>
> filp->private_data = file;
> kobject_get(&dev->kobj);
> @@ -954,21 +955,33 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
> struct ib_uverbs_file *file = filp->private_data;
> struct ib_uverbs_device *dev = file->device;
> struct ib_ucontext *ucontext = NULL;
> + struct ib_device *ib_dev;
> + int srcu_key;
>
> - mutex_lock(&file->device->lists_mutex);
> + srcu_key = srcu_read_lock(&dev->disassociate_srcu);
> + ib_dev = srcu_dereference(dev->ib_dev,
> + &dev->disassociate_srcu);
> + if (!ib_dev)
You need to free the sruc lock, wait for completion then go to out.
Doing in the opposite order as you did below, might end up with a
dead-lock in case ib_uverbs_free_hw_resources is waiting for the
synchronize_srcu(disassociate_srcu) and can't mark the file as completed.
> + goto out;
> +
> + mutex_lock(&dev->lists_mutex);
> ucontext = file->ucontext;
> file->ucontext = NULL;
> if (!file->is_closed) {
> list_del(&file->list);
> file->is_closed = 1;
> }
> - mutex_unlock(&file->device->lists_mutex);
> + mutex_unlock(&dev->lists_mutex);
> if (ucontext)
> ib_uverbs_cleanup_ucontext(file, ucontext);
>
> if (file->async_file)
> kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
>
> + complete(&file->fcomp);
No need to do that in that flow, only in above flow.
> +out:
> + wait_for_completion(&file->fcomp);
> + srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
See above, might end-up with a dead-lock, should be done above in
opposite order.
> kref_put(&file->ref, ib_uverbs_release_file);
> kobject_put(&dev->kobj);
>
> @@ -1199,6 +1212,7 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
> }
>
> mutex_lock(&uverbs_dev->lists_mutex);
> + complete(&file->fcomp);
> kref_put(&file->ref, ib_uverbs_release_file);
> }
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-03-07 11:14 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-07 9:44 [PATCH V2] IB/uverbs: Fix race between uverbs_close and remove_one Devesh Sharma
[not found] ` <1457343873-14869-1-git-send-email-devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2016-03-07 11:14 ` Yishai Hadas [this message]
[not found] ` <56DD6295.6000705-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-08 9:49 ` Devesh Sharma
2016-03-07 19:08 ` Jason Gunthorpe
[not found] ` <20160307190833.GA1886-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-08 10:54 ` Devesh Sharma
[not found] ` <CANjDDBiYagKm79n5sWNsCnxruSzqDqZYREmw1mGBR_upapF4hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-08 14:33 ` Yishai Hadas
2016-03-08 17:53 ` Jason Gunthorpe
[not found] ` <20160308175334.GB10805-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-09 16:48 ` Yishai Hadas
[not found] ` <56E053C8.8050008-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-09 19:03 ` Jason Gunthorpe
[not found] ` <20160309190354.GD21139-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-10 9:04 ` Devesh Sharma
[not found] ` <CANjDDBj=F-LTSDMesD97CvvJQWOW6fecuDLY2a9sBZ220jMYMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-10 15:25 ` Devesh Sharma
[not found] ` <CANjDDBhnJgic4QP-mL7_7cTAh-CH7xaTO147MNqat=aZ45B1nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-10 15:44 ` Yishai Hadas
[not found] ` <56E19676.4070805-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-10 15:57 ` Devesh Sharma
2016-03-10 11:26 ` Yishai Hadas
[not found] ` <56E159CC.3090805-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-10 21:05 ` Jason Gunthorpe
[not found] ` <20160310210535.GA9735-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-14 15:55 ` Yishai Hadas
[not found] ` <56E6DEEB.30904-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-14 17:29 ` Jason Gunthorpe
2016-03-10 8:16 ` Devesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56DD6295.6000705@dev.mellanox.co.il \
--to=yishaih-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
--cc=devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.