From: jackm <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
List Linux RDMA Mailing
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Knut Omang <Knut.Omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
Jack Morgenstein <jackm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
Subject: Re: Crashes due to concurrent calls to ib_unmap_fmr()
Date: Wed, 19 Apr 2017 11:02:35 +0300 [thread overview]
Message-ID: <20170419110235.00007e4e@dev.mellanox.co.il> (raw)
In-Reply-To: <20170418174430.GD14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
On Tue, 18 Apr 2017 20:44:30 +0300
Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Apr 17, 2017 at 01:45:24PM -0400, Chuck Lever wrote:
> >
> > > On Apr 15, 2017, at 5:55 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > > wrote:
> > >
> > > On Fri, Apr 14, 2017 at 11:51:39AM -0400, Chuck Lever wrote:
> > >> Howdy-
> > >>
> > >> I recently found a way to crash my HCA (and the whole system)
> > >> using a signal on an NFS/RDMA mount point that is using FMR.
> > >> I've documented the issue:
> > >>
> > >> https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
> > >>
> > >> And I have an NFS/RDMA fix I'm testing for v4.13. The fix is to
> > >> prevent simultaneous calls to ib_unmap_fmr with the same FMR.
> > >>
> > >> While working on the fix, I've been looking for any documentation
> > >> regarding serialization requirements for ib_unmap_fmr. Knut
> > >> Omang pointed out to me that
> > >> Documentation/infiniband/core-locking.txt makes this bold
> > >> statement:
> > >>> Reentrancy
> > >>>
> > >>> All of the methods in struct ib_device exported by a low-level
> > >>> driver must be fully reentrant. The low-level driver is
> > >>> required to perform all synchronization necessary to maintain
> > >>> consistency, even if multiple function calls using the same
> > >>> object are run simultaneously.
> > >>>
> > >>> The IB midlayer does not perform any serialization of function
> > >>> calls.
> > >>>
> > >>> Because low-level drivers are reentrant, upper level protocol
> > >>> consumers are not required to perform any serialization.
> > >>
> > >> Does this re-entrancy guarantee apply only when ib_unmap_fmr is
> > >> called concurrently with unique FMRs?
> > >
> > > According to description, it should apply to all operations on
> > > ib_device without any exclusion.
> > >
> > >>
> > >> I've been told it is not possible for ib_unmap_fmr to detect
> > >> when it has been invoked in different threads with the same
> > >> FMR.
> > >
> > > Right, FMR management is implemented as direct writes to MPT and
> > > MTT tables. HW doesn't distinguish simultaneous calls to the TPT
> > > cache.
> > >> but apparently the > user space equivalent does not have the same
> > >> vulnerability (I did not test this assertion).
> > >>
> > >> I'm wondering what is proper closure here (aside from merging the
> > >> NFS/RDMA fix).
> > >
> > > Maybe serialize unmap_frm (workqueue) from the driver side?
> >
> > Either correcting the documentation or a driver change is OK with
> > me.
> >
> > Claiming that "upper level protocol consumers are not required to
> > perform any serialization" seems like a stretch.
>
> Right,
>
> I added Jack to this thread, and we will need a couple of days to
> think internally about possible solutions.
>
> Thanks
>
Adding Majd
-Jack
> >
> > --
> > Chuck Lever
> >
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2017-04-19 8:02 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-14 15:51 Crashes due to concurrent calls to ib_unmap_fmr() Chuck Lever
[not found] ` <5C9E097E-938D-4F41-9EA4-003F77A54DAD-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-04-15 9:55 ` Leon Romanovsky
[not found] ` <20170415095528.GK1343-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-17 17:45 ` Chuck Lever
[not found] ` <A9FF7F7C-F936-4925-B3A6-31684613B69F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-04-18 17:44 ` Leon Romanovsky
[not found] ` <20170418174430.GD14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-19 8:02 ` jackm [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170419110235.00007e4e@dev.mellanox.co.il \
--to=jackm-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
--cc=Knut.Omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=jackm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox