public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: jackm <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	List Linux RDMA Mailing
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Knut Omang <Knut.Omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Jack Morgenstein <jackm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
Subject: Re: Crashes due to concurrent calls to ib_unmap_fmr()
Date: Wed, 19 Apr 2017 11:02:35 +0300	[thread overview]
Message-ID: <20170419110235.00007e4e@dev.mellanox.co.il> (raw)
In-Reply-To: <20170418174430.GD14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>

On Tue, 18 Apr 2017 20:44:30 +0300
Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

> On Mon, Apr 17, 2017 at 01:45:24PM -0400, Chuck Lever wrote:
> >  
> > > On Apr 15, 2017, at 5:55 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > > wrote:
> > >
> > > On Fri, Apr 14, 2017 at 11:51:39AM -0400, Chuck Lever wrote:  
> > >> Howdy-
> > >>
> > >> I recently found a way to crash my HCA (and the whole system)
> > >> using a signal on an NFS/RDMA mount point that is using FMR.
> > >> I've documented the issue:
> > >>
> > >> https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
> > >>
> > >> And I have an NFS/RDMA fix I'm testing for v4.13. The fix is to
> > >> prevent simultaneous calls to ib_unmap_fmr with the same FMR.
> > >>
> > >> While working on the fix, I've been looking for any documentation
> > >> regarding serialization requirements for ib_unmap_fmr. Knut
> > >> Omang pointed out to me that
> > >> Documentation/infiniband/core-locking.txt makes this bold
> > >> statement: 
> > >>> Reentrancy
> > >>>
> > >>>  All of the methods in struct ib_device exported by a low-level
> > >>>  driver must be fully reentrant.  The low-level driver is
> > >>> required to perform all synchronization necessary to maintain
> > >>> consistency, even if multiple function calls using the same
> > >>> object are run simultaneously.
> > >>>
> > >>>  The IB midlayer does not perform any serialization of function
> > >>> calls.
> > >>>
> > >>>  Because low-level drivers are reentrant, upper level protocol
> > >>>  consumers are not required to perform any serialization.  
> > >>
> > >> Does this re-entrancy guarantee apply only when ib_unmap_fmr is
> > >> called concurrently with unique FMRs?  
> > >
> > > According to description, it should apply to all operations on
> > > ib_device without any exclusion.
> > >  
> > >>
> > >> I've been told it is not possible for ib_unmap_fmr to detect
> > >> when it has been invoked in different threads with the same
> > >> FMR.  
> > >
> > > Right, FMR management is implemented as direct writes to MPT and
> > > MTT tables. HW doesn't distinguish simultaneous calls to the TPT
> > > cache. 
> > >> but apparently the > user space equivalent does not have the same
> > >> vulnerability (I did not test this assertion).
> > >>
> > >> I'm wondering what is proper closure here (aside from merging the
> > >> NFS/RDMA fix).  
> > >
> > > Maybe serialize unmap_frm (workqueue) from the driver side?  
> >
> > Either correcting the documentation or a driver change is OK with
> > me.
> >
> > Claiming that "upper level protocol consumers are not required to
> > perform any serialization" seems like a stretch.  
> 
> Right,
> 
> I added Jack to this thread, and we will need a couple of days to
> think internally about possible solutions.
> 
> Thanks
> 
Adding Majd

-Jack
> >
> > --
> > Chuck Lever
> >
> >
> >  

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2017-04-19  8:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-14 15:51 Crashes due to concurrent calls to ib_unmap_fmr() Chuck Lever
     [not found] ` <5C9E097E-938D-4F41-9EA4-003F77A54DAD-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-04-15  9:55   ` Leon Romanovsky
     [not found]     ` <20170415095528.GK1343-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-17 17:45       ` Chuck Lever
     [not found]         ` <A9FF7F7C-F936-4925-B3A6-31684613B69F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-04-18 17:44           ` Leon Romanovsky
     [not found]             ` <20170418174430.GD14088-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-19  8:02               ` jackm [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170419110235.00007e4e@dev.mellanox.co.il \
    --to=jackm-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
    --cc=Knut.Omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=jackm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox