public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Alex Rosenbaum
	<rosenbaumalex-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [RFC] libibverbs IB device hotplug support
Date: Tue, 28 Feb 2017 10:48:02 -0500	[thread overview]
Message-ID: <1488296882.86943.214.camel@redhat.com> (raw)
In-Reply-To: <CAFgAxU9eQYwBad2+SJw1x+cQMMqzgkz0fpeumG7Rx=o_eQqo2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 7064 bytes --]

On Mon, 2017-02-27 at 18:08 +0200, Alex Rosenbaum wrote:
> 
> 
> Today the IB device list is returned by ibv_get_device_list() from
> libibverbs.
> 
> This IB device list is created by scanning of the
> /sys/class/infiniband_verbs
> 
> [2]. The list is cached and never updated, no matter if there were
> hardware
> 
> changes in the system.

Correct.  This is something I've felt we've needed to update/fix for a
while now.

> Detection of hotplug events is not part of libibverbs functionality
> and thus
> 
> not part of this RFC. User space applications should monitor
> respectful changes
> 
> in the system according to its specific logic to detect plugout or
> plugin of
> 
> hardware devices. This can be does by means of: netlink, udev or
> other inputs.

On this point I disagree.  I see no reason that we should implement
hotplug detection in every application when the library can get it
right just once and make it available to everyone.

> Here I suggest to modify the implementation of ibv_get_device_list()
> so that
> 
> consecutive calls will re-scan the sysfs in the same manner as done
> today in
> 
> order to create a fresh ibv_device list each time. We will remove
> caching of
> 
> devices that support plugout, while keeping the ibv_device cache for
> devices
> 
> which do not support plugout.

I would argue that a better solution is to, on first call into
ibv_get_device_list() (which doubles as the library's init() entry), we
scan the system for whatever devices there are, build a permanent list
of these devices, then register the necessary hooks ourselves to get
updates on both plug out and plug in, and on those events we update the
list as appropriate (on plug out, notify the application via async
event, after the application has processed the event, if the device is
unused, remove the device entirely, if the device is still in use, then
move it to a defunct list and wait for the application to be able to
release all of its references to it, and only then remove it; on plug
in, add a new device to the device list and notify the application via
async event at which point the application can decide if it wants to
use the device).  This might actually involve a new async notifier
added to the libibverbs API that is not tied to a device and which an
application would need to be modified to register with the library.
 But since any applicaiton wanting to support this is going to need to
be modified anyway, a new entry point isn't the end of the world.

> 
> Applications Behavior
> 
> -------------------------------------------------------------------
> -------------
> 
> Applications use different logic to decide which ibv_device is the
> relevant
> 
> device they want to use. And each application has its own detection
> logic to
> 
> track such changes in device availability.
> 
> Few examples: librdmacm logic is based on GUID values. Socket
> acceleration
> 
> (libvma) maps an IB device to its corresponding net iface based on
> netlink and
> 
> sysfs. DPDK applications lookup the IB device PCI address. And most
> MPI
> 
> implementation want human specified IB device name in command line
> and will
> 
> probably not handle any hotplug (out or in) events.
> 
> 
> 
> It is the application's responsibility to check which ibv_device
> returned from
> 
> ibv_get_device_list() has changed from previous scan and which is of
> interest.

With an async event notifier, and with a static list of devices, it
would be possible to simply pass the new device in with the notifier.
 Makes things much simpler for the app IMO.

> 
> 
> Verbs can issue an IBV_EVENT_DEVICE_FATAL async event on an open user
> space
> 
> ibv_context for device's which support the ib_device-
> >disassociate_ucontext().
> 
> This event will indicate to the application that the device is no
> longer
> 
> operational. In addition, user space CQ channel fd’s blocking calls
> on recv(),
> 
> select(), poll() or epoll() will be released with EINTR errno.
> 
> 
> 
> Typical user space application will monitor hardware changes and/or
> call for
> 
> ibv_get_device_list() only from control path dedicated thread, and
> not from the
> 
> fast path threads.
> 
> 
> 
> Pitfall
> 
> -------------------------------------------------------------------
> -------------
> 
> If a legacy user space application did not follow the
> ibv_get_device_list()
> 
> man page definition, and it saved a private copy of an ibv_device
> pointer and
> 
> used it after releasing the device list (call to
> ibv_free_device_list()), then
> 
> ibv_open_device() might seg-fault based on this new suggestion.
> 
> We can work around this by moving the IB device re-scan logic to a
> new API
> 
> 'ibv_refresh_device_list()' so that only new application using this
> API will
> 
> have correct behavior as needed.

Following what I wrote above, this would no longer be a problem.  The
call to ibv_get_device_list can simply return the existing list (on
first call we init this list), and we can make ibv_free_device_list a
no-op.  The async events can notify of adds/removes, user can update
their own list accordingly, but we would also protect ourselves in all
of the ibv_ calls such that if the user space code calls into our code
with a defunct device we don't crash.  We might want to protect all of
our calls, even hot path calls, from invalid ibv_context pointers now
as a result of this support.


> 
> 
> Reference
> 
> -------------------------------------------------------------------
> -------------
> 
> [1] https://www.kernel.org/doc/pending/hotplug.txt
> 
> [2] https://github.com/linux-rdma/rdma-core/blob/master/Documentation
> /libibverbs.md
> 
> 
> 
> API changes
> 
> -------------------------------------------------------------------
> -------------
> 
> Signed-off-by: Alex Rosenbaum <alexr@xxxxxxxxxxxx>
> 
> --- a/libibverbs/verbs.h
> 
> +++ b/libibverbs/verbs.h
> 
> @@ -1336,6 +1336,7 @@ struct verbs_device {
> 
>                                 struct ibv_context *ctx, int cmd_fd);
> 
>         void    (*uninit_context)(struct verbs_device *device,
> 
>                                 struct ibv_context *ctx);
> 
> +       void    (*uninit_device)(struct verbs_device *device);
> 
> +       atomic_t         ref_count;
> 
>         /* future fields added here */
> 
> };
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" 
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2017-02-28 15:48 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-27 16:08 [RFC] libibverbs IB device hotplug support Alex Rosenbaum
     [not found] ` <CAFgAxU9eQYwBad2+SJw1x+cQMMqzgkz0fpeumG7Rx=o_eQqo2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-27 18:59   ` Jason Gunthorpe
     [not found]     ` <20170227185912.GM5891-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-02-27 22:11       ` Alex Rosenbaum
     [not found]         ` <CAFgAxU9XJTXkaeL_VE7zHASPBM+j=TZd2L+McKBcJRThJtUN5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-27 22:36           ` Jason Gunthorpe
     [not found]             ` <20170227223600.GA1526-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-02-28  7:56               ` Alex Rosenbaum
     [not found]                 ` <CAFgAxU8KkrOS6aib4ykf8vj1sQFGK_oMW=fX3vZ2X2_r3ryDVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-28 21:26                   ` Jason Gunthorpe
2017-02-28 15:48   ` Doug Ledford [this message]
     [not found]     ` <1488296882.86943.214.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-28 16:32       ` Liran Liss
     [not found]         ` <HE1PR0501MB28123E8F342607E2C2CA4DD9B1560-692Kmc8YnlIVrnpjwTCbp8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-02-28 21:47           ` Doug Ledford
     [not found]             ` <3389d831-c135-b326-4b96-5f2a746de5ac-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-01  0:44               ` Jason Gunthorpe
     [not found]                 ` <20170301004449.GA13114-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-01 18:00                   ` Hefty, Sean
     [not found]                     ` <1828884A29C6694DAF28B7E6B8A82373AB0ECDC6-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-03-01 18:07                       ` Jason Gunthorpe
2017-03-01 21:47                       ` Alex Rosenbaum
     [not found]                         ` <CAFgAxU9vObaW4O+byEJ5pV1Ofou4cd05HHWWPC7iTJshyk+LdA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-01 22:07                           ` Doug Ledford
     [not found]                             ` <d92df6b7-207f-f3a3-8bf5-b12cffe20684-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-01 22:32                               ` Alex Rosenbaum
     [not found]                                 ` <CAFgAxU_-jbAoOR2XMYfSgbDMn7FnrthudZJLgeNNTzb9GEUXrg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-02 12:48                                   ` Doug Ledford
     [not found]                                     ` <467c0560-8de7-42e5-14b9-178c367874d2-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-02 16:53                                       ` Jason Gunthorpe
2017-03-01 23:18                           ` Hefty, Sean
2017-03-01 21:56                       ` Doug Ledford
     [not found]                         ` <cad3d2c6-0c98-9f58-a3bb-81823103bd76-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-01 22:23                           ` Jason Gunthorpe
2017-03-01 22:34                           ` Alex Rosenbaum
2017-03-01  0:10   ` ira.weiny
     [not found]     ` <20170301001023.GA3001-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2017-03-01 21:21       ` Alex Rosenbaum
     [not found]         ` <CAFgAxU-Vf6RLEo8N=oya0X+WbneRHE1xa_-5GY+TEqo1wyzpow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-01 22:17           ` Weiny, Ira
     [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E67C5CA80-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-03-01 22:38               ` Alex Rosenbaum
2017-03-01 21:57       ` Alex Rosenbaum
     [not found]         ` <CAFgAxU_i+5FtBkdf=t5vdQAvBqDt9aKsFJbAKGN5YWQQ9w-c2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-01 22:15           ` Weiny, Ira
     [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E67C5CA68-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-03-01 22:42               ` Alex Rosenbaum
2017-03-01 23:07               ` Jason Gunthorpe
     [not found]                 ` <20170301230743.GC2820-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-02 13:22                   ` Leon Romanovsky
2017-03-02 16:08   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A82373AB0ED3C6-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-03-03 12:31       ` Alex Rosenbaum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1488296882.86943.214.camel@redhat.com \
    --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=rosenbaumalex-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox