All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 00/29] Add SRIOV support for IB interfaces
@ 2012-06-11 10:45 Jack Morgenstein
       [not found] ` <1339411570-4689-1-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Jack Morgenstein @ 2012-06-11 10:45 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, liranl-VPRAkNaXOzVWk0Htik3J/w,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Jack Morgenstein,
	dotanb-VPRAkNaXOzVWk0Htik3J/w, tziporet-VPRAkNaXOzVWk0Htik3J/w

This patch set adds SRIOV support for IB interfaces.

Patches 1-13 are "precondition" patches.
Patches 14-29 actually implement the feature.

This patch set introduces Infiniband SRIOV support for ConnectX2 and ConnectX3
devices.  Each function presents itself as an independent vHCA (virtual HCA) to
the host while a single HCA is observable by the network, which is unaware of
the vHCAs.  No changes are required by the IB subsystem, ULPs, and apps to
support SRIOV, and vHCAs are interoperable with any existing (non-virtualized)
IB deployments.
 
We term this model for SRIOV implementation the shared-port model.

Sharing the same physical port(s) among multiple vHCAs is achieved as follows:
 
1. Each vHCA port presents its own virtual GID table.
 
Currently, the virtual GID table comprises a single entry (at index 0) that
maps to a unique index in the physical GID table.  The vHCA of the PF maps to
physical GID index 0. To obtain GIDs for other vHCAs, alias GUIDs are requested
from the SM.  These are GUIDs which the SM places, per port, in the port's guid
table after the 0'th slot (which is read-only and determined by the FW).
The host admin can assign GIDs to vHCAs using a sysfs interface (see below).
 
2. Each vHCA port presents its own virtual PKey table.
 
The virtual PKey table is a mapping of selected indexes of the physical pkey table.
The host admin can control which pkey indexes are mapped to which virtual indexes
using a sysfs interface (see below). Note that the physical PKey table may contain
both full and partial memberships of the same PKey to allow different membership
types in different virtual tables.
 
3. Each vHCA port has its own virtual port state.
 
A vHCA port is up if the following conditions apply:
- The physical port is up
- The virtual GID table contains the GIDs requested by the host admin
- The SM has acknowledged the requested GIDs since the last time that
  the physical port came up
 
4. Other port attributes are shared, e.g., GID prefix, LID,  SM LID, LMC mask.
 
5. Special QPs are para-virtualized.
 
vHCAs are not given direct access to QP0/1. Rather, these QPs are operated by a
special context hosted by the PF, which mediates access to/from vHCAs.
This is done by opening a “tunnel” per vHCA port per QP0/1. A tunnel comprises
a pair of UD QPs:  a “Tunnel QP” in the PF-context and a “Proxy QP” in the vHCA.
All vHCA MAD traffic must pass through the corresponding tunnel.
vHCA QPs cannot be assigned to VL15 and are denied of the well-known QKey. 
 
QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual) QP0’s,
but they never receive any SMPs and all SMPs sent are discarded.
QP1 traffic is allowed for all vHCAs, but special care is required to bridge
the gap between the host and network views.

Specifically:
- Transaction IDs are mapped to guarantee uniqueness among vHCAs
- CM para-virtualization
  o   Incoming requests are steered to the correct vHCA according to the embedded GID
  o   Local communication IDs are mapped to ensure uniqueness among vHCAs
- Multicast para-virtualization
  o   The PF context aggregates membership state from all vHCAs
  o   The SA is contacted only when the aggregate membership changes
  o   If the aggregate does not change, the PF context will provide the
       requesting vHCA with the proper response
 
Incoming MADs are steered according to:
- the DGID If a GRH is present
- the mapped transaction ID for response MADs
- the embedded GID in CM requests
- the remote communication ID in other CM messages

To allow the host admin to control the virtual GID and PKey tables of vHCAs,
a new sysfs ‘iov’ sub-tree has been added under the PF infiniband device.
Details on this mechanism can be found in the change log of:
   IB/mlx4: Add iov directory in sysfs under the ib device

Known Issues
------------
1. librdmacm will currently not support multiple VF/PF on the same host.
   This will be fixed in V1.
2. FMRs are not currently supported on slaves. This will be corrected in a
   future submission.
3. RoCE is not currently supported on slaves. This will be corrected in a
   future submission.
4. Due to a (correct) change in kernel IRQ management in kernel 3.5-rc1 (see
   commit 1c6c69525b40), the KVM module no longer succeeds in passing interrupts
   through to guests.  (see the discussion thread beginning at
   https://lkml.org/lkml/2012/6/1/261).  Until this KVM issue is fixed, anyone
   wishing to use SRIOV-IB (or SRIOV-Ethernet) with ConnectX2 or ConnectX3
   devices on guest O/Ses should revert commit 1c6c69525b40
   (as a TEMPORARY workaround) in order to enable the guests to operate the mlx4 driver.

   VFs may still be bound to the host (via setting the "probe_vf" mlx4_core
   module parameter to a non-zero value in a conf file under /etc/modprobe.d) 
   without reverting the commit mentioned above.

In addition, several of the patches have notations indicating things that
will be fixed in V1.

Amir Vadai (1):
  IB/mlx4: Add CM paravirtualization

Erez Shitrit (1):
  IB/sa: Add GuidInfoRecord query support.

Jack Morgenstein (26):
  net/mlx4_core: Pass an invalid PCI id number to VFs
  IB/mlx4: Mask out high order bit of port_num in mlx4_ib_create_ah
  IB/mlx4: Add run-time switchable error path debug output capability
  IB/core: change pkey table lookups to support full and partial
    membership for the same pkey
  IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey
    match
  IB/core: move macros from cm_msgs.h to ib_cm.h
  {NET,IB}/mlx4: Use port management change event instead of smp_snoop
  net/mlx4_core: For SRIOV, initialize ib port-capabilities for all
    slaves
  net/mlx4_core: Implement mechanism for reserved qkeys
  net/mlx4_core: Allow guests to support IB ports
  net/mlx4_core: place phys gid and pkey tbl sizes in mlx4_phys_caps
    struct and paravirtualize them
  IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support
  net/mlx4_core: Add proxy and tunnel QPs to the reserved QP area
  IB/mlx4: Initialize SRIOV IB support for slaves in master context
  {NET/IB}mlx4: Implement QP paravirtualization
  IB/mlx4: SRIOV multiplex and demultiplex MADs
  {NET,IB}/mlx4: MAD_IFC paravirtualization
  net/mlx4_core: Add IB port-state machine, and port mgmt event
    propagation infrastructure
  {NET,IB}/mlx4: Add alias_guid mechanism
  IB/mlx4: Propagate pkey and guid change port management events to
    slaves
  IB/mlx4: Add iov directory in sysfs under the ib device
  net/mlx4_core: Adjustments to SET_PORT for SRIOV-IB
  IB/mlx4: Initialize guid-cache index 0 (default guid)
  net/mlx4_core: INIT/CLOSE port logic for IB ports in SRIOV mode
  IB/mlx4: Miscellaneous adjustments to SRIOV IB support
  {NET/IB}mlx4: Activate SRIOV mode for IB

Oren Duer (1):
  IB/mlx4: Added Multicast Groups (MCG) para-virtualization for SRIOV

 drivers/infiniband/core/cache.c                    |   42 +-
 drivers/infiniband/core/cm_msgs.h                  |   12 -
 drivers/infiniband/core/device.c                   |   17 +-
 drivers/infiniband/core/sa_query.c                 |  133 ++
 drivers/infiniband/hw/mlx4/Makefile                |    2 +-
 drivers/infiniband/hw/mlx4/ah.c                    |    4 +-
 drivers/infiniband/hw/mlx4/alias_GUID.c            |  791 +++++++++
 drivers/infiniband/hw/mlx4/cm.c                    |  437 +++++
 drivers/infiniband/hw/mlx4/cq.c                    |   31 +-
 drivers/infiniband/hw/mlx4/mad.c                   | 1712 +++++++++++++++++++-
 drivers/infiniband/hw/mlx4/main.c                  |  284 +++-
 drivers/infiniband/hw/mlx4/mcg.c                   | 1254 ++++++++++++++
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |  368 +++++-
 drivers/infiniband/hw/mlx4/qp.c                    |  663 +++++++-
 drivers/infiniband/hw/mlx4/sysfs.c                 |  808 +++++++++
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |  179 ++-
 drivers/net/ethernet/mellanox/mlx4/en_main.c       |    5 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c            |  257 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |  235 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.h            |    3 +
 drivers/net/ethernet/mellanox/mlx4/intf.c          |    5 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |  103 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |  115 +-
 drivers/net/ethernet/mellanox/mlx4/port.c          |   21 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c            |   66 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  220 +++-
 include/linux/mlx4/device.h                        |  168 ++-
 include/linux/mlx4/driver.h                        |    5 +-
 include/linux/mlx4/qp.h                            |    3 +-
 include/rdma/ib_cache.h                            |   16 +
 include/rdma/ib_cm.h                               |   12 +
 include/rdma/ib_sa.h                               |   33 +
 32 files changed, 7653 insertions(+), 351 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx4/alias_GUID.c
 create mode 100644 drivers/infiniband/hw/mlx4/cm.c
 create mode 100644 drivers/infiniband/hw/mlx4/mcg.c
 create mode 100644 drivers/infiniband/hw/mlx4/sysfs.c

Cc: dotanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
Cc: tziporet-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH for-next 00/29] Add SRIOV support for IB interfaces
@ 2012-06-14 13:21 Yann Droneaud
       [not found] ` <9e8965c740ff0e889a097a722b9fff90.squirrel-2RFepEojUI2lDZmfZ6uX/xeHL2rgt/dS@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Yann Droneaud @ 2012-06-14 13:21 UTC (permalink / raw)
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, yevgenyp-VPRAkNaXOzVWk0Htik3J/w,
	Jack Morgenstein, dotanb-VPRAkNaXOzVWk0Htik3J/w,
	tziporet-VPRAkNaXOzVWk0Htik3J/w, ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA

Hi,

> This patch set adds SRIOV support for IB interfaces.
> Patches 1-13 are "precondition" patches.
> Patches 14-29 actually implement the feature.
> This patch set introduces Infiniband SRIOV support for ConnectX2 and
ConnectX3
> devices.  Each function presents itself as an independent vHCA (virtual
HCA) to
> the host while a single HCA is observable by the network, which is
unaware
> of
> the vHCAs.  No changes are required by the IB subsystem, ULPs, and apps
to
> support SRIOV, and vHCAs are interoperable with any existing
> (non-virtualized)
> IB deployments.

Please forgive me, I haven't tried the patches or go deeper in reading them.

How will interact vHCA/HCA regarding Automatic Path Migration (APM) and
IPoIB bonding with fail-over (HA), with RDMA_CM and IP usages in mind ?

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2012-06-18  9:43 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-11 10:45 [PATCH for-next 00/29] Add SRIOV support for IB interfaces Jack Morgenstein
     [not found] ` <1339411570-4689-1-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-06-11 10:45   ` [PATCH for-next 01/29] net/mlx4_core: Pass an invalid PCI id number to VFs Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 02/29] IB/mlx4: Mask out high order bit of port_num in mlx4_ib_create_ah Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 03/29] IB/mlx4: Add run-time switchable error path debug output capability Jack Morgenstein
     [not found]     ` <1339411570-4689-4-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-06-12  7:00       ` Roland Dreier
2012-06-11 10:45   ` [PATCH for-next 04/29] IB/core: change pkey table lookups to support full and partial membership for the same pkey Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 05/29] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 06/29] IB/sa: Add GuidInfoRecord query support Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 07/29] IB/core: move macros from cm_msgs.h to ib_cm.h Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 08/29] {NET,IB}/mlx4: Use port management change event instead of smp_snoop Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 09/29] net/mlx4_core: For SRIOV, initialize ib port-capabilities for all slaves Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 10/29] net/mlx4_core: Implement mechanism for reserved qkeys Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 11/29] net/mlx4_core: Allow guests to support IB ports Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 12/29] net/mlx4_core: place phys gid and pkey tbl sizes in mlx4_phys_caps struct and paravirtualize them Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 13/29] IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 14/29] net/mlx4_core: Add proxy and tunnel QPs to the reserved QP area Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 15/29] IB/mlx4: Initialize SRIOV IB support for slaves in master context Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 16/29] {NET,IB}/mlx4: Implement QP paravirtualization Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 17/29] IB/mlx4: SRIOV multiplex and demultiplex MADs Jack Morgenstein
2012-06-11 10:45   ` [PATCH for-next 18/29] {NET,IB}/mlx4: MAD_IFC paravirtualization Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 19/29] IB/mlx4: Added Multicast Groups (MCG) para-virtualization for SRIOV Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 20/29] IB/mlx4: Add CM paravirtualization Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 21/29] net/mlx4_core: Add IB port-state machine, and port mgmt event propagation infrastructure Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 22/29] {NET,IB}/mlx4: Add alias_guid mechanism Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 23/29] IB/mlx4: Propagate pkey and guid change port management events to slaves Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 24/29] IB/mlx4: Add iov directory in sysfs under the ib device Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 25/29] net/mlx4_core: Adjustments to SET_PORT for SRIOV-IB Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 26/29] IB/mlx4: Initialize guid-cache index 0 (default guid) Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 27/29] net/mlx4_core: INIT/CLOSE port logic for IB ports in SRIOV mode Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 28/29] IB/mlx4: Miscellaneous adjustments to SRIOV IB support Jack Morgenstein
2012-06-11 10:46   ` [PATCH for-next 29/29] {NET,IB}/mlx4: Activate SRIOV mode for IB Jack Morgenstein
2012-06-12  7:00   ` [PATCH for-next 00/29] Add SRIOV support for IB interfaces Roland Dreier
     [not found]     ` <CAL1RGDVsGUBHUBOaajnzO4NcFjA1xBZczLA=vxV3=oRKme+LrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-06-12  7:22       ` Or Gerlitz
  -- strict thread matches above, loose matches on Subject: below --
2012-06-14 13:21 Yann Droneaud
     [not found] ` <9e8965c740ff0e889a097a722b9fff90.squirrel-2RFepEojUI2lDZmfZ6uX/xeHL2rgt/dS@public.gmane.org>
2012-06-17 14:39   ` Or Gerlitz
     [not found]     ` <4FDDEC16.9060702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-06-18  9:43       ` Yann Droneaud

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.