public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Add new torus routing engine: torus-2QoS
@ 2009-11-20 19:14 Jim Schutt
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:14 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

This patch series adds a new routing engine designed to handle large 
fabrics connected with a 2D/3D torus topology.

Patches 1-4 do some preparation to handle new SL-related features of
the routing engine, patches 5/6 add and enable the engine, and patches
7-11 have some fixups that only make sense in the presence of the new
engine.

So why a new torus routing engine?

Because I believe none of the existing routing engines can provide a
satisfactory operational experience on a large-scale torus, i.e. one
with hundreds of switches.

Generating routes for a torus that are free of credit loops requires
the use of multiple virtual lanes, and thus SLs on IB.  For IB fabrics
it also requires that _every_ application use path record queries - 
any application that uses an SL that was not obtained via a path record
query may cause credit loops.

In addition, if a fabric topology change (e.g. failed switch/link)
causes a change in the path SL values needed to prevent credit loops,
then _every_ application needs to repath for every path whose SL has
changed.  AFAIK there is no good way to do this as yet in general.

Also, the requirement for path SL queries on every connection places a
heavy load on subnet administration, and the possibility that path SL
values can change makes caching as a performance enhancement more 
difficult.

Since multiple VL/SL values are required to prevent credit loops on a 
torus,  supporting QoS means that QoS and routing need to share the small 
pool of available SL values, and the even smaller pool of available VL 
values.

This patch series, and the routing engine it introduces, addresses these
issues for a 2D/3D torus fabric.  The torus-2QoS engine can provide the
following functionality on a 2D/3D torus:
- routing that is free of credit loops
- two levels of QoS, assuming switches support 8 data VLs
- ability to route around a single failed switch, and/or multiple failed
    links, without
    - introducing credit loops
    - changing path SL values
- very short run times, with good scaling properties as fabric size
    increases

The routing engine currently in opensm that is most functional for a
torus-connected fabric is LASH.  In comparison with torus-2QoS, LASH
has the following issues:
- LASH does not support QoS.
- changing inter-switch topology (add/remove a switch, or
    removing all the links between a switch) can change many
    path SL values, potentially leading to credit loops if
    running applications do not repath.
- running time to calculate routes scales poorly with increasing 
    fabric size.

The basic algorithm used by torus-2QoS is DOR.  It also uses SL bits 0-2,
one SL bit per torus dimension, to encode whether a path crosses a dateline
(where the coordinate value wraps to zero) for each of the three dimensions,
in order to avoid the credit loops that otherwise result on a torus.  It
uses SL bit 3 to distinguish between two QoS levels.

It uses the SL2VL tables to map those eight SL values per QoS level into
two VL values per QoS level, based on which coordinate direction a link
points.  For two QoS levels, this consumes four data VLs, where VL bit
0 encodes whether the path crosses the dateline for the coordinate
direction in which the link points, and VL bit 2 encodes QoS level.

In the event of link failure, it routes the long way around the 1-D ring
containing the failed link.  I.e. no turns are introduced into a path in
order to route around a failed link.  Note that due to this implementation, 
torus-2QoS cannot route a torus with link failures that break a 1-D ring
into two disjoint segments.

Under DOR routing in a torus with a failed switch, paths that would
otherwise turn at the failed switch cannot be routed without introducing
an "illegal" turn into the path.  Such turns are "illegal" in the
sense that allowing them will allow credit loops, unless something can
be done.

The routes produced by torus-2QoS will introduce such "illegal" turns when
a switch fails.  It makes use of the input/output port dependence in the
SL2VL maps to set the otherwise unused VL bit 1 for the path hop following 
such an illegal turn.  This is enough to avoid credit loops in the 
presence of a single failed switch.

As an example, consider the following 2D torus, and consider routes
from S to D, both when the switch at F is operational, and when it
has failed.  torus-2QoS will generate routes such that the path
S-F-D is followed if F is operational, and the path S-E-I-L-D
if F has failed:

    |    |    |    |    |    |    |
  --+----+----+----+----+----+----+--
    |    |    |    |    |    |    |
  --+----+----+----+----+----D----+--
    |    |    |    |    |    |    |
  --+----+----+----+----I----L----+--
    |    |    |    |    |    |    |
  --+----+----S----+----E----F----+--
    |    |    |    |    |    |    |
  --+----+----+----+----+----+----+--

The turn in S-E-I-L-D at switch I is the illegal turn introduced
into the path.  The turns at E and L are extra turns introduced
into the path that are legal in the sense that no credit loops
can be constructed using them.

The path hop after the turn at switch I has VL bit 1 set, which marks
it as a hop after an illegal turn.

I've used the latest development version of ibdmchk, because it can use
path SL values and SL2VL tables, to check for credit loops in cases like 
the above routed with torus-2QoS, and it finds none.

I've also looked for credit loops in a torus with multiple failed switches
routed with torus-2QoS, and learned that if and only if the failed switches
are adjacent in the last DOR dimension, there will be no credit loops.

Since torus-2QoS makes use of all available SL values when supporting
2 QoS levels, there are none left over on which to confine multicast.
It turns out there is a way to construct a spanning tree which can 
overlay a DOR-routed mesh, so that multicast and unicast can coexist
on the same SL/VL without causing credit loops.  I'm working on that but
don't have it implemented yet.

In the meantime, if you do not request QoS using opensm -Q, then
torus-2QoS will only use SLs 8-15, and thus VLs 4-7, leaving SL0/VL0
free for multicast.


Jim Schutt (11):
  opensm: Prepare for routing engine input to path record SL lookup and
    SL2VL map setup.
  opensm: Allow the routing engine to influence SL2VL calculations.
  opensm: Allow the routing engine to participate in path SL
    calculations.
  opensm: Track the minimum value in the fabric of data VLs supported.
  opensm: Add torus-2QoS routing engine.
  opensm: Enable torus-2QoS routing engine.
  opensm: Add opensm option to specify file name for extra torus-2QoS
    configuration information.
  opensm: Do not require -Q option for torus-2QoS routing engine.
  opensm: Make it possible to configure no fallback routing engine.
  opensm:  Avoid havoc in minhop caused by torus-2QoS persistent use of
    osm_port_t:priv.
  opensm: Update documentation to describe torus-2QoS.

 opensm/doc/current-routing.txt         |  154 +-
 opensm/include/opensm/osm_base.h       |   18 +
 opensm/include/opensm/osm_opensm.h     |   24 +-
 opensm/include/opensm/osm_subnet.h     |    7 +
 opensm/include/opensm/osm_ucast_lash.h |    3 -
 opensm/man/opensm.8.in                 |    9 +-
 opensm/opensm/Makefile.am              |    2 +-
 opensm/opensm/main.c                   |    8 +
 opensm/opensm/osm_console.c            |   10 +-
 opensm/opensm/osm_dump.c               |    3 +-
 opensm/opensm/osm_link_mgr.c           |   16 +-
 opensm/opensm/osm_opensm.c             |   54 +-
 opensm/opensm/osm_port_info_rcv.c      |   13 +-
 opensm/opensm/osm_qos.c                |   26 +-
 opensm/opensm/osm_sa_path_record.c     |   33 +-
 opensm/opensm/osm_state_mgr.c          |   10 +-
 opensm/opensm/osm_subnet.c             |   20 +-
 opensm/opensm/osm_ucast_lash.c         |   11 +-
 opensm/opensm/osm_ucast_mgr.c          |   44 +-
 opensm/opensm/osm_ucast_torus.c        | 8665 ++++++++++++++++++++++++++++++++
 20 files changed, 9038 insertions(+), 92 deletions(-)
 create mode 100644 opensm/opensm/osm_ucast_torus.c


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2010-03-04 21:38 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-20 19:14 [PATCH 00/11] Add new torus routing engine: torus-2QoS Jim Schutt
     [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2009-11-20 19:15   ` [PATCH 01/11] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup Jim Schutt
2009-11-20 19:15   ` [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations Jim Schutt
     [not found]     ` <1258744509-11148-3-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-01-14 12:36       ` Yevgeny Kliteynik
     [not found]         ` <4B4F0FBD.3040308-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-01-14 16:01           ` Jim Schutt
2010-02-10 16:15       ` Yevgeny Kliteynik
     [not found]         ` <4B72DBBD.9020709-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-02-15 21:45           ` Jim Schutt
2009-11-20 19:15   ` [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations Jim Schutt
     [not found]     ` <1258744509-11148-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-01-14 16:24       ` Yevgeny Kliteynik
     [not found]         ` <4B4F452B.7040007-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-01-18 19:24           ` Jim Schutt
     [not found]             ` <1263842661.5550.43.camel-mgfCWIlwujvg4c9jKm7R2O1ftBKYq+Ku@public.gmane.org>
2010-01-18 20:19               ` Yevgeny Kliteynik
2009-11-20 19:15   ` [PATCH 04/11] opensm: Track the minimum value in the fabric of data VLs supported Jim Schutt
2009-11-20 19:15   ` [PATCH 06/11] opensm: Enable torus-2QoS routing engine Jim Schutt
2009-11-20 19:15   ` [PATCH 07/11] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information Jim Schutt
2009-11-20 19:15   ` [PATCH 08/11] opensm: Do not require -Q option for torus-2QoS routing engine Jim Schutt
2009-11-20 19:15   ` [PATCH 09/11] opensm: Make it possible to configure no fallback " Jim Schutt
     [not found]     ` <1258744509-11148-9-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-03-04 14:35       ` Yevgeny Kliteynik
     [not found]         ` <4B8FC53C.9060605-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-03-04 21:38           ` Jim Schutt
2009-11-20 19:15   ` [PATCH 10/11] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
2009-11-20 19:15   ` [PATCH 11/11] opensm: Update documentation to describe torus-2QoS Jim Schutt
2009-11-20 19:24   ` [PATCH 05/11] opensm: Add torus-2QoS routing engine Jim Schutt
2009-11-20 19:27   ` torus-2QoS example input files (was Re: [PATCH 00/11] Add new torus routing engine: torus-2QoS) Jim Schutt
2009-12-18 20:50   ` [PATCH 00/12] Add specialized multicast support to new torus routing engine: torus-2QoS Jim Schutt
     [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2009-12-18 20:54       ` [PATCH 05/12] opensm: Enforce torus-2QoS link ordering convention Jim Schutt
2010-02-16 16:16       ` [PATCH 0/3] opensm: Bug fixes for torus-2QoS patchset Jim Schutt
2010-02-16 16:16       ` [PATCH 1/3] opensm: Use local variables when searching for torus-2QoS master spanning tree root Jim Schutt
2010-02-16 16:16       ` [PATCH 2/3] opensm: Fix handling of torus-2QoS topology discovery for radix 4 torus dimensions Jim Schutt
2010-02-16 16:16       ` [PATCH 3/3] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
2009-12-18 20:50   ` [PATCH 01/12] opensm: Make error message for torus-2QoS dateline specification match code check Jim Schutt
2009-12-18 20:50   ` [PATCH 02/12] opensm: torus-2QoS should fail to route if message deadlock is possible Jim Schutt
2009-12-18 20:50   ` [PATCH 03/12] opensm: Remove unused port specification from torus-2QoS config file parsing Jim Schutt
     [not found]     ` <1261169461-2516-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2009-12-18 20:56       ` Jim Schutt
2009-12-18 20:50   ` [PATCH 04/12] opensm: Fix up some torus-2QoS comments to match code Jim Schutt
2009-12-18 20:50   ` [PATCH 06/12] opensm: Remove redundant function names in torus-2QoS logging Jim Schutt
2009-12-18 20:50   ` [PATCH 07/12] opensm: Make torus-2QoS always use OSM_LOG_INFO, never LOG_INFO Jim Schutt
2009-12-18 20:50   ` [PATCH 08/12] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast Jim Schutt
2009-12-18 20:50   ` [PATCH 09/12] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c Jim Schutt
2009-12-18 20:50   ` [PATCH 10/12] opensm: Implement master spanning tree for torus-2QoS multicast support Jim Schutt
2009-12-18 20:51   ` [PATCH 11/12] opensm: Implement multicast support for torus-2QoS Jim Schutt
2009-12-18 20:51   ` [PATCH 12/12] opensm: Update documentation to describe torus-2QoS multicast support Jim Schutt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox