linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Reid O <hpc_reid-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Subnet management on non pure fat-tree network‏
Date: Mon, 29 Nov 2010 11:48:57 +0200	[thread overview]
Message-ID: <4CF37709.3010008@dev.mellanox.co.il> (raw)
In-Reply-To: <SNT140-w638ED9C5E0455EBAEBC0FEF1230-MsuGFMq8XAE@public.gmane.org>

On 29-Nov-10 12:18 AM, Reid O wrote:
> 
> Hello,
>    We have an Infiniband cluster in a fat tree configuration with 8 core switches and
> 12 leaf switches.  The compute nodes are all in enclosures connected to the 12
> leaf switches.  However, we have a number of non-compute nodes (admin,
> login and storage nodes) that we have connected directly to the core
> switches.  Initially, we were getting credit-loop issues so we switched
> from Min Hop to UPDN routing.  However, now 90% of our IB traffic seems
> to be routed through a single core switch.  I have tried adding a root
> guid file with the -a option, but that results in us getting this error:
> 
> Nov
> 28 16:47:19 319442 [45007960] 0x01 ->  __osm_pr_rcv_get_path_parms:
> ERR 1F07: Dead end on path to LID 0x6F from switch for GUID
> 0x00066a00d9000ac8
> Nov 28 16:47:22 319469 [43C05960] 0x01 ->
> __osm_pr_rcv_get_path_parms: ERR 1F07: Dead end on path to LID 0x6F
> from switch for GUID 0x00066a00d9000ac8
> 
> Is there any way we can handle this hardware config via subnet management?

I'm only guessing, but here's what I understand from your description:
You have 8 spine switches, and 12 leaf switches.
ANY of the spine switches is connected to ALL the leaf switches.
You have compute nodes connected to ALL the leaf switches.
You have some management/IO nodes connected to SEVERAL spine switches.

Am I right so far?

You get credit loops because of the traffic between management/IO nodes.
Up/Down routing with root nodes list doesn't solve you problem - it
prevents credit loops, but this is only because it doesn't connect
those management/IO nodes (hence the error that you see in the OSM log).

The real solution would be changing the topology.

If it's not an option, you can select a SINGLE leaf switch as a root
node, and run Up/Down routing with root guid list with this leaf switch
as a root. This is bad for BW, but it will solve the problem.

-- Yevgeny

> Thanks,
> 
> Reid O. 		 	   		
>   		 	   		
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2010-11-29  9:48 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-28 22:18 Subnet management on non pure fat-tree network‏ Reid O
     [not found] ` <SNT140-w638ED9C5E0455EBAEBC0FEF1230-MsuGFMq8XAE@public.gmane.org>
2010-11-29  9:48   ` Yevgeny Kliteynik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF37709.3010008@dev.mellanox.co.il \
    --to=kliteyn-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
    --cc=hpc_reid-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).