From: Hal Rosenstock <hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Jens Domke <domke.j.aa@m.titech.ac.jp>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Torsten Hoefler <htor-gy3b+zu4XSAfv37vnLkPlQ@public.gmane.org>
Subject: Re: umad_send with service level higher than 0 does not work
Date: Fri, 14 Dec 2012 11:42:17 -0500 [thread overview]
Message-ID: <50CB56E9.70900@dev.mellanox.co.il> (raw)
In-Reply-To: <53BC3D57-0D23-488F-A3A5-DFB2EEAB3016@m.titech.ac.jp>
Hi again,
On 12/14/2012 10:17 AM, Jens Domke wrote:
> Hello Hal,
>
> thank you for the fast response. I will try to clarify some points.
>
>>> d) OpenMPI runs are executed with "--mca btl_openib_ib_path_record_service_level 1"
>>
>> I'm not familiar with what DFSSSP does to figure out SLs exactly but
>> there should be no need to set this. The proper SL for querying the SA
>> for PathRecords, etc. is always in PortInfo.SMSL. In the case of DFSSSP
>> (and other QoS based routing algorithms), it calculates that and the SM
>> pushes this into each port. That should be used. It's possible that SL1
>> is not a valid SL for port <-> SA querying using DFSSSP.
> The OpenMPI parameter btl_openib_ib_path_record_service_level does not specify the SL for querying the PathRecords.
> It just enables the functionality. And the ompi processes use the PortInfo.SMSL to send the request.
> For the request "port -> SA" every 0<=SL<=7 was used in the test, and the SA received the requests.
>>
>>> e) kernel 2.6.32-220.13.1.el6.x86_64
>>>
>>> As far as I understand the whole system:
>>> 1. the OMPI processes are sending MAD requests (SubnAdmGet:PathRecord) to the OpenSM
>>> 2. the SA receives the request on QP1
>>
>> There is the SL in the query itself. This should be the SMSL that the SM
>> set for that port.
> Hmm, there you might have a point. I think I saw that the query itself had SL=0 specified.
> In fact OpenMPI sets everthing to 0 except for slid and dlid.
>>
>>> 3. SA asks the routing algorithm (like LASH, DFSSSP or Torus_2QoS) about a special service level for the slid/dlid path
>>
>> This is a (potentially) different SL (for MPI<->MPI port communication)
>> than the one the query used and is the one returned inside the
>> PathRecord attribute/data.
> Yes, it can be different, but DFSSSP sets the same SL, because the SM is running on a port which is also used for MPI comm.
With DFSSSP are all SLs same from source port to get to any destination ?
>>
>>> 4. SA sends the PathRecord back to the OMPI process via umad_send in libvendor/osm_vendor_ibumad.c
>>
>> By the response reversibility rule, I think this is returned on the SL
>> of the original query but haven't verified this in the code base yet.
> Ok, I was not aware of that rule. But if this is true, then the SA should also be able to send via SL>0.
I doubled checked and indeed the SA response does use the SL that the
incoming request was received on.
>>
>>> The osm_vendor_send() function builds the MAD packet with the following attributes:
>>> /* GS classes */
>>> umad_set_addr_net(p_vw->umad, p_mad_addr->dest_lid,
>>> p_mad_addr->addr_type.gsi.remote_qp,
>>> p_mad_addr->addr_type.gsi.service_level,
>>> IB_QP1_WELL_KNOWN_Q_KEY);
>>> So, the SL is the same like the one which was used by the OMPI process. The Q_Key matches the Q_key on the OMPI process, and remote_qp and dest_lid is correct, too.
>>> Afterwards umad_send(…) is used to send the reply with the PathRecord, and this send does not work (except for SL=0).
>>
>> By not working, what do you mean ? Do you mean it's not received at the
>> requester with no message in the OpenSM log or not received at the
>> OpenSM or something else ? It could be due to the wrong SL being used in
>> the original request (forcing it to SL 1). That could cause it not to be
>> received at the SM or the response not to make it back to the requester
>> from the SA if the SL used is not "reversible".
> By "not working" I mean, that the MPI process does not receive any response from the SA.
> I get messages from the MPI process like the following:
> [rc011][[14851,1],1][connect/btl_openib_connect_sl.c:301:get_pathrecord_info] No response from SA after 20 retries
> The log of OpenSM shows that the SA received the PathRequest query, dumps the query into the log, and sends the reply back.
> And I think I was some messages in the log about "…1 outstanding MAD…".
>>
>>> If I look into the MAD before it is send, then it looks like this:
>>> Breakpoint 2, umad_send (fd=9, agentid=2, umad=0x7fffe8012530, length=120, timeout_ms=0, retries=3)
>>> at src/umad.c:791
>>> 791 if (umaddebug > 1)
>>> (gdb) p *mad
>>> $1 = {agent_id = 2, status = 0, timeout_ms = 0, retries = 3, length = 0, addr = {qpn = 1325427712, qkey = 384,
>>> lid = 4096, sl = 6 '\006', path_bits = 0 '\000', grh_present = 0 '\000', gid_index = 0 '\000',
>>> hop_limit = 0 '\000', traffic_class = 0 '\000', gid = '\000' <repeats 15 times>, flow_label = 0,
>>> pkey_index = 0, reserved = "\000\000\000\000\000"}, data = 0x7fffe8012530 "\002"}
>>
>> Is this the PathRecord query on the OpenMPI side or the response on the
>> OpenSM side ? SL is 6 rather than 1 here.
> This is the response on the OpenSM side (inside the umad_send function, right before it is written to the device with write(fd, …).
> SL=6 indicates, that the MPI process was sending the request on SL 6.
What is SMSL for the requester ? Was it SL 6 ?
One would need to walk the SLToVLMappingTables from requester (OMPI
port) to SA and back to see whether SL6 would even have a chance of
working (not dropping) aside from whether it's really the correct SL to use.
-- Hal
>>
>>> The output of OpenMPI or OpenSM's log file don't show any useful information for this problem, even with higher debug levels.
>>
>> So nothing interesting logged relative to the PathRecord queries ?
> In the OpenSM log, only that it was received, how the request looks like, and that it was send back.
> And a few "outstanding MADs" a few lines later in the log.
>>
>>> So, right now I'm stuck, and have no idea if there is an error in the kernel driver, the HCA firmware or something completely different. Or if umad_send basically does not support SL>0.
>>> A workaround for the moment is to set the SL in the umad_set_addr_net(...) call to 0.
>>
>> So SL 0 works between all nodes and SA for querying/responses. Wonder if
>> that's how SMSL is set by DFSSSP.
> No, the SMSL set by DFSSSP is different from 0, I have checked this. In our case (OpenSM running on a compute node), it sets the same SL, which is used
for MPI<->MPI traffic, to ensure deadlock freedom.
>
> Regards
> Jens
>
> --------------------------------
> Dipl.-Math. Jens Domke
> Researcher - Tokyo Institute of Technology
> Satoshi MATSUOKA Laboratory
> Global Scientific Information and Computing Center
> 2-12-1-E2-7 Ookayama, Meguro-ku,
> Tokyo, 152-8550, JAPAN
> Tel/Fax: +81-3-5734-3876
> E-Mail: domke.j.aa@m.titech.ac.jp
> --------------------------------
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-12-14 16:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-14 12:18 umad_send with service level higher than 0 does not work Jens Domke
2012-12-14 13:47 ` Hal Rosenstock
[not found] ` <50CB2DF3.7020409-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-12-14 15:17 ` Jens Domke
2012-12-14 16:42 ` Hal Rosenstock [this message]
[not found] ` <50CB56E9.70900-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-12-14 18:24 ` Jens Domke
2012-12-14 18:58 ` Hal Rosenstock
[not found] ` <50CB76F2.70003-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-12-14 20:32 ` Jens Domke
2012-12-14 20:44 ` Hal Rosenstock
[not found] ` <50CB8F90.1030701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-12-16 12:03 ` Jens Domke
2012-12-16 12:32 ` Hal Rosenstock
[not found] ` <50CDBF61.3080100-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-12-16 13:25 ` Hal Rosenstock
2012-12-16 13:39 ` Jens Domke
2012-12-16 13:48 ` Hal Rosenstock
[not found] ` <50CDD114.2090706-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-12-16 14:59 ` Jens Domke
2012-12-17 6:16 ` Jens Domke
2012-12-17 12:04 ` Hal Rosenstock
[not found] ` <50CF0A33.1030809-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-12-18 2:26 ` Jens Domke
2012-12-14 18:17 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50CB56E9.70900@dev.mellanox.co.il \
--to=hal-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
--cc=domke.j.aa@m.titech.ac.jp \
--cc=htor-gy3b+zu4XSAfv37vnLkPlQ@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.