From: "brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org" <brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Pramod Gunjikar
<pramod.gunjikar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected]
Date: Fri, 15 Mar 2013 01:26:34 +0000 [thread overview]
Message-ID: <514278CA.8010809@oracle.com> (raw)
In-Reply-To: <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
On 03/15/13 01:23 AM, brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote:
>
> So, here is the history...
>
> There is an oracle application used to monitor the health of a node
> by calling ib_resolve_portid_str_via(node GUID).
>
> We observed that a call to ib_resolve_portid_str_via() which specified
> the use of an unconnected port to issue the query was returning -1
> with errno as 0. The application interpreted this as the query succeeded
> but no paths to the specified node were found and assumed the node
> was dead. This obviously was not the case, we could not send the query
> because the port was down which is very different to the queried
> node being down. But the API does not provide a means to distinguish
> between these two failure modes. Hence the reason for updating
> errno to allow these two failure modes to be distinguished.
>
> Now in terms of the detail of the failure and the reason for the other
> changes. The call to ib_resolve_portid_str_via() did not specify an smid,
> so ib_resolve_portid_str_via() calls ib_resolve_guid_via() which first
> tries to resolve the SM LID using the unconnected port:
>
> if (!sm_id) {
> sm_id = &sm_portid;
> if (ib_resolve_smlid_via(sm_id, timeout, srcport) < 0)
> return -1;
> }
>
> The call to this succeeds (return value is not -1) and indicates that the
> SM LID is 0, which is of course wrong. Even though port 2 is not
> connected, it's SMA is still operational, It receives the get PORT_INFO
> SMP (sent as a result of the ib_resolve_smlid_via()) , and returns it
> successfully as the value it reads from the adapters PORT_INFO
> which happens to be 0. ib_resolve_smlid_via() assumes everything is OK,
> it does not bother to check the value of the SM LID returned. We then
> try and contact the SM at LID 0!
>
> if ((portid->lid =
> ib_path_query_via(srcport, selfgid, portid->gid, sm_id,
> buf)) < 0)
> return -1;
>
> And this obviously fails, and so ib_resolve_portid_str_via() returns -1
> errno 0, and the app thinks the node is down. You could argue that
> we should not specify an unconnected port in the call to
> ib_resolve_portid_str_via(),
> but the changes harden the code, and allow us to distinguish between
> no path to the node or could not issue the SA query.
>
> Rdgs
>
> Brendan
>
>
>
>
> On 03/13/13 08:29 PM, Boris Chiu wrote:
>> fyi,
>>
>> Boris
>>
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] libibmad: Fixes for failures when not all ports
>> of HCA are connected
>> Date: Wed, 13 Mar 2013 12:36:05 -0700
>> From: Ira Weiny <iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> References: <513F83FD.1090106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>
>>
>>
>> Your commit message does not seem to have anything to do with the
>> patch. Could you explain how returning errno from these functions
>> "Fixes for failures when not all ports of HCA are connected"?
>>
>> Furthermore, I'm reluctant to modify errno in this library. It is not
>> documented and in general is poor form. I realize that the interface
>> does not currently allow for an alternative. :-(
>>
>> More comments below.
>>
>> On Tue, Mar 12, 2013 at 12:37 PM, Boris Chiu<boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>> > From: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> >
>> > Signed-off-by: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> > ---
>> > src/resolve.c | 22 ++++++++++++++++++----
>> > src/rpc.c | 1 +
>> > src/sa.c | 2 ++
>> > 3 files changed, 21 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/src/resolve.c b/src/resolve.c
>> > index f866bf4..ab24c79 100644
>> > --- a/src/resolve.c
>> > +++ b/src/resolve.c
>> > @@ -40,6 +40,7 @@
>> > #include<stdlib.h>
>> > #include<string.h>
>> > #include<arpa/inet.h>
>> > +#include<errno.h>
>> >
>> > #include<infiniband/umad.h>
>> > #include<infiniband/mad.h>
>> > @@ -57,10 +58,18 @@ int ib_resolve_smlid_via(ib_portid_t * sm_id, int
>> > timeout,
>> >
>> > memset(sm_id, 0, sizeof(*sm_id));
>> >
>> > - if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0,
>> > srcport))
>> > + if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0,
>> > srcport)) {
>> > + if (!errno)
>> > + errno = EIO;
>> > return -1;
>> > + }
>> >
>> > mad_decode_field(portinfo, IB_PORT_SMLID_F,&lid);
>> > + if (lid == 0) {
>> > + if (!errno)
>> > + errno = EIO;
>> > + return -1;
>> > + }
>>
>> This may not be an error. A port which is down only requires
>> PortState and PortPhyState to be valid.
>>
>> > mad_decode_field(portinfo, IB_PORT_SMSL_F,&sm_id->sl);
>> >
>> > return ib_portid_set(sm_id, lid, 0, 0);
>> > @@ -95,21 +104,26 @@ int ib_resolve_guid_via(ib_portid_t * portid, uint64_t
>> > * guid,
>> > ib_portid_t * sm_id, int timeout,
>> > const struct ibmad_port *srcport)
>> > {
>> > - ib_portid_t sm_portid;
>> > + ib_portid_t sm_portid = { 0 };
>> > uint8_t buf[IB_SA_DATA_SIZE] = { 0 };
>> > ib_portid_t self = { 0 };
>> > uint64_t selfguid, prefix;
>> > ibmad_gid_t selfgid;
>> > uint8_t nodeinfo[64];
>> >
>> > - if (!sm_id) {
>> > + if (!sm_id)
>> > sm_id =&sm_portid;
>> > +
>> > + if (!sm_id->lid) {
>> > if (ib_resolve_smlid_via(sm_id, timeout, srcport)< 0)
>> > return -1;
>> > }
>>
>> If you want ib_resolve_guid_via to resolve the SM for you, sm_id
>> should be set to NULL. I don't see a reason to support an "invalid"
>> sm_id port id.
>>
>> Another note is that I have converted the diags to use internal
>> functions which used umad* and SA queries to resolve GUID's. This is
>> more appropriate in the long term.
>>
>> Ira
>>
>> >
>> > - if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0,
>> > srcport))
>> > + if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0,
>> > srcport)) {
>> > + if (!errno)
>> > + errno = EIO;
>> > return -1;
>> > + }
>> > mad_decode_field(nodeinfo, IB_NODE_PORT_GUID_F,&selfguid);
>> > mad_set_field64(selfgid, 0, IB_GID_PREFIX_F,
>> > IB_DEFAULT_SUBN_PREFIX);
>> > mad_set_field64(selfgid, 0, IB_GID_GUID_F, selfguid);
>> > diff --git a/src/rpc.c b/src/rpc.c
>> > index 6312d42..cf2b60d 100644
>> > --- a/src/rpc.c
>> > +++ b/src/rpc.c
>> > @@ -178,6 +178,7 @@ _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int
>> > agentid, int len,
>> > IB_MAD_TRID_F) != trid);
>> >
>> > status = umad_status(rcvbuf);
>> > + errno = status;
>> > if (!status)
>> > return length; /* done */
>> > if (status == ENOMEM)
>> > diff --git a/src/sa.c b/src/sa.c
>> > index 352ed9f..367da2a 100644
>> > --- a/src/sa.c
>> > +++ b/src/sa.c
>> > @@ -38,6 +38,7 @@
>> > #include<stdio.h>
>> > #include<stdlib.h>
>> > #include<string.h>
>> > +#include<errno.h>
>> >
>> > #include<infiniband/mad.h>
>> > #include "mad_internal.h"
>> > @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port *ibmad_port,
>> > void *rcvbuf,
>> >
>> > if (portid->lid<= 0) {
>> > IBWARN("only lid routes are supported");
>> > + errno = EIO;
>> > return NULL;
>> > }
>> >
>> > --
>> > 1.7.1
>> >
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next parent reply other threads:[~2013-03-15 1:26 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <5140E1A3.9070706@oracle.com>
[not found] ` <51427819.7000505@oracle.com>
[not found] ` <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-15 1:26 ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA [this message]
[not found] ` <514278CA.8010809-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 1:27 ` [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB3F62-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 18:35 ` brendan doyle
[not found] ` <514A0156.2070009-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 19:02 ` Jason Gunthorpe
[not found] ` <20130320190208.GA23478-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 21:52 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4214-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 22:00 ` brendan doyle
[not found] ` <514A3169.7000501-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 22:24 ` Jason Gunthorpe
[not found] ` <20130320222422.GA30100-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 22:44 ` brendan doyle
[not found] ` <514A3BDF.2090105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 23:19 ` Jason Gunthorpe
[not found] ` <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 23:30 ` Hefty, Sean
2013-03-21 1:01 ` brendan doyle
[not found] ` <514A5C07.3080308-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 5:21 ` Jason Gunthorpe
[not found] ` <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 20:37 ` brendan doyle
[not found] ` <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 21:27 ` Jason Gunthorpe
[not found] ` <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 21:58 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AF5-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:07 ` Jason Gunthorpe
[not found] ` <20130321220751.GG8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:46 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:49 ` Weiny, Ira
2013-03-21 22:50 ` Jason Gunthorpe
[not found] ` <20130321225018.GA9749-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:53 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:58 ` Jason Gunthorpe
[not found] ` <20130321225858.GA9924-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 23:05 ` Weiny, Ira
2013-03-21 23:04 ` brendan doyle
[not found] ` <514B9215.2000106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 23:06 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4BBF-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-26 23:22 ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA
2013-03-21 22:07 ` brendan doyle
2013-03-21 21:30 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AB2-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:47 ` Jason Gunthorpe
[not found] ` <20130321214725.GD8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:12 ` brendan doyle
2013-03-21 21:37 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4ACB-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:48 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=514278CA.8010809@oracle.com \
--to=brendan.doyle-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=pramod.gunjikar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox