From: "brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org" <brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Pramod Gunjikar
<pramod.gunjikar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected]
Date: Fri, 15 Mar 2013 01:26:34 +0000 [thread overview]
Message-ID: <514278CA.8010809@oracle.com> (raw)
In-Reply-To: <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
On 03/15/13 01:23 AM, brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote:
>
> So, here is the history...
>
> There is an oracle application used to monitor the health of a node
> by calling ib_resolve_portid_str_via(node GUID).
>
> We observed that a call to ib_resolve_portid_str_via() which specified
> the use of an unconnected port to issue the query was returning -1
> with errno as 0. The application interpreted this as the query succeeded
> but no paths to the specified node were found and assumed the node
> was dead. This obviously was not the case, we could not send the query
> because the port was down which is very different to the queried
> node being down. But the API does not provide a means to distinguish
> between these two failure modes. Hence the reason for updating
> errno to allow these two failure modes to be distinguished.
>
> Now in terms of the detail of the failure and the reason for the other
> changes. The call to ib_resolve_portid_str_via() did not specify an smid,
> so ib_resolve_portid_str_via() calls ib_resolve_guid_via() which first
> tries to resolve the SM LID using the unconnected port:
>
> if (!sm_id) {
> sm_id = &sm_portid;
> if (ib_resolve_smlid_via(sm_id, timeout, srcport) < 0)
> return -1;
> }
>
> The call to this succeeds (return value is not -1) and indicates that the
> SM LID is 0, which is of course wrong. Even though port 2 is not
> connected, it's SMA is still operational, It receives the get PORT_INFO
> SMP (sent as a result of the ib_resolve_smlid_via()) , and returns it
> successfully as the value it reads from the adapters PORT_INFO
> which happens to be 0. ib_resolve_smlid_via() assumes everything is OK,
> it does not bother to check the value of the SM LID returned. We then
> try and contact the SM at LID 0!
>
> if ((portid->lid =
> ib_path_query_via(srcport, selfgid, portid->gid, sm_id,
> buf)) < 0)
> return -1;
>
> And this obviously fails, and so ib_resolve_portid_str_via() returns -1
> errno 0, and the app thinks the node is down. You could argue that
> we should not specify an unconnected port in the call to
> ib_resolve_portid_str_via(),
> but the changes harden the code, and allow us to distinguish between
> no path to the node or could not issue the SA query.
>
> Rdgs
>
> Brendan
>
>
>
>
> On 03/13/13 08:29 PM, Boris Chiu wrote:
>> fyi,
>>
>> Boris
>>
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] libibmad: Fixes for failures when not all ports
>> of HCA are connected
>> Date: Wed, 13 Mar 2013 12:36:05 -0700
>> From: Ira Weiny <iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> References: <513F83FD.1090106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>
>>
>>
>> Your commit message does not seem to have anything to do with the
>> patch. Could you explain how returning errno from these functions
>> "Fixes for failures when not all ports of HCA are connected"?
>>
>> Furthermore, I'm reluctant to modify errno in this library. It is not
>> documented and in general is poor form. I realize that the interface
>> does not currently allow for an alternative. :-(
>>
>> More comments below.
>>
>> On Tue, Mar 12, 2013 at 12:37 PM, Boris Chiu<boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>> > From: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> >
>> > Signed-off-by: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> > ---
>> > src/resolve.c | 22 ++++++++++++++++++----
>> > src/rpc.c | 1 +
>> > src/sa.c | 2 ++
>> > 3 files changed, 21 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/src/resolve.c b/src/resolve.c
>> > index f866bf4..ab24c79 100644
>> > --- a/src/resolve.c
>> > +++ b/src/resolve.c
>> > @@ -40,6 +40,7 @@
>> > #include<stdlib.h>
>> > #include<string.h>
>> > #include<arpa/inet.h>
>> > +#include<errno.h>
>> >
>> > #include<infiniband/umad.h>
>> > #include<infiniband/mad.h>
>> > @@ -57,10 +58,18 @@ int ib_resolve_smlid_via(ib_portid_t * sm_id, int
>> > timeout,
>> >
>> > memset(sm_id, 0, sizeof(*sm_id));
>> >
>> > - if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0,
>> > srcport))
>> > + if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0,
>> > srcport)) {
>> > + if (!errno)
>> > + errno = EIO;
>> > return -1;
>> > + }
>> >
>> > mad_decode_field(portinfo, IB_PORT_SMLID_F,&lid);
>> > + if (lid == 0) {
>> > + if (!errno)
>> > + errno = EIO;
>> > + return -1;
>> > + }
>>
>> This may not be an error. A port which is down only requires
>> PortState and PortPhyState to be valid.
>>
>> > mad_decode_field(portinfo, IB_PORT_SMSL_F,&sm_id->sl);
>> >
>> > return ib_portid_set(sm_id, lid, 0, 0);
>> > @@ -95,21 +104,26 @@ int ib_resolve_guid_via(ib_portid_t * portid, uint64_t
>> > * guid,
>> > ib_portid_t * sm_id, int timeout,
>> > const struct ibmad_port *srcport)
>> > {
>> > - ib_portid_t sm_portid;
>> > + ib_portid_t sm_portid = { 0 };
>> > uint8_t buf[IB_SA_DATA_SIZE] = { 0 };
>> > ib_portid_t self = { 0 };
>> > uint64_t selfguid, prefix;
>> > ibmad_gid_t selfgid;
>> > uint8_t nodeinfo[64];
>> >
>> > - if (!sm_id) {
>> > + if (!sm_id)
>> > sm_id =&sm_portid;
>> > +
>> > + if (!sm_id->lid) {
>> > if (ib_resolve_smlid_via(sm_id, timeout, srcport)< 0)
>> > return -1;
>> > }
>>
>> If you want ib_resolve_guid_via to resolve the SM for you, sm_id
>> should be set to NULL. I don't see a reason to support an "invalid"
>> sm_id port id.
>>
>> Another note is that I have converted the diags to use internal
>> functions which used umad* and SA queries to resolve GUID's. This is
>> more appropriate in the long term.
>>
>> Ira
>>
>> >
>> > - if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0,
>> > srcport))
>> > + if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0,
>> > srcport)) {
>> > + if (!errno)
>> > + errno = EIO;
>> > return -1;
>> > + }
>> > mad_decode_field(nodeinfo, IB_NODE_PORT_GUID_F,&selfguid);
>> > mad_set_field64(selfgid, 0, IB_GID_PREFIX_F,
>> > IB_DEFAULT_SUBN_PREFIX);
>> > mad_set_field64(selfgid, 0, IB_GID_GUID_F, selfguid);
>> > diff --git a/src/rpc.c b/src/rpc.c
>> > index 6312d42..cf2b60d 100644
>> > --- a/src/rpc.c
>> > +++ b/src/rpc.c
>> > @@ -178,6 +178,7 @@ _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int
>> > agentid, int len,
>> > IB_MAD_TRID_F) != trid);
>> >
>> > status = umad_status(rcvbuf);
>> > + errno = status;
>> > if (!status)
>> > return length; /* done */
>> > if (status == ENOMEM)
>> > diff --git a/src/sa.c b/src/sa.c
>> > index 352ed9f..367da2a 100644
>> > --- a/src/sa.c
>> > +++ b/src/sa.c
>> > @@ -38,6 +38,7 @@
>> > #include<stdio.h>
>> > #include<stdlib.h>
>> > #include<string.h>
>> > +#include<errno.h>
>> >
>> > #include<infiniband/mad.h>
>> > #include "mad_internal.h"
>> > @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port *ibmad_port,
>> > void *rcvbuf,
>> >
>> > if (portid->lid<= 0) {
>> > IBWARN("only lid routes are supported");
>> > + errno = EIO;
>> > return NULL;
>> > }
>> >
>> > --
>> > 1.7.1
>> >
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next parent reply other threads:[~2013-03-15 1:26 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <5140E1A3.9070706@oracle.com>
[not found] ` <51427819.7000505@oracle.com>
[not found] ` <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-15 1:26 ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA [this message]
[not found] ` <514278CA.8010809-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 1:27 ` [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB3F62-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 18:35 ` brendan doyle
[not found] ` <514A0156.2070009-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 19:02 ` Jason Gunthorpe
[not found] ` <20130320190208.GA23478-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 21:52 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4214-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 22:00 ` brendan doyle
[not found] ` <514A3169.7000501-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 22:24 ` Jason Gunthorpe
[not found] ` <20130320222422.GA30100-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 22:44 ` brendan doyle
[not found] ` <514A3BDF.2090105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 23:19 ` Jason Gunthorpe
[not found] ` <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 23:30 ` Hefty, Sean
2013-03-21 1:01 ` brendan doyle
[not found] ` <514A5C07.3080308-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 5:21 ` Jason Gunthorpe
[not found] ` <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 20:37 ` brendan doyle
[not found] ` <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 21:27 ` Jason Gunthorpe
[not found] ` <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 21:58 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AF5-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:07 ` Jason Gunthorpe
[not found] ` <20130321220751.GG8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:46 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:49 ` Weiny, Ira
2013-03-21 22:50 ` Jason Gunthorpe
[not found] ` <20130321225018.GA9749-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:53 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:58 ` Jason Gunthorpe
[not found] ` <20130321225858.GA9924-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 23:05 ` Weiny, Ira
2013-03-21 23:04 ` brendan doyle
[not found] ` <514B9215.2000106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 23:06 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4BBF-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-26 23:22 ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA
2013-03-21 22:07 ` brendan doyle
2013-03-21 21:30 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AB2-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:47 ` Jason Gunthorpe
[not found] ` <20130321214725.GD8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:12 ` brendan doyle
2013-03-21 21:37 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4ACB-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:48 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=514278CA.8010809@oracle.com \
--to=brendan.doyle-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=pramod.gunjikar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.