public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: "brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org" <brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Pramod Gunjikar
	<pramod.gunjikar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected]
Date: Fri, 15 Mar 2013 01:26:34 +0000	[thread overview]
Message-ID: <514278CA.8010809@oracle.com> (raw)
In-Reply-To: <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On 03/15/13 01:23 AM, brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote:
>
> So, here is the history...
>
> There is an oracle application used to monitor the health of a node
> by calling ib_resolve_portid_str_via(node GUID).
>
> We observed that a call to ib_resolve_portid_str_via() which specified
> the use of an unconnected port to issue the query was returning -1
> with errno as 0. The application interpreted this as the query succeeded
> but no paths to the specified node were found and assumed the node
> was dead. This obviously was not the case, we could not send the query
> because the port was down which is very different to the queried
> node being down. But the API does not provide a means to distinguish
> between these two failure modes. Hence the reason for updating
> errno to allow these two failure modes to be distinguished.
>
> Now in terms of the detail of the failure and the reason for the other
> changes. The call to ib_resolve_portid_str_via() did not specify an smid,
> so ib_resolve_portid_str_via() calls ib_resolve_guid_via() which first
> tries to resolve the SM LID using the unconnected port:
>
> if (!sm_id) {
>                 sm_id = &sm_portid;
>                 if (ib_resolve_smlid_via(sm_id, timeout, srcport) < 0)
>                         return -1;
>         }
>
> The call to this succeeds (return value is not -1) and indicates that the
> SM LID is 0, which is of course wrong. Even though  port 2 is not
> connected, it's SMA is still operational, It receives the get PORT_INFO
> SMP (sent as a result of the ib_resolve_smlid_via()) , and returns it
> successfully as the value it reads from the adapters PORT_INFO
> which happens to be 0. ib_resolve_smlid_via() assumes everything is OK,
> it does not bother to check the value of the SM LID returned. We then
> try and contact the SM at LID 0!
>
> if ((portid->lid =
>              ib_path_query_via(srcport, selfgid, portid->gid, sm_id, 
> buf)) < 0)
>                 return -1;
>
> And this obviously fails, and so ib_resolve_portid_str_via() returns -1
> errno 0, and the app thinks the node is down. You could argue that
> we should not specify an unconnected port in the call to 
> ib_resolve_portid_str_via(),
> but the changes harden the code, and allow us to distinguish between
> no path to the node or could not issue the SA query.
>
> Rdgs
>
> Brendan
>
>
>
>
> On 03/13/13 08:29 PM, Boris Chiu wrote:
>> fyi,
>>
>> Boris
>>
>>
>> -------- Original Message --------
>> Subject: 	Re: [PATCH] libibmad: Fixes for failures when not all ports 
>> of HCA are connected
>> Date: 	Wed, 13 Mar 2013 12:36:05 -0700
>> From: 	Ira Weiny <iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> To: 	Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> CC: 	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> References: 	<513F83FD.1090106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>
>>
>>
>> Your commit message does not seem to have anything to do with the
>> patch.  Could you explain how returning errno from these functions
>> "Fixes for failures when not all ports of HCA are connected"?
>>
>> Furthermore, I'm reluctant to modify errno in this library.  It is not
>> documented and in general is poor form.  I realize that the interface
>> does not currently allow for an alternative.  :-(
>>
>> More comments below.
>>
>> On Tue, Mar 12, 2013 at 12:37 PM, Boris Chiu<boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>  wrote:
>> >  From: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> >
>> >  Signed-off-by: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> >  ---
>> >  src/resolve.c |   22 ++++++++++++++++++----
>> >  src/rpc.c     |    1 +
>> >  src/sa.c      |    2 ++
>> >  3 files changed, 21 insertions(+), 4 deletions(-)
>> >
>> >  diff --git a/src/resolve.c b/src/resolve.c
>> >  index f866bf4..ab24c79 100644
>> >  --- a/src/resolve.c
>> >  +++ b/src/resolve.c
>> >  @@ -40,6 +40,7 @@
>> >  #include<stdlib.h>
>> >  #include<string.h>
>> >  #include<arpa/inet.h>
>> >  +#include<errno.h>
>> >
>> >  #include<infiniband/umad.h>
>> >  #include<infiniband/mad.h>
>> >  @@ -57,10 +58,18 @@ int ib_resolve_smlid_via(ib_portid_t * sm_id, int
>> >  timeout,
>> >
>> >          memset(sm_id, 0, sizeof(*sm_id));
>> >
>> >  -       if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0,
>> >  srcport))
>> >  +       if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0,
>> >  srcport)) {
>> >  +               if (!errno)
>> >  +                       errno = EIO;
>> >                  return -1;
>> >  +       }
>> >
>> >          mad_decode_field(portinfo, IB_PORT_SMLID_F,&lid);
>> >  +       if (lid == 0) {
>> >  +               if (!errno)
>> >  +                       errno = EIO;
>> >  +               return -1;
>> >  +       }
>>
>> This may not be an error.  A port which is down only requires
>> PortState and PortPhyState to be valid.
>>
>> >          mad_decode_field(portinfo, IB_PORT_SMSL_F,&sm_id->sl);
>> >
>> >          return ib_portid_set(sm_id, lid, 0, 0);
>> >  @@ -95,21 +104,26 @@ int ib_resolve_guid_via(ib_portid_t * portid, uint64_t
>> >  * guid,
>> >                          ib_portid_t * sm_id, int timeout,
>> >                          const struct ibmad_port *srcport)
>> >  {
>> >  -       ib_portid_t sm_portid;
>> >  +       ib_portid_t sm_portid = { 0 };
>> >          uint8_t buf[IB_SA_DATA_SIZE] = { 0 };
>> >          ib_portid_t self = { 0 };
>> >          uint64_t selfguid, prefix;
>> >          ibmad_gid_t selfgid;
>> >          uint8_t nodeinfo[64];
>> >
>> >  -       if (!sm_id) {
>> >  +       if (!sm_id)
>> >                  sm_id =&sm_portid;
>> >  +
>> >  +       if (!sm_id->lid) {
>> >                  if (ib_resolve_smlid_via(sm_id, timeout, srcport)<  0)
>> >                          return -1;
>> >          }
>>
>> If you want ib_resolve_guid_via to resolve the SM for you, sm_id
>> should be set to NULL.  I don't see a reason to support an "invalid"
>> sm_id port id.
>>
>> Another note is that I have converted the diags to use internal
>> functions which used umad* and SA queries to resolve GUID's.  This is
>> more appropriate in the long term.
>>
>> Ira
>>
>> >
>> >  -       if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0,
>> >  srcport))
>> >  +       if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0,
>> >  srcport)) {
>> >  +               if (!errno)
>> >  +                       errno = EIO;
>> >                  return -1;
>> >  +       }
>> >          mad_decode_field(nodeinfo, IB_NODE_PORT_GUID_F,&selfguid);
>> >          mad_set_field64(selfgid, 0, IB_GID_PREFIX_F,
>> >  IB_DEFAULT_SUBN_PREFIX);
>> >          mad_set_field64(selfgid, 0, IB_GID_GUID_F, selfguid);
>> >  diff --git a/src/rpc.c b/src/rpc.c
>> >  index 6312d42..cf2b60d 100644
>> >  --- a/src/rpc.c
>> >  +++ b/src/rpc.c
>> >  @@ -178,6 +178,7 @@ _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int
>> >  agentid, int len,
>> >                                           IB_MAD_TRID_F) != trid);
>> >
>> >                  status = umad_status(rcvbuf);
>> >  +               errno = status;
>> >                  if (!status)
>> >                          return length;  /* done */
>> >                  if (status == ENOMEM)
>> >  diff --git a/src/sa.c b/src/sa.c
>> >  index 352ed9f..367da2a 100644
>> >  --- a/src/sa.c
>> >  +++ b/src/sa.c
>> >  @@ -38,6 +38,7 @@
>> >  #include<stdio.h>
>> >  #include<stdlib.h>
>> >  #include<string.h>
>> >  +#include<errno.h>
>> >
>> >  #include<infiniband/mad.h>
>> >  #include "mad_internal.h"
>> >  @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port *ibmad_port,
>> >  void *rcvbuf,
>> >
>> >          if (portid->lid<= 0) {
>> >                  IBWARN("only lid routes are supported");
>> >  +               errno = EIO;
>> >                  return NULL;
>> >          }
>> >
>> >  --
>> >  1.7.1
>> >
>> >
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

       reply	other threads:[~2013-03-15  1:26 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5140E1A3.9070706@oracle.com>
     [not found] ` <51427819.7000505@oracle.com>
     [not found]   ` <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-15  1:26     ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA [this message]
     [not found]       ` <514278CA.8010809-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20  1:27         ` [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] Weiny, Ira
     [not found]           ` <2807E5FD2F6FDA4886F6618EAC48510EBB3F62-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 18:35             ` brendan doyle
     [not found]               ` <514A0156.2070009-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 19:02                 ` Jason Gunthorpe
     [not found]                   ` <20130320190208.GA23478-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 21:52                     ` Weiny, Ira
     [not found]                       ` <2807E5FD2F6FDA4886F6618EAC48510EBB4214-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 22:00                         ` brendan doyle
     [not found]                           ` <514A3169.7000501-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 22:24                             ` Jason Gunthorpe
     [not found]                               ` <20130320222422.GA30100-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 22:44                                 ` brendan doyle
     [not found]                                   ` <514A3BDF.2090105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 23:19                                     ` Jason Gunthorpe
     [not found]                                       ` <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 23:30                                         ` Hefty, Sean
2013-03-21  1:01                                         ` brendan doyle
     [not found]                                           ` <514A5C07.3080308-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21  5:21                                             ` Jason Gunthorpe
     [not found]                                               ` <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 20:37                                                 ` brendan doyle
     [not found]                                                   ` <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 21:27                                                     ` Jason Gunthorpe
     [not found]                                                       ` <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 21:58                                                         ` Weiny, Ira
     [not found]                                                           ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AF5-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:07                                                             ` Jason Gunthorpe
     [not found]                                                               ` <20130321220751.GG8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:46                                                                 ` Weiny, Ira
     [not found]                                                                   ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:49                                                                     ` Weiny, Ira
2013-03-21 22:50                                                                     ` Jason Gunthorpe
     [not found]                                                                       ` <20130321225018.GA9749-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:53                                                                         ` Weiny, Ira
     [not found]                                                                           ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:58                                                                             ` Jason Gunthorpe
     [not found]                                                                               ` <20130321225858.GA9924-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 23:05                                                                                 ` Weiny, Ira
2013-03-21 23:04                                                                             ` brendan doyle
     [not found]                                                                               ` <514B9215.2000106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 23:06                                                                                 ` Weiny, Ira
     [not found]                                                                                   ` <2807E5FD2F6FDA4886F6618EAC48510EBB4BBF-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-26 23:22                                                                                     ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA
2013-03-21 22:07                                                         ` brendan doyle
2013-03-21 21:30                                                     ` Weiny, Ira
     [not found]                                                       ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AB2-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:47                                                         ` Jason Gunthorpe
     [not found]                                                           ` <20130321214725.GD8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:12                                                             ` brendan doyle
2013-03-21 21:37                                                 ` Weiny, Ira
     [not found]                                                   ` <2807E5FD2F6FDA4886F6618EAC48510EBB4ACB-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:48                                                     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=514278CA.8010809@oracle.com \
    --to=brendan.doyle-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=pramod.gunjikar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox