* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-15 1:26 ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA [not found] ` <514278CA.8010809-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA @ 2013-03-15 1:26 UTC (permalink / raw) To: Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: Pramod Gunjikar On 03/15/13 01:23 AM, brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote: > > So, here is the history... > > There is an oracle application used to monitor the health of a node > by calling ib_resolve_portid_str_via(node GUID). > > We observed that a call to ib_resolve_portid_str_via() which specified > the use of an unconnected port to issue the query was returning -1 > with errno as 0. The application interpreted this as the query succeeded > but no paths to the specified node were found and assumed the node > was dead. This obviously was not the case, we could not send the query > because the port was down which is very different to the queried > node being down. But the API does not provide a means to distinguish > between these two failure modes. Hence the reason for updating > errno to allow these two failure modes to be distinguished. > > Now in terms of the detail of the failure and the reason for the other > changes. The call to ib_resolve_portid_str_via() did not specify an smid, > so ib_resolve_portid_str_via() calls ib_resolve_guid_via() which first > tries to resolve the SM LID using the unconnected port: > > if (!sm_id) { > sm_id = &sm_portid; > if (ib_resolve_smlid_via(sm_id, timeout, srcport) < 0) > return -1; > } > > The call to this succeeds (return value is not -1) and indicates that the > SM LID is 0, which is of course wrong. Even though port 2 is not > connected, it's SMA is still operational, It receives the get PORT_INFO > SMP (sent as a result of the ib_resolve_smlid_via()) , and returns it > successfully as the value it reads from the adapters PORT_INFO > which happens to be 0. ib_resolve_smlid_via() assumes everything is OK, > it does not bother to check the value of the SM LID returned. We then > try and contact the SM at LID 0! > > if ((portid->lid = > ib_path_query_via(srcport, selfgid, portid->gid, sm_id, > buf)) < 0) > return -1; > > And this obviously fails, and so ib_resolve_portid_str_via() returns -1 > errno 0, and the app thinks the node is down. You could argue that > we should not specify an unconnected port in the call to > ib_resolve_portid_str_via(), > but the changes harden the code, and allow us to distinguish between > no path to the node or could not issue the SA query. > > Rdgs > > Brendan > > > > > On 03/13/13 08:29 PM, Boris Chiu wrote: >> fyi, >> >> Boris >> >> >> -------- Original Message -------- >> Subject: Re: [PATCH] libibmad: Fixes for failures when not all ports >> of HCA are connected >> Date: Wed, 13 Mar 2013 12:36:05 -0700 >> From: Ira Weiny <iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> >> To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> References: <513F83FD.1090106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >> >> >> >> Your commit message does not seem to have anything to do with the >> patch. Could you explain how returning errno from these functions >> "Fixes for failures when not all ports of HCA are connected"? >> >> Furthermore, I'm reluctant to modify errno in this library. It is not >> documented and in general is poor form. I realize that the interface >> does not currently allow for an alternative. :-( >> >> More comments below. >> >> On Tue, Mar 12, 2013 at 12:37 PM, Boris Chiu<boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: >> > From: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >> > >> > Signed-off-by: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >> > --- >> > src/resolve.c | 22 ++++++++++++++++++---- >> > src/rpc.c | 1 + >> > src/sa.c | 2 ++ >> > 3 files changed, 21 insertions(+), 4 deletions(-) >> > >> > diff --git a/src/resolve.c b/src/resolve.c >> > index f866bf4..ab24c79 100644 >> > --- a/src/resolve.c >> > +++ b/src/resolve.c >> > @@ -40,6 +40,7 @@ >> > #include<stdlib.h> >> > #include<string.h> >> > #include<arpa/inet.h> >> > +#include<errno.h> >> > >> > #include<infiniband/umad.h> >> > #include<infiniband/mad.h> >> > @@ -57,10 +58,18 @@ int ib_resolve_smlid_via(ib_portid_t * sm_id, int >> > timeout, >> > >> > memset(sm_id, 0, sizeof(*sm_id)); >> > >> > - if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0, >> > srcport)) >> > + if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0, >> > srcport)) { >> > + if (!errno) >> > + errno = EIO; >> > return -1; >> > + } >> > >> > mad_decode_field(portinfo, IB_PORT_SMLID_F,&lid); >> > + if (lid == 0) { >> > + if (!errno) >> > + errno = EIO; >> > + return -1; >> > + } >> >> This may not be an error. A port which is down only requires >> PortState and PortPhyState to be valid. >> >> > mad_decode_field(portinfo, IB_PORT_SMSL_F,&sm_id->sl); >> > >> > return ib_portid_set(sm_id, lid, 0, 0); >> > @@ -95,21 +104,26 @@ int ib_resolve_guid_via(ib_portid_t * portid, uint64_t >> > * guid, >> > ib_portid_t * sm_id, int timeout, >> > const struct ibmad_port *srcport) >> > { >> > - ib_portid_t sm_portid; >> > + ib_portid_t sm_portid = { 0 }; >> > uint8_t buf[IB_SA_DATA_SIZE] = { 0 }; >> > ib_portid_t self = { 0 }; >> > uint64_t selfguid, prefix; >> > ibmad_gid_t selfgid; >> > uint8_t nodeinfo[64]; >> > >> > - if (!sm_id) { >> > + if (!sm_id) >> > sm_id =&sm_portid; >> > + >> > + if (!sm_id->lid) { >> > if (ib_resolve_smlid_via(sm_id, timeout, srcport)< 0) >> > return -1; >> > } >> >> If you want ib_resolve_guid_via to resolve the SM for you, sm_id >> should be set to NULL. I don't see a reason to support an "invalid" >> sm_id port id. >> >> Another note is that I have converted the diags to use internal >> functions which used umad* and SA queries to resolve GUID's. This is >> more appropriate in the long term. >> >> Ira >> >> > >> > - if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0, >> > srcport)) >> > + if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0, >> > srcport)) { >> > + if (!errno) >> > + errno = EIO; >> > return -1; >> > + } >> > mad_decode_field(nodeinfo, IB_NODE_PORT_GUID_F,&selfguid); >> > mad_set_field64(selfgid, 0, IB_GID_PREFIX_F, >> > IB_DEFAULT_SUBN_PREFIX); >> > mad_set_field64(selfgid, 0, IB_GID_GUID_F, selfguid); >> > diff --git a/src/rpc.c b/src/rpc.c >> > index 6312d42..cf2b60d 100644 >> > --- a/src/rpc.c >> > +++ b/src/rpc.c >> > @@ -178,6 +178,7 @@ _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int >> > agentid, int len, >> > IB_MAD_TRID_F) != trid); >> > >> > status = umad_status(rcvbuf); >> > + errno = status; >> > if (!status) >> > return length; /* done */ >> > if (status == ENOMEM) >> > diff --git a/src/sa.c b/src/sa.c >> > index 352ed9f..367da2a 100644 >> > --- a/src/sa.c >> > +++ b/src/sa.c >> > @@ -38,6 +38,7 @@ >> > #include<stdio.h> >> > #include<stdlib.h> >> > #include<string.h> >> > +#include<errno.h> >> > >> > #include<infiniband/mad.h> >> > #include "mad_internal.h" >> > @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port *ibmad_port, >> > void *rcvbuf, >> > >> > if (portid->lid<= 0) { >> > IBWARN("only lid routes are supported"); >> > + errno = EIO; >> > return NULL; >> > } >> > >> > -- >> > 1.7.1 >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <514278CA.8010809-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514278CA.8010809-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-20 1:27 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB3F62-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-20 1:27 UTC (permalink / raw) To: brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Pramod Gunjikar > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org > Sent: Thursday, March 14, 2013 9:27 PM > To: Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Cc: Pramod Gunjikar > Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports > of HCA are connected] > > On 03/15/13 01:23 AM, brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote: > > > > So, here is the history... > > > > There is an oracle application used to monitor the health of a node by > > calling ib_resolve_portid_str_via(node GUID). > > > > We observed that a call to ib_resolve_portid_str_via() which specified > > the use of an unconnected port to issue the query was returning -1 > > with errno as 0. Again the library is not documented to set errno upon failure. While this is unfortunate for your application I don't really want to go down this path. > > The application interpreted this as the query > > succeeded but no paths to the specified node were found and assumed > > the node was dead. This obviously was not the case, we could not send > > the query because the port was down which is very different to the > > queried node being down. But the API does not provide a means to > > distinguish between these two failure modes. Hence the reason for > > updating errno to allow these two failure modes to be distinguished. There are other, better, ways to tell if a port is up. umad_get_port for example. > > > > Now in terms of the detail of the failure and the reason for the other > > changes. The call to ib_resolve_portid_str_via() did not specify an > > smid, so ib_resolve_portid_str_via() calls ib_resolve_guid_via() which > > first tries to resolve the SM LID using the unconnected port: > > > > if (!sm_id) { > > sm_id = &sm_portid; > > if (ib_resolve_smlid_via(sm_id, timeout, srcport) < 0) > > return -1; > > } > > > > The call to this succeeds (return value is not -1) and indicates that > > the SM LID is 0, which is of course wrong. Again this is not wrong. The fact there is no SM controlling that port is a side effect of you querying a port which is down and the unfortunate fact that this library makes generous use of SMP's where it should not. > > Even though port 2 is not > > connected, it's SMA is still operational, It receives the get > > PORT_INFO SMP (sent as a result of the ib_resolve_smlid_via()) , and > > returns it successfully as the value it reads from the adapters > > PORT_INFO which happens to be 0. ib_resolve_smlid_via() assumes > > everything is OK, it does not bother to check the value of the SM LID > > returned. We then try and contact the SM at LID 0! > > > > if ((portid->lid = > > ib_path_query_via(srcport, selfgid, portid->gid, sm_id, > > buf)) < 0) > > return -1; > > > > And this obviously fails, and so ib_resolve_portid_str_via() returns > > -1 errno 0, and the app thinks the node is down. You could argue that > > we should not specify an unconnected port in the call to > > ib_resolve_portid_str_via(), but the changes harden the code, and > > allow us to distinguish between no path to the node or could not issue > > the SA query. I feel bad that this code has lead you down this path. However, these calls were never intended to be used on ports which are down. The entire umad/mad stack attempts to chose the first active port for just this reason. I believe you should look at issuing SA NodeRecord queries rather than rely on the internals of libibmad to do this functionality. Sorry, Ira > > > > Rdgs > > > > Brendan > > > > > > > > > > On 03/13/13 08:29 PM, Boris Chiu wrote: > >> fyi, > >> > >> Boris > >> > >> > >> -------- Original Message -------- > >> Subject: Re: [PATCH] libibmad: Fixes for failures when not all ports > >> of HCA are connected > >> Date: Wed, 13 Mar 2013 12:36:05 -0700 > >> From: Ira Weiny <iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > >> To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > >> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > >> References: <513F83FD.1090106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > >> > >> > >> > >> Your commit message does not seem to have anything to do with the > >> patch. Could you explain how returning errno from these functions > >> "Fixes for failures when not all ports of HCA are connected"? > >> > >> Furthermore, I'm reluctant to modify errno in this library. It is > >> not documented and in general is poor form. I realize that the > >> interface does not currently allow for an alternative. :-( > >> > >> More comments below. > >> > >> On Tue, Mar 12, 2013 at 12:37 PM, Boris Chiu<boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > wrote: > >> > From: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > >> > > >> > Signed-off-by: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > >> > --- > >> > src/resolve.c | 22 ++++++++++++++++++---- > >> > src/rpc.c | 1 + > >> > src/sa.c | 2 ++ > >> > 3 files changed, 21 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/src/resolve.c b/src/resolve.c index f866bf4..ab24c79 > >> > 100644 > >> > --- a/src/resolve.c > >> > +++ b/src/resolve.c > >> > @@ -40,6 +40,7 @@ > >> > #include<stdlib.h> > >> > #include<string.h> > >> > #include<arpa/inet.h> > >> > +#include<errno.h> > >> > > >> > #include<infiniband/umad.h> > >> > #include<infiniband/mad.h> > >> > @@ -57,10 +58,18 @@ int ib_resolve_smlid_via(ib_portid_t * sm_id, > >> > int timeout, > >> > > >> > memset(sm_id, 0, sizeof(*sm_id)); > >> > > >> > - if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0, > >> > srcport)) > >> > + if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0, > >> > srcport)) { > >> > + if (!errno) > >> > + errno = EIO; > >> > return -1; > >> > + } > >> > > >> > mad_decode_field(portinfo, IB_PORT_SMLID_F,&lid); > >> > + if (lid == 0) { > >> > + if (!errno) > >> > + errno = EIO; > >> > + return -1; > >> > + } > >> > >> This may not be an error. A port which is down only requires > >> PortState and PortPhyState to be valid. > >> > >> > mad_decode_field(portinfo, IB_PORT_SMSL_F,&sm_id->sl); > >> > > >> > return ib_portid_set(sm_id, lid, 0, 0); @@ -95,21 +104,26 > >> > @@ int ib_resolve_guid_via(ib_portid_t * portid, uint64_t > >> > * guid, > >> > ib_portid_t * sm_id, int timeout, > >> > const struct ibmad_port *srcport) { > >> > - ib_portid_t sm_portid; > >> > + ib_portid_t sm_portid = { 0 }; > >> > uint8_t buf[IB_SA_DATA_SIZE] = { 0 }; > >> > ib_portid_t self = { 0 }; > >> > uint64_t selfguid, prefix; > >> > ibmad_gid_t selfgid; > >> > uint8_t nodeinfo[64]; > >> > > >> > - if (!sm_id) { > >> > + if (!sm_id) > >> > sm_id =&sm_portid; + > >> > + if (!sm_id->lid) { > >> > if (ib_resolve_smlid_via(sm_id, timeout, srcport)< 0) > >> > return -1; > >> > } > >> > >> If you want ib_resolve_guid_via to resolve the SM for you, sm_id > >> should be set to NULL. I don't see a reason to support an "invalid" > >> sm_id port id. > >> > >> Another note is that I have converted the diags to use internal > >> functions which used umad* and SA queries to resolve GUID's. This is > >> more appropriate in the long term. > >> > >> Ira > >> > >> > > >> > - if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0, > >> > srcport)) > >> > + if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0, > >> > srcport)) { > >> > + if (!errno) > >> > + errno = EIO; > >> > return -1; > >> > + } > >> > mad_decode_field(nodeinfo, IB_NODE_PORT_GUID_F,&selfguid); > >> > mad_set_field64(selfgid, 0, IB_GID_PREFIX_F, > >> > IB_DEFAULT_SUBN_PREFIX); > >> > mad_set_field64(selfgid, 0, IB_GID_GUID_F, selfguid); > >> > diff --git a/src/rpc.c b/src/rpc.c index 6312d42..cf2b60d 100644 > >> > --- a/src/rpc.c > >> > +++ b/src/rpc.c > >> > @@ -178,6 +178,7 @@ _do_madrpc(int port_id, void *sndbuf, void > >> > *rcvbuf, int agentid, int len, > >> > IB_MAD_TRID_F) != trid); > >> > > >> > status = umad_status(rcvbuf); > >> > + errno = status; > >> > if (!status) > >> > return length; /* done */ > >> > if (status == ENOMEM) diff --git a/src/sa.c > >> > b/src/sa.c index 352ed9f..367da2a 100644 > >> > --- a/src/sa.c > >> > +++ b/src/sa.c > >> > @@ -38,6 +38,7 @@ > >> > #include<stdio.h> > >> > #include<stdlib.h> > >> > #include<string.h> > >> > +#include<errno.h> > >> > > >> > #include<infiniband/mad.h> > >> > #include "mad_internal.h" > >> > @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port > >> > *ibmad_port, void *rcvbuf, > >> > > >> > if (portid->lid<= 0) { > >> > IBWARN("only lid routes are supported"); > >> > + errno = EIO; > >> > return NULL; > >> > } > >> > > >> > -- > >> > 1.7.1 > >> > > >> > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB3F62-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB3F62-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-20 18:35 ` brendan doyle [not found] ` <514A0156.2070009-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: brendan doyle @ 2013-03-20 18:35 UTC (permalink / raw) To: Weiny, Ira Cc: Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 20/03/2013 01:27, Weiny, Ira wrote: >> -----Original Message----- >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- >> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org >> Sent: Thursday, March 14, 2013 9:27 PM >> To: Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> Cc: Pramod Gunjikar >> Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports >> of HCA are connected] >> >> On 03/15/13 01:23 AM, brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote: >>> So, here is the history... >>> >>> There is an oracle application used to monitor the health of a node by >>> calling ib_resolve_portid_str_via(node GUID). >>> >>> We observed that a call to ib_resolve_portid_str_via() which specified >>> the use of an unconnected port to issue the query was returning -1 >>> with errno as 0. > Again the library is not documented to set errno upon failure. While this is unfortunate for your application I don't really want to go down this path. As far as I can see the library is not documented at all, I can't find any man pages. So setting errno is not breaking the interface, and I would argue that if it adds value, which it does in this case, then what is the objection. Additionally I think if the approach is that errno is not set unless documented in a man page, then we should have some consistency, a quick grep of errno in libibmad reveals that errno is being set in other parts of the library, in libibumad too, we see errno set, but again it is not documented in the umad man pages. >>> The application interpreted this as the query >>> succeeded but no paths to the specified node were found and assumed >>> the node was dead. This obviously was not the case, we could not send >>> the query because the port was down which is very different to the >>> queried node being down. But the API does not provide a means to >>> distinguish between these two failure modes. Hence the reason for >>> updating errno to allow these two failure modes to be distinguished. > There are other, better, ways to tell if a port is up. umad_get_port for example. So the application is trying to determine if a remote node is up, I agree that the application could have been smarter, and not tried to use a local port that is down to query the SA. That said I think hardening the library against not so smart applications, can only be a good thing. >>> Now in terms of the detail of the failure and the reason for the other >>> changes. The call to ib_resolve_portid_str_via() did not specify an >>> smid, so ib_resolve_portid_str_via() calls ib_resolve_guid_via() which >>> first tries to resolve the SM LID using the unconnected port: >>> >>> if (!sm_id) { >>> sm_id = &sm_portid; >>> if (ib_resolve_smlid_via(sm_id, timeout, srcport) < 0) >>> return -1; >>> } >>> >>> The call to this succeeds (return value is not -1) and indicates that >>> the SM LID is 0, which is of course wrong. > Again this is not wrong. The fact there is no SM controlling that port is a side effect of you querying a port which is down and the So I would have to disagree, I think ib_resolve_smlid_via() returning that it successfully resolved the SM LID and that the SM LID is 0 is wrong. It did not resolve the SM LID, it couldn't because the port is down, so it should return a status that indicates it did not resolve the LID, and if it can not resolve the LID, then ib_resolve_guid_via() should fail. This is just hardening the library, which can only be a good thing, and avoids unexpected behavior and misleading results, remote node down , as opposed to we could not contact the SA to determine if the remote node is down. > unfortunate fact that this library makes generous use of SMP's where it should not. I won't disagree with that. >>> Even though port 2 is not >>> connected, it's SMA is still operational, It receives the get >>> PORT_INFO SMP (sent as a result of the ib_resolve_smlid_via()) , and >>> returns it successfully as the value it reads from the adapters >>> PORT_INFO which happens to be 0. ib_resolve_smlid_via() assumes >>> everything is OK, it does not bother to check the value of the SM LID >>> returned. We then try and contact the SM at LID 0! >>> >>> if ((portid->lid = >>> ib_path_query_via(srcport, selfgid, portid->gid, sm_id, >>> buf)) < 0) >>> return -1; >>> >>> And this obviously fails, and so ib_resolve_portid_str_via() returns >>> -1 errno 0, and the app thinks the node is down. You could argue that >>> we should not specify an unconnected port in the call to >>> ib_resolve_portid_str_via(), but the changes harden the code, and >>> allow us to distinguish between no path to the node or could not issue >>> the SA query. > I feel bad that this code has lead you down this path. However, these calls were never intended to be used on ports which are down. The entire umad/mad stack attempts to chose the first active port for just this reason. Maybe so, but I don't see that as a reason not to harden the library where we can, so that we can avoid issue like this. I can see that it adds goodness, I can't see that the proposed changes have any harmful affects, so I'm struggling to understand why we would not want to incorporate them. > I believe you should look at issuing SA NodeRecord queries rather than rely on the internals of libibmad to do this functionality. Yes, the application is being changed to do that, so we'll avoid this issue for *this* application, but some other application could run into the same issue, and some other person could end up spending hours trying to debug what has happened until eventually realizing where in libibmad the failure actually was. So again, I see the proposed changes as hardening the library and adding nothing but goodness, with no harmful side affects. So again I'm strugling to understand the objection. Rdgs Brendan > > Sorry, > Ira > >>> Rdgs >>> >>> Brendan >>> >>> >>> >>> >>> On 03/13/13 08:29 PM, Boris Chiu wrote: >>>> fyi, >>>> >>>> Boris >>>> >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [PATCH] libibmad: Fixes for failures when not all ports >>>> of HCA are connected >>>> Date: Wed, 13 Mar 2013 12:36:05 -0700 >>>> From: Ira Weiny <iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> >>>> To: Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >>>> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>>> References: <513F83FD.1090106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >>>> >>>> >>>> >>>> Your commit message does not seem to have anything to do with the >>>> patch. Could you explain how returning errno from these functions >>>> "Fixes for failures when not all ports of HCA are connected"? >>>> >>>> Furthermore, I'm reluctant to modify errno in this library. It is >>>> not documented and in general is poor form. I realize that the >>>> interface does not currently allow for an alternative. :-( >>>> >>>> More comments below. >>>> >>>> On Tue, Mar 12, 2013 at 12:37 PM, Boris Chiu<boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >> wrote: >>>>> From: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >>>>> >>>>> Signed-off-by: Brendan Doyle<brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> >>>>> --- >>>>> src/resolve.c | 22 ++++++++++++++++++---- >>>>> src/rpc.c | 1 + >>>>> src/sa.c | 2 ++ >>>>> 3 files changed, 21 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/src/resolve.c b/src/resolve.c index f866bf4..ab24c79 >>>>> 100644 >>>>> --- a/src/resolve.c >>>>> +++ b/src/resolve.c >>>>> @@ -40,6 +40,7 @@ >>>>> #include<stdlib.h> >>>>> #include<string.h> >>>>> #include<arpa/inet.h> >>>>> +#include<errno.h> >>>>> >>>>> #include<infiniband/umad.h> >>>>> #include<infiniband/mad.h> >>>>> @@ -57,10 +58,18 @@ int ib_resolve_smlid_via(ib_portid_t * sm_id, >>>>> int timeout, >>>>> >>>>> memset(sm_id, 0, sizeof(*sm_id)); >>>>> >>>>> - if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0, >>>>> srcport)) >>>>> + if (!smp_query_via(portinfo,&self, IB_ATTR_PORT_INFO, 0, 0, >>>>> srcport)) { >>>>> + if (!errno) >>>>> + errno = EIO; >>>>> return -1; >>>>> + } >>>>> >>>>> mad_decode_field(portinfo, IB_PORT_SMLID_F,&lid); >>>>> + if (lid == 0) { >>>>> + if (!errno) >>>>> + errno = EIO; >>>>> + return -1; >>>>> + } >>>> This may not be an error. A port which is down only requires >>>> PortState and PortPhyState to be valid. >>>> >>>>> mad_decode_field(portinfo, IB_PORT_SMSL_F,&sm_id->sl); >>>>> >>>>> return ib_portid_set(sm_id, lid, 0, 0); @@ -95,21 +104,26 >>>>> @@ int ib_resolve_guid_via(ib_portid_t * portid, uint64_t >>>>> * guid, >>>>> ib_portid_t * sm_id, int timeout, >>>>> const struct ibmad_port *srcport) { >>>>> - ib_portid_t sm_portid; >>>>> + ib_portid_t sm_portid = { 0 }; >>>>> uint8_t buf[IB_SA_DATA_SIZE] = { 0 }; >>>>> ib_portid_t self = { 0 }; >>>>> uint64_t selfguid, prefix; >>>>> ibmad_gid_t selfgid; >>>>> uint8_t nodeinfo[64]; >>>>> >>>>> - if (!sm_id) { >>>>> + if (!sm_id) >>>>> sm_id =&sm_portid; + >>>>> + if (!sm_id->lid) { >>>>> if (ib_resolve_smlid_via(sm_id, timeout, srcport)< 0) >>>>> return -1; >>>>> } >>>> If you want ib_resolve_guid_via to resolve the SM for you, sm_id >>>> should be set to NULL. I don't see a reason to support an "invalid" >>>> sm_id port id. >>>> >>>> Another note is that I have converted the diags to use internal >>>> functions which used umad* and SA queries to resolve GUID's. This is >>>> more appropriate in the long term. >>>> >>>> Ira >>>> >>>>> - if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0, >>>>> srcport)) >>>>> + if (!smp_query_via(nodeinfo,&self, IB_ATTR_NODE_INFO, 0, 0, >>>>> srcport)) { >>>>> + if (!errno) >>>>> + errno = EIO; >>>>> return -1; >>>>> + } >>>>> mad_decode_field(nodeinfo, IB_NODE_PORT_GUID_F,&selfguid); >>>>> mad_set_field64(selfgid, 0, IB_GID_PREFIX_F, >>>>> IB_DEFAULT_SUBN_PREFIX); >>>>> mad_set_field64(selfgid, 0, IB_GID_GUID_F, selfguid); >>>>> diff --git a/src/rpc.c b/src/rpc.c index 6312d42..cf2b60d 100644 >>>>> --- a/src/rpc.c >>>>> +++ b/src/rpc.c >>>>> @@ -178,6 +178,7 @@ _do_madrpc(int port_id, void *sndbuf, void >>>>> *rcvbuf, int agentid, int len, >>>>> IB_MAD_TRID_F) != trid); >>>>> >>>>> status = umad_status(rcvbuf); >>>>> + errno = status; >>>>> if (!status) >>>>> return length; /* done */ >>>>> if (status == ENOMEM) diff --git a/src/sa.c >>>>> b/src/sa.c index 352ed9f..367da2a 100644 >>>>> --- a/src/sa.c >>>>> +++ b/src/sa.c >>>>> @@ -38,6 +38,7 @@ >>>>> #include<stdio.h> >>>>> #include<stdlib.h> >>>>> #include<string.h> >>>>> +#include<errno.h> >>>>> >>>>> #include<infiniband/mad.h> >>>>> #include "mad_internal.h" >>>>> @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port >>>>> *ibmad_port, void *rcvbuf, >>>>> >>>>> if (portid->lid<= 0) { >>>>> IBWARN("only lid routes are supported"); >>>>> + errno = EIO; >>>>> return NULL; >>>>> } >>>>> >>>>> -- >>>>> 1.7.1 >>>>> >>>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the >> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at >> http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <514A0156.2070009-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514A0156.2070009-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-20 19:02 ` Jason Gunthorpe [not found] ` <20130320190208.GA23478-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-20 19:02 UTC (permalink / raw) To: brendan doyle Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Wed, Mar 20, 2013 at 06:35:02PM +0000, brendan doyle wrote: > As far as I can see the library is not documented at all, I can't > find any man pages. So setting errno is not breaking the interface, > and I would argue that if it adds value, which it does in this case, > then what is the objection. Additionally I think if the approach is > that errno is not set unless documented in a man page, then we > should have some consistency, a quick grep of errno in libibmad > reveals that errno is being set in other parts of the library, in > libibumad too, we see errno set, but again it is not documented in > the umad man pages. Ideally we would have consistency amongst the IB libraries - try hard to return -ERRNO like verbs, and only use errno for cases where an int return is not possible. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130320190208.GA23478-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130320190208.GA23478-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-20 21:52 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4214-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-20 21:52 UTC (permalink / raw) To: Jason Gunthorpe, brendan doyle Cc: Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe > Sent: Wednesday, March 20, 2013 12:02 PM > To: brendan doyle > Cc: Weiny, Ira; Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; > Pramod Gunjikar > Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports > of HCA are connected] > > On Wed, Mar 20, 2013 at 06:35:02PM +0000, brendan doyle wrote: > > > As far as I can see the library is not documented at all, I can't find > > any man pages. So setting errno is not breaking the interface, and I > > would argue that if it adds value, which it does in this case, then > > what is the objection. Additionally I think if the approach is that > > errno is not set unless documented in a man page, then we should have > > some consistency, a quick grep of errno in libibmad reveals that errno > > is being set in other parts of the library, in libibumad too, we see > > errno set, but again it is not documented in the umad man pages. > > Ideally we would have consistency amongst the IB libraries - try hard to > return -ERRNO like verbs, and only use errno for cases where an int return is > not possible. Since these calls return int I thought about this. But I am worried about breaking users who may be explicitly checking for -1. OTOH nothing is documented. Ira > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB4214-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4214-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-20 22:00 ` brendan doyle [not found] ` <514A3169.7000501-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: brendan doyle @ 2013-03-20 22:00 UTC (permalink / raw) To: Weiny, Ira Cc: Jason Gunthorpe, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 20/03/2013 21:52, Weiny, Ira wrote: >> -----Original Message----- >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- >> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe >> Sent: Wednesday, March 20, 2013 12:02 PM >> To: brendan doyle >> Cc: Weiny, Ira; Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; >> Pramod Gunjikar >> Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports >> of HCA are connected] >> >> On Wed, Mar 20, 2013 at 06:35:02PM +0000, brendan doyle wrote: >> >>> As far as I can see the library is not documented at all, I can't find >>> any man pages. So setting errno is not breaking the interface, and I >>> would argue that if it adds value, which it does in this case, then >>> what is the objection. Additionally I think if the approach is that >>> errno is not set unless documented in a man page, then we should have >>> some consistency, a quick grep of errno in libibmad reveals that errno >>> is being set in other parts of the library, in libibumad too, we see >>> errno set, but again it is not documented in the umad man pages. >> Ideally we would have consistency amongst the IB libraries - try hard to >> return -ERRNO like verbs, and only use errno for cases where an int return is >> not possible. > Since these calls return int I thought about this. But I am worried about breaking users who may be explicitly checking for -1. OTOH nothing is documented. Well there are man pages for umad and verbs and though some verbs man pages say errno is set for failure, they don't indicate what it is set to, and umad man pages say nothing of errno. In any case the proposed changes don't change the return value, and so won't break existing users, again I can only see goodness being added, and fail to see any negative implications? > > Ira > >> Jason >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the >> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at >> http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <514A3169.7000501-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514A3169.7000501-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-20 22:24 ` Jason Gunthorpe [not found] ` <20130320222422.GA30100-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-20 22:24 UTC (permalink / raw) To: brendan doyle Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Wed, Mar 20, 2013 at 10:00:09PM +0000, brendan doyle wrote: > >>>As far as I can see the library is not documented at all, I can't find > >>>any man pages. So setting errno is not breaking the interface, and I > >>>would argue that if it adds value, which it does in this case, then > >>>what is the objection. Additionally I think if the approach is that > >>>errno is not set unless documented in a man page, then we should have > >>>some consistency, a quick grep of errno in libibmad reveals that errno > >>>is being set in other parts of the library, in libibumad too, we see > >>>errno set, but again it is not documented in the umad man pages. > >>Ideally we would have consistency amongst the IB libraries - try hard to > >>return -ERRNO like verbs, and only use errno for cases where an int return is > >>not possible. > >Since these calls return int I thought about this. But I am > >worried about breaking users who may be explicitly checking for -1. > >OTOH nothing is documented. > Well there are man pages for umad and verbs and though some verbs > man pages say errno is set for failure, they don't indicate what it > is set to, and umad man pages say nothing of errno. In any case the > proposed changes don't change the return value, and so won't break > existing users, again I can only see goodness being added, and fail > to see any negative implications? Patches adding inconsistency are not 'goodness'. People have to use this stuff and try to write correct applications. Playing wack a mole on 'how does this call return errors' is a special kind of nightmare. .. and look, I already got it wrong in my memory, verbs consistently uses the POSIX +ERRNO convention, not the kernel's -ERRNO convention. Ira, do you know any place that would break if +ERRNO is returned, or is this just a hypothetical? Even so, I'm not sure I have much sympathy for people checking against -1... that is only appropriate for FDs. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130320222422.GA30100-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130320222422.GA30100-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-20 22:44 ` brendan doyle [not found] ` <514A3BDF.2090105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: brendan doyle @ 2013-03-20 22:44 UTC (permalink / raw) To: Jason Gunthorpe Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 20/03/2013 22:24, Jason Gunthorpe wrote: > On Wed, Mar 20, 2013 at 10:00:09PM +0000, brendan doyle wrote: > >>>>> As far as I can see the library is not documented at all, I can't find >>>>> any man pages. So setting errno is not breaking the interface, and I >>>>> would argue that if it adds value, which it does in this case, then >>>>> what is the objection. Additionally I think if the approach is that >>>>> errno is not set unless documented in a man page, then we should have >>>>> some consistency, a quick grep of errno in libibmad reveals that errno >>>>> is being set in other parts of the library, in libibumad too, we see >>>>> errno set, but again it is not documented in the umad man pages. >>>> Ideally we would have consistency amongst the IB libraries - try hard to >>>> return -ERRNO like verbs, and only use errno for cases where an int return is >>>> not possible. >>> Since these calls return int I thought about this. But I am >>> worried about breaking users who may be explicitly checking for -1. >>> OTOH nothing is documented. > >> Well there are man pages for umad and verbs and though some verbs >> man pages say errno is set for failure, they don't indicate what it >> is set to, and umad man pages say nothing of errno. In any case the >> proposed changes don't change the return value, and so won't break >> existing users, again I can only see goodness being added, and fail >> to see any negative implications? > Patches adding inconsistency are not 'goodness'. Where is the inconsistency? failure is still returned as -1, and errno is still undefined, it is set in some parts of libibmad and not others, so nothing has changed from a consumer perspective? Also this is just one aspect of the proposed update the other hardens the lib from trying to use an invalid SM LID. > > People have to use this stuff and try to write correct > applications. Playing wack a mole on 'how does this call return > errors' is a special kind of nightmare. Again the patch does not break the existing API -1 is still failure. > > .. and look, I already got it wrong in my memory, verbs consistently > uses the POSIX +ERRNO convention, not the kernel's -ERRNO convention. > > Ira, do you know any place that would break if +ERRNO is returned, or > is this just a hypothetical? Even so, I'm not sure I have much > sympathy for people checking against -1... that is only appropriate > for FDs. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <514A3BDF.2090105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514A3BDF.2090105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-20 23:19 ` Jason Gunthorpe [not found] ` <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-20 23:19 UTC (permalink / raw) To: brendan doyle Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Wed, Mar 20, 2013 at 10:44:47PM +0000, brendan doyle wrote: > Where is the inconsistency? There is no other case in other IB libraries or libibmad itself where errno is expected to be set by an int returning function. int returning functions should return +ERRNO. > errno is still undefined, it is set in some parts of libibmad and > not others, so nothing has changed from a consumer perspective? Consistency would be returning +ERRNO from int functions and properly propogating return codes from lower layers: if (smp_query_via(portinfo, &self, IB_ATTR_PORT_INFO, 0, 0, srcport)) return errno; Would be a better modification. > + if (!errno) > + errno = EIO; This is funny, it is not portable to expect that errno will ever be != 0. > @@ -178,6 +178,7 @@ int _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int > status = umad_status(rcvbuf); > + errno = status; _do_madrpc already returns int, so this is not great either. > @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port *ibmad_port, > void *rcvbuf, > > if (portid->lid <= 0) { > IBWARN("only lid routes are supported"); > + errno = EIO; > return NULL; This one is OK. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-20 23:30 ` Hefty, Sean 2013-03-21 1:01 ` brendan doyle 1 sibling, 0 replies; 31+ messages in thread From: Hefty, Sean @ 2013-03-20 23:30 UTC (permalink / raw) To: Jason Gunthorpe, brendan doyle Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > There is no other case in other IB libraries or libibmad itself where > errno is expected to be set by an int returning function. int > returning functions should return +ERRNO. The librdmacm always returns -1 on error with errno set appropriately. This includes calls where the librdmacm wraps around verb calls (rdma/rdma_verbs.h). I guess you could argue whether the librdmacm is an IB library or not... - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2013-03-20 23:30 ` Hefty, Sean @ 2013-03-21 1:01 ` brendan doyle [not found] ` <514A5C07.3080308-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 31+ messages in thread From: brendan doyle @ 2013-03-21 1:01 UTC (permalink / raw) To: Jason Gunthorpe Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 20/03/2013 23:19, Jason Gunthorpe wrote: > On Wed, Mar 20, 2013 at 10:44:47PM +0000, brendan doyle wrote: > >> Where is the inconsistency? > There is no other case in other IB libraries or libibmad itself where > errno is expected to be set by an int returning function. int > returning functions should return +ERRNO. And that is what this patch does, currently we have: # grep errno * register.c:#include <errno.h> rpc.c:#include <errno.h> rpc.c: errno = EINVAL; rpc.c: errno = ENODEV; rpc.c: errno = ENOMEM; rpc.c: if (!errno) rpc.c: errno = EIO; rpc.c: if (!errno) rpc.c: errno = EINVAL; The patch adds: resolve.c: if (!errno) resolve.c: errno = EIO; resolve.c: if (!errno) resolve.c: errno = EIO; resolve.c: if (!errno) resolve.c: errno = EIO; So where is the inconsistency? what is resolve.c doing that is different to what rpc.c is doing? > >> errno is still undefined, it is set in some parts of libibmad and >> not others, so nothing has changed from a consumer perspective? > Consistency would be returning +ERRNO from int functions and properly > propogating return codes from lower layers: > > if (smp_query_via(portinfo, &self, IB_ATTR_PORT_INFO, 0, 0, > srcport)) > return errno; > > Would be a better modification. The patch only sets errno, if it is not set, so if lower layers set it then it is propagated up. > >> + if (!errno) >> + errno = EIO; > This is funny, it is not portable to expect that errno will ever be != > 0. Well we have several examples of this exact code segment already in libibmad, and I hazard to say elsewhere in the OFED pkgs. Why is it not portable to expect that errno, *may* not be set on a failure? >> @@ -178,6 +178,7 @@ int _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int >> status = umad_status(rcvbuf); >> + errno = status; > _do_madrpc already returns int, so this is not great either. OK I could concede on this, the Solaris kernel uses ib_user_mad_hdr.status to return an errno to indicate the nature of failure, where as linux does not. >> @@ -56,6 +57,7 @@ uint8_t *sa_rpc_call(const struct ibmad_port *ibmad_port, >> void *rcvbuf, >> >> if (portid->lid <= 0) { >> IBWARN("only lid routes are supported"); >> + errno = EIO; >> return NULL; > This one is OK. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <514A5C07.3080308-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514A5C07.3080308-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-21 5:21 ` Jason Gunthorpe [not found] ` <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-21 5:21 UTC (permalink / raw) To: brendan doyle Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Thu, Mar 21, 2013 at 01:01:59AM +0000, brendan doyle wrote: > > On 20/03/2013 23:19, Jason Gunthorpe wrote: > >On Wed, Mar 20, 2013 at 10:44:47PM +0000, brendan doyle wrote: > > > >>Where is the inconsistency? > >There is no other case in other IB libraries or libibmad itself where > >errno is expected to be set by an int returning function. int > >returning functions should return +ERRNO. > > And that is what this patch does, currently we have: > > # grep errno * > register.c:#include <errno.h> > rpc.c:#include <errno.h> > rpc.c: errno = EINVAL; > rpc.c: errno = ENODEV; > rpc.c: errno = ENOMEM; > rpc.c: if (!errno) > rpc.c: errno = EIO; > rpc.c: if (!errno) > rpc.c: errno = EINVAL; I looked at all of these, they are OK because the functions do not return int, they are returning pointers. The 'new' POSIX convention is: - int function returns are 0 on success, -1 on 'not-an-errno' and +ERRNO to indicate error. - 'fd in an int' function returns are -1 on failure with errno set, everything else is a valid fd. (Code that checks for fd's < 0 is pedantically wrong) - pointer function returns are NULL on failure with errno set. - void returns can never fail The 'old' POSIX convention had int functions return 0 on success and set errno. Some function will restrict error returns to -1, others leave it open. > >>+ if (!errno) > >>+ errno = EIO; > >This is funny, it is not portable to expect that errno will ever be != > >0. > > Well we have several examples of this exact code segment already in > libibmad, and I hazard to say elsewhere in the OFED pkgs. Why is it > not portable to expect that errno, *may* not be set on a failure? Copying broken code from elsewhere doesn't make it any more right.. The specs say functions set errno on error return. They say nothing about clearing it on success return. Assuming errno is ever 0 is absolutely not portable, and IIRC, doesn't work on Linux anyhow. To be more specific, if a function doesn't explicitly set errno, then the value is simply whatever was left in errno before, ie the errno from the last failed system call. The only time it will be 0 is immediately after program start, assuming no failed system calls were made by the runtime. The first failed system call will set it to a value and it will never be 0'd by anything after that. > >>@@ -178,6 +178,7 @@ int _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int > >> status = umad_status(rcvbuf); > >>+ errno = status; > >_do_madrpc already returns int, so this is not great either. > > OK I could concede on this, the Solaris kernel uses ib_user_mad_hdr.status > to return an errno to indicate the nature of failure, where as linux does > not. Hrm? Linux will return at least ETIMEDOUT in the .status field. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-21 20:37 ` brendan doyle [not found] ` <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2013-03-21 21:37 ` Weiny, Ira 1 sibling, 1 reply; 31+ messages in thread From: brendan doyle @ 2013-03-21 20:37 UTC (permalink / raw) To: Jason Gunthorpe Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 21/03/2013 05:21, Jason Gunthorpe wrote: > On Thu, Mar 21, 2013 at 01:01:59AM +0000, brendan doyle wrote: >> On 20/03/2013 23:19, Jason Gunthorpe wrote: >>> On Wed, Mar 20, 2013 at 10:44:47PM +0000, brendan doyle wrote: >>> >>>> Where is the inconsistency? >>> There is no other case in other IB libraries or libibmad itself where >>> errno is expected to be set by an int returning function. int >>> returning functions should return +ERRNO. >> And that is what this patch does, currently we have: >> >> # grep errno * >> register.c:#include <errno.h> >> rpc.c:#include <errno.h> >> rpc.c: errno = EINVAL; >> rpc.c: errno = ENODEV; >> rpc.c: errno = ENOMEM; >> rpc.c: if (!errno) >> rpc.c: errno = EIO; >> rpc.c: if (!errno) >> rpc.c: errno = EINVAL; > I looked at all of these, they are OK because the functions do not > return int, they are returning pointers. > > The 'new' POSIX convention is: > - int function returns are 0 on success, -1 on 'not-an-errno' and > +ERRNO to indicate error. > - 'fd in an int' function returns are -1 on failure with errno set, > everything else is a valid fd. (Code that checks for fd's < 0 is > pedantically wrong) > - pointer function returns are NULL on failure with errno set. > - void returns can never fail > > The 'old' POSIX convention had int functions return 0 on success and > set errno. Some function will restrict error returns to -1, others > leave it open. OK so then the inconsistency is further down the stack.. int ib_resolve_smlid_via() -> uint8_t *smp_query_via() -> void *mad_rpc() -> int mad_build_pkt() -> void *mad_encode() <- return NULL on fail but does not set errno -> static int _do_madrpc() -> umad_send() -> umad_recv) So if ib_resolve_smlid_via() is to conform to POSIX This: if (!smp_query_via(portinfo, &self, IB_ATTR_PORT_INFO, 0, 0, srcport)) return -1; Should be: if (!smp_query_via(portinfo, &self, IB_ATTR_PORT_INFO, 0, 0, srcport)) { return (errno); } But mad_encode() does not set errno, and also retuning +errno instead of -1 would break the existing API usage. It seems to me that there are many places in libibmad and libibumad that are not poisx compliant, and the libs could do with an overhaul in this regard, however that would likely break many of the library consumers. So from a pragmatic stand point the proposal was to not break the easting API and cause consumers to fail, but also to add the ability to distinguish between types of failures for a specific failure mode. And this was just one aspect of the patch the other was to harden the library. >>>> + if (!errno) >>>> + errno = EIO; >>> This is funny, it is not portable to expect that errno will ever be != >>> 0. >> Well we have several examples of this exact code segment already in >> libibmad, and I hazard to say elsewhere in the OFED pkgs. Why is it >> not portable to expect that errno, *may* not be set on a failure? > Copying broken code from elsewhere doesn't make it any more right.. Absolutely agree, and was not suggesting that, but was merely saying that we should have consistency. > The specs say functions set errno on error return. They say nothing > about clearing it on success return. Assuming errno is ever 0 is > absolutely not portable, and IIRC, doesn't work on Linux anyhow. Humm, again, take a look at umad, it sets errno to 0 before calling the write() system call. If we are to enforce strict posix lets do it everywhere, but that then would likely break many consumers of the libs. > To be more specific, if a function doesn't explicitly set errno, then > the value is simply whatever was left in errno before, ie the errno > from the last failed system call. The only time it will be 0 is > immediately after program start, assuming no failed system calls were > made by the runtime. The first failed system call will set it to a > value and it will never be 0'd by anything after that. See above. > >>>> @@ -178,6 +178,7 @@ int _do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int >>>> status = umad_status(rcvbuf); >>>> + errno = status; >>> _do_madrpc already returns int, so this is not great either. >> OK I could concede on this, the Solaris kernel uses ib_user_mad_hdr.status >> to return an errno to indicate the nature of failure, where as linux does >> not. > Hrm? Linux will return at least ETIMEDOUT in the .status field. Well Linux is not my bag, but I couldn't see where in the code this was the case. > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-21 21:27 ` Jason Gunthorpe [not found] ` <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2013-03-21 21:30 ` Weiny, Ira 1 sibling, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-21 21:27 UTC (permalink / raw) To: brendan doyle Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Thu, Mar 21, 2013 at 08:37:08PM +0000, brendan doyle wrote: > >The 'old' POSIX convention had int functions return 0 on success and > >set errno. Some function will restrict error returns to -1, others > >leave it open. > OK so then the inconsistency is further down the stack.. No doubt there would be many places requiring various levels of fixing, I don't think anybody ever thought about how to do error return when this was written. > But mad_encode() does not set errno, and also retuning +errno > instead of -1 would break the existing API usage. The existing API is 0 on success != 0 on failure - if code tests for -1 it is wrong :) The question is, do we care? > break the easting API and cause consumers to fail, but also to add > the ability to distinguish between types of failures for a specific > failure mode. Well, somebody needs to decide how it would ideally look and migrate things in that direction 'new' vs 'old' POSIX style.. Once you introduce an 'old' style return, you are going to start using it and then we are stuck with this inconsistency forever. Pick a direction and go there, consistently :) 'new' POSIX is *much* safer to use, you don't have the risk of library code stomping errno, particularly during error unwind. > >The specs say functions set errno on error return. They say nothing > >about clearing it on success return. Assuming errno is ever 0 is > >absolutely not portable, and IIRC, doesn't work on Linux anyhow. > > Humm, again, take a look at umad, it sets errno to 0 before calling > the write() system call. Gah, that is so incredibly fragile and wrong. write isn't guaranteed to not kill errno - eg ERESTARTSYS might get written into errno in some cases (not sure on Linux, POSIX allows errno to be stomped though). Further, look up the unwind chain. There are *lots* of function calls. This is a classic errno mistake, read 'man errno': A common mistake is to do if (somecall() == -1) { printf("somecall() failed\n"); if (errno == ...) { ... } } where errno no longer needs to have the value it had upon return from somecall() (i.e., it may have been changed by the printf(3)). If the value of errno should be preserved across a library call, it must be saved: This is why the 'new' POSIX method is so desirable. It avoids people inadvertently creating these kinds of mistakes. :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-21 21:58 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AF5-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2013-03-21 22:07 ` brendan doyle 1 sibling, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 21:58 UTC (permalink / raw) To: Jason Gunthorpe, brendan doyle Cc: Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > Sent: Thursday, March 21, 2013 2:27 PM > To: brendan doyle > Cc: Weiny, Ira; Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; > Pramod Gunjikar > Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports > of HCA are connected] > > On Thu, Mar 21, 2013 at 08:37:08PM +0000, brendan doyle wrote: > > > >The 'old' POSIX convention had int functions return 0 on success and > > >set errno. Some function will restrict error returns to -1, others > > >leave it open. > > > OK so then the inconsistency is further down the stack.. > > No doubt there would be many places requiring various levels of fixing, I > don't think anybody ever thought about how to do error return when this > was written. > > > But mad_encode() does not set errno, and also retuning +errno instead > > of -1 would break the existing API usage. > > The existing API is 0 on success != 0 on failure - if code tests for > -1 it is wrong :) The question is, do we care? I don't. I would love for this library to go away... But as it looks like that is not going to happen... > > > break the easting API and cause consumers to fail, but also to add the > > ability to distinguish between types of failures for a specific > > failure mode. > > Well, somebody needs to decide how it would ideally look and migrate things > in that direction 'new' vs 'old' POSIX style.. That somebody is me... ;-) > > Once you introduce an 'old' style return, you are going to start using it and > then we are stuck with this inconsistency forever. Pick a direction and go > there, consistently :) > > 'new' POSIX is *much* safer to use, you don't have the risk of library code > stomping errno, particularly during error unwind. I really don't see how the new POSIX model prevents libraries from stomping errno? I have a different idea. I think the library should define a new call. int mad_get_error(const struct ibmad_port *srcport); This returns an errno compatible value which corresponds to the last mad_* call. Ibmad_port is opaque and we can add an error field which is set on error and retrieved with this call. This avoids the use of errno all together, preserves the current error return values and allows for additional error information to be queried by the app if they so desire. The current use of errno will be removed and all the calls which don't have an ibmad_port parameter will finally be removed from the library. They have been listed as deprecated for many years now. The only disadvantage to this is that ibmad_port is not thread safe. So if you are sharing an ibmad_port between threads you'll have issues. > > > >The specs say functions set errno on error return. They say nothing > > >about clearing it on success return. Assuming errno is ever 0 is > > >absolutely not portable, and IIRC, doesn't work on Linux anyhow. > > > > Humm, again, take a look at umad, it sets errno to 0 before calling > > the write() system call. > > Gah, that is so incredibly fragile and wrong. write isn't guaranteed to not kill > errno - eg ERESTARTSYS might get written into errno in some cases (not sure > on Linux, POSIX allows errno to be stomped though). > > Further, look up the unwind chain. There are *lots* of function calls. This is a > classic errno mistake, read 'man errno': > > A common mistake is to do > > if (somecall() == -1) { > printf("somecall() failed\n"); > if (errno == ...) { ... } > } > > where errno no longer needs to have the value it had upon return from > somecall() (i.e., it may have been changed by the printf(3)). If the value > of errno should be preserved across a library call, it must be saved: > > This is why the 'new' POSIX method is so desirable. It avoids people > inadvertently creating these kinds of mistakes. :) This is exactly my concern with using errno. So I propose not using it. Ira > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB4AF5-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AF5-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-21 22:07 ` Jason Gunthorpe [not found] ` <20130321220751.GG8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-21 22:07 UTC (permalink / raw) To: Weiny, Ira Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Thu, Mar 21, 2013 at 09:58:09PM +0000, Weiny, Ira wrote: > > Once you introduce an 'old' style return, you are going to start using it and > > then we are stuck with this inconsistency forever. Pick a direction and go > > there, consistently :) > > > > 'new' POSIX is *much* safer to use, you don't have the risk of library code > > stomping errno, particularly during error unwind. > > I really don't see how the new POSIX model prevents libraries from stomping errno? Most errors are returned as +ERRNO via 'int' so errno is not involved, this is the common case. The uncommon case degrades into the 'old' errno case, which people theoretically know how to program for. At the very least it works often enough for application writers. People working on library code have to be aware of proper errno use, and the cavets. This is true of all the IB libraries. > I have a different idea. I think the library should define a new call. > > int mad_get_error(const struct ibmad_port *srcport); > > This returns an errno compatible value which corresponds to the last > mad_* call. Ibmad_port is opaque and we can add an error field .. and now you have created a new, special, unfamiliar paradigm that doesn't avoid the main problems with errno, just reduces the scope, and breaks thread safety.. openssl did something very much like this, and I once went through a painful process to make sure that every call to openssl used the openssl error check idiom, and every C library call used the proper errno idiom, and so forth. Adding more error check idioms for application writers to follow isn't going to make their job easier or create more correct code. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130321220751.GG8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321220751.GG8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-21 22:46 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 22:46 UTC (permalink / raw) To: Jason Gunthorpe Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > > On Thu, Mar 21, 2013 at 09:58:09PM +0000, Weiny, Ira wrote: > > > > Once you introduce an 'old' style return, you are going to start > > > using it and then we are stuck with this inconsistency forever. Pick > > > a direction and go there, consistently :) > > > > > > 'new' POSIX is *much* safer to use, you don't have the risk of > > > library code stomping errno, particularly during error unwind. > > > > I really don't see how the new POSIX model prevents libraries from > stomping errno? > > Most errors are returned as +ERRNO via 'int' so errno is not involved, this is > the common case. > > The uncommon case degrades into the 'old' errno case, which people > theoretically know how to program for. At the very least it works often > enough for application writers. > > People working on library code have to be aware of proper errno use, and > the cavets. This is true of all the IB libraries. Ok, not counting simple *2str, dump functions and other "simple" calls, and removing the deprecated calls. There are 19 calls which return a pointer 17 calls which return int Of those returning int Ib_path_query_via returns the dlid on success mad_register_server_via returns the agent id on success mad_register_client_via returns the agent id on success mad_register_port_client returns the agent id on success mad_rpc_class_agent returns the agent on success mad_build_pkt returns the length of the packet built on success mad_print_field returns the length of the string printed on success That leaves 10 calls which return int and could be "fixed" by returning +ERRNO. Converting to the new POSIX errno fixes only 27% of the library calls and degrades the rest of them to use the old POSIX's errno handling which is broken... :-( And breaks those users who currently expect those calls to return -1 (or at least <0) on error. I'm not saying this is bad but it is not a complete solution and shows just how messed up this library is. > > > I have a different idea. I think the library should define a new call. > > > > int mad_get_error(const struct ibmad_port *srcport); > > > > This returns an errno compatible value which corresponds to the last > > mad_* call. Ibmad_port is opaque and we can add an error field > > .. and now you have created a new, special, unfamiliar paradigm that doesn't > avoid the main problems with errno, just reduces the scope, and breaks > thread safety.. > > openssl did something very much like this, and I once went through a painful > process to make sure that every call to openssl used the openssl error check > idiom, and every C library call used the proper errno idiom, and so forth. > Adding more error check idioms for application writers to follow isn't going to > make their job easier or create more correct code. I agree. But the only correct solution here is to throw out libibmad and start over. I am not going to do that work. Therefore, none of these solutions is good. At least with my proposed solution the semantics would be the same for every call. I am sorry that each library is different and application users may have to follow different idioms. However having 2 or 3 in the same library is even worse!!! <sigh> Brandon, if you want to convert the said 10 calls to return +ERRNO on failure I'll accept such a patch. I really don't have time to implement my solution anyway. Be aware that I am removing the DEPRECATED calls as we speak. At least that will clean up some of this library. Ira > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-21 22:49 ` Weiny, Ira 2013-03-21 22:50 ` Jason Gunthorpe 1 sibling, 0 replies; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 22:49 UTC (permalink / raw) To: Weiny, Ira, Jason Gunthorpe Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Weiny, Ira [snip] > > Of those returning int > Ib_path_query_via returns the dlid on success mad_register_server_via returns the agent id on success mad_register_client_via returns the agent id on success mad_register_port_client returns the agent id on success mad_rpc_class_agent returns the agent on success mad_build_pkt returns the length of the packet built on success mad_print_field returns the length of the string printed on success Sorry about outlooks frelling line wrap crap... Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2013-03-21 22:49 ` Weiny, Ira @ 2013-03-21 22:50 ` Jason Gunthorpe [not found] ` <20130321225018.GA9749-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-21 22:50 UTC (permalink / raw) To: Weiny, Ira Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Thu, Mar 21, 2013 at 10:46:06PM +0000, Weiny, Ira wrote: > have to follow different idioms. However having 2 or 3 in the same > library is even worse!!! Agreed, I think you made a compelling case that the new POSIX method is not appropriate for this library, so standardizing on the old method is the least bad option.. So Brandon, maybe redo your patch do get rid of the bogus errno != 0 stuff? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130321225018.GA9749-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321225018.GA9749-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-21 22:53 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 22:53 UTC (permalink / raw) To: Jason Gunthorpe Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > Sent: Thursday, March 21, 2013 3:50 PM > To: Weiny, Ira > Cc: brendan doyle; Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux- > rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pramod Gunjikar > Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports > of HCA are connected] > > On Thu, Mar 21, 2013 at 10:46:06PM +0000, Weiny, Ira wrote: > > > have to follow different idioms. However having 2 or 3 in the same > > library is even worse!!! > > Agreed, I think you made a compelling case that the new POSIX method is > not appropriate for this library, so standardizing on the old method is the > least bad option.. > Wait what did I do? You think using errno is the "right" thing to do now? Ira > So Brandon, maybe redo your patch do get rid of the bogus errno != 0 stuff? > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-21 22:58 ` Jason Gunthorpe [not found] ` <20130321225858.GA9924-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2013-03-21 23:04 ` brendan doyle 1 sibling, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-21 22:58 UTC (permalink / raw) To: Weiny, Ira Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Thu, Mar 21, 2013 at 10:53:21PM +0000, Weiny, Ira wrote: > > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > > Sent: Thursday, March 21, 2013 3:50 PM > > To: Weiny, Ira > > Cc: brendan doyle; Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux- > > rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pramod Gunjikar > > Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports > > of HCA are connected] > > > > On Thu, Mar 21, 2013 at 10:46:06PM +0000, Weiny, Ira wrote: > > > > > have to follow different idioms. However having 2 or 3 in the same > > > library is even worse!!! > > > > Agreed, I think you made a compelling case that the new POSIX method is > > not appropriate for this library, so standardizing on the old method is the > > least bad option.. > > > > Wait what did I do? > > You think using errno is the "right" thing to do now? Well, I greatly prefer +ERRNO return for its enhanced safety and consistency with other libraries but I didn't do the analysis you did to see how inapplicable it is to the existing function signatures.. Usually it is > 50% :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130321225858.GA9924-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321225858.GA9924-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-21 23:05 ` Weiny, Ira 0 siblings, 0 replies; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 23:05 UTC (permalink / raw) To: Jason Gunthorpe Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > > On Thu, Mar 21, 2013 at 10:53:21PM +0000, Weiny, Ira wrote: > > > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > > > > > > On Thu, Mar 21, 2013 at 10:46:06PM +0000, Weiny, Ira wrote: > > > > > > > have to follow different idioms. However having 2 or 3 in the > > > > same library is even worse!!! > > > > > > Agreed, I think you made a compelling case that the new POSIX method > > > is not appropriate for this library, so standardizing on the old > > > method is the least bad option.. > > > > > > > Wait what did I do? > > > > You think using errno is the "right" thing to do now? > > Well, I greatly prefer +ERRNO return for its enhanced safety and consistency > with other libraries but I didn't do the analysis you did to see how > inapplicable it is to the existing function signatures.. Usually it is > 50% :) > Did I mention this library should go away? :-P Ok, Brendan if you want to resubmit using errno that is fine (with Jasons comment). Also lets go ahead and clean up all 10 of the calls which currently either return 0 or -1 to return -1 on error with errno set. At least that will make everything consistent for most of the int return calls. We can clean up the other pointer return calls as we go. Again, don't worry about the deprecated calls. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2013-03-21 22:58 ` Jason Gunthorpe @ 2013-03-21 23:04 ` brendan doyle [not found] ` <514B9215.2000106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 31+ messages in thread From: brendan doyle @ 2013-03-21 23:04 UTC (permalink / raw) To: Weiny, Ira Cc: Jason Gunthorpe, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 21/03/2013 22:53, Weiny, Ira wrote: >> -----Original Message----- >> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] >> Sent: Thursday, March 21, 2013 3:50 PM >> To: Weiny, Ira >> Cc: brendan doyle; Boris Chiu; iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux- >> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pramod Gunjikar >> Subject: Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports >> of HCA are connected] >> >> On Thu, Mar 21, 2013 at 10:46:06PM +0000, Weiny, Ira wrote: >> >>> have to follow different idioms. However having 2 or 3 in the same >>> library is even worse!!! >> Agreed, I think you made a compelling case that the new POSIX method is >> not appropriate for this library, so standardizing on the old method is the >> least bad option.. >> > Wait what did I do? > > You think using errno is the "right" thing to do now? I agree that it is the least bad option, I don't want to fix the int return APIs, as I think that will break infiniband-diags and who knows what else, and as you say we then end up with two or 3 idioms within the same library. I think it's too much of a stretch, at least for me to go redo the whole lib, so the best we can do is at least be consistent. I'm going to be traveling shortly I'll take another look at the patch next week. > > Ira > >> So Brandon, maybe redo your patch do get rid of the bogus errno != 0 stuff? >> >> Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <514B9215.2000106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514B9215.2000106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2013-03-21 23:06 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4BBF-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 23:06 UTC (permalink / raw) To: brendan doyle Cc: Jason Gunthorpe, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: brendan doyle [mailto:brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org] > > > On 21/03/2013 22:53, Weiny, Ira wrote: > >> -----Original Message----- > >> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > >> > >> On Thu, Mar 21, 2013 at 10:46:06PM +0000, Weiny, Ira wrote: > >> > >>> have to follow different idioms. However having 2 or 3 in the same > >>> library is even worse!!! > >> Agreed, I think you made a compelling case that the new POSIX method > >> is not appropriate for this library, so standardizing on the old > >> method is the least bad option.. > >> > > Wait what did I do? > > > > You think using errno is the "right" thing to do now? > > I agree that it is the least bad option, I don't want to fix the int return APIs, as > I think that will break infiniband-diags and who knows what else, and as you > say we then end up with two or 3 idioms within the same library. I think it's > too much of a stretch, at least for me to go redo the whole lib, so the best we > can do is at least be consistent. > > I'm going to be traveling shortly I'll take another look at the patch next week. > Sounds good, travel safe, Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB4BBF-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4BBF-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-26 23:22 ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA 0 siblings, 0 replies; 31+ messages in thread From: brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA @ 2013-03-26 23:22 UTC (permalink / raw) To: Weiny, Ira Cc: Jason Gunthorpe, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar OK Got home safely, thanks, made some changes, I've ask Boris to pick these up, rebuild the patch, test and resubmit, so expect something later in the week. On 03/21/13 11:06 PM, Weiny, Ira wrote: >> -----Original Message----- >> From: brendan doyle [mailto:brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org] >> >> >> On 21/03/2013 22:53, Weiny, Ira wrote: >>>> -----Original Message----- >>>> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] >>>> >>>> On Thu, Mar 21, 2013 at 10:46:06PM +0000, Weiny, Ira wrote: >>>> >>>>> have to follow different idioms. However having 2 or 3 in the same >>>>> library is even worse!!! >>>> Agreed, I think you made a compelling case that the new POSIX method >>>> is not appropriate for this library, so standardizing on the old >>>> method is the least bad option.. >>>> >>> Wait what did I do? >>> >>> You think using errno is the "right" thing to do now? >> I agree that it is the least bad option, I don't want to fix the int return APIs, as >> I think that will break infiniband-diags and who knows what else, and as you >> say we then end up with two or 3 idioms within the same library. I think it's >> too much of a stretch, at least for me to go redo the whole lib, so the best we >> can do is at least be consistent. >> >> I'm going to be traveling shortly I'll take another look at the patch next week. >> > Sounds good, travel safe, > Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2013-03-21 21:58 ` Weiny, Ira @ 2013-03-21 22:07 ` brendan doyle 1 sibling, 0 replies; 31+ messages in thread From: brendan doyle @ 2013-03-21 22:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 21/03/2013 21:27, Jason Gunthorpe wrote: > On Thu, Mar 21, 2013 at 08:37:08PM +0000, brendan doyle wrote: > >>> The 'old' POSIX convention had int functions return 0 on success and >>> set errno. Some function will restrict error returns to -1, others >>> leave it open. >> OK so then the inconsistency is further down the stack.. > No doubt there would be many places requiring various levels of > fixing, I don't think anybody ever thought about how to do error return > when this was written. > >> But mad_encode() does not set errno, and also retuning +errno >> instead of -1 would break the existing API usage. > The existing API is 0 on success != 0 on failure - if code tests for > -1 it is wrong :) The question is, do we care? Well, I'm not sure all the utilities in the diags pkg have if (ret != 0) fail, the point is that if any have if (ret <0) fail then fixing this function could cause diag utilities to fail, and fixing the library also involves fixing all our internal OFED consumers (diags, utils etc, ) and notifying our external consumers of the API change. And I'm not saying that should not be done, but I don't have the bandwidth to under take that task and I'd question what is the ROI. So I understand introducing another non POSIX function return/use of errno just further propagates non compliance, however unless we intend to fix the whole lib and its consumers then I don't see that this is too big a deal. This is the current way consumers use the API, so it won't break current usage, and if/when the whole lib is overhauled to be POISX compliant then this new non POISX compliant code can be fixed as part of that effort. Essentially we wanted to add something that is quick and useful to the consumer but we don't want to have to rebuild the whole house to do that. Brendan. >> break the easting API and cause consumers to fail, but also to add >> the ability to distinguish between types of failures for a specific >> failure mode. > Well, somebody needs to decide how it would ideally look and migrate > things in that direction 'new' vs 'old' POSIX style.. > > Once you introduce an 'old' style return, you are going to start using > it and then we are stuck with this inconsistency forever. Pick a > direction and go there, consistently :) > > 'new' POSIX is *much* safer to use, you don't have the risk of library > code stomping errno, particularly during error unwind. > >>> The specs say functions set errno on error return. They say nothing >>> about clearing it on success return. Assuming errno is ever 0 is >>> absolutely not portable, and IIRC, doesn't work on Linux anyhow. >> Humm, again, take a look at umad, it sets errno to 0 before calling >> the write() system call. > Gah, that is so incredibly fragile and wrong. write isn't guaranteed > to not kill errno - eg ERESTARTSYS might get written into errno in > some cases (not sure on Linux, POSIX allows errno to be stomped though). > > Further, look up the unwind chain. There are *lots* of function > calls. This is a classic errno mistake, read 'man errno': > > A common mistake is to do > > if (somecall() == -1) { > printf("somecall() failed\n"); > if (errno == ...) { ... } > } > > where errno no longer needs to have the value it had upon return from > somecall() (i.e., it may have been changed by the printf(3)). If the value > of errno should be preserved across a library call, it must be saved: > > This is why the 'new' POSIX method is so desirable. It avoids people > inadvertently creating these kinds of mistakes. :) > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2013-03-21 21:27 ` Jason Gunthorpe @ 2013-03-21 21:30 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AB2-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 1 sibling, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 21:30 UTC (permalink / raw) To: brendan doyle, Jason Gunthorpe Cc: Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: brendan doyle [mailto:brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org] > > > On 21/03/2013 05:21, Jason Gunthorpe wrote: > > On Thu, Mar 21, 2013 at 01:01:59AM +0000, brendan doyle wrote: > >> On 20/03/2013 23:19, Jason Gunthorpe wrote: > >>> On Wed, Mar 20, 2013 at 10:44:47PM +0000, brendan doyle wrote: > >>> [snip] > > > >>>> @@ -178,6 +178,7 @@ int _do_madrpc(int port_id, void *sndbuf, void > *rcvbuf, int > >>>> status = umad_status(rcvbuf); > >>>> + errno = status; > >>> _do_madrpc already returns int, so this is not great either. > >> OK I could concede on this, the Solaris kernel uses > >> ib_user_mad_hdr.status to return an errno to indicate the nature of > >> failure, where as linux does not. > > Hrm? Linux will return at least ETIMEDOUT in the .status field. > > Well Linux is not my bag, but I couldn't see where in the code this was the > case. I'm a bit confused by what you guys are debating here. FWIW umad_status does not return MAD.status (IBTA 1.2.1 section 13.4.7) but rather ib_user_mad.status. ib_user_mad is the umad meta data to communicate with the kernel and ib_user_mad.status _is_ an errno value. I don't know what solaris does. Ira > > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB4AB2-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AB2-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-21 21:47 ` Jason Gunthorpe [not found] ` <20130321214725.GD8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-21 21:47 UTC (permalink / raw) To: Weiny, Ira Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Thu, Mar 21, 2013 at 09:30:07PM +0000, Weiny, Ira wrote: > > Well Linux is not my bag, but I couldn't see where in the code > > this was the case. > > I'm a bit confused by what you guys are debating here. FWIW > umad_status does not return MAD.status (IBTA 1.2.1 section 13.4.7) > but rather ib_user_mad.status. > > ib_user_mad is the umad meta data to communicate with the kernel and > ib_user_mad.status _is_ an errno value. I don't know what solaris > does. Yes, I think we all agree on this.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20130321214725.GD8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321214725.GD8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-03-21 22:12 ` brendan doyle 0 siblings, 0 replies; 31+ messages in thread From: brendan doyle @ 2013-03-21 22:12 UTC (permalink / raw) To: Jason Gunthorpe Cc: Weiny, Ira, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On 21/03/2013 21:47, Jason Gunthorpe wrote: > On Thu, Mar 21, 2013 at 09:30:07PM +0000, Weiny, Ira wrote: > >>> Well Linux is not my bag, but I couldn't see where in the code >>> this was the case. >> I'm a bit confused by what you guys are debating here. FWIW >> umad_status does not return MAD.status (IBTA 1.2.1 section 13.4.7) >> but rather ib_user_mad.status. >> >> ib_user_mad is the umad meta data to communicate with the kernel and >> ib_user_mad.status _is_ an errno value. I don't know what solaris >> does. > Yes, I think we all agree on this.. Yes, and in Solaris we return errno here, I was just pointing out that I didn't see where errno was being returned here in the OFED kernel code, but then I didn't look too hard. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2013-03-21 20:37 ` brendan doyle @ 2013-03-21 21:37 ` Weiny, Ira [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4ACB-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 1 sibling, 1 reply; 31+ messages in thread From: Weiny, Ira @ 2013-03-21 21:37 UTC (permalink / raw) To: Jason Gunthorpe, brendan doyle Cc: Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar > -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] [snip] > > I looked at all of these, they are OK because the functions do not return int, > they are returning pointers. > > The 'new' POSIX convention is: > - int function returns are 0 on success, -1 on 'not-an-errno' and > +ERRNO to indicate error. > - 'fd in an int' function returns are -1 on failure with errno set, > everything else is a valid fd. (Code that checks for fd's < 0 is > pedantically wrong) > - pointer function returns are NULL on failure with errno set. > - void returns can never fail > Do you have a link to this? I have been searching all through this discussion and have not found a good reference. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2807E5FD2F6FDA4886F6618EAC48510EBB4ACB-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] [not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4ACB-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-03-21 21:48 ` Jason Gunthorpe 0 siblings, 0 replies; 31+ messages in thread From: Jason Gunthorpe @ 2013-03-21 21:48 UTC (permalink / raw) To: Weiny, Ira Cc: brendan doyle, Boris Chiu, iweiny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Pramod Gunjikar On Thu, Mar 21, 2013 at 09:37:41PM +0000, Weiny, Ira wrote: > > From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org] > > I looked at all of these, they are OK because the functions do not > > return int, they are returning pointers. > > > > The 'new' POSIX convention is: > > - int function returns are 0 on success, -1 on 'not-an-errno' and > > +ERRNO to indicate error. > > - 'fd in an int' function returns are -1 on failure with errno set, > > everything else is a valid fd. (Code that checks for fd's < 0 is > > pedantically wrong) > > - pointer function returns are NULL on failure with errno set. > > - void returns can never fail > > > > Do you have a link to this? I have been searching all through this > discussion and have not found a good reference. I've never seen it spelled out like this before, it is the convention they switched to for pthreads and many new things follow it now. Look at the pthread man pages for instance. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2013-03-26 23:22 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5140E1A3.9070706@oracle.com>
[not found] ` <51427819.7000505@oracle.com>
[not found] ` <51427819.7000505-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-15 1:26 ` [Fwd: Re: [PATCH] libibmad: Fixes for failures when not all ports of HCA are connected] brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <514278CA.8010809-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 1:27 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB3F62-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 18:35 ` brendan doyle
[not found] ` <514A0156.2070009-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 19:02 ` Jason Gunthorpe
[not found] ` <20130320190208.GA23478-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 21:52 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4214-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-20 22:00 ` brendan doyle
[not found] ` <514A3169.7000501-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 22:24 ` Jason Gunthorpe
[not found] ` <20130320222422.GA30100-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 22:44 ` brendan doyle
[not found] ` <514A3BDF.2090105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 23:19 ` Jason Gunthorpe
[not found] ` <20130320231923.GA32300-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-20 23:30 ` Hefty, Sean
2013-03-21 1:01 ` brendan doyle
[not found] ` <514A5C07.3080308-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 5:21 ` Jason Gunthorpe
[not found] ` <20130321052122.GB20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 20:37 ` brendan doyle
[not found] ` <514B6F74.9020707-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 21:27 ` Jason Gunthorpe
[not found] ` <20130321212703.GA8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 21:58 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AF5-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:07 ` Jason Gunthorpe
[not found] ` <20130321220751.GG8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:46 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B49-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:49 ` Weiny, Ira
2013-03-21 22:50 ` Jason Gunthorpe
[not found] ` <20130321225018.GA9749-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:53 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4B7E-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 22:58 ` Jason Gunthorpe
[not found] ` <20130321225858.GA9924-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 23:05 ` Weiny, Ira
2013-03-21 23:04 ` brendan doyle
[not found] ` <514B9215.2000106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 23:06 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4BBF-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-26 23:22 ` brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA
2013-03-21 22:07 ` brendan doyle
2013-03-21 21:30 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4AB2-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:47 ` Jason Gunthorpe
[not found] ` <20130321214725.GD8431-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 22:12 ` brendan doyle
2013-03-21 21:37 ` Weiny, Ira
[not found] ` <2807E5FD2F6FDA4886F6618EAC48510EBB4ACB-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21 21:48 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox