* [PATCH rdma-core 1/2] srp_daemon: handle SM lid change
@ 2017-12-12 14:08 Nicolas Morey-Chaisemartin
[not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-12 14:08 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg
When srp_daemon was running and the master SM host changes,
srp_daemon output these errors at every scan:
srp_daemon[25394]: No response to inform info registration
srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
running on fabric or IB port is down
This was introduced by commit 4952e5f Fix a memory leak.
A side effect of this patch was that create_ah was only called when the
port lid changes. Which meant register_to_traps used an older, obsolete,
version of sm_lid and failed to connect to it.
This patch fixes this behaviour by checking for both local lid changes and
SM lid changes, and calling create_ah on any of these events.
Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16
---
srp_daemon/srp_daemon.c | 10 ++++++----
srp_daemon/srp_daemon.h | 2 +-
srp_daemon/srp_handle_traps.c | 14 +++++++++++---
3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index 2465ccd9..36df5c3b 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res,
int i, num_pkeys = 0;
uint16_t pkey;
uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx,
- config->port_num);
+ config->port_num, NULL);
in_mad_buf = malloc(sizeof(struct ib_user_mad) +
node_table_response_size);
@@ -2092,7 +2092,7 @@ int main(int argc, char *argv[])
{
int ret;
struct resources *res;
- uint16_t lid;
+ uint16_t lid, sm_lid;
uint16_t pkey;
union umad_gid gid;
struct target_details *target;
@@ -2196,8 +2196,10 @@ catas_start:
pr_debug("Starting a recalculation\n");
port_lid = get_port_lid(res->ud_res->ib_ctx,
- config->port_num);
- if (port_lid != res->ud_res->port_attr.lid) {
+ config->port_num, &sm_lid);
+ if (port_lid != res->ud_res->port_attr.lid ||
+ sm_lid != res->ud_res->port_attr.sm_lid) {
+
if (res->ud_res->ah) {
ibv_destroy_ah(res->ud_res->ah);
res->ud_res->ah = NULL;
diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
index 5d268ed3..864b3d42 100644
--- a/srp_daemon/srp_daemon.h
+++ b/srp_daemon/srp_daemon.h
@@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in);
int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid);
int create_trap_resources(struct ud_resources *ud_res);
int register_to_traps(struct resources *res, int subscribe);
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num);
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid);
int create_ah(struct ud_resources *ud_res);
void push_gid_to_list(struct sync_resources *res, union umad_gid *gid,
uint16_t pkey);
diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index 6d94634e..25f2b9ab 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res)
return 0;
}
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num)
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid)
{
struct ibv_port_attr port_attr;
+ int ret;
+
+ ret = ibv_query_port(ib_ctx, port_num, &port_attr);
- return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ?
- port_attr.lid : 0;
+ if (!ret) {
+ if (sm_lid)
+ *sm_lid = port_attr.sm_lid;
+ return port_attr.lid;
+ }
+
+ return 0;
}
int create_ah(struct ud_resources *ud_res)
--
2.15.1.272.g8e603414b
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 5+ messages in thread[parent not found: <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>]
* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org> @ 2017-12-12 14:38 ` Hal Rosenstock 2017-12-12 17:10 ` Bart Van Assche 2017-12-13 11:32 ` Dennis Dalessandro 2 siblings, 0 replies; 5+ messages in thread From: Hal Rosenstock @ 2017-12-12 14:38 UTC (permalink / raw) To: Nicolas Morey-Chaisemartin, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg On 12/12/2017 9:08 AM, Nicolas Morey-Chaisemartin wrote: > When srp_daemon was running and the master SM host changes, > srp_daemon output these errors at every scan: > srp_daemon[25394]: No response to inform info registration > srp_daemon[25394]: Fail to register to traps, maybe there is no opensm > running on fabric or IB port is down > > This was introduced by commit 4952e5f Fix a memory leak. > A side effect of this patch was that create_ah was only called when the > port lid changes. Which meant register_to_traps used an older, obsolete, > version of sm_lid and failed to connect to it. > > This patch fixes this behaviour by checking for both local lid changes and > SM lid changes, and calling create_ah on any of these events. > > Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org> > Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16 > --- > srp_daemon/srp_daemon.c | 10 ++++++---- > srp_daemon/srp_daemon.h | 2 +- > srp_daemon/srp_handle_traps.c | 14 +++++++++++--- > 3 files changed, 18 insertions(+), 8 deletions(-) > > diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c > index 2465ccd9..36df5c3b 100644 > --- a/srp_daemon/srp_daemon.c > +++ b/srp_daemon/srp_daemon.c > @@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res, > int i, num_pkeys = 0; > uint16_t pkey; > uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx, > - config->port_num); > + config->port_num, NULL); > > in_mad_buf = malloc(sizeof(struct ib_user_mad) + > node_table_response_size); > @@ -2092,7 +2092,7 @@ int main(int argc, char *argv[]) > { > int ret; > struct resources *res; > - uint16_t lid; > + uint16_t lid, sm_lid; > uint16_t pkey; > union umad_gid gid; > struct target_details *target; > @@ -2196,8 +2196,10 @@ catas_start: > > pr_debug("Starting a recalculation\n"); > port_lid = get_port_lid(res->ud_res->ib_ctx, > - config->port_num); > - if (port_lid != res->ud_res->port_attr.lid) { > + config->port_num, &sm_lid); > + if (port_lid != res->ud_res->port_attr.lid || > + sm_lid != res->ud_res->port_attr.sm_lid) { > + > if (res->ud_res->ah) { > ibv_destroy_ah(res->ud_res->ah); > res->ud_res->ah = NULL; > diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h > index 5d268ed3..864b3d42 100644 > --- a/srp_daemon/srp_daemon.h > +++ b/srp_daemon/srp_daemon.h > @@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in); > int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid); > int create_trap_resources(struct ud_resources *ud_res); > int register_to_traps(struct resources *res, int subscribe); > -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num); > +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid); > int create_ah(struct ud_resources *ud_res); > void push_gid_to_list(struct sync_resources *res, union umad_gid *gid, > uint16_t pkey); > diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c > index 6d94634e..25f2b9ab 100644 > --- a/srp_daemon/srp_handle_traps.c > +++ b/srp_daemon/srp_handle_traps.c > @@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res) > return 0; > } > > -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num) > +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid) > { > struct ibv_port_attr port_attr; > + int ret; > + > + ret = ibv_query_port(ib_ctx, port_num, &port_attr); > > - return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ? > - port_attr.lid : 0; > + if (!ret) { > + if (sm_lid) > + *sm_lid = port_attr.sm_lid; > + return port_attr.lid; > + } > + > + return 0; > } > > int create_ah(struct ud_resources *ud_res) > Reviewed-by: Hal Rosenstock <hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org> 2017-12-12 14:38 ` Hal Rosenstock @ 2017-12-12 17:10 ` Bart Van Assche 2017-12-13 11:32 ` Dennis Dalessandro 2 siblings, 0 replies; 5+ messages in thread From: Bart Van Assche @ 2017-12-12 17:10 UTC (permalink / raw) To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: bvanassche-HInyCGIudOg@public.gmane.org, stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org, hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 796 bytes --] On Tue, 2017-12-12 at 15:08 +0100, Nicolas Morey-Chaisemartin wrote: > When srp_daemon was running and the master SM host changes, > srp_daemon output these errors at every scan: > srp_daemon[25394]: No response to inform info registration > srp_daemon[25394]: Fail to register to traps, maybe there is no opensm > running on fabric or IB port is down Please include a cover letter when sending a patch series and set sendemail.thread = true in your ~/.gitconfig such that e-mail clients that support threading can keep track of a patch series. Anyway, since this patch looks fine to me: Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org> 2017-12-12 14:38 ` Hal Rosenstock 2017-12-12 17:10 ` Bart Van Assche @ 2017-12-13 11:32 ` Dennis Dalessandro [not found] ` <0ca80873-0eb0-9c64-f813-dee94b82eea6-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 2 siblings, 1 reply; 5+ messages in thread From: Dennis Dalessandro @ 2017-12-13 11:32 UTC (permalink / raw) To: Nicolas Morey-Chaisemartin, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg On 12/12/2017 9:08 AM, Nicolas Morey-Chaisemartin wrote: > When srp_daemon was running and the master SM host changes, > srp_daemon output these errors at every scan: > srp_daemon[25394]: No response to inform info registration > srp_daemon[25394]: Fail to register to traps, maybe there is no opensm > running on fabric or IB port is down > > This was introduced by commit 4952e5f Fix a memory leak. > A side effect of this patch was that create_ah was only called when the > port lid changes. Which meant register_to_traps used an older, obsolete, > version of sm_lid and failed to connect to it. > > This patch fixes this behaviour by checking for both local lid changes and > SM lid changes, and calling create_ah on any of these events. > > Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org> > Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16 You are probably going to want to add a proper fixes tag here rather than just mentioning in the commit messages. Fixes: <12-char-of-SHA> ("Patch subject") -Denny -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <0ca80873-0eb0-9c64-f813-dee94b82eea6-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change [not found] ` <0ca80873-0eb0-9c64-f813-dee94b82eea6-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> @ 2017-12-19 5:46 ` Leon Romanovsky 0 siblings, 0 replies; 5+ messages in thread From: Leon Romanovsky @ 2017-12-19 5:46 UTC (permalink / raw) To: Dennis Dalessandro, Nicolas Morey-Chaisemartin Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg [-- Attachment #1: Type: text/plain, Size: 1103 bytes --] On Wed, Dec 13, 2017 at 06:32:30AM -0500, Dennis Dalessandro wrote: > On 12/12/2017 9:08 AM, Nicolas Morey-Chaisemartin wrote: > > When srp_daemon was running and the master SM host changes, > > srp_daemon output these errors at every scan: > > srp_daemon[25394]: No response to inform info registration > > srp_daemon[25394]: Fail to register to traps, maybe there is no opensm > > running on fabric or IB port is down > > > > This was introduced by commit 4952e5f Fix a memory leak. > > A side effect of this patch was that create_ah was only called when the > > port lid changes. Which meant register_to_traps used an older, obsolete, > > version of sm_lid and failed to connect to it. > > > > This patch fixes this behaviour by checking for both local lid changes and > > SM lid changes, and calling create_ah on any of these events. > > > > Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org> > > Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16 There is no need to mention all versions, it is enough to write first one. Thanks [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-12-19 5:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-12 14:08 [PATCH rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
[not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>
2017-12-12 14:38 ` Hal Rosenstock
2017-12-12 17:10 ` Bart Van Assche
2017-12-13 11:32 ` Dennis Dalessandro
[not found] ` <0ca80873-0eb0-9c64-f813-dee94b82eea6-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-12-19 5:46 ` Leon Romanovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox