public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-core 1/2] srp_daemon: handle SM lid change
@ 2017-12-12 14:08 Nicolas Morey-Chaisemartin
       [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-12 14:08 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

When srp_daemon was running and the master SM host changes,
 srp_daemon output these errors at every scan:
srp_daemon[25394]: No response to inform info registration
srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
 running on fabric or IB port is down

This was introduced by commit 4952e5f Fix a memory leak.
A side effect of this patch was that create_ah was only called when the
 port lid changes. Which meant register_to_traps used an older, obsolete,
 version of sm_lid and failed to connect to it.

This patch fixes this behaviour by checking for both local lid changes and
 SM lid changes, and calling create_ah on any of these events.

Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16
---
 srp_daemon/srp_daemon.c       | 10 ++++++----
 srp_daemon/srp_daemon.h       |  2 +-
 srp_daemon/srp_handle_traps.c | 14 +++++++++++---
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index 2465ccd9..36df5c3b 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res,
 	int i, num_pkeys = 0;
 	uint16_t pkey;
 	uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx,
-					       config->port_num);
+					       config->port_num, NULL);
 
 	in_mad_buf = malloc(sizeof(struct ib_user_mad) +
 			    node_table_response_size);
@@ -2092,7 +2092,7 @@ int main(int argc, char *argv[])
 {
 	int			ret;
 	struct resources       *res;
-	uint16_t 		lid;
+	uint16_t 		lid, sm_lid;
 	uint16_t 		pkey;
 	union umad_gid 		gid;
 	struct target_details  *target;
@@ -2196,8 +2196,10 @@ catas_start:
 
 			pr_debug("Starting a recalculation\n");
 			port_lid = get_port_lid(res->ud_res->ib_ctx,
-					   config->port_num);
-			if (port_lid != res->ud_res->port_attr.lid) {
+						config->port_num, &sm_lid);
+			if (port_lid != res->ud_res->port_attr.lid ||
+				sm_lid != res->ud_res->port_attr.sm_lid) {
+
 				if (res->ud_res->ah) {
 					ibv_destroy_ah(res->ud_res->ah);
 					res->ud_res->ah = NULL;
diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
index 5d268ed3..864b3d42 100644
--- a/srp_daemon/srp_daemon.h
+++ b/srp_daemon/srp_daemon.h
@@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in);
 int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid);
 int create_trap_resources(struct ud_resources *ud_res);
 int register_to_traps(struct resources *res, int subscribe);
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num);
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid);
 int create_ah(struct ud_resources *ud_res);
 void push_gid_to_list(struct sync_resources *res, union umad_gid *gid,
 		      uint16_t pkey);
diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index 6d94634e..25f2b9ab 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res)
 	return 0;
 }
 
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num)
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid)
 {
 	struct ibv_port_attr port_attr;
+	int ret;
+
+	ret = ibv_query_port(ib_ctx, port_num, &port_attr);
 
-	return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ?
-		port_attr.lid : 0;
+	if (!ret) {
+		if (sm_lid)
+			*sm_lid = port_attr.sm_lid;
+		return port_attr.lid;
+	}
+
+	return 0;
 }
 
 int create_ah(struct ud_resources *ud_res)
-- 
2.15.1.272.g8e603414b


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change
       [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>
@ 2017-12-12 14:38   ` Hal Rosenstock
  2017-12-12 17:10   ` Bart Van Assche
  2017-12-13 11:32   ` Dennis Dalessandro
  2 siblings, 0 replies; 5+ messages in thread
From: Hal Rosenstock @ 2017-12-12 14:38 UTC (permalink / raw)
  To: Nicolas Morey-Chaisemartin, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

On 12/12/2017 9:08 AM, Nicolas Morey-Chaisemartin wrote:
> When srp_daemon was running and the master SM host changes,
>  srp_daemon output these errors at every scan:
> srp_daemon[25394]: No response to inform info registration
> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
>  running on fabric or IB port is down
> 
> This was introduced by commit 4952e5f Fix a memory leak.
> A side effect of this patch was that create_ah was only called when the
>  port lid changes. Which meant register_to_traps used an older, obsolete,
>  version of sm_lid and failed to connect to it.
> 
> This patch fixes this behaviour by checking for both local lid changes and
>  SM lid changes, and calling create_ah on any of these events.
> 
> Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
> Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16
> ---
>  srp_daemon/srp_daemon.c       | 10 ++++++----
>  srp_daemon/srp_daemon.h       |  2 +-
>  srp_daemon/srp_handle_traps.c | 14 +++++++++++---
>  3 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
> index 2465ccd9..36df5c3b 100644
> --- a/srp_daemon/srp_daemon.c
> +++ b/srp_daemon/srp_daemon.c
> @@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res,
>  	int i, num_pkeys = 0;
>  	uint16_t pkey;
>  	uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx,
> -					       config->port_num);
> +					       config->port_num, NULL);
>  
>  	in_mad_buf = malloc(sizeof(struct ib_user_mad) +
>  			    node_table_response_size);
> @@ -2092,7 +2092,7 @@ int main(int argc, char *argv[])
>  {
>  	int			ret;
>  	struct resources       *res;
> -	uint16_t 		lid;
> +	uint16_t 		lid, sm_lid;
>  	uint16_t 		pkey;
>  	union umad_gid 		gid;
>  	struct target_details  *target;
> @@ -2196,8 +2196,10 @@ catas_start:
>  
>  			pr_debug("Starting a recalculation\n");
>  			port_lid = get_port_lid(res->ud_res->ib_ctx,
> -					   config->port_num);
> -			if (port_lid != res->ud_res->port_attr.lid) {
> +						config->port_num, &sm_lid);
> +			if (port_lid != res->ud_res->port_attr.lid ||
> +				sm_lid != res->ud_res->port_attr.sm_lid) {
> +
>  				if (res->ud_res->ah) {
>  					ibv_destroy_ah(res->ud_res->ah);
>  					res->ud_res->ah = NULL;
> diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
> index 5d268ed3..864b3d42 100644
> --- a/srp_daemon/srp_daemon.h
> +++ b/srp_daemon/srp_daemon.h
> @@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in);
>  int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid);
>  int create_trap_resources(struct ud_resources *ud_res);
>  int register_to_traps(struct resources *res, int subscribe);
> -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num);
> +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid);
>  int create_ah(struct ud_resources *ud_res);
>  void push_gid_to_list(struct sync_resources *res, union umad_gid *gid,
>  		      uint16_t pkey);
> diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
> index 6d94634e..25f2b9ab 100644
> --- a/srp_daemon/srp_handle_traps.c
> +++ b/srp_daemon/srp_handle_traps.c
> @@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res)
>  	return 0;
>  }
>  
> -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num)
> +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid)
>  {
>  	struct ibv_port_attr port_attr;
> +	int ret;
> +
> +	ret = ibv_query_port(ib_ctx, port_num, &port_attr);
>  
> -	return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ?
> -		port_attr.lid : 0;
> +	if (!ret) {
> +		if (sm_lid)
> +			*sm_lid = port_attr.sm_lid;
> +		return port_attr.lid;
> +	}
> +
> +	return 0;
>  }
>  
>  int create_ah(struct ud_resources *ud_res)
> 

Reviewed-by: Hal Rosenstock <hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change
       [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>
  2017-12-12 14:38   ` Hal Rosenstock
@ 2017-12-12 17:10   ` Bart Van Assche
  2017-12-13 11:32   ` Dennis Dalessandro
  2 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2017-12-12 17:10 UTC (permalink / raw)
  To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: bvanassche-HInyCGIudOg@public.gmane.org,
	stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 796 bytes --]

On Tue, 2017-12-12 at 15:08 +0100, Nicolas Morey-Chaisemartin wrote:
> When srp_daemon was running and the master SM host changes,
>  srp_daemon output these errors at every scan:
> srp_daemon[25394]: No response to inform info registration
> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
>  running on fabric or IB port is down

Please include a cover letter when sending a patch series and set
sendemail.thread = true in your ~/.gitconfig such that e-mail clients that
support threading can keep track of a patch series.

Anyway, since this patch looks fine to me:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change
       [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>
  2017-12-12 14:38   ` Hal Rosenstock
  2017-12-12 17:10   ` Bart Van Assche
@ 2017-12-13 11:32   ` Dennis Dalessandro
       [not found]     ` <0ca80873-0eb0-9c64-f813-dee94b82eea6-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 5+ messages in thread
From: Dennis Dalessandro @ 2017-12-13 11:32 UTC (permalink / raw)
  To: Nicolas Morey-Chaisemartin, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

On 12/12/2017 9:08 AM, Nicolas Morey-Chaisemartin wrote:
> When srp_daemon was running and the master SM host changes,
>   srp_daemon output these errors at every scan:
> srp_daemon[25394]: No response to inform info registration
> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
>   running on fabric or IB port is down
> 
> This was introduced by commit 4952e5f Fix a memory leak.
> A side effect of this patch was that create_ah was only called when the
>   port lid changes. Which meant register_to_traps used an older, obsolete,
>   version of sm_lid and failed to connect to it.
> 
> This patch fixes this behaviour by checking for both local lid changes and
>   SM lid changes, and calling create_ah on any of these events.
> 
> Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
> Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16

You are probably going to want to add a proper fixes tag here rather 
than just mentioning in the commit messages.

Fixes: <12-char-of-SHA> ("Patch subject")

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]     ` <0ca80873-0eb0-9c64-f813-dee94b82eea6-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-12-19  5:46       ` Leon Romanovsky
  0 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2017-12-19  5:46 UTC (permalink / raw)
  To: Dennis Dalessandro, Nicolas Morey-Chaisemartin
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

On Wed, Dec 13, 2017 at 06:32:30AM -0500, Dennis Dalessandro wrote:
> On 12/12/2017 9:08 AM, Nicolas Morey-Chaisemartin wrote:
> > When srp_daemon was running and the master SM host changes,
> >   srp_daemon output these errors at every scan:
> > srp_daemon[25394]: No response to inform info registration
> > srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
> >   running on fabric or IB port is down
> >
> > This was introduced by commit 4952e5f Fix a memory leak.
> > A side effect of this patch was that create_ah was only called when the
> >   port lid changes. Which meant register_to_traps used an older, obsolete,
> >   version of sm_lid and failed to connect to it.
> >
> > This patch fixes this behaviour by checking for both local lid changes and
> >   SM lid changes, and calling create_ah on any of these events.
> >
> > Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
> > Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16

There is no need to mention all versions, it is enough to write first one.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-12-19  5:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-12 14:08 [PATCH rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
     [not found] ` <dba1097c-8ab9-7086-a976-46e6d3c4a165-IBi9RG/b67k@public.gmane.org>
2017-12-12 14:38   ` Hal Rosenstock
2017-12-12 17:10   ` Bart Van Assche
2017-12-13 11:32   ` Dennis Dalessandro
     [not found]     ` <0ca80873-0eb0-9c64-f813-dee94b82eea6-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-12-19  5:46       ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox