linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 rdma-core 0/2] srp_daemon fixes
@ 2017-12-13 12:30 Nicolas Morey-Chaisemartin
       [not found] ` <001cdcfd-8ace-261f-ab86-a09ae3582dd8-IBi9RG/b67k@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-13 12:30 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

This patch series fixes two issues of srp_daemon which occur when the master SM switches.
Both causes a failure to register to SM traps.

Changes since v1:
Patch#1
- Add Fixes tag to commit log
Patch#2
- Add comment on return codes of poll_cq_once
- Fix poll_cq behaviour when poll_cq_once returns an error

Nicolas Morey-Chaisemartin (2):
  srp_daemon: handle SM lid change
  srp_daemon: fix CQ handling

 srp_daemon/srp_daemon.c       | 10 ++++---
 srp_daemon/srp_daemon.h       |  2 +-
 srp_daemon/srp_handle_traps.c | 65 +++++++++++++++++++++++++++++++++----------
 3 files changed, 58 insertions(+), 19 deletions(-)

-- 
2.15.1.272.g8e603414b

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found] ` <001cdcfd-8ace-261f-ab86-a09ae3582dd8-IBi9RG/b67k@public.gmane.org>
@ 2017-12-13 12:31   ` Nicolas Morey-Chaisemartin
       [not found]     ` <be797b2f-0317-f9ac-d0ec-5284632790b9-IBi9RG/b67k@public.gmane.org>
  2017-12-13 12:32   ` [PATCHv2 rdma-core 2/2] srp_daemon: fix CQ handling Nicolas Morey-Chaisemartin
  2017-12-15 16:36   ` [PATCHv3 rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
  2 siblings, 1 reply; 12+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-13 12:31 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

When srp_daemon was running and the master SM host changes,
 srp_daemon output these errors at every scan:
srp_daemon[25394]: No response to inform info registration
srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
 running on fabric or IB port is down

This was introduced by commit 4952e5f Fix a memory leak.
A side effect of this patch was that create_ah was only called when the
 port lid changes. Which meant register_to_traps used an older, obsolete,
 version of sm_lid and failed to connect to it.

This patch fixes this behaviour by checking for both local lid changes and
 SM lid changes, and calling create_ah on any of these events.

Fixes: 4952e5f7 (Fix a memory leak)
Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16
---
 srp_daemon/srp_daemon.c       | 10 ++++++----
 srp_daemon/srp_daemon.h       |  2 +-
 srp_daemon/srp_handle_traps.c | 14 +++++++++++---
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index 2465ccd9..36df5c3b 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res,
 	int i, num_pkeys = 0;
 	uint16_t pkey;
 	uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx,
-					       config->port_num);
+					       config->port_num, NULL);
 
 	in_mad_buf = malloc(sizeof(struct ib_user_mad) +
 			    node_table_response_size);
@@ -2092,7 +2092,7 @@ int main(int argc, char *argv[])
 {
 	int			ret;
 	struct resources       *res;
-	uint16_t 		lid;
+	uint16_t 		lid, sm_lid;
 	uint16_t 		pkey;
 	union umad_gid 		gid;
 	struct target_details  *target;
@@ -2196,8 +2196,10 @@ catas_start:
 
 			pr_debug("Starting a recalculation\n");
 			port_lid = get_port_lid(res->ud_res->ib_ctx,
-					   config->port_num);
-			if (port_lid != res->ud_res->port_attr.lid) {
+						config->port_num, &sm_lid);
+			if (port_lid != res->ud_res->port_attr.lid ||
+				sm_lid != res->ud_res->port_attr.sm_lid) {
+
 				if (res->ud_res->ah) {
 					ibv_destroy_ah(res->ud_res->ah);
 					res->ud_res->ah = NULL;
diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
index 5d268ed3..864b3d42 100644
--- a/srp_daemon/srp_daemon.h
+++ b/srp_daemon/srp_daemon.h
@@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in);
 int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid);
 int create_trap_resources(struct ud_resources *ud_res);
 int register_to_traps(struct resources *res, int subscribe);
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num);
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid);
 int create_ah(struct ud_resources *ud_res);
 void push_gid_to_list(struct sync_resources *res, union umad_gid *gid,
 		      uint16_t pkey);
diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index 6d94634e..25f2b9ab 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res)
 	return 0;
 }
 
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num)
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid)
 {
 	struct ibv_port_attr port_attr;
+	int ret;
+
+	ret = ibv_query_port(ib_ctx, port_num, &port_attr);
 
-	return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ?
-		port_attr.lid : 0;
+	if (!ret) {
+		if (sm_lid)
+			*sm_lid = port_attr.sm_lid;
+		return port_attr.lid;
+	}
+
+	return 0;
 }
 
 int create_ah(struct ud_resources *ud_res)
-- 
2.15.1.272.g8e603414b


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 rdma-core 2/2] srp_daemon: fix CQ handling
       [not found] ` <001cdcfd-8ace-261f-ab86-a09ae3582dd8-IBi9RG/b67k@public.gmane.org>
  2017-12-13 12:31   ` [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
@ 2017-12-13 12:32   ` Nicolas Morey-Chaisemartin
       [not found]     ` <7ed5d5af-0f53-c15f-e7f2-516f6786113f-IBi9RG/b67k@public.gmane.org>
  2017-12-15 16:36   ` [PATCHv3 rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
  2 siblings, 1 reply; 12+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-13 12:32 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

SM traps are polled through poll_cq which waited for a CQ event
before polling the CQ itself.
However it may happens that multiple completions are attached
to a single event. As stated by the ibv_get_cq_event man page,
it is required to poll the the CQ to get those completions
after the call to ibv_req_notify_cq.

As completions need to be handled one by one in an outer function,
start by polling the CQ and return the completion (if any) before
waiting for the next completion event.
This will allow emptying all pending completions, through multiple calls
to poll_cq,  before waiting for a new event.

The buggy use case seems to appear when the master SM is switched multiple
times between two nodes. As the number of ping-pong between the SMs increases,
the number of traps sent to notify that the SM just became master increases
too. This causes burst of completions linked to a single event.
Note that the race condition is also possible in other scenario.

Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16
---
 srp_daemon/srp_handle_traps.c | 51 +++++++++++++++++++++++++++++++++----------
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index 25f2b9ab..77a47db3 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -496,6 +496,34 @@ static int stop_threads(struct sync_resources *sync_res)
 	return result;
 }
 
+/*****************************************************************************
+* Function: poll_cq_once
+* Poll a CQ once.
+* Returns the number of completion polled (0 or 1).
+* Returns a negative value on error.
+*****************************************************************************/
+static int poll_cq_once(struct sync_resources *sync_res, struct ibv_cq *cq,
+			struct ibv_wc *wc)
+{
+	int ret;
+
+	ret = ibv_poll_cq(cq, 1, wc);
+	if (ret < 0) {
+		pr_err("poll CQ failed\n");
+		return ret;
+	}
+
+	if (ret > 0 && wc->status != IBV_WC_SUCCESS) {
+		if (!stop_threads(sync_res))
+			pr_err("got bad completion with status: 0x%x\n",
+			       wc->status);
+		return -ret;
+	}
+
+	return ret;
+}
+
+
 static int poll_cq(struct sync_resources *sync_res, struct ibv_cq *cq,
 		   struct ibv_wc *wc, struct ibv_comp_channel *channel)
 {
@@ -504,6 +532,16 @@ static int poll_cq(struct sync_resources *sync_res, struct ibv_cq *cq,
 	void          *ev_ctx;
 
 	if (channel) {
+		/* There may be extra completions that
+		 * were associated to the previous event.
+		 * Only poll for the first one. If there are more than one,
+		 * they will be handled by later call to poll_cq */
+		ret = poll_cq_once(sync_res, cq, wc);
+		/* return directly if there was an error or
+		 * 1 completion polled */
+		if (ret)
+			return ret;
+
 		if (ibv_get_cq_event(channel, &ev_cq, &ev_ctx)) {
 			pr_err("Failed to get cq_event\n");
 			return -1;
@@ -524,18 +562,9 @@ static int poll_cq(struct sync_resources *sync_res, struct ibv_cq *cq,
 	}
 
 	do {
-		ret = ibv_poll_cq(cq, 1, wc);
-		if (ret < 0) {
-			pr_err("poll CQ failed\n");
+		ret = poll_cq_once(sync_res, cq, wc);
+		if (ret < 0)
 			return ret;
-		}
-
-		if (ret > 0 && wc->status != IBV_WC_SUCCESS) {
-			if (!stop_threads(sync_res))
-				pr_err("got bad completion with status: 0x%x\n",
-				       wc->status);
-			return -ret;
-		}
 
 		if (ret == 0 && channel) {
 			pr_err("Weird poll returned no cqe after CQ event\n");
-- 
2.15.1.272.g8e603414b

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]     ` <be797b2f-0317-f9ac-d0ec-5284632790b9-IBi9RG/b67k@public.gmane.org>
@ 2017-12-14 19:13       ` Bart Van Assche
       [not found]         ` <1513278813.2475.49.camel-Sjgp3cTcYWE@public.gmane.org>
  2017-12-15 11:40       ` Dennis Dalessandro
  1 sibling, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2017-12-14 19:13 UTC (permalink / raw)
  To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: bvanassche-HInyCGIudOg@public.gmane.org,
	stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org

On Wed, 2017-12-13 at 13:31 +0100, Nicolas Morey-Chaisemartin wrote:
> When srp_daemon was running and the master SM host changes,
>  srp_daemon output these errors at every scan:
> srp_daemon[25394]: No response to inform info registration
> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
>  running on fabric or IB port is down
> 
> This was introduced by commit 4952e5f Fix a memory leak.
> A side effect of this patch was that create_ah was only called when the
>  port lid changes. Which meant register_to_traps used an older, obsolete,
>  version of sm_lid and failed to connect to it.
> 
> This patch fixes this behaviour by checking for both local lid changes and
>  SM lid changes, and calling create_ah on any of these events.

Reported-by: Bart Van Assche <bart.vanassche@wdc.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 rdma-core 2/2] srp_daemon: fix CQ handling
       [not found]     ` <7ed5d5af-0f53-c15f-e7f2-516f6786113f-IBi9RG/b67k@public.gmane.org>
@ 2017-12-14 19:17       ` Bart Van Assche
  0 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2017-12-14 19:17 UTC (permalink / raw)
  To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: bvanassche-HInyCGIudOg@public.gmane.org,
	stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org

On Wed, 2017-12-13 at 13:32 +0100, Nicolas Morey-Chaisemartin wrote:
> SM traps are polled through poll_cq which waited for a CQ event
> before polling the CQ itself.
> However it may happens that multiple completions are attached
> to a single event. As stated by the ibv_get_cq_event man page,
> it is required to poll the the CQ to get those completions
> after the call to ibv_req_notify_cq.
> 
> As completions need to be handled one by one in an outer function,
> start by polling the CQ and return the completion (if any) before
> waiting for the next completion event.
> This will allow emptying all pending completions, through multiple calls
> to poll_cq,  before waiting for a new event.
> 
> The buggy use case seems to appear when the master SM is switched multiple
> times between two nodes. As the number of ping-pong between the SMs increases,
> the number of traps sent to notify that the SM just became master increases
> too. This causes burst of completions linked to a single event.
> Note that the race condition is also possible in other scenario.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]         ` <1513278813.2475.49.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2017-12-14 19:18           ` Bart Van Assche
  0 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2017-12-14 19:18 UTC (permalink / raw)
  To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: bvanassche-HInyCGIudOg@public.gmane.org,
	stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org

On Thu, 2017-12-14 at 19:13 +0000, Bart Van Assche wrote:
> On Wed, 2017-12-13 at 13:31 +0100, Nicolas Morey-Chaisemartin wrote:
> > This patch fixes this behaviour by checking for both local lid changes and
> > SM lid changes, and calling create_ah on any of these events.
> 
> Reported-by: Bart Van Assche <bart.vanassche@wdc.com>
  ^^^^^^^^^^^

This should have been: Reviewed-by.

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]     ` <be797b2f-0317-f9ac-d0ec-5284632790b9-IBi9RG/b67k@public.gmane.org>
  2017-12-14 19:13       ` Bart Van Assche
@ 2017-12-15 11:40       ` Dennis Dalessandro
       [not found]         ` <a4b3d5e1-f611-9305-1ed2-d4a51ebe4611-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Dennis Dalessandro @ 2017-12-15 11:40 UTC (permalink / raw)
  To: Nicolas Morey-Chaisemartin, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

On 12/13/2017 7:31 AM, Nicolas Morey-Chaisemartin wrote:
> When srp_daemon was running and the master SM host changes,
>   srp_daemon output these errors at every scan:
> srp_daemon[25394]: No response to inform info registration
> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
>   running on fabric or IB port is down
> 
> This was introduced by commit 4952e5f Fix a memory leak.
> A side effect of this patch was that create_ah was only called when the
>   port lid changes. Which meant register_to_traps used an older, obsolete,
>   version of sm_lid and failed to connect to it.
> 
> This patch fixes this behaviour by checking for both local lid changes and
>   SM lid changes, and calling create_ah on any of these events.
> 
> Fixes: 4952e5f7 (Fix a memory leak)

Close. You are going to want 12 digits of the SHA, followed by the 
actual patch subject, the full thing in ("......").

I tried to look at the commit to provide you the actual example but get 
the following. Make sure the SHA you are using is correct.

$ git show 4952e5f7
fatal: ambiguous argument '4952e5f7': unknown revision or path not in 
the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]         ` <a4b3d5e1-f611-9305-1ed2-d4a51ebe4611-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-12-15 12:41           ` Nicolas Morey-Chaisemartin
       [not found]             ` <5c12ee61-22f3-239d-8fbb-59c381b8ce13-IBi9RG/b67k@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-15 12:41 UTC (permalink / raw)
  To: Dennis Dalessandro, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg



Le 15/12/2017 à 12:40, Dennis Dalessandro a écrit :
> On 12/13/2017 7:31 AM, Nicolas Morey-Chaisemartin wrote:
>> When srp_daemon was running and the master SM host changes,
>>   srp_daemon output these errors at every scan:
>> srp_daemon[25394]: No response to inform info registration
>> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
>>   running on fabric or IB port is down
>>
>> This was introduced by commit 4952e5f Fix a memory leak.
>> A side effect of this patch was that create_ah was only called when the
>>   port lid changes. Which meant register_to_traps used an older, obsolete,
>>   version of sm_lid and failed to connect to it.
>>
>> This patch fixes this behaviour by checking for both local lid changes and
>>   SM lid changes, and calling create_ah on any of these events.
>>
>> Fixes: 4952e5f7 (Fix a memory leak)
>
> Close. You are going to want 12 digits of the SHA, followed by the actual patch subject, the full thing in ("......").
>
> I tried to look at the commit to provide you the actual example but get the following. Make sure the SHA you are using is correct.
>
> $ git show 4952e5f7
> fatal: ambiguous argument '4952e5f7': unknown revision or path not in the working tree.
> Use '--' to separate paths from revisions, like this:
> 'git <command> [<revision>...] -- [<file>...]'

Weird it works for me...
$ git show 4952e5f7
commit 4952e5f7df0c93d6f3972975106c5e06623a301d
Author: Roi Dayan <roid-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date:   Thu Mar 21 17:38:11 2013 +0200

    Fix a memory leak

I used this command  to generate the line
$ git  log -n1 --pretty='%h (%s)' 4952e5f7df0c93d6f3972975106c5e06623a301d
4952e5f7 (Fix a memory leak)

Isn't the abbreviated hash supposed to be enough ?

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]             ` <5c12ee61-22f3-239d-8fbb-59c381b8ce13-IBi9RG/b67k@public.gmane.org>
@ 2017-12-15 16:18               ` Bart Van Assche
  2017-12-15 17:33               ` Jason Gunthorpe
  1 sibling, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2017-12-15 16:18 UTC (permalink / raw)
  To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: bvanassche-HInyCGIudOg@public.gmane.org,
	stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org

On Fri, 2017-12-15 at 13:41 +0100, Nicolas Morey-Chaisemartin wrote:
> Isn't the abbreviated hash supposed to be enough ?

Most kernel developers use the following in their ~/.gitconfig:

[core]
        abbrev = 12

See also https://public-inbox.org/git/CA+55aFy0_pwtFOYS1Tmnxipw9ZkRNCQHmoYyegO00pjMiZQfbg@mail.gmail.com/.

Bart.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found] ` <001cdcfd-8ace-261f-ab86-a09ae3582dd8-IBi9RG/b67k@public.gmane.org>
  2017-12-13 12:31   ` [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
  2017-12-13 12:32   ` [PATCHv2 rdma-core 2/2] srp_daemon: fix CQ handling Nicolas Morey-Chaisemartin
@ 2017-12-15 16:36   ` Nicolas Morey-Chaisemartin
       [not found]     ` <6ea765f9-e770-74a3-bbe7-19b7ebc76ebe-IBi9RG/b67k@public.gmane.org>
  2 siblings, 1 reply; 12+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-15 16:36 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

When srp_daemon was running and the master SM host changes,
 srp_daemon output these errors at every scan:
srp_daemon[25394]: No response to inform info registration
srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
 running on fabric or IB port is down

This was introduced by commit 4952e5f Fix a memory leak.
A side effect of this patch was that create_ah was only called when the
 port lid changes. Which meant register_to_traps used an older, obsolete,
 version of sm_lid and failed to connect to it.

This patch fixes this behaviour by checking for both local lid changes and
 SM lid changes, and calling create_ah on any of these events.

Fixes: 4952e5f7df0c (Fix a memory leak)
Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
Cc: stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org # v14, v15, v16
---

Since v2, expand abbrev sha1 of Fixes:... to 12B

 srp_daemon/srp_daemon.c       | 10 ++++++----
 srp_daemon/srp_daemon.h       |  2 +-
 srp_daemon/srp_handle_traps.c | 14 +++++++++++---
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index cec36db2e0f1..38501886110a 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res,
 	int i, num_pkeys = 0;
 	uint16_t pkey;
 	uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx,
-					       config->port_num);
+					       config->port_num, NULL);
 
 	in_mad_buf = malloc(sizeof(struct ib_user_mad) +
 			    node_table_response_size);
@@ -2092,7 +2092,7 @@ int main(int argc, char *argv[])
 {
 	int			ret;
 	struct resources       *res;
-	uint16_t 		lid;
+	uint16_t 		lid, sm_lid;
 	uint16_t 		pkey;
 	union umad_gid 		gid;
 	struct target_details  *target;
@@ -2196,8 +2196,10 @@ catas_start:
 
 			pr_debug("Starting a recalculation\n");
 			port_lid = get_port_lid(res->ud_res->ib_ctx,
-					   config->port_num);
-			if (port_lid != res->ud_res->port_attr.lid) {
+						config->port_num, &sm_lid);
+			if (port_lid != res->ud_res->port_attr.lid ||
+				sm_lid != res->ud_res->port_attr.sm_lid) {
+
 				if (res->ud_res->ah) {
 					ibv_destroy_ah(res->ud_res->ah);
 					res->ud_res->ah = NULL;
diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
index 5d268ed395e1..864b3d42fb46 100644
--- a/srp_daemon/srp_daemon.h
+++ b/srp_daemon/srp_daemon.h
@@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in);
 int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid);
 int create_trap_resources(struct ud_resources *ud_res);
 int register_to_traps(struct resources *res, int subscribe);
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num);
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid);
 int create_ah(struct ud_resources *ud_res);
 void push_gid_to_list(struct sync_resources *res, union umad_gid *gid,
 		      uint16_t pkey);
diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index 6b36b15cc84c..8c428756a379 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res)
 	return 0;
 }
 
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num)
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid)
 {
 	struct ibv_port_attr port_attr;
+	int ret;
+
+	ret = ibv_query_port(ib_ctx, port_num, &port_attr);
 
-	return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ?
-		port_attr.lid : 0;
+	if (!ret) {
+		if (sm_lid)
+			*sm_lid = port_attr.sm_lid;
+		return port_attr.lid;
+	}
+
+	return 0;
 }
 
 int create_ah(struct ud_resources *ud_res)
-- 
2.15.1.272.g8e603414b


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv3 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]     ` <6ea765f9-e770-74a3-bbe7-19b7ebc76ebe-IBi9RG/b67k@public.gmane.org>
@ 2017-12-15 16:53       ` Bart Van Assche
  0 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2017-12-15 16:53 UTC (permalink / raw)
  To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: bvanassche-HInyCGIudOg@public.gmane.org,
	stable-Xl5UnYtxxKxKUA01WzcqbQ@public.gmane.org,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org

On Fri, 2017-12-15 at 17:36 +0100, Nicolas Morey-Chaisemartin wrote:
> Since v2, expand abbrev sha1 of Fixes:... to 12B

Hello Nicolas,

Please keep Reviewed-by tags when making small changes like this.

BTW, have you already uploaded these patches to github and have you already
sent a pull request to https://github.com/linux-rdma/rdma-core? That is how
rdma-core patches usually end up upstream. You will see that after you have
created a pull request that several validation checks (Travis CI) are started
automatically.

Bart.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change
       [not found]             ` <5c12ee61-22f3-239d-8fbb-59c381b8ce13-IBi9RG/b67k@public.gmane.org>
  2017-12-15 16:18               ` Bart Van Assche
@ 2017-12-15 17:33               ` Jason Gunthorpe
  1 sibling, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2017-12-15 17:33 UTC (permalink / raw)
  To: Nicolas Morey-Chaisemartin
  Cc: Dennis Dalessandro, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	stable-Xl5UnYtxxKxKUA01WzcqbQ, bvanassche-HInyCGIudOg

On Fri, Dec 15, 2017 at 01:41:12PM +0100, Nicolas Morey-Chaisemartin wrote:

> I used this command  to generate the line
> $ git  log -n1 --pretty='%h (%s)' 4952e5f7df0c93d6f3972975106c5e06623a301d
> 4952e5f7 (Fix a memory leak)
> 
> Isn't the abbreviated hash supposed to be enough ?

You can put this alias in your .gitconfig to generate teh correct
format:

[alias]
        fixes = log --abbrev=12 -1 --format='Fixes: %h (\"%s\")'


And your git is older so you need to add

[core]
	abbrev = 12

To your config too.

The 8 character abbrev is known to have collisions already for
linux-kernel (but probably not in rdma-core)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-12-15 17:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-13 12:30 [PATCHv2 rdma-core 0/2] srp_daemon fixes Nicolas Morey-Chaisemartin
     [not found] ` <001cdcfd-8ace-261f-ab86-a09ae3582dd8-IBi9RG/b67k@public.gmane.org>
2017-12-13 12:31   ` [PATCHv2 rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
     [not found]     ` <be797b2f-0317-f9ac-d0ec-5284632790b9-IBi9RG/b67k@public.gmane.org>
2017-12-14 19:13       ` Bart Van Assche
     [not found]         ` <1513278813.2475.49.camel-Sjgp3cTcYWE@public.gmane.org>
2017-12-14 19:18           ` Bart Van Assche
2017-12-15 11:40       ` Dennis Dalessandro
     [not found]         ` <a4b3d5e1-f611-9305-1ed2-d4a51ebe4611-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-12-15 12:41           ` Nicolas Morey-Chaisemartin
     [not found]             ` <5c12ee61-22f3-239d-8fbb-59c381b8ce13-IBi9RG/b67k@public.gmane.org>
2017-12-15 16:18               ` Bart Van Assche
2017-12-15 17:33               ` Jason Gunthorpe
2017-12-13 12:32   ` [PATCHv2 rdma-core 2/2] srp_daemon: fix CQ handling Nicolas Morey-Chaisemartin
     [not found]     ` <7ed5d5af-0f53-c15f-e7f2-516f6786113f-IBi9RG/b67k@public.gmane.org>
2017-12-14 19:17       ` Bart Van Assche
2017-12-15 16:36   ` [PATCHv3 rdma-core 1/2] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
     [not found]     ` <6ea765f9-e770-74a3-bbe7-19b7ebc76ebe-IBi9RG/b67k@public.gmane.org>
2017-12-15 16:53       ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).