* [BUG] SRP daemon and SM migration
@ 2017-12-04 11:08 Nicolas Morey-Chaisemartin
[not found] ` <bd47faf6-eae0-d520-42ce-eb3d6bce4376-l3A5Bk7waGM@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-04 11:08 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hi
A bug was reported to SUSE concerning the srp_daemon.
When it's running and the matser SM changes host (host shutdown, or new higher priority SM started), srp_daemon outputs these errors at every scan:
srp_daemon[25394]: No response to inform info registration
srp_daemon[25394]: Fail to register to traps, maybe there is no opensm running on fabric or IB port is down
It seems this was introduced by this commit:
commit 4952e5f7df0c93d6f3972975106c5e06623a301d
Author: Roi Dayan <roid-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Thu Mar 21 17:38:11 2013 +0200
Fix a memory leak
Avoid leaking one IB AH per rescan. Only allocate a new AH if the
port LID changed or after a LID has been assigned by the SM.
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Signed-off-by: Roi Dayan <roid-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
One of the side effect of the leak fix is that create_ah is only called when the local port lid changes.
And register_to_traps uses the sm_id from ud_res which is filled by create_ah.
Thus if the SM lid changes but not the local LID, it keeps trying to contact the previous LID.
I tried fixing it by getting get_port_lid to also return the SM lid and calling create_ah on local lid OR SM lid changes.
It seems to be working at first (at least the call is always done to the right lid).
But after a while (doing ping pong between 2 SM by changing the priority) I still end up getting the error above.
Even through the LID is right this time.
It may not be the same bug though. Is there some calls to do to unregister from the previous SM before registering to the new one ?
Any idea on what could cause this ? I don't seem to get any more infos in all the logs I've checked...
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] SRP daemon and SM migration
[not found] ` <bd47faf6-eae0-d520-42ce-eb3d6bce4376-l3A5Bk7waGM@public.gmane.org>
@ 2017-12-04 13:50 ` Hal Rosenstock
[not found] ` <1978d0ad-909f-ed8b-5b54-e8c465d3641f-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Hal Rosenstock @ 2017-12-04 13:50 UTC (permalink / raw)
To: Nicolas Morey-Chaisemartin,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hi Nicolas,
On 12/4/2017 6:08 AM, Nicolas Morey-Chaisemartin wrote:
> Hi
>
> A bug was reported to SUSE concerning the srp_daemon.
> When it's running and the matser SM changes host (host shutdown, or new higher priority SM started), srp_daemon outputs these errors at every scan:
> srp_daemon[25394]: No response to inform info registration
> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm running on fabric or IB port is down
>
> It seems this was introduced by this commit:
> commit 4952e5f7df0c93d6f3972975106c5e06623a301d
> Author: Roi Dayan <roid-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Date: Thu Mar 21 17:38:11 2013 +0200
>
> Fix a memory leak
>
> Avoid leaking one IB AH per rescan. Only allocate a new AH if the
> port LID changed or after a LID has been assigned by the SM.
>
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Signed-off-by: Roi Dayan <roid-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
>
> One of the side effect of the leak fix is that create_ah is only called when the local port lid changes.
> And register_to_traps uses the sm_id from ud_res which is filled by create_ah.
>
> Thus if the SM lid changes but not the local LID, it keeps trying to contact the previous LID.
>
> I tried fixing it by getting get_port_lid to also return the SM lid and calling create_ah on local lid OR SM lid changes.
Yes, client should reregister for traps with SA on local LID or SM LID
change. Note that SMSL could change as well.
> It seems to be working at first (at least the call is always done to the right lid).
> But after a while (doing ping pong between 2 SM by changing the priority) I still end up getting the error above.
> Even through the LID is right this time.
> It may not be the same bug though.
In srp_handle_traps.c:register_to_trap, try increasing counter to
something more than 3:
} while (rc == 0 && ++counter < 3);
if (counter==3) {
It may be that SM/SA handover/failover takes longer than 3 seconds in
some cases.
> Is there some calls to do to unregister from the previous SM before registering to the new one ?
This is complex subject. Although deregistration would eliminate stale
registrations (events/traps, multicast, services) in SA which
potentially can cause timed out event notifications (SA Reports), once
SA is no longer master, it will not respond to SA client requests and
client has no way to control when SM/SA master transition occurs.
-- Hal
> Any idea on what could cause this ? I don't seem to get any more infos in all the logs I've checked...
>
> Nicolas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC rdma-core] srp_daemon: handle SM lid change
[not found] ` <1978d0ad-909f-ed8b-5b54-e8c465d3641f-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2017-12-08 9:36 ` Nicolas Morey-Chaisemartin
[not found] ` <82bd1d9a-7e6a-324e-1e00-43fcf130faf0-IBi9RG/b67k@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-08 9:36 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
When srp_daemon was running and the master SM host changes,
srp_daemon output these errors at every scan:
srp_daemon[25394]: No response to inform info registration
srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
running on fabric or IB port is down
This was introduced by commit 4952e5f Fix a memory leak.
A side effect of this patch was that create_ah was only called when the
port lid changes. Which meant register_to_traps used an older, obsolete,
version of sm_lid and failed to connect to it.
This patch fixes this behaviour by checking for both local lid changes and
SM lid changes, and calling create_ah on any of these events.
Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
---
This fixes the bug for me but I'm waiting for feedback from the customer to make sure it fixes the issue for him too.
And not just part of it.
I'll repost a proper patch (non RFC) once he confirms it.
srp_daemon/srp_daemon.c | 10 ++++++----
srp_daemon/srp_daemon.h | 2 +-
srp_daemon/srp_handle_traps.c | 16 ++++++++++++----
3 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index 2465ccd9..36df5c3b 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res,
int i, num_pkeys = 0;
uint16_t pkey;
uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx,
- config->port_num);
+ config->port_num, NULL);
in_mad_buf = malloc(sizeof(struct ib_user_mad) +
node_table_response_size);
@@ -2092,7 +2092,7 @@ int main(int argc, char *argv[])
{
int ret;
struct resources *res;
- uint16_t lid;
+ uint16_t lid, sm_lid;
uint16_t pkey;
union umad_gid gid;
struct target_details *target;
@@ -2196,8 +2196,10 @@ catas_start:
pr_debug("Starting a recalculation\n");
port_lid = get_port_lid(res->ud_res->ib_ctx,
- config->port_num);
- if (port_lid != res->ud_res->port_attr.lid) {
+ config->port_num, &sm_lid);
+ if (port_lid != res->ud_res->port_attr.lid ||
+ sm_lid != res->ud_res->port_attr.sm_lid) {
+
if (res->ud_res->ah) {
ibv_destroy_ah(res->ud_res->ah);
res->ud_res->ah = NULL;
diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
index 5d268ed3..864b3d42 100644
--- a/srp_daemon/srp_daemon.h
+++ b/srp_daemon/srp_daemon.h
@@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in);
int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid);
int create_trap_resources(struct ud_resources *ud_res);
int register_to_traps(struct resources *res, int subscribe);
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num);
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid);
int create_ah(struct ud_resources *ud_res);
void push_gid_to_list(struct sync_resources *res, union umad_gid *gid,
uint16_t pkey);
diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index 6d94634e..e546d491 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res)
return 0;
}
-uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num)
+uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid)
{
struct ibv_port_attr port_attr;
+ int ret;
+
+ ret = ibv_query_port(ib_ctx, port_num, &port_attr);
- return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ?
- port_attr.lid : 0;
+ if (!ret) {
+ if (sm_lid)
+ *sm_lid = port_attr.sm_lid;
+ return port_attr.lid;
+ }
+
+ return 0;
}
int create_ah(struct ud_resources *ud_res)
@@ -641,7 +649,7 @@ static int register_to_trap(struct sync_resources *sync_res,
pthread_mutex_unlock(res->mad_buffer_mutex);
} while (rc == 2); // while old response.
- } while (rc == 0 && ++counter < 3);
+ } while (rc == 0 && ++counter < 5);
if (counter==3) {
pr_err("No response to inform info registration\n");
--
2.15.1.272.g8e603414b
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC rdma-core] srp_daemon: handle SM lid change
[not found] ` <82bd1d9a-7e6a-324e-1e00-43fcf130faf0-IBi9RG/b67k@public.gmane.org>
@ 2017-12-08 13:33 ` Hal Rosenstock
[not found] ` <82dc0912-3ac3-9548-2c3b-703c6e1c9c95-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Hal Rosenstock @ 2017-12-08 13:33 UTC (permalink / raw)
To: Nicolas Morey-Chaisemartin, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 12/8/2017 4:36 AM, Nicolas Morey-Chaisemartin wrote:
> When srp_daemon was running and the master SM host changes,
> srp_daemon output these errors at every scan:
> srp_daemon[25394]: No response to inform info registration
> srp_daemon[25394]: Fail to register to traps, maybe there is no opensm
> running on fabric or IB port is down
>
> This was introduced by commit 4952e5f Fix a memory leak.
> A side effect of this patch was that create_ah was only called when the
> port lid changes. Which meant register_to_traps used an older, obsolete,
> version of sm_lid and failed to connect to it.
>
> This patch fixes this behaviour by checking for both local lid changes and
> SM lid changes, and calling create_ah on any of these events.
>
> Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
> ---
>
> This fixes the bug for me but I'm waiting for feedback from the customer to make sure it fixes the issue for him too.
> And not just part of it.
> I'll repost a proper patch (non RFC) once he confirms it.
>
> srp_daemon/srp_daemon.c | 10 ++++++----
> srp_daemon/srp_daemon.h | 2 +-
> srp_daemon/srp_handle_traps.c | 16 ++++++++++++----
> 3 files changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
> index 2465ccd9..36df5c3b 100644
> --- a/srp_daemon/srp_daemon.c
> +++ b/srp_daemon/srp_daemon.c
> @@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res,
> int i, num_pkeys = 0;
> uint16_t pkey;
> uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx,
> - config->port_num);
> + config->port_num, NULL);
>
> in_mad_buf = malloc(sizeof(struct ib_user_mad) +
> node_table_response_size);
> @@ -2092,7 +2092,7 @@ int main(int argc, char *argv[])
> {
> int ret;
> struct resources *res;
> - uint16_t lid;
> + uint16_t lid, sm_lid;
> uint16_t pkey;
> union umad_gid gid;
> struct target_details *target;
> @@ -2196,8 +2196,10 @@ catas_start:
>
> pr_debug("Starting a recalculation\n");
> port_lid = get_port_lid(res->ud_res->ib_ctx,
> - config->port_num);
> - if (port_lid != res->ud_res->port_attr.lid) {
> + config->port_num, &sm_lid);
> + if (port_lid != res->ud_res->port_attr.lid ||
> + sm_lid != res->ud_res->port_attr.sm_lid) {
> +
> if (res->ud_res->ah) {
> ibv_destroy_ah(res->ud_res->ah);
> res->ud_res->ah = NULL;
> diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
> index 5d268ed3..864b3d42 100644
> --- a/srp_daemon/srp_daemon.h
> +++ b/srp_daemon/srp_daemon.h
> @@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in);
> int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid);
> int create_trap_resources(struct ud_resources *ud_res);
> int register_to_traps(struct resources *res, int subscribe);
> -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num);
> +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid);
> int create_ah(struct ud_resources *ud_res);
> void push_gid_to_list(struct sync_resources *res, union umad_gid *gid,
> uint16_t pkey);
> diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
> index 6d94634e..e546d491 100644
> --- a/srp_daemon/srp_handle_traps.c
> +++ b/srp_daemon/srp_handle_traps.c
> @@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res)
> return 0;
> }
>
> -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num)
> +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid)
> {
> struct ibv_port_attr port_attr;
> + int ret;
> +
> + ret = ibv_query_port(ib_ctx, port_num, &port_attr);
>
> - return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ?
> - port_attr.lid : 0;
> + if (!ret) {
> + if (sm_lid)
> + *sm_lid = port_attr.sm_lid;
> + return port_attr.lid;
> + }
> +
> + return 0;
> }
>
> int create_ah(struct ud_resources *ud_res)
> @@ -641,7 +649,7 @@ static int register_to_trap(struct sync_resources *sync_res,
> pthread_mutex_unlock(res->mad_buffer_mutex);
> } while (rc == 2); // while old response.
>
> - } while (rc == 0 && ++counter < 3);
> + } while (rc == 0 && ++counter < 5);
>
> if (counter==3) {
This should be changed from 3 to 5 also. Change both to some define ?
> pr_err("No response to inform info registration\n");
>
Other than issue above, looks fine to me.
Reviewed-by: Hal Rosenstock <hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC rdma-core] srp_daemon: handle SM lid change
[not found] ` <82dc0912-3ac3-9548-2c3b-703c6e1c9c95-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2017-12-08 13:56 ` Dennis Dalessandro
[not found] ` <dce98958-a6e2-b289-2a58-609d1914affb-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-12-08 17:35 ` Nicolas Morey-Chaisemartin
2017-12-11 14:46 ` [RFC rdma-core 2/2] srp_daemon: fix CQ handling Nicolas Morey-Chaisemartin
2 siblings, 1 reply; 10+ messages in thread
From: Dennis Dalessandro @ 2017-12-08 13:56 UTC (permalink / raw)
To: Hal Rosenstock, Nicolas Morey-Chaisemartin,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 12/8/2017 8:33 AM, Hal Rosenstock wrote:
>> int create_ah(struct ud_resources *ud_res)
>> @@ -641,7 +649,7 @@ static int register_to_trap(struct sync_resources *sync_res,
>> pthread_mutex_unlock(res->mad_buffer_mutex);
>> } while (rc == 2); // while old response.
>>
>> - } while (rc == 0 && ++counter < 3);
>> + } while (rc == 0 && ++counter < 5);
>>
>> if (counter==3) {
>
> This should be changed from 3 to 5 also. Change both to some define ?
Yes please change these to a #define with a some what descriptive name.
That would make the code much better to read.
-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC rdma-core] srp_daemon: handle SM lid change
[not found] ` <dce98958-a6e2-b289-2a58-609d1914affb-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-12-08 16:41 ` Bart Van Assche
0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2017-12-08 16:41 UTC (permalink / raw)
To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1099 bytes --]
On Fri, 2017-12-08 at 08:56 -0500, Dennis Dalessandro wrote:
> On 12/8/2017 8:33 AM, Hal Rosenstock wrote:
> > > int create_ah(struct ud_resources *ud_res)
> > > @@ -641,7 +649,7 @@ static int register_to_trap(struct sync_resources *sync_res,
> > > pthread_mutex_unlock(res->mad_buffer_mutex);
> > > } while (rc == 2); // while old response.
> > >
> > > - } while (rc == 0 && ++counter < 3);
> > > + } while (rc == 0 && ++counter < 5);
> > >
> > > if (counter==3) {
> >
> > This should be changed from 3 to 5 also. Change both to some define ?
>
> Yes please change these to a #define with a some what descriptive name.
> That would make the code much better to read.
It would be even better to change the loop such that it counts down instead
of up. Then the if-statement after the loop can test whether the counter has
reached zero and the constant that represents the number of iterations will
only occur once.
Bart.N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC rdma-core] srp_daemon: handle SM lid change
[not found] ` <82dc0912-3ac3-9548-2c3b-703c6e1c9c95-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2017-12-08 13:56 ` Dennis Dalessandro
@ 2017-12-08 17:35 ` Nicolas Morey-Chaisemartin
2017-12-11 14:46 ` [RFC rdma-core 2/2] srp_daemon: fix CQ handling Nicolas Morey-Chaisemartin
2 siblings, 0 replies; 10+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-08 17:35 UTC (permalink / raw)
To: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Le 08/12/2017 à 14:33, Hal Rosenstock a écrit :
> On 12/8/2017 4:36 AM, Nicolas Morey-Chaisemartin wrote:
>>
>> int create_ah(struct ud_resources *ud_res)
>> @@ -641,7 +649,7 @@ static int register_to_trap(struct sync_resources *sync_res,
>> pthread_mutex_unlock(res->mad_buffer_mutex);
>> } while (rc == 2); // while old response.
>>
>> - } while (rc == 0 && ++counter < 3);
>> + } while (rc == 0 && ++counter < 5);
>>
>> if (counter==3) {
> This should be changed from 3 to 5 also. Change both to some define ?
I thought I changed that back to 3 during my tests.
No I have a big doubt that the issue is fully resolved, or just hidden because on timeout, the pr_err is never reached.
I'll retest that in Monday.
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC rdma-core 2/2] srp_daemon: fix CQ handling
[not found] ` <82dc0912-3ac3-9548-2c3b-703c6e1c9c95-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2017-12-08 13:56 ` Dennis Dalessandro
2017-12-08 17:35 ` Nicolas Morey-Chaisemartin
@ 2017-12-11 14:46 ` Nicolas Morey-Chaisemartin
[not found] ` <f35bfdad-9027-d2be-9706-b5a0edd9a778-IBi9RG/b67k@public.gmane.org>
2 siblings, 1 reply; 10+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-11 14:46 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
SM traps are polled through poll_cq which waited for a CQ event
before polling the CQ itself.
However it may happens that multiple completions are attached
to a single event. As stated by the ibv_get_cq_event, it is required
to poll the the CQ to get those event after the call to
ibv_req_notify_cq.
As completion need to be handled one by one in an outer function,
start by polling the CQ and return the event (if any) before waiting
for the next completion event.
The buggy use case seem to appear when the master SM is switched multiple
times between two nodes. As the number of ping-pong between the SM increases,
the number of traps sent to notify that the SM just became master increases
too. This causes burst of completions linked to a single event.
Note that the race condition is also possible in other scenario.
Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin-IBi9RG/b67k@public.gmane.org>
---
srp_daemon/srp_handle_traps.c | 40 ++++++++++++++++++++++++++++------------
This patch goes on top of the one proposed in the thread.
1 file changed, 28 insertions(+), 12 deletions(-)
diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index b1d051c6..eaa57979 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -496,6 +496,27 @@ static int stop_threads(struct sync_resources *sync_res)
return result;
}
+tatic int poll_cq_once(struct sync_resources *sync_res, struct ibv_cq *cq,
+ struct ibv_wc *wc)
+{
+ int ret;
+ ret = ibv_poll_cq(cq, 1, wc);
+ if (ret < 0) {
+ pr_err("poll CQ failed\n");
+ return ret;
+ }
+
+ if (ret > 0 && wc->status != IBV_WC_SUCCESS) {
+ if (!stop_threads(sync_res))
+ pr_err("got bad completion with status: 0x%x\n",
+ wc->status);
+ return -ret;
+ }
+
+ return ret;
+}
+
+
static int poll_cq(struct sync_resources *sync_res, struct ibv_cq *cq,
struct ibv_wc *wc, struct ibv_comp_channel *channel)
{
@@ -504,6 +525,12 @@ static int poll_cq(struct sync_resources *sync_res, struct ibv_cq *cq,
void *ev_ctx;
if (channel) {
+ /* Poll CQ once. There may be extra completion that
+ * were associated to the previous event */
+ ret = poll_cq_once(sync_res, cq, wc);
+ if (ret)
+ return ret;
+
if (ibv_get_cq_event(channel, &ev_cq, &ev_ctx)) {
pr_err("Failed to get cq_event\n");
return -1;
@@ -524,18 +551,7 @@ static int poll_cq(struct sync_resources *sync_res, struct ibv_cq *cq,
}
do {
- ret = ibv_poll_cq(cq, 1, wc);
- if (ret < 0) {
- pr_err("poll CQ failed\n");
- return ret;
- }
-
- if (ret > 0 && wc->status != IBV_WC_SUCCESS) {
- if (!stop_threads(sync_res))
- pr_err("got bad completion with status: 0x%x\n",
- wc->status);
- return -ret;
- }
+ ret = poll_cq_once(sync_res, cq, wc);
if (ret == 0 && channel) {
pr_err("Weird poll returned no cqe after CQ event\n");
--
2.15.1.272.g8e603414b
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC rdma-core 2/2] srp_daemon: fix CQ handling
[not found] ` <f35bfdad-9027-d2be-9706-b5a0edd9a778-IBi9RG/b67k@public.gmane.org>
@ 2017-12-11 16:30 ` Bart Van Assche
[not found] ` <1513009807.2747.11.camel-Sjgp3cTcYWE@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-12-11 16:30 UTC (permalink / raw)
To: nmoreychaisemartin-IBi9RG/b67k@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 763 bytes --]
On Mon, 2017-12-11 at 15:46 +0100, Nicolas Morey-Chaisemartin wrote:
> +tatic int poll_cq_once(struct sync_resources *sync_res, struct ibv_cq *cq,
> + struct ibv_wc *wc)
Has this patch been tested? I don't think that "tatic" is a valid keyword?
> + /* Poll CQ once. There may be extra completion that
> + * were associated to the previous event */
> + ret = poll_cq_once(sync_res, cq, wc);
> + if (ret)
> + return ret;
How many extra completions can be associated with an event? Is polling once
sufficient? If so, please update the comment ("an" is missing / "were" ->
"was").
Thanks,
Bart.N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC rdma-core 2/2] srp_daemon: fix CQ handling
[not found] ` <1513009807.2747.11.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2017-12-11 16:38 ` Nicolas Morey-Chaisemartin
0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2017-12-11 16:38 UTC (permalink / raw)
To: Bart Van Assche,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org
Le 11/12/2017 à 17:30, Bart Van Assche a écrit :
> On Mon, 2017-12-11 at 15:46 +0100, Nicolas Morey-Chaisemartin wrote:
>> +tatic int poll_cq_once(struct sync_resources *sync_res, struct ibv_cq *cq,
>> + struct ibv_wc *wc)
> Has this patch been tested? I don't think that "tatic" is a valid keyword?
Typo during the rebase. But the overall fix was tested yes. I'm looking for insight to make sure my understanding of the issue (and the way to fix it) is valid.
>
>> + /* Poll CQ once. There may be extra completion that
>> + * were associated to the previous event */
>> + ret = poll_cq_once(sync_res, cq, wc);
>> + if (ret)
>> + return ret;
> How many extra completions can be associated with an event? Is polling once
> sufficient? If so, please update the comment ("an" is missing / "were" ->
> "was").
There can be more than one.
This should be enough becausethey would all be consumed through the call to poll_cq.
As long as there as completion in the CQ, they will be returned directly without waiting for an event.
Once there are no more, the code will wait for the next event again.
I'll fix the typo and the log.
Thanks
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-12-11 16:38 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-04 11:08 [BUG] SRP daemon and SM migration Nicolas Morey-Chaisemartin
[not found] ` <bd47faf6-eae0-d520-42ce-eb3d6bce4376-l3A5Bk7waGM@public.gmane.org>
2017-12-04 13:50 ` Hal Rosenstock
[not found] ` <1978d0ad-909f-ed8b-5b54-e8c465d3641f-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2017-12-08 9:36 ` [RFC rdma-core] srp_daemon: handle SM lid change Nicolas Morey-Chaisemartin
[not found] ` <82bd1d9a-7e6a-324e-1e00-43fcf130faf0-IBi9RG/b67k@public.gmane.org>
2017-12-08 13:33 ` Hal Rosenstock
[not found] ` <82dc0912-3ac3-9548-2c3b-703c6e1c9c95-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2017-12-08 13:56 ` Dennis Dalessandro
[not found] ` <dce98958-a6e2-b289-2a58-609d1914affb-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-12-08 16:41 ` Bart Van Assche
2017-12-08 17:35 ` Nicolas Morey-Chaisemartin
2017-12-11 14:46 ` [RFC rdma-core 2/2] srp_daemon: fix CQ handling Nicolas Morey-Chaisemartin
[not found] ` <f35bfdad-9027-d2be-9706-b5a0edd9a778-IBi9RG/b67k@public.gmane.org>
2017-12-11 16:30 ` Bart Van Assche
[not found] ` <1513009807.2747.11.camel-Sjgp3cTcYWE@public.gmane.org>
2017-12-11 16:38 ` Nicolas Morey-Chaisemartin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox