* CQ overrun with ib_send_bw
@ 2010-08-13 18:44 Sumeet Lahorani
[not found] ` <4C659288.4030402-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Sumeet Lahorani @ 2010-08-13 18:44 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi,
If I run ib_send_bw with the -a option, we seem to be getting CQ overrun
errors.
Server :
[root@dscbad01 ~]# ib_send_bw
------------------------------------------------------------------
Send BW Test
Connection type : RC
Inline data is used up to 1 bytes message
local address: LID 0x24, QPN 0x1c004c, PSN 0x85c292
remote address: LID 0x2a, QPN 0x14004a, PSN 0x858358
Mtu : 2048
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
------------------------------------------------------------------
Client :
[root@dscbad03 ~]# ib_send_bw -a dscbad01
------------------------------------------------------------------
Send BW Test
Connection type : RC
Inline data is used up to 1 bytes message
local address: LID 0x2a, QPN 0x14004a, PSN 0x858358
remote address: LID 0x24, QPN 0x1c004c, PSN 0x85c292
Mtu : 2048
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
2 1000 5.99 5.45
Completion wth error at client:
Failed status 12: wr_id 1 syndrom 0x81
scnt=600, ccnt=300
and on the client console
mlx4_core 0000:13:00.0: CQ overrun on CQN 000086
mlx4_core 0000:13:00.0: Internal error detected:
mlx4_core 0000:13:00.0: buf[00]: 00328f6f
mlx4_core 0000:13:00.0: buf[01]: 00000000
mlx4_core 0000:13:00.0: buf[02]: 20070000
mlx4_core 0000:13:00.0: buf[03]: 00000000
mlx4_core 0000:13:00.0: buf[04]: 00328f3c
mlx4_core 0000:13:00.0: buf[05]: 0014004a
mlx4_core 0000:13:00.0: buf[06]: 00340000
mlx4_core 0000:13:00.0: buf[07]: 00000044
mlx4_core 0000:13:00.0: buf[08]: 00000804
mlx4_core 0000:13:00.0: buf[09]: 00000804
mlx4_core 0000:13:00.0: buf[0a]: 00000000
mlx4_core 0000:13:00.0: buf[0b]: 00000000
mlx4_core 0000:13:00.0: buf[0c]: 00000000
mlx4_core 0000:13:00.0: buf[0d]: 00000000
mlx4_core 0000:13:00.0: buf[0e]: 00000000
mlx4_core 0000:13:00.0: buf[0f]: 00000000
This is with OFED 1.5.1 but it also happens with OFED 1.4.2. Sometimes,
the node crashes because it runs out of memory but most of the time, I
see just the above errors. What could be wrong?
- Sumeet
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <4C659288.4030402-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: CQ overrun with ib_send_bw [not found] ` <4C659288.4030402-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2010-08-13 19:06 ` Ralph Campbell [not found] ` <1281726396.2313.44.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Ralph Campbell @ 2010-08-13 19:06 UTC (permalink / raw) To: Sumeet Lahorani; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org I know there is a bug with "ib_send_bw -b" (bi-directional) since it doesn't create a CQ that is large enough for all the posted sends *and* receives. I have tried several times to get the following patch applied but I never got a reply and nothing was done. diff --git a/send_bw.c b/send_bw.c index ddd2b73..e3f644a 100644 --- a/send_bw.c +++ b/send_bw.c @@ -746,6 +746,8 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, if (user_parm->use_mcg && !user_parm->servername) { cq_rx_depth *= user_parm->num_of_clients_mcg; } + if (user_parm->duplex) + cq_rx_depth += ctx->tx_depth; ctx->cq = ibv_create_cq(ctx->context,cq_rx_depth, NULL, ctx->channel, 0); if (!ctx->cq) { fprintf(stderr, "Couldn't create CQ\n"); There should be enough CQEs in the normal case though. On Fri, 2010-08-13 at 11:44 -0700, Sumeet Lahorani wrote: > Hi, > > If I run ib_send_bw with the -a option, we seem to be getting CQ overrun > errors. > > Server : > [root@dscbad01 ~]# ib_send_bw > ------------------------------------------------------------------ > Send BW Test > Connection type : RC > Inline data is used up to 1 bytes message > local address: LID 0x24, QPN 0x1c004c, PSN 0x85c292 > remote address: LID 0x2a, QPN 0x14004a, PSN 0x858358 > Mtu : 2048 > ------------------------------------------------------------------ > #bytes #iterations BW peak[MB/sec] BW average[MB/sec] > ------------------------------------------------------------------ > > Client : > [root@dscbad03 ~]# ib_send_bw -a dscbad01 > ------------------------------------------------------------------ > Send BW Test > Connection type : RC > Inline data is used up to 1 bytes message > local address: LID 0x2a, QPN 0x14004a, PSN 0x858358 > remote address: LID 0x24, QPN 0x1c004c, PSN 0x85c292 > Mtu : 2048 > ------------------------------------------------------------------ > #bytes #iterations BW peak[MB/sec] BW average[MB/sec] > 2 1000 5.99 5.45 > Completion wth error at client: > Failed status 12: wr_id 1 syndrom 0x81 > scnt=600, ccnt=300 > > and on the client console > > mlx4_core 0000:13:00.0: CQ overrun on CQN 000086 > mlx4_core 0000:13:00.0: Internal error detected: > mlx4_core 0000:13:00.0: buf[00]: 00328f6f > mlx4_core 0000:13:00.0: buf[01]: 00000000 > mlx4_core 0000:13:00.0: buf[02]: 20070000 > mlx4_core 0000:13:00.0: buf[03]: 00000000 > mlx4_core 0000:13:00.0: buf[04]: 00328f3c > mlx4_core 0000:13:00.0: buf[05]: 0014004a > mlx4_core 0000:13:00.0: buf[06]: 00340000 > mlx4_core 0000:13:00.0: buf[07]: 00000044 > mlx4_core 0000:13:00.0: buf[08]: 00000804 > mlx4_core 0000:13:00.0: buf[09]: 00000804 > mlx4_core 0000:13:00.0: buf[0a]: 00000000 > mlx4_core 0000:13:00.0: buf[0b]: 00000000 > mlx4_core 0000:13:00.0: buf[0c]: 00000000 > mlx4_core 0000:13:00.0: buf[0d]: 00000000 > mlx4_core 0000:13:00.0: buf[0e]: 00000000 > mlx4_core 0000:13:00.0: buf[0f]: 00000000 > > This is with OFED 1.5.1 but it also happens with OFED 1.4.2. Sometimes, > the node crashes because it runs out of memory but most of the time, I > see just the above errors. What could be wrong? > > - Sumeet > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 8+ messages in thread
[parent not found: <1281726396.2313.44.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>]
* RE: CQ overrun with ib_send_bw [not found] ` <1281726396.2313.44.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> @ 2010-08-13 19:14 ` Hefty, Sean [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A96887A2-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Hefty, Sean @ 2010-08-13 19:14 UTC (permalink / raw) To: Ralph Campbell, Sumeet Lahorani Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > I know there is a bug with "ib_send_bw -b" (bi-directional) > since it doesn't create a CQ that is large enough for all the > posted sends *and* receives. I have tried several times to get the > following patch applied but I never got a reply and nothing was > done. Who's the maintainer of these tests? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CF9C39F99A89134C9CF9C4CCB68B8DDF25A96887A2-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: CQ overrun with ib_send_bw [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A96887A2-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2010-08-13 19:21 ` Ralph Campbell [not found] ` <1281727297.2313.47.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Ralph Campbell @ 2010-08-13 19:21 UTC (permalink / raw) To: Hefty, Sean Cc: Sumeet Lahorani, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri, 2010-08-13 at 12:14 -0700, Hefty, Sean wrote: > > I know there is a bug with "ib_send_bw -b" (bi-directional) > > since it doesn't create a CQ that is large enough for all the > > posted sends *and* receives. I have tried several times to get the > > following patch applied but I never got a reply and nothing was > > done. > > Who's the maintainer of these tests? I believe it is: Ido Shamai <idos-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> git://git.openfabrics.org/~shamoya/perftest.git -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1281727297.2313.47.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>]
* RE: CQ overrun with ib_send_bw [not found] ` <1281727297.2313.47.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> @ 2010-08-17 11:19 ` Tziporet Koren [not found] ` <E113D394D7C5DB4F8FF691FA7EE9DB443B5668DE17-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Tziporet Koren @ 2010-08-17 11:19 UTC (permalink / raw) To: Ralph Campbell, Hefty, Sean, Ido Shamay, Amir Ancel Cc: Sumeet Lahorani, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 8/13/2010 10:21 PM, Ralph Campbell wrote: > On Fri, 2010-08-13 at 12:14 -0700, Hefty, Sean wrote: >>> I know there is a bug with "ib_send_bw -b" (bi-directional) >>> since it doesn't create a CQ that is large enough for all the >>> posted sends *and* receives. I have tried several times to get the >>> following patch applied but I never got a reply and nothing was >>> done. >> >> Who's the maintainer of these tests? > > I believe it is: > > Ido Shamai <idos-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> > > git://git.openfabrics.org/~shamoya/perftest.git > > Yes Ido is the maintainer, however he is on vacation till Sep. I add Amir that may help for now Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <E113D394D7C5DB4F8FF691FA7EE9DB443B5668DE17-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org>]
* RE: CQ overrun with ib_send_bw [not found] ` <E113D394D7C5DB4F8FF691FA7EE9DB443B5668DE17-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org> @ 2010-08-17 11:36 ` Amir Ancel [not found] ` <1EEC75D0B27041449A1EEA2927D1B145380145A7DA-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Amir Ancel @ 2010-08-17 11:36 UTC (permalink / raw) To: Tziporet Koren, Ralph Campbell, Hefty, Sean, Ido Shamay Cc: Sumeet Lahorani, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Raz Baussi Hi Sean, We've seen this issue as well. Can you send the patch directly to us ? Added Raz from my team which replaces Ido while he is OOO. Thanks, Amir Ancel Performance Team Manager Mellanox Technologies -----Original Message----- From: Tziporet Koren Sent: Tuesday, August 17, 2010 2:19 PM To: Ralph Campbell; Hefty, Sean; Ido Shamay; Amir Ancel Cc: Sumeet Lahorani; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: RE: CQ overrun with ib_send_bw On 8/13/2010 10:21 PM, Ralph Campbell wrote: > On Fri, 2010-08-13 at 12:14 -0700, Hefty, Sean wrote: >>> I know there is a bug with "ib_send_bw -b" (bi-directional) >>> since it doesn't create a CQ that is large enough for all the >>> posted sends *and* receives. I have tried several times to get the >>> following patch applied but I never got a reply and nothing was >>> done. >> >> Who's the maintainer of these tests? > > I believe it is: > > Ido Shamai <idos-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> > > git://git.openfabrics.org/~shamoya/perftest.git > > Yes Ido is the maintainer, however he is on vacation till Sep. I add Amir that may help for now Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1EEC75D0B27041449A1EEA2927D1B145380145A7DA-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org>]
* RE: CQ overrun with ib_send_bw [not found] ` <1EEC75D0B27041449A1EEA2927D1B145380145A7DA-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org> @ 2010-08-17 18:59 ` Ralph Campbell [not found] ` <1282071547.2313.100.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Ralph Campbell @ 2010-08-17 18:59 UTC (permalink / raw) To: Amir Ancel Cc: Tziporet Koren, Hefty, Sean, Ido Shamay, Sumeet Lahorani, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Raz Baussi [-- Attachment #1: Type: text/plain, Size: 1318 bytes --] The patch is attached. On Tue, 2010-08-17 at 04:36 -0700, Amir Ancel wrote: > Hi Sean, > > We've seen this issue as well. > > Can you send the patch directly to us ? > > Added Raz from my team which replaces Ido while he is OOO. > > > Thanks, > > Amir Ancel > Performance Team Manager > Mellanox Technologies > > -----Original Message----- > From: Tziporet Koren > Sent: Tuesday, August 17, 2010 2:19 PM > To: Ralph Campbell; Hefty, Sean; Ido Shamay; Amir Ancel > Cc: Sumeet Lahorani; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Subject: RE: CQ overrun with ib_send_bw > > On 8/13/2010 10:21 PM, Ralph Campbell wrote: > > On Fri, 2010-08-13 at 12:14 -0700, Hefty, Sean wrote: > >>> I know there is a bug with "ib_send_bw -b" (bi-directional) > >>> since it doesn't create a CQ that is large enough for all the > >>> posted sends *and* receives. I have tried several times to get the > >>> following patch applied but I never got a reply and nothing was > >>> done. > >> > >> Who's the maintainer of these tests? > > > > I believe it is: > > > > Ido Shamai <idos-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> > > > > git://git.openfabrics.org/~shamoya/perftest.git > > > > > > Yes Ido is the maintainer, however he is on vacation till Sep. > I add Amir that may help for now > > Tziporet > [-- Attachment #2: send_bw.patch --] [-- Type: text/x-patch, Size: 491 bytes --] diff --git a/send_bw.c b/send_bw.c index ddd2b73..e3f644a 100644 --- a/send_bw.c +++ b/send_bw.c @@ -746,6 +746,8 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, if (user_parm->use_mcg && !user_parm->servername) { cq_rx_depth *= user_parm->num_of_clients_mcg; } + if (user_parm->duplex) + cq_rx_depth += ctx->tx_depth; ctx->cq = ibv_create_cq(ctx->context,cq_rx_depth, NULL, ctx->channel, 0); if (!ctx->cq) { fprintf(stderr, "Couldn't create CQ\n"); ^ permalink raw reply related [flat|nested] 8+ messages in thread
[parent not found: <1282071547.2313.100.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>]
* RE: CQ overrun with ib_send_bw [not found] ` <1282071547.2313.100.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> @ 2010-08-17 19:08 ` Amir Ancel 0 siblings, 0 replies; 8+ messages in thread From: Amir Ancel @ 2010-08-17 19:08 UTC (permalink / raw) To: Ralph Campbell Cc: Tziporet Koren, Hefty, Sean, Ido Shamay, Sumeet Lahorani, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Raz Baussi Thanks, We'll apply it soon. Amir Ancel Performance Team Manager Mellanox Technologies -----Original Message----- From: Ralph Campbell [mailto:ralph.campbell-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org] Sent: Tuesday, August 17, 2010 9:59 PM To: Amir Ancel Cc: Tziporet Koren; Hefty, Sean; Ido Shamay; Sumeet Lahorani; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Raz Baussi Subject: RE: CQ overrun with ib_send_bw The patch is attached. On Tue, 2010-08-17 at 04:36 -0700, Amir Ancel wrote: > Hi Sean, > > We've seen this issue as well. > > Can you send the patch directly to us ? > > Added Raz from my team which replaces Ido while he is OOO. > > > Thanks, > > Amir Ancel > Performance Team Manager > Mellanox Technologies > > -----Original Message----- > From: Tziporet Koren > Sent: Tuesday, August 17, 2010 2:19 PM > To: Ralph Campbell; Hefty, Sean; Ido Shamay; Amir Ancel > Cc: Sumeet Lahorani; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Subject: RE: CQ overrun with ib_send_bw > > On 8/13/2010 10:21 PM, Ralph Campbell wrote: > > On Fri, 2010-08-13 at 12:14 -0700, Hefty, Sean wrote: > >>> I know there is a bug with "ib_send_bw -b" (bi-directional) since > >>> it doesn't create a CQ that is large enough for all the posted > >>> sends *and* receives. I have tried several times to get the > >>> following patch applied but I never got a reply and nothing was > >>> done. > >> > >> Who's the maintainer of these tests? > > > > I believe it is: > > > > Ido Shamai <idos-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> > > > > git://git.openfabrics.org/~shamoya/perftest.git > > > > > > Yes Ido is the maintainer, however he is on vacation till Sep. > I add Amir that may help for now > > Tziporet > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-08-17 19:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-13 18:44 CQ overrun with ib_send_bw Sumeet Lahorani
[not found] ` <4C659288.4030402-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2010-08-13 19:06 ` Ralph Campbell
[not found] ` <1281726396.2313.44.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-08-13 19:14 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A96887A2-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-08-13 19:21 ` Ralph Campbell
[not found] ` <1281727297.2313.47.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-08-17 11:19 ` Tziporet Koren
[not found] ` <E113D394D7C5DB4F8FF691FA7EE9DB443B5668DE17-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org>
2010-08-17 11:36 ` Amir Ancel
[not found] ` <1EEC75D0B27041449A1EEA2927D1B145380145A7DA-WQlSmcKwN8Te+A/uUDamNg@public.gmane.org>
2010-08-17 18:59 ` Ralph Campbell
[not found] ` <1282071547.2313.100.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-08-17 19:08 ` Amir Ancel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox