public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP
Date: Mon, 13 Feb 2017 08:54:53 -0500 (EST)	[thread overview]
Message-ID: <1633827327.30531404.1486994093828.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1630482470.30208948.1486955693106.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, hch-jcswGhMUV9g@public.gmane.org, maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
> dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> Sent: Sunday, February 12, 2017 10:14:53 PM
> Subject: Re: [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > To: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > Cc: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, hch-jcswGhMUV9g@public.gmane.org, maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
> > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
> > dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > Sent: Sunday, February 12, 2017 9:07:16 PM
> > Subject: Re: [PATCH 8/8] IB/srp: Drain the send queue before destroying a
> > QP
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > Cc: hch-jcswGhMUV9g@public.gmane.org, maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
> > > linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > Sent: Sunday, February 12, 2017 3:05:16 PM
> > > Subject: Re: [PATCH 8/8] IB/srp: Drain the send queue before destroying a
> > > QP
> > > 
> > > On Sun, 2017-02-12 at 13:02 -0500, Laurence Oberman wrote:
> > > > [  861.143141] WARNING: CPU: 27 PID: 1103 at
> > > > drivers/infiniband/core/verbs.c:1959 __ib_drain_sq+0x1bb/0x1c0
> > > > [ib_core]
> > > > [  861.202208] IB_POLL_DIRECT poll_ctx not supported for drain
> > > 
> > > Hello Laurence,
> > > 
> > > That warning has been removed by patch 7/8 of this series. Please double
> > > check
> > > whether all eight patches have been applied properly.
> > > 
> > > Bart.N�����r��y���b�X��ǧv�^�)޺{.n�+����{��ٚ�{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
> > 
> > Hello
> > Just a heads up, working with Bart on this patch series.
> > We have stability issues with my tests in my MLX5 EDR-100 test bed.
> > Thanks
> > Laurence
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> I went back to Linus' latest tree for a baseline and we fail the same way.
> This has none of the latest 8 patches applied so we will
> have to figure out what broke this.
> 
> Dont forget that I tested all this recently with Bart's dma patch series
> and its solid.
> 
> Will come back to this tomorrow and see what recently made it into Linus's
> tree by
> checking back with Doug.
> 
> [  183.779175] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff880bd4270eb0
> [  183.853047] 00000000 00000000 00000000 00000000
> [  183.878425] 00000000 00000000 00000000 00000000
> [  183.903243] 00000000 00000000 00000000 00000000
> [  183.928518] 00000000 0f007806 2500002a ad9fafd1
> [  198.538593] scsi host1: ib_srp: reconnect succeeded
> [  198.573141] mlx5_0:dump_cqe:262:(pid 7369): dump error cqe
> [  198.603037] 00000000 00000000 00000000 00000000
> [  198.628884] 00000000 00000000 00000000 00000000
> [  198.653961] 00000000 00000000 00000000 00000000
> [  198.680021] 00000000 0f007806 25000032 00105dd0
> [  198.705985] scsi host1: ib_srp: failed FAST REG status memory management
> operation error (6) for CQE ffff880b92860138
> [  213.532848] scsi host1: ib_srp: reconnect succeeded
> [  213.568828] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  227.579684] scsi host1: ib_srp: reconnect succeeded
> [  227.616175] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  242.633925] scsi host1: ib_srp: reconnect succeeded
> [  242.668160] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  257.127715] scsi host1: ib_srp: reconnect succeeded
> [  257.165623] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  272.225762] scsi host1: ib_srp: reconnect succeeded
> [  272.262570] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  286.350226] scsi host1: ib_srp: reconnect succeeded
> [  286.386160] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  301.109365] scsi host1: ib_srp: reconnect succeeded
> [  301.144930] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  315.910860] scsi host1: ib_srp: reconnect succeeded
> [  315.944594] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  330.551052] scsi host1: ib_srp: reconnect succeeded
> [  330.584552] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  344.998448] scsi host1: ib_srp: reconnect succeeded
> [  345.032115] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  359.866731] scsi host1: ib_srp: reconnect succeeded
> [  359.902114] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> ..
> ..
> [  373.113045] scsi host1: ib_srp: reconnect succeeded
> [  373.149511] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  388.401469] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> [  388.589517] scsi host1: ib_srp: reconnect succeeded
> [  388.623462] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  403.086893] scsi host1: ib_srp: reconnect succeeded
> [  403.120876] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> ffff8817f2234c30
> [  403.140401] mlx5_0:dump_cqe:262:(pid 749): dump error cqe
> [  403.140402] 00000000 00000000 00000000 00000000
> [  403.140402] 00000000 00000000 00000000 00000000
> [  403.140403] 00000000 00000000 00000000 00000000
> [  403.140403] 00
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello

Let summarize where we are and how we got here.

The last kernel I tested with mlx5 and ib_srp was vmlinuz-4.10.0-rc4 with Barts dma patches.
All tests passed.

I pulled Linus's tree and applied all 8 patches of the above series and we failed in the 
"failed FAST REG status memory management" area.

I applied only 7 of the 8 patches to Linus's tree because Bart and I thought patch 6 of the series 
may have been the catalyst.

This also failed.

Building from Barts tree which is based on 4.10.0-rc7 failed again.

This made me decide to baseline Linus's tree 4.10.0-rc7 and we fail.

So something has crept into 4.10.0-rc7 affecting this with mlx5 and ib_srp.

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-02-13 13:54 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-10 23:56 [PATCH 0/8] IB/srp bug fixes Bart Van Assche
2017-02-10 23:56 ` [PATCH 1/8] IB/srp: Avoid that duplicate responses trigger a kernel bug Bart Van Assche
2017-02-12 17:05   ` Leon Romanovsky
2017-02-12 20:07     ` Bart Van Assche
     [not found]       ` <1486930017.2918.3.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13  5:54         ` Leon Romanovsky
     [not found]           ` <20170213055432.GM14015-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-13 16:02             ` Bart Van Assche
2017-02-10 23:56 ` [PATCH 2/8] IB/srp: Fix race conditions related to task management Bart Van Assche
     [not found] ` <20170210235611.3243-1-bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-10 23:56   ` [PATCH 3/8] IB/srp: Document locking conventions Bart Van Assche
2017-02-10 23:56   ` [PATCH 4/8] IB/srp: Make a diagnostic message more informative Bart Van Assche
2017-02-10 23:56   ` [PATCH 5/8] IB/srp: Improve an error path Bart Van Assche
2017-02-10 23:56   ` [PATCH 6/8] IB/srp: Use the IB_DEVICE_SG_GAPS_REG HCA feature if supported Bart Van Assche
2017-02-10 23:56   ` [PATCH 7/8] IB/core: Add support for draining IB_POLL_DIRECT completion queues Bart Van Assche
2017-02-10 23:56   ` [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP Bart Van Assche
     [not found]     ` <20170210235611.3243-9-bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-11  0:07       ` Robert LeBlanc
     [not found]         ` <CAANLjFr+Jd3ctmhpBnjYGKZ4ZQPtYLAB7EWZxL59vHpgekP=Jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-11  0:13           ` Bart Van Assche
2017-02-12 17:19       ` Leon Romanovsky
     [not found]         ` <20170212171928.GF14015-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-12 18:02           ` Laurence Oberman
     [not found]             ` <1041506550.30101266.1486922573298.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-12 18:06               ` Laurence Oberman
     [not found]                 ` <1051975432.30101289.1486922792858.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14  3:02                   ` [PATCH 0/8] IB/srp bug fixes Laurence Oberman
     [not found]                     ` <1465409120.30916025.1487041332560.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 17:18                       ` Bart Van Assche
     [not found]                         ` <1487092678.2466.6.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-14 17:22                           ` Laurence Oberman
2017-02-14 18:47                           ` Laurence Oberman
     [not found]                             ` <1364431877.31401761.1487098067033.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 18:49                               ` Bart Van Assche
2017-02-12 20:05               ` [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP Bart Van Assche
     [not found]                 ` <1486929901.2918.1.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13  2:07                   ` Laurence Oberman
     [not found]                     ` <655392767.30136125.1486951636415.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13  3:14                       ` Laurence Oberman
     [not found]                         ` <1630482470.30208948.1486955693106.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 13:54                           ` Laurence Oberman [this message]
     [not found]                             ` <1633827327.30531404.1486994093828.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 14:17                               ` Leon Romanovsky
     [not found]                                 ` <20170213141724.GQ14015-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-13 14:24                                   ` Laurence Oberman
     [not found]                                     ` <225897984.30545262.1486995841880.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 16:12                                       ` Laurence Oberman
     [not found]                                         ` <1971987443.30613645.1487002375580.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 16:47                                           ` Laurence Oberman
     [not found]                                             ` <21338434.30712464.1487004451595.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 21:34                                               ` Laurence Oberman
     [not found]                                                 ` <1301607843.30852658.1487021644535.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 21:46                                                   ` Laurence Oberman
     [not found]                                                     ` <898197116.30855343.1487022400065.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 21:52                                                       ` v4.10-rc SRP + mlx5 regression Bart Van Assche
     [not found]                                                         ` <1487022735.2719.7.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13 21:56                                                           ` Laurence Oberman
2017-02-14  2:19                                                           ` Laurence Oberman
     [not found]                                                             ` <568916592.30910570.1487038794766.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14  6:39                                                               ` Leon Romanovsky
     [not found]                                                                 ` <20170214063953.GF6989-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-14 10:00                                                                   ` Max Gurtovoy
     [not found]                                                                     ` <bfca98d3-3f74-c370-7455-71e2ebd583e9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-02-14 13:31                                                                       ` Laurence Oberman
     [not found]                                                                         ` <656778124.31118982.1487079062235.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 16:21                                                                           ` Laurence Oberman
2017-02-14 17:15                                                                           ` Max Gurtovoy
     [not found]                                                                             ` <a7ae2926-da0a-edf9-7779-09a6edd54d5d-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-02-14 17:29                                                                               ` Bart Van Assche
2017-02-14 17:31                                                                               ` Laurence Oberman
2017-02-14 17:15                                                                           ` Max Gurtovoy
2017-02-14 16:53                                                                       ` Bart Van Assche
2017-02-12 20:11           ` [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP Bart Van Assche
     [not found]             ` <1486930299.2918.5.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13  6:07               ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1633827327.30531404.1486994093828.JavaMail.zimbra@redhat.com \
    --to=loberman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox