From: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
hch-jcswGhMUV9g@public.gmane.org,
maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: v4.10-rc SRP + mlx5 regression
Date: Mon, 13 Feb 2017 21:19:54 -0500 (EST) [thread overview]
Message-ID: <568916592.30910570.1487038794766.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1487022735.2719.7.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
----- Original Message -----
> From: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> Cc: hch-jcswGhMUV9g@public.gmane.org, maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> Sent: Monday, February 13, 2017 4:52:28 PM
> Subject: Re: v4.10-rc SRP + mlx5 regression
>
> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
> > I will have to run through this again and see where the bisect went wrong.
>
> Hello Laurence,
>
> If you would be considering to repeat the bisect, did you know that a bisect
> can be sped up by specifying the names of the files and/or directories that
> are suspected? An example:
>
> git bisect start */infiniband */net
>
> Bart.--
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Hello Bart,
Much better news this time :), worked late on this but got it figured out.
OK, so we got to this one, which makes a lot more sense and is right in the area where we are having issues.
I must have answered wrong to one of the steps the first time I did the bisect.
Reverted this in the master tree of rc8 and rebuilt the kernel
Now all tests pass on Linus's tree - 4.10.0_rc8+
The interesting point here is that this commit is in rc5 but rc5 was not failing so we have an interoperability issue with this commit
[loberman@ibclient linux]$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation when the device supports sg gaps
[loberman@ibclient linux]$ git show ad8e66b4a80182174f73487ed25fd2140cf43361
commit ad8e66b4a80182174f73487ed25fd2140cf43361
Author: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Wed Dec 28 12:48:28 2016 +0200
IB/srp: fix mr allocation when the device supports sg gaps
If the device support arbitrary sg list mapping (device cap
IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
IB_MR_TYPE_SG_GAPS.
Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 4.7+
Signed-off-by: Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 8ddc071..0f67cf9 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
struct srp_fr_desc *d;
struct ib_mr *mr;
int i, ret = -EINVAL;
+ enum ib_mr_type mr_type;
if (pool_size <= 0)
goto err;
@@ -384,9 +385,13 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
spin_lock_init(&pool->lock);
INIT_LIST_HEAD(&pool->free_list);
+ if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
+ mr_type = IB_MR_TYPE_SG_GAPS;
+ else
+ mr_type = IB_MR_TYPE_MEM_REG;
+
for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
- mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
- max_page_list_len);
+ mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
if (IS_ERR(mr)) {
ret = PTR_ERR(mr);
if (ret == -ENOMEM)
(END)
So here is the revert patch, but you need to decide how you want to deal with this.
Revert "IB/srp: fix mr allocation when the device supports sg gaps"
Laurence Oberman
Traced after bisection to a cause for this failure
Tested-by: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
commit 90d169d312a173d5350c1bb36d6daab04c592127
Author: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Mon Feb 13 20:33:32 2017 -0500
Revert "IB/srp: fix mr allocation when the device supports sg gaps"
Laurence Oberman
Traced after bisection to a cause for this failure
[ 130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
[ 130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0edbfb0
[ 130.510899] 00000000 00000000 00000000 00000000
[ 130.536455] 00000000 00000000 00000000 00000000
[ 130.561878] 00000000 00000000 00000000 00000000
[ 130.585904] 00000000 0f007806 2500002a db0ec4d0
[ 145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
[ 146.530439] scsi host1: ib_srp: reconnect succeeded
[ 146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
[ 146.597635] 00000000 00000000 00000000 00000000
[ 146.623545] 00000000 00000000 00000000 00000000
[ 146.649599] 00000000 00000000 00000000 00000000
[ 146.673938] 00000000 0f007806 25000032 000c46d0
[ 146.697969] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff88
[ 162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
[ 162.256337] scsi host1: ib_srp: reconnect succeeded
[ 162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0412ef0`
This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 79bf484..01338c8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
struct srp_fr_desc *d;
struct ib_mr *mr;
int i, ret = -EINVAL;
- enum ib_mr_type mr_type;
if (pool_size <= 0)
goto err;
@@ -385,13 +384,9 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
spin_lock_init(&pool->lock);
INIT_LIST_HEAD(&pool->free_list);
- if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
- mr_type = IB_MR_TYPE_SG_GAPS;
- else
- mr_type = IB_MR_TYPE_MEM_REG;
-
for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
- mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
+ mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
+ max_page_list_len);
if (IS_ERR(mr)) {
ret = PTR_ERR(mr);
if (ret == -ENOMEM)
Now moving on to what got me here in the first place.
Bart, let me know if the 7 of the 8 patches in your most recent series are all still valid after this revert
Otherwise let me know which ones you want me to apply.
patch 6 - I am thinking i sno longer valid.
"
If a HCA supports the SG_GAPS_REG feature then a single memory
region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
reduces the number of memory regions that is allocated per SRP
session.
"
Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-02-14 2:19 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-10 23:56 [PATCH 0/8] IB/srp bug fixes Bart Van Assche
2017-02-10 23:56 ` [PATCH 1/8] IB/srp: Avoid that duplicate responses trigger a kernel bug Bart Van Assche
2017-02-10 23:56 ` Bart Van Assche
2017-02-12 17:05 ` Leon Romanovsky
2017-02-12 20:07 ` Bart Van Assche
[not found] ` <1486930017.2918.3.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13 5:54 ` Leon Romanovsky
2017-02-13 5:54 ` Leon Romanovsky
[not found] ` <20170213055432.GM14015-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-13 16:02 ` Bart Van Assche
2017-02-13 16:02 ` Bart Van Assche
2017-02-10 23:56 ` [PATCH 2/8] IB/srp: Fix race conditions related to task management Bart Van Assche
2017-02-10 23:56 ` Bart Van Assche
[not found] ` <20170210235611.3243-1-bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-10 23:56 ` [PATCH 3/8] IB/srp: Document locking conventions Bart Van Assche
2017-02-10 23:56 ` [PATCH 4/8] IB/srp: Make a diagnostic message more informative Bart Van Assche
2017-02-10 23:56 ` [PATCH 5/8] IB/srp: Improve an error path Bart Van Assche
2017-02-10 23:56 ` [PATCH 6/8] IB/srp: Use the IB_DEVICE_SG_GAPS_REG HCA feature if supported Bart Van Assche
2017-02-10 23:56 ` [PATCH 7/8] IB/core: Add support for draining IB_POLL_DIRECT completion queues Bart Van Assche
2017-02-10 23:56 ` [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP Bart Van Assche
[not found] ` <20170210235611.3243-9-bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-11 0:07 ` Robert LeBlanc
[not found] ` <CAANLjFr+Jd3ctmhpBnjYGKZ4ZQPtYLAB7EWZxL59vHpgekP=Jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-11 0:13 ` Bart Van Assche
2017-02-12 17:19 ` Leon Romanovsky
[not found] ` <20170212171928.GF14015-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-12 18:02 ` Laurence Oberman
[not found] ` <1041506550.30101266.1486922573298.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-12 18:06 ` Laurence Oberman
[not found] ` <1051975432.30101289.1486922792858.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 3:02 ` [PATCH 0/8] IB/srp bug fixes Laurence Oberman
[not found] ` <1465409120.30916025.1487041332560.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 17:18 ` Bart Van Assche
[not found] ` <1487092678.2466.6.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-14 17:22 ` Laurence Oberman
2017-02-14 18:47 ` Laurence Oberman
[not found] ` <1364431877.31401761.1487098067033.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 18:49 ` Bart Van Assche
2017-02-12 20:05 ` [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP Bart Van Assche
[not found] ` <1486929901.2918.1.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13 2:07 ` Laurence Oberman
[not found] ` <655392767.30136125.1486951636415.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 3:14 ` Laurence Oberman
[not found] ` <1630482470.30208948.1486955693106.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 13:54 ` Laurence Oberman
[not found] ` <1633827327.30531404.1486994093828.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 14:17 ` Leon Romanovsky
[not found] ` <20170213141724.GQ14015-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-13 14:24 ` Laurence Oberman
[not found] ` <225897984.30545262.1486995841880.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 16:12 ` Laurence Oberman
[not found] ` <1971987443.30613645.1487002375580.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 16:47 ` Laurence Oberman
[not found] ` <21338434.30712464.1487004451595.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 21:34 ` Laurence Oberman
[not found] ` <1301607843.30852658.1487021644535.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 21:46 ` Laurence Oberman
[not found] ` <898197116.30855343.1487022400065.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-13 21:52 ` v4.10-rc SRP + mlx5 regression Bart Van Assche
[not found] ` <1487022735.2719.7.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13 21:56 ` Laurence Oberman
2017-02-14 2:19 ` Laurence Oberman [this message]
[not found] ` <568916592.30910570.1487038794766.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 6:39 ` Leon Romanovsky
[not found] ` <20170214063953.GF6989-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-14 10:00 ` Max Gurtovoy
[not found] ` <bfca98d3-3f74-c370-7455-71e2ebd583e9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-02-14 13:31 ` Laurence Oberman
[not found] ` <656778124.31118982.1487079062235.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-14 16:21 ` Laurence Oberman
2017-02-14 17:15 ` Max Gurtovoy
[not found] ` <a7ae2926-da0a-edf9-7779-09a6edd54d5d-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-02-14 17:29 ` Bart Van Assche
2017-02-14 17:31 ` Laurence Oberman
2017-02-14 17:15 ` Max Gurtovoy
2017-02-14 16:53 ` Bart Van Assche
2017-02-12 20:11 ` [PATCH 8/8] IB/srp: Drain the send queue before destroying a QP Bart Van Assche
[not found] ` <1486930299.2918.5.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-13 6:07 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=568916592.30910570.1487038794766.JavaMail.zimbra@redhat.com \
--to=loberman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=hch-jcswGhMUV9g@public.gmane.org \
--cc=israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.