From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Bart Van Assche
<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>,
Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Israel Rukshin <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
Artemy Kovalyov
<artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
"# 4 . 7+" <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
Date: Wed, 15 Feb 2017 17:18:12 +0200 [thread overview]
Message-ID: <20170215151812.GS6989@mtr-leonro.local> (raw)
In-Reply-To: <242820990.31706010.1487170436012.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 12548 bytes --]
On Wed, Feb 15, 2017 at 09:53:56AM -0500, Laurence Oberman wrote:
>
>
> ----- Original Message -----
> > From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Doug Ledford"
> > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Mark Bloch"
> > <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Yuval Shaia" <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>, "Artemy Kovalyov" <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "# 4 . 7+"
> > <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > Sent: Wednesday, February 15, 2017 8:47:07 AM
> > Subject: Re: [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
> >
> > On Wed, Feb 15, 2017 at 08:43:38AM -0500, Laurence Oberman wrote:
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > To: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > > Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Max Gurtovoy"
> > > > <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Doug Ledford"
> > > > <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Israel Rukshin"
> > > > <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Mark Bloch"
> > > > <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Yuval Shaia" <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>, "Artemy
> > > > Kovalyov" <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "# 4 . 7+"
> > > > <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > > > Sent: Wednesday, February 15, 2017 8:33:35 AM
> > > > Subject: Re: [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "Leon Romanovsky" <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > > > To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Max Gurtovoy"
> > > > > <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > > > Cc: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
> > > > > "Israel Rukshin" <israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Mark
> > > > > Bloch" <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Yuval Shaia" <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
> > > > > "Artemy Kovalyov" <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "# 4
> > > > > . 7+" <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > > > > Sent: Wednesday, February 15, 2017 3:19:45 AM
> > > > > Subject: Re: [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
> > > > >
> > > > > On Wed, Feb 15, 2017 at 09:14:49AM +0200, Leon Romanovsky wrote:
> > > > > > On Tue, Feb 14, 2017 at 10:56:29AM -0800, Bart Van Assche wrote:
> > > > > > > Tests have shown that the following error message is reported when
> > > > > > > using SG-GAPS registration with an mlx5 adapter:
> > > > > > >
> > > > > > > scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE
> > > > > > > ffff880bd4270eb0
> > > > > > > 00000000 00000000 00000000 00000000
> > > > > > > 00000000 00000000 00000000 00000000
> > > > > > > 00000000 00000000 00000000 00000000
> > > > > > > 00000000 0f007806 2500002a ad9fafd1
> > > > > > > scsi host1: ib_srp: reconnect succeeded
> > > > > > > mlx5_0:dump_cqe:262:(pid 7369): dump error cqe
> > > > > > > 00000000 00000000 00000000 00000000
> > > > > > > 00000000 00000000 00000000 00000000
> > > > > > > 00000000 00000000 00000000 00000000
> > > > > > > 00000000 0f007806 25000032 00105dd0
> > > > > > > scsi host1: ib_srp: failed FAST REG status memory management
> > > > > > > operation
> > > > > > > error (6) for CQE ffff880b92860138
> > > > > > >
> > > > > > > Hence avoid using SG-GAPS memory registrations. Additionally,
> > > > > > > always configure the blk_queue_virt_boundary() to avoid to trigger
> > > > > > > a mapping failure when using adapters that support SG-GAPS (e.g.
> > > > > > > mlx5).
> > > > > >
> > > > > > According to the error dump, we have an issue with max_page_list_len
> > > > > > supplied and/or
> > > > > > internal calculations from that value to the UMR byte count.
> > > > >
> > > > > Hi Bart,
> > > > >
> > > > > Do you mind to try your test on my branch rdma-next [1] with the
> > > > > following
> > > > > fixup?
> > > > >
> > > > > diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > > > > b/drivers/infiniband/hw/mlx5/mr.c
> > > > > index 3c1f483d003f..3e59dce10d5e 100644
> > > > > --- a/drivers/infiniband/hw/mlx5/mr.c
> > > > > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > > > > @@ -1045,8 +1045,9 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64
> > > > > idx,
> > > > > int npages,
> > > > > for (pages_mapped = 0;
> > > > > pages_mapped < pages_to_map && !err;
> > > > > pages_mapped += pages_iter, idx += pages_iter) {
> > > > > + npages = min_t(int, pages_iter, pages_to_map - pages_mapped);
> > > > > dma_sync_single_for_cpu(ddev, dma, size, DMA_TO_DEVICE);
> > > > > - npages = populate_xlt(mr, idx, pages_iter, xlt,
> > > > > + npages = populate_xlt(mr, idx, npages, xlt,
> > > > > page_shift, size, flags);
> > > > >
> > > > > dma_sync_single_for_device(ddev, dma, size, DMA_TO_DEVICE);
> > > > >
> > > > > [1]
> > > > > https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=rdma-next
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > > > Hello Leon
> > > > Replied earlier but I dont know if my reply made it.
> > > > I will have to test this.
> > > >
> > > > is this repo
> > > > https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=rdma-next
> > > > already patched with the change you want.
> > > > If not can I just take the patch and apply to my earlier tree based just
> > > > on
> > > > Linus's tree where I reverted the patch.
> > > >
> > > > Thanks
> > > > Laurence
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > >
> > >
> > > I guess not, I looked at applying this patch to Linus's tree but the
> > > function
> > > mlx5_ib_update_xlt() is not in there yet.
> >
> > This is why I asked to use my tree, there is a chance that wrong
> > calculation was before we introduced mlx5_ib_update_xlt() function.
> >
> > >
> > > I will see if I can get your tree staged and test this for you.
> >
> > Thanks a lot.
> >
> > >
> > > Thanks
> > > Laurence
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> Hello Leon
> I pulled your tree, ran git checkout rdma-next
> I applied the patch manually, built the kernel and started the tests.
>
> Pretty soon I ran into
>
> ibclient login: [ 132.640142] mlx5_1:dump_cqe:262:(pid 11417): dump error cqe
> [ 132.640185] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd86a97b0
> [ 132.718106] 00000000 00000000 00000000 00000000
> [ 132.743616] 00000000 00000000 00000000 00000000
> [ 132.767790] 00000000 00000000 00000000 00000000
> [ 132.793626] 00000000 0f007806 2500002a 8ad015d1
> [ 136.181512] systemd-readahead[701]: open(/var/tmp/dracut.ZR8CYG/initramfs/usr/bin/loginctl) failed: Too many levels of symbolic links
> [ 136.250779] systemd-readahead[701]: open(/var/tmp/dracut.ZR8CYG/initramfs/usr/lib/systemd/system/dracut-emergency.service) failed: Too many levels of symbolic links
> [ 147.791028] scsi host2: ib_srp: reconnect succeeded
> [ 147.827012] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bf5e71330
> [ 162.908764] scsi host2: ib_srp: reconnect succeeded
> [ 162.944244] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bf5e71330
> [ 166.409523] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec570
> [ 177.292496] scsi host2: ib_srp: reconnect succeeded
> [ 177.334396] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
> [ 177.363793] 00000000 00000000 00000000 00000000
> [ 177.388434] 00000000 00000000 00000000 00000000
> [ 177.413918] 00000000 00000000 00000000 00000000
> [ 177.438911] 00000000 0f007806 25000042 00102dd0
> [ 177.465048] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880bd7c90138
> [ 181.386124] scsi host1: ib_srp: reconnect succeeded
> [ 181.422892] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec5b0
> [ 192.827678] fast_io_fail_tmo expired for SRP port-2:1 / host2.
> [ 193.230036] scsi host2: ib_srp: reconnect succeeded
> [ 193.329290] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
> [ 193.356245] 00000000 00000000 00000000 00000000
> [ 193.379488] 00000000 00000000 00000000 00000000
> [ 193.404853] 00000000 00000000 00000000 00000000
> [ 193.429633] 00000000 0f007806 2500004a 006c46d0
> [ 193.455183] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880bd7c906f8
> [ 196.178111] scsi host1: ib_srp: reconnect succeeded
> [ 196.215197] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec5b0
> [ 208.277396] scsi host2: ib_srp: reconnect succeeded
> [ 208.318454] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
> [ 208.349012] 00000000 00000000 00000000 00000000
> [ 208.374873] 00000000 00000000 00000000 00000000
> [ 208.401070] 00000000 00000000 00000000 00000000
> [ 208.426873] 00000000 0f007806 25000052 00103dd0
> [ 208.452788] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880bd7c90138
> [ 211.066945] scsi host1: ib_srp: reconnect succeeded
> [ 211.101645] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec5b0
> [ 223.296126] scsi host2: ib_srp: reconnect succeeded
> [ 223.396340] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
> [ 223.423310] 00000000 00000000 00000000 00000000
> [ 223.448590] 00000000 00000000 00000000 00000000
> [ 223.473520] 00000000 00000000 00000000 00000000
> [ 223.499008] 00000000 0f007806 2500005a 006c56d0
> [ 223.524787] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880bd7c906f8
> [ 225.463312] scsi host1: ib_srp: reconnect succeeded
> [ 225.499915] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec5b0
> [ 238.912126] fast_io_fail_tmo expired for SRP port-2:1 / host2.
> [ 239.254388] scsi host2: ib_srp: reconnect succeeded
> [ 239.291843] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bf5e712f0
> [ 240.619914] scsi host1: ib_srp: reconnect succeeded
> [ 240.654445] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec5b0
> [ 254.620679] scsi host2: ib_srp: reconnect succeeded
> [ 254.658021] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bf5e712f0
> [ 255.798616] scsi host1: ib_srp: reconnect succeeded
> [ 255.832897] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec5b0
> [ 269.228783] scsi host2: ib_srp: reconnect succeeded
> [ 269.272055] mlx5_1:dump_cqe:262:(pid 0): dump error cqe
> [ 269.300655] 00000000 00000000 00000000 00000000
> [ 269.325885] 00000000 00000000 00000000 00000000
> [ 269.350457] 00000000 00000000 00000000 00000000
> [ 269.375757] 00000000 0f007806 25000072 005847d0
> [ 269.402013] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880bd7c905b8
> [ 269.856464] scsi host1: ib_srp: reconnect succeeded
> [ 269.893203] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f23ec5b0
>
Thanks a lot for your effort, we tried fast path to debug, but it didn't
work :(.
> Thanks
> Laurence
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2017-02-15 15:18 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-14 18:56 [PATCH v2 0/8] IB/srp bug fixes Bart Van Assche
[not found] ` <20170214185636.29250-1-bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-14 18:56 ` [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS Bart Van Assche
[not found] ` <20170214185636.29250-2-bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-15 0:22 ` Bart Van Assche
2017-02-15 7:14 ` Leon Romanovsky
2017-02-15 8:19 ` Leon Romanovsky
[not found] ` <20170215081945.GP6989-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-02-15 13:33 ` Laurence Oberman
2017-02-15 13:43 ` Laurence Oberman
2017-02-15 13:47 ` Leon Romanovsky
2017-02-15 14:53 ` Laurence Oberman
[not found] ` <242820990.31706010.1487170436012.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-15 15:18 ` Leon Romanovsky [this message]
2017-02-15 15:42 ` Sagi Grimberg
2017-02-15 15:38 ` Sagi Grimberg
[not found] ` <cebcaeae-94a6-de82-cfc8-ce055b273836-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-02-15 15:42 ` Laurence Oberman
2017-02-15 16:18 ` Max Gurtovoy
2017-02-15 16:27 ` Sagi Grimberg
[not found] ` <0514bb01-95cf-c10a-b883-494f149845f3-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-02-15 16:30 ` Leon Romanovsky
2017-02-15 16:37 ` Laurence Oberman
2017-02-15 16:55 ` Sagi Grimberg
2017-02-15 23:49 ` Bart Van Assche
2017-02-16 6:14 ` Leon Romanovsky
2017-02-16 9:11 ` Max Gurtovoy
2017-02-14 18:56 ` [PATCH v2 2/8] IB/srp: Avoid that duplicate responses trigger a kernel bug Bart Van Assche
2017-02-15 7:22 ` Leon Romanovsky
2017-02-14 18:56 ` [PATCH v2 3/8] IB/srp: Fix race conditions related to task management Bart Van Assche
2017-02-14 18:56 ` [PATCH v2 4/8] IB/srp: Document locking conventions Bart Van Assche
2017-02-14 18:56 ` [PATCH v2 5/8] IB/srp: Make a diagnostic message more informative Bart Van Assche
2017-02-14 18:56 ` [PATCH v2 6/8] IB/srp: Improve an error path Bart Van Assche
2017-02-14 18:56 ` [PATCH v2 7/8] IB/core: Add support for draining IB_POLL_DIRECT completion queues Bart Van Assche
[not found] ` <20170214185636.29250-8-bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-14 19:00 ` Steve Wise
2017-02-15 7:21 ` Leon Romanovsky
2017-02-14 18:56 ` [PATCH v2 8/8] IB/srp: Drain the send queue before destroying a QP Bart Van Assche
2017-02-19 14:21 ` [PATCH v2 0/8] IB/srp bug fixes Doug Ledford
[not found] <017955b3-8fd5-40da-8bd5-023bc2f23fb4@email.android.com>
[not found] ` <017955b3-8fd5-40da-8bd5-023bc2f23fb4-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2017-02-15 15:31 ` [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS Bart Van Assche
[not found] ` <1487172663.2990.5.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-02-15 15:34 ` Laurence Oberman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170215151812.GS6989@mtr-leonro.local \
--to=leonro-vpraknaxozvwk0htik3j/w@public.gmane.org \
--cc=artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=israelr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox