From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Erez Shitrit <erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
Amir Vadai <amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Eyal Perry <eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions
Date: Mon, 26 Jan 2015 14:30:21 -0500 [thread overview]
Message-ID: <1422300621.2854.38.camel@redhat.com> (raw)
In-Reply-To: <54C6400E.30607-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 4110 bytes --]
On Mon, 2015-01-26 at 15:24 +0200, Erez Shitrit wrote:
> On 1/26/2015 2:51 PM, Doug Ledford wrote:
> > On Mon, 2015-01-26 at 12:27 +0200, Erez Shitrit wrote:
> >
> >> New (and full) dmesg attached, (after modprobe ib_ipoib, with all debug
> >> flags set) it is all there.
> > Thank you, I know what's going on here now. Will correct shortly.
>
> welcome -:)
I munged my opensm configuration so that I could forcibly replicate the
situation here (I intentionally took several well known multicast groups
and forbid their creation).
I was able to first replicate Eriz's problem.
Then I installed a new ib_ipoib module with my proposed fix for Erez's
problem and it worked exactly as expected. It was a mistake in one of
my earlier patches (the third in the series). When I added a delayed
queue of the task thread, I didn't have a separate work struct and
instead tried to queue the same work struct twice. I reworked it so
that the work struct is only ever queued once and if the multicast task
gets to the end of its run and there are delayed entries waiting still,
it will queue itself to run again when the shortest delay has expired.
I'll send that through.
Here's the log of the attempt:
[root@rdma-master linus (firewall/for-rc)]$ dmesg | tail -10
[337072.429488] mlx4_ib0: successfully joined all multicast groups
[337073.856932] mlx4_ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0002, starting sendonly join
[337073.869686] mlx4_ib0: sendonly multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[337073.882754] mlx4_ib0: successfully joined all multicast groups
[337088.480082] mlx4_ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[337088.492789] mlx4_ib0: sendonly multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[337088.505819] mlx4_ib0: successfully joined all multicast groups
[337089.897041] mlx4_ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0002, starting sendonly join
[337089.909870] mlx4_ib0: sendonly multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[337089.922893] mlx4_ib0: successfully joined all multicast groups
[root@rdma-master linus (firewall/for-rc)]$ ping6 -I mlx4_ib0 fe80::211:7500:77:d3cc
PING fe80::211:7500:77:d3cc(fe80::211:7500:77:d3cc) from fe80::f652:1403:7b:cba1 mlx4_ib0: 56 data bytes
64 bytes from fe80::211:7500:77:d3cc: icmp_seq=1 ttl=64 time=77.6 ms
64 bytes from fe80::211:7500:77:d3cc: icmp_seq=2 ttl=64 time=0.159 ms
64 bytes from fe80::211:7500:77:d3cc: icmp_seq=3 ttl=64 time=0.125 ms
64 bytes from fe80::211:7500:77:d3cc: icmp_seq=4 ttl=64 time=0.128 ms
^C
--- fe80::211:7500:77:d3cc ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.125/19.503/77.600/33.542 ms
[root@rdma-master linus (firewall/for-rc)]$ dmesg | tail -10[337120.632427] mlx4_ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0016, starting sendonly join
[337120.645166] mlx4_ib0: sendonly multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[337120.658292] mlx4_ib0: successfully joined all multicast groups
[337121.977733] mlx4_ib0: no multicast record for ff12:601b:ffff:0000:0000:0000:0000:0002, starting sendonly join
[337121.990478] mlx4_ib0: sendonly multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[337122.003589] mlx4_ib0: successfully joined all multicast groups
[337130.410559] mlx4_ib0: setting up send only multicast group for ff12:601b:ffff:0000:0000:0001:ff77:d3cc
[337130.423203] mlx4_ib0: no multicast record for ff12:601b:ffff:0000:0000:0001:ff77:d3cc, starting sendonly join
[337130.436327] mlx4_ib0: MGID ff12:601b:ffff:0000:0000:0001:ff77:d3cc AV ffff882027235f00, LID 0xc01e, SL 0
[337130.448970] mlx4_ib0: successfully joined all multicast groups
[root@rdma-master linus (firewall/for-rc)]$
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-01-26 19:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-22 14:31 [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions Doug Ledford
[not found] ` <cover.1421936879.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 01/10] IB/ipoib: fix IPOIB_MCAST_RUN flag usage Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 02/10] IB/ipoib: Add a helper to restart the multicast task Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 03/10] IB/ipoib: make delayed tasks not hold up everything Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 04/10] IB/ipoib: Handle -ENETRESET properly in our callback Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 05/10] IB/ipoib: don't restart our thread on ENETRESET Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 06/10] IB/ipoib: remove unneeded locks Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 07/10] IB/ipoib: fix race between mcast_dev_flush and mcast_join Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 08/10] IB/ipoib: fix ipoib_mcast_restart_task Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 09/10] IB/ipoib: flush the ipoib_workqueue on unregister Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 10/10] IB/ipoib: cleanup a couple debug messages Doug Ledford
2015-01-23 7:01 ` [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions Or Gerlitz
[not found] ` <CAJ3xEMi7mowr_qFMUXtM5m8p974qF39nPf-Qh-NOYK_jUzswSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-23 7:45 ` Doug Ledford
[not found] ` <1421999125.3352.265.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-24 4:58 ` Roland Dreier
2015-01-23 12:54 ` Estrin, Alex
2015-01-23 16:52 ` Doug Ledford
[not found] ` <1422031938.3352.286.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-25 12:54 ` Erez Shitrit
[not found] ` <54C4E793.2010103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-25 22:21 ` Doug Ledford
[not found] ` <1422224477.3352.373.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 10:27 ` Erez Shitrit
[not found] ` <54C616A8.3050804-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-26 12:51 ` Doug Ledford
[not found] ` <1422276712.2854.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 13:24 ` Erez Shitrit
[not found] ` <54C6400E.30607-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-26 13:37 ` Doug Ledford
[not found] ` <1422279465.2854.15.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 14:07 ` Erez Shitrit
[not found] ` <54C64A2A.5070306-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-26 18:45 ` Doug Ledford
2015-01-26 19:30 ` Doug Ledford [this message]
2015-01-26 19:34 ` [PATCH FIX For-3.19 11/10] IB/ipoib: don't queue a work struct up twice Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1422300621.2854.38.camel@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox