From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
Yevgeny Petrilin
<yevgenyp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH 0/8] IPoIB: Fix multiple race conditions
Date: Wed, 03 Sep 2014 14:12:17 -0400 [thread overview]
Message-ID: <1409767937.26762.10.camel@firewall.xsintricity.com> (raw)
In-Reply-To: <54071D14.9040404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 4642 bytes --]
On Wed, 2014-09-03 at 16:52 +0300, Or Gerlitz wrote:
> On 8/13/2014 2:38 AM, Doug Ledford wrote:
> > Locking of multicast joins/leaves in the IPoIB layer have been problematic
> > for a while. There have been recent changes to try and make things better,
> > including these changes:
> >
> > bea1e22 IPoIB: Fix use-after-free of multicast object
> > a9c8ba5 IPoIB: Fix usage of uninitialized multicast objects
> >
> > Unfortunately, the following test still fails (miserably) on a plain
> > upstream kernel:
> >
> > pass=0
> > ifdown ib0
> > while true; do
> > ifconfig ib0 up
> > ifconfig ib0 down
> > echo "Pass $pass"
> > let pass++
> > done
> >
> > This usually fails within 10 to 20 passes, although I did have a lucky
> > run make it to 300 or so. If you happen to have a P_Key child interface,
> > it fails even quicker.
>
> Hi Doug,
>
> Thanks for looking on that. I wasn't aware we're doing so badly... I
> checked here and the plan is that Erez Shitrit from Mellanox will also
> be providing feedback on the series.
>
> Anyway, just to make sure we're on the same page, people agree that
> picking such series is too heavy for post merge window, right? so we are
> talking on 3.18, agree?
Given how easy the problem is to reproduce, and how this patch set
positively solves the reproducer, I would have preferred 3.17, and I had
it in to Roland before the merge window closed, but Roland chose to put
it off.
/me boggles.
> Or.
>
> >
> > In tracking down the rtnl deadlock in the multicast code, I ended up
> > re-designing the flag usage and clean up just the race conditions in
> > the various tasks. Even that wasn't enough to resolve the issue entirely
> > though, as if you ran the test above on multiple interfaces simultaneously,
> > it could still deadlock. So in the end I re-did the workqueue usage too
> > so that we now use a workqueue per device (and that includes P_Key devices
> > have dedicated workqueues) as well as one global workqueue that does
> > nothing but flush tasks. This turns out to be a much more elegant way
> > of handling the workqueues and in fact enabled me to remove all of the
> > klunky passing around of flush parameters to tell various functions not
> > to flush the workqueue if it would deadlock the workqueue we are running
> > from.
> >
> > Here is my test setup:
> >
> > 2 InfiniBand physical fabrics: ib0 and ib1
> > 2 P_Keys on each fabric: default and 0x8002 on ib0 and 0x8003 on ib1
> > 4 total IPoIB devices that I have named mlx4_ib0, mlx4_ib0.8002,
> > mlx4_ib1, and mlx4_ib1.8003
> >
> > In order to test my final patch set, I logged into my test machine on
> > four different virtual terminals, I used ifdown on all of the above
> > interfaces to get things in a consistent state, and then I ran the above
> > loop, one per terminal per interface simultaneously.
> >
> > It's worth noting here that when you ifconfig a base interface up, it
> > automatically brings up the child interface too, so the ifconfig mlx4_ib0
> > up is in fact racing with both ups and downs of mlx4_ib0.8002. The same
> > is true for the mlx4_ib1 interface and its child. With my patch set in
> > place, these loops are currently running without a problem and have passed
> > 15,000 up/down cycles per interface.
> >
> > Doug Ledford (8):
> > IPoIB: Consolidate rtnl_lock tasks in workqueue
> > IPoIB: Make the carrier_on_task race aware
> > IPoIB: fix MCAST_FLAG_BUSY usage
> > IPoIB: fix mcast_dev_flush/mcast_restart_task race
> > IPoIB: change init sequence ordering
> > IPoIB: Use dedicated workqueues per interface
> > IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
> > IPoIB: No longer use flush as a parameter
> >
> > drivers/infiniband/ulp/ipoib/ipoib.h | 19 +-
> > drivers/infiniband/ulp/ipoib/ipoib_cm.c | 18 +-
> > drivers/infiniband/ulp/ipoib/ipoib_ib.c | 27 ++-
> > drivers/infiniband/ulp/ipoib/ipoib_main.c | 49 +++--
> > drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 239 ++++++++++++++++---------
> > drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 22 ++-
> > 6 files changed, 240 insertions(+), 134 deletions(-)
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
prev parent reply other threads:[~2014-09-03 18:12 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-12 23:38 [PATCH 0/8] IPoIB: Fix multiple race conditions Doug Ledford
[not found] ` <cover.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-12 23:38 ` [PATCH 1/8] IPoIB: Consolidate rtnl_lock tasks in workqueue Doug Ledford
[not found] ` <2394730ce5ae1d46522dca04066293dd842edf16.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-15 22:11 ` Wendy Cheng
2014-09-04 14:28 ` Erez Shitrit
2014-08-12 23:38 ` [PATCH 2/8] IPoIB: Make the carrier_on_task race aware Doug Ledford
[not found] ` <d05cdce70f1312f35f8be2d14bafd2a06809b137.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-18 23:26 ` Wendy Cheng
[not found] ` <CABgxfbE6edfZZ58=mTvhGqWSkCxsik0XuQPR0L-yayze=803cg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-19 20:32 ` Doug Ledford
[not found] ` <2CC1794B-10CD-49A2-8F5D-0C66A6684DBC-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-19 22:05 ` Wendy Cheng
2014-09-04 12:13 ` Erez Shitrit
[not found] ` <5408576C.7040609-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-09-09 7:17 ` Doug Ledford
2014-08-12 23:38 ` [PATCH 3/8] IPoIB: fix MCAST_FLAG_BUSY usage Doug Ledford
[not found] ` <a410e80dc5ca7cfa64229bbbf50c1337317e3bd8.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-19 18:08 ` Wendy Cheng
[not found] ` <CABgxfbH-Dt3CpxJKwCAZeHTUyupaA9y_WXVXuxgiPMet26PTQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-19 20:28 ` Doug Ledford
[not found] ` <902D5BF2-159A-4B31-A87F-7491F3C8057F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-25 19:51 ` Wendy Cheng
[not found] ` <CABgxfbHOD75vLdZ0TtWZbk8ne3kHd_eWObxPHmoJ-D8DjE0bkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-25 20:03 ` Doug Ledford
[not found] ` <E3EFCBAC-2D6E-49D3-A556-DBD40701CC5F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-26 18:04 ` Wendy Cheng
2014-08-12 23:38 ` [PATCH 4/8] IPoIB: fix mcast_dev_flush/mcast_restart_task race Doug Ledford
[not found] ` <e1dbcfc25d8930b281aad12699ebf8fa82485b0e.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-29 21:53 ` Wendy Cheng
[not found] ` <CABgxfbHDuUrdHuLJT2oD07Cy3Ys4_rj-bJ6eR=9+uv0CuPH7_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-30 15:39 ` Wendy Cheng
[not found] ` <CABgxfbGJOdmAn1sokEtisDdnA=r_4mfP=PfqZVsP0cd_oL50dA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-03 18:06 ` Doug Ledford
[not found] ` <1409767614.26762.7.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
2014-09-03 19:45 ` Wendy Cheng
2014-09-03 17:49 ` Doug Ledford
2014-08-12 23:38 ` [PATCH 5/8] IPoIB: change init sequence ordering Doug Ledford
[not found] ` <ead9800512c1cb412b86cb1de3868c40f07c72be.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-04 12:36 ` Erez Shitrit
2014-08-12 23:38 ` [PATCH 6/8] IPoIB: Use dedicated workqueues per interface Doug Ledford
[not found] ` <f7af9c251d722675a549e4a673f46c0f31dfa266.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:01 ` Estrin, Alex
2014-09-04 6:49 ` Erez Shitrit
[not found] ` <54080B6B.8050707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-09-09 7:09 ` Doug Ledford
[not found] ` <540EA7B2.2050805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-10 17:27 ` Erez Shitrit
2014-08-12 23:38 ` [PATCH 7/8] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue Doug Ledford
[not found] ` <ae3912431eeacd81d920a405a6bdeb3853791b1a.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:03 ` Estrin, Alex
2014-08-12 23:38 ` [PATCH 8/8] IPoIB: No longer use flush as a parameter Doug Ledford
[not found] ` <ad7bb2b8da52f187cf2978e6a1c77ead32b60de3.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:04 ` Estrin, Alex
2014-08-15 22:08 ` [PATCH 0/8] IPoIB: Fix multiple race conditions Wendy Cheng
[not found] ` <CABgxfbEGfiNGUKT4NJi1GoDRouFznxpgogDt5yr47TLfwDB7hA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-03 16:26 ` Wendy Cheng
2014-09-03 13:52 ` Or Gerlitz
[not found] ` <54071D14.9040404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-09-03 18:12 ` Doug Ledford [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1409767937.26762.10.camel@firewall.xsintricity.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=yevgenyp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox