From: Erez Shitrit <erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH 7/9] IB/ipoib: fix MCAST_FLAG_BUSY usage
Date: Tue, 03 Mar 2015 11:53:21 +0200 [thread overview]
Message-ID: <54F58491.6020407@dev.mellanox.co.il> (raw)
In-Reply-To: <1425310036.2354.24.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On 3/2/2015 5:27 PM, Doug Ledford wrote:
> On Sun, 2015-03-01 at 11:31 +0200, Erez Shitrit wrote:
>> On 2/22/2015 2:27 AM, Doug Ledford wrote:
>>> Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast
>>> objects") added a new flag MCAST_JOIN_STARTED, but was not very strict
>>> in how it was used. We didn't always initialize the completion struct
>>> before we set the flag, and we didn't always call complete on the
>>> completion struct from all paths that complete it. And when we did
>>> complete it, sometimes we continued to touch the mcast entry after
>>> the completion, opening us up to possible use after free issues.
>>>
>>> This made it less than totally effective, and certainly made its use
>>> confusing. And in the flush function we would use the presence of this
>>> flag to signal that we should wait on the completion struct, but we never
>>> cleared this flag, ever.
>>>
>>> In order to make things clearer and aid in resolving the rtnl deadlock
>>> bug I've been chasing, I cleaned this up a bit.
>>>
>>> 1) Remove the MCAST_JOIN_STARTED flag entirely
>>> 2) Change MCAST_FLAG_BUSY so it now only means a join is in-flight
>>> 3) Test mcast->mc directly to see if we have completed
>>> ib_sa_join_multicast (using IS_ERR_OR_NULL)
>>> 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
>>> the mcast->done completion struct
>>> 5) Make sure that before calling complete(&mcast->done), we always clear
>>> the MCAST_FLAG_BUSY bit
>>> 6) Take the mcast_mutex before we call ib_sa_multicast_join and also
>>> take the mutex in our join callback. This forces
>>> ib_sa_multicast_join to return and set mcast->mc before we process
>>> the callback. This way, our callback can safely clear mcast->mc
>>> if there is an error on the join and we will do the right thing as
>>> a result in mcast_dev_flush.
>>> 7) Because we need the mutex to synchronize mcast->mc, we can no
>>> longer call mcast_sendonly_join directly from mcast_send and
>>> instead must add sendonly join processing to the mcast_join_task
>>> 8) Make MCAST_RUN mean that we have a working mcast subsystem, not that
>>> we have a running task. We know when we need to reschedule our
>>> join task thread and don't need a flag to tell us.
>>> 9) Add a helper for rescheduling the join task thread
>>>
>>> A number of different races are resolved with these changes. These
>>> races existed with the old MCAST_FLAG_BUSY usage, the
>>> MCAST_JOIN_STARTED flag was an attempt to address them, and while it
>>> helped, a determined effort could still trip things up.
>>>
>>> One race looks something like this:
>>>
>>> Thread 1 Thread 2
>>> ib_sa_join_multicast (as part of running restart mcast task)
>>> alloc member
>>> call callback
>>> ifconfig ib0 down
>>> wait_for_completion
>>> callback call completes
>>> wait_for_completion in
>>> mcast_dev_flush completes
>>> mcast->mc is PTR_ERR_OR_NULL
>>> so we skip ib_sa_leave_multicast
>>> return from callback
>>> return from ib_sa_join_multicast
>>> set mcast->mc = return from ib_sa_multicast
>>>
>>> We now have a permanently unbalanced join/leave issue that trips up the
>>> refcounting in core/multicast.c
>>>
>>> Another like this:
>>>
>>> Thread 1 Thread 2 Thread 3
>>> ib_sa_multicast_join
>>> ifconfig ib0 down
>>> priv->broadcast = NULL
>>> join_complete
>>> wait_for_completion
>>> mcast->mc is not yet set, so don't clear
>>> return from ib_sa_join_multicast and set mcast->mc
>>> complete
>>> return -EAGAIN (making mcast->mc invalid)
>>> call ib_sa_multicast_leave
>>> on invalid mcast->mc, hang
>>> forever
>>>
>>> By holding the mutex around ib_sa_multicast_join and taking the mutex
>>> early in the callback, we force mcast->mc to be valid at the time we
>>> run the callback. This allows us to clear mcast->mc if there is an
>>> error and the join is going to fail. We do this before we complete
>>> the mcast. In this way, mcast_dev_flush always sees consistent state
>>> in regards to mcast->mc membership at the time that the
>>> wait_for_completion() returns.
>>>
>>> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> drivers/infiniband/ulp/ipoib/ipoib.h | 11 +-
>>> drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 355 ++++++++++++++++---------
>>> 2 files changed, 238 insertions(+), 128 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
>>> index 9ef432ae72e..c79dcd5ee8a 100644
>>> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
>>> @@ -98,9 +98,15 @@ enum {
>>>
>>> IPOIB_MCAST_FLAG_FOUND = 0, /* used in set_multicast_list */
>>> IPOIB_MCAST_FLAG_SENDONLY = 1,
>>> - IPOIB_MCAST_FLAG_BUSY = 2, /* joining or already joined */
>>> + /*
>>> + * For IPOIB_MCAST_FLAG_BUSY
>>> + * When set, in flight join and mcast->mc is unreliable
>>> + * When clear and mcast->mc IS_ERR_OR_NULL, need to restart or
>>> + * haven't started yet
>>> + * When clear and mcast->mc is valid pointer, join was successful
>>> + */
>>> + IPOIB_MCAST_FLAG_BUSY = 2,
>>> IPOIB_MCAST_FLAG_ATTACHED = 3,
>>> - IPOIB_MCAST_JOIN_STARTED = 4,
>>>
>>> MAX_SEND_CQE = 16,
>>> IPOIB_CM_COPYBREAK = 256,
>>> @@ -148,6 +154,7 @@ struct ipoib_mcast {
>>>
>>> unsigned long created;
>>> unsigned long backoff;
>>> + unsigned long delay_until;
>>>
>>> unsigned long flags;
>>> unsigned char logcount;
>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
>>> index bb1b69904f9..277e7ac7c4d 100644
>>> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
>>> @@ -66,6 +66,48 @@ struct ipoib_mcast_iter {
>>> unsigned int send_only;
>>> };
>>>
>>> +/*
>>> + * This should be called with the mcast_mutex held
>>> + */
>>> +static void __ipoib_mcast_schedule_join_thread(struct ipoib_dev_priv *priv,
>>> + struct ipoib_mcast *mcast,
>>> + bool delay)
>>> +{
>>> + if (!test_bit(IPOIB_MCAST_RUN, &priv->flags))
>> You don't need the flag IPOIB_MCAST_RUN, it is duplicated of
>> IPOIB_FLAG_OPER_UP
>> probably, need to be taken from all places (including ipoib.h file).
> This is probably true, however I skipped it for this series of patches.
> It wasn't a requirement of proper operation, and depending on where in
> the patch series you tried to inject this change, it had unintended
> negative consequences. In particular, up until patch 7/9,
> mcast_restart_task used to do a hand rolled stop thread and a matching
> start_thread and so couldn't use FLAG_OPER_UP because we needed
> FLAG_OPER_UP to tell us whether or not to restart the thread.
OK, sounds reasonable. we can send a patch after that that fix that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-03-03 9:53 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-22 0:26 [PATCH 0/9] IB/ipoib: fixup multicast locking issues Doug Ledford
[not found] ` <cover.1424562072.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-22 0:26 ` [PATCH 1/9] IB/ipoib: factor out ah flushing Doug Ledford
[not found] ` <b06eb720c2f654f5ecdb72c66f4e89149d1c24ec.1424562072.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-26 13:28 ` Erez Shitrit
[not found] ` <54EF1F67.4000001-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-02-26 16:27 ` Doug Ledford
[not found] ` <1424968046.2543.18.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-01 6:47 ` Erez Shitrit
[not found] ` <54F2B61C.9080308-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-03-02 15:09 ` Doug Ledford
[not found] ` <1425308967.2354.19.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-03 9:59 ` Erez Shitrit
[not found] ` <54F585E9.7070704-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-03-13 8:39 ` Or Gerlitz
[not found] ` <CAJ3xEMgxxHu5BQdADaRe-Grtf4rm1LMfsCRiDyF6ToPdV_62OA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-15 18:42 ` Doug Ledford
[not found] ` <3A0A417D-BFE4-475C-BAB3-C3FB1D313022-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-16 15:24 ` Erez Shitrit
[not found] ` <5506F5B2.1080900-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-03-16 16:06 ` Doug Ledford
[not found] ` <ADC46FD9-3179-4182-949D-1884C9D31757-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-16 16:51 ` Erez Shitrit
2015-03-16 18:00 ` Doug Ledford
2015-02-22 0:27 ` [PATCH 2/9] IB/ipoib: change init sequence ordering Doug Ledford
2015-02-22 0:27 ` [PATCH 3/9] IB/ipoib: Consolidate rtnl_lock tasks in workqueue Doug Ledford
2015-02-22 0:27 ` [PATCH 4/9] IB/ipoib: Make the carrier_on_task race aware Doug Ledford
2015-02-22 0:27 ` [PATCH 5/9] IB/ipoib: Use dedicated workqueues per interface Doug Ledford
[not found] ` <1cfdf15058cea312f07c2907490a1d7300603c40.1424562072.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-23 16:48 ` Or Gerlitz
2015-02-22 0:27 ` [PATCH 6/9] IB/ipoib: No longer use flush as a parameter Doug Ledford
2015-02-22 0:27 ` [PATCH 7/9] IB/ipoib: fix MCAST_FLAG_BUSY usage Doug Ledford
[not found] ` <9d657f64ee961ee3b3233520d8b499b234a42bcd.1424562072.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-01 9:31 ` Erez Shitrit
[not found] ` <54F2DC81.304-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-03-02 15:27 ` Doug Ledford
[not found] ` <1425310036.2354.24.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-03 9:53 ` Erez Shitrit [this message]
2015-02-22 0:27 ` [PATCH 8/9] IB/ipoib: deserialize multicast joins Doug Ledford
[not found] ` <a24ade295dfdd1369aac47a978003569ec190952.1424562072.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-01 13:58 ` Erez Shitrit
[not found] ` <54F31AEC.3010001-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-03-02 15:29 ` Doug Ledford
[not found] ` <1425310145.2354.26.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-03 9:54 ` Erez Shitrit
2015-02-22 0:27 ` [PATCH 9/9] IB/ipoib: drop mcast_mutex usage Doug Ledford
[not found] ` <767f4c41779db63ce8c6dbba04b21959aba70ef9.1424562072.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-23 16:56 ` Or Gerlitz
[not found] ` <CAJ3xEMgLPF9pCwQDy9QyL9fAERJXJRXN2gBj3nhuXUCcbfCMPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-23 17:41 ` Doug Ledford
2015-02-22 21:34 ` [PATCH 0/9] IB/ipoib: fixup multicast locking issues Or Gerlitz
[not found] ` <CAJ3xEMgj=ATKLt0MA67c3WefCrG1hZ59eSrhpD-u_dxLJe2kfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-22 21:56 ` Doug Ledford
[not found] ` <1424642176.4847.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-22 21:57 ` Doug Ledford
2015-03-13 8:41 ` Or Gerlitz
[not found] ` <CAJ3xEMjHrTH_F=zPDsH9A9qRWo=AYN4sgbsdDKV62nzBkB5kXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-15 18:52 ` Doug Ledford
[not found] ` <F42024C5-60A5-4B92-B4AC-4D225E2C0FC3-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-31 17:04 ` ira.weiny
[not found] ` <20150331170452.GA6261-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-03-31 20:42 ` Or Gerlitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54F58491.6020407@dev.mellanox.co.il \
--to=erezsh-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).