From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Erez Shitrit <erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
Amir Vadai <amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Eyal Perry <eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions
Date: Mon, 26 Jan 2015 13:45:55 -0500 [thread overview]
Message-ID: <1422297955.2854.32.camel@redhat.com> (raw)
In-Reply-To: <54C64A2A.5070306-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 6957 bytes --]
On Mon, 2015-01-26 at 16:07 +0200, Erez Shitrit wrote:
> On 1/26/2015 3:37 PM, Doug Ledford wrote:
> > On Mon, 2015-01-26 at 15:24 +0200, Erez Shitrit wrote:
> >>>>>> The main cause is the concept that was broken for the send-only join,
> >>>>>> when you treat the sendonly like a regular mcg and add it to the mc list
> >>>>>> and to the mc_task etc.
> >>>>> I'm looking at 3.18 right now, and 3.18 adds sendonly groups to the mcg
> >>>>> just like my current code does. The only difference, and I do mean
> >>>>> *only*, is that it calls sendonly_join directly instead of via the
> >>>>> mcast_task.
> >>>> Yes, and i already wrote that it is more than just "only", it changed
> >>>> the concept of the sendonly mc packet.
> >>> Be more specific please. What do you mean by "concept"? And just so we
> >>> are clear, this all started because the existing multicast code was
> >>> super easy to break and was racy, so if the "concept" you are referring
> >>> to is what made the original code easy to break and racy, I'm not going
> >>> to care one whit that I changed that concept.
> >>>
> >> I agree that you fixed many bugs in your patches to 3.18, where the mc
> >> flow was easy to break, no argue about that.
> >> The only issue that i disagree is about the way now sendonly is handled
> >> (and i think that this is the reason for the regression we see now).
> > It's not. The patch I sent you off list should be sufficient at this
> > point to finally put your issues to rest.
> >
> >> In general, IMHO, the sendonly join is part of the TX flow and not part
> >> of the ipoib_set_mcast_list flow.
> > I disagree. I'll get into that below.
> >
> >> The original meaning of the ipoib_set_mcast_list task that restart the
> >> mc_task is to be used for the kernel in order to add one or more new
> >> mcg's macs to the driver/HW (ndo_set_rx_mode), the sendonly mc is not
> >> such object, its mac should not be part of the "mac" list of the driver
> >> (in IB wards, no qp_attach for it)
> > This part I agree with.
> >
> >> and from the kernel point of view
> >> whenever it sends packet from sendonly mcg type no need to do the join,
> >> it's a regular send, the only reason we have the sendonly join is the IB
> >> enforcement for such mcg.
> > This is where you and I differ. There is a requirement on us from the
> > IB spec that we have to join a sendonly MC group in order to do what the
> > kernel would think of as a normal send if we were on Ethernet. We
> > aren't on Ethernet though, so acting like we are to the above layer is
> > fine, but in our own layer we should be coding to the realities of our
> > layer. And that means treating a sendonly MC group like a regular MC
> > group.
>
> Just to make it clear, the IB request to do sendonly join was done prior
> to your changes.
Yes...
> (stopped after them.)
What do you mean here? With my changes we still do sendonly joins.
> The IB spec doesn't care if you mix sendonly with full-member,
Correct.
> the
> kernel does care,
Where? What part of the kernel cares?
> and by adding them to the mc_list you abuse the kernel
> ndo (by adding unnecessary joins to it, as i wrote in my previous mail)
How? We never add a sendonly during a typical set_multicast call
invocation. And in fact, we still do the exact same thing we used to do
when the kernel ndo is called, which is to call restart_task which will
scan our currently joined groups (both regular and sendonly, like it
always has), clear out any that don't have a matching address entry set
via the kernel ndo, and rejoin those that do have a matching entry set
via the ndo.
> I think you should not mix between RX operation (like the full-member
> join, kernel rx ndo) to TX operation (sendonly join)
And I think having two distinctly different code paths when it doesn't
matter whether the join is sendonly or normal, they still must all be
accurately tracked and reaped on shutdown and various other events, is
wrong and is an invitation to failures for no good reason.
> >
> >> The reason the driver keeps the sendonly mcg in its mc_list is from
> >> others reasons, the first is to handle the case when the kernel decides
> >> to move a mcg from sendonly membership to full-member, one more other
> >> reason is to do the leave operation when needed and not for being
> >> handled as a full-member mcg.
> > It's been a long time since I first started working on this issue, so
> > some of the details are fuzzy in my memory. I think my first 8 patch
> > patchset is now about 6 months old. But, I think probably the most
> > important thing you left off this list, and the one that trumps all the
> > others, is that whether a sendonly MC group has a QP attached or not, we
> > still have to account for these groups, we have to track them, and on
> > shutdown we have to reliably leave them without oopsing. Putting the
> > join of the sendonly groups directly in mcast_send opens us up to
> > problems with synchronizing our mcast list (either due to a dev flush, a
> > mcast restart task, or something else).
>
> This is the reason the sendonly mcg was added to the mc_list. and should
> stay there.
>
> > I can't say for certain that
> > the change to how we record the return value from ib_sa_join_multicast
> > in sendonly_join is enough to keep the sendonly join code from racing
> > with the flush code and causing problems. For that reason, I am highly
> > reluctant to put the sendonly join back directly into the tx path
> > because that could have been part of the original problem in the first
> > place, it's just that my memory of back then is too fuzzy to say that
> > with 100% certainty either way.
>
> That fear from the unknown is not a reason to break the original logic
> of the sendonly.
No, the reason to break the original logic of the sendonly was that it
was a bad design that contributed to other problems. The reason not to
consider putting that bad logic back in place right now, IMO, is that I
can't remember the details of the breakage better than I do at this
moment.
> if there is such a problem, we need to find it and to fix.
I disagree. The original logic was an artificial and bad separation of
too more or less identical things in the code that helped contribute to
it being broken in the first place. We are under no obligation to put
it back to that bad design. And in the future, I'd like to bring the
sendonly and regular joins more in line with each other, quite possible
doing away with them having separate functions. I'd also like to
completely deserialize them. Both of these things will be easier to do
from the starting point we have now versus going back to the old setup.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-01-26 18:45 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-22 14:31 [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions Doug Ledford
[not found] ` <cover.1421936879.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 01/10] IB/ipoib: fix IPOIB_MCAST_RUN flag usage Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 02/10] IB/ipoib: Add a helper to restart the multicast task Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 03/10] IB/ipoib: make delayed tasks not hold up everything Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 04/10] IB/ipoib: Handle -ENETRESET properly in our callback Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 05/10] IB/ipoib: don't restart our thread on ENETRESET Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 06/10] IB/ipoib: remove unneeded locks Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 07/10] IB/ipoib: fix race between mcast_dev_flush and mcast_join Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 08/10] IB/ipoib: fix ipoib_mcast_restart_task Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 09/10] IB/ipoib: flush the ipoib_workqueue on unregister Doug Ledford
2015-01-22 14:31 ` [PATCH FIX For-3.19 v5 10/10] IB/ipoib: cleanup a couple debug messages Doug Ledford
2015-01-23 7:01 ` [PATCH FIX For-3.19 v5 00/10] Fix ipoib regressions Or Gerlitz
[not found] ` <CAJ3xEMi7mowr_qFMUXtM5m8p974qF39nPf-Qh-NOYK_jUzswSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-23 7:45 ` Doug Ledford
[not found] ` <1421999125.3352.265.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-24 4:58 ` Roland Dreier
2015-01-23 12:54 ` Estrin, Alex
2015-01-23 16:52 ` Doug Ledford
[not found] ` <1422031938.3352.286.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-25 12:54 ` Erez Shitrit
[not found] ` <54C4E793.2010103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-25 22:21 ` Doug Ledford
[not found] ` <1422224477.3352.373.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 10:27 ` Erez Shitrit
[not found] ` <54C616A8.3050804-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-26 12:51 ` Doug Ledford
[not found] ` <1422276712.2854.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 13:24 ` Erez Shitrit
[not found] ` <54C6400E.30607-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-26 13:37 ` Doug Ledford
[not found] ` <1422279465.2854.15.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 14:07 ` Erez Shitrit
[not found] ` <54C64A2A.5070306-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-26 18:45 ` Doug Ledford [this message]
2015-01-26 19:30 ` Doug Ledford
2015-01-26 19:34 ` [PATCH FIX For-3.19 11/10] IB/ipoib: don't queue a work struct up twice Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1422297955.2854.32.camel@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox