From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Erez Shitrit <erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Amir Vadai <amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Eyal Perry <eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH V3 FIX for-3.19] IB/ipoib: Fix sendonly traffic and multicast traffic
Date: Tue, 27 Jan 2015 12:02:10 -0500 [thread overview]
Message-ID: <1422378130.2854.119.camel@redhat.com> (raw)
In-Reply-To: <54C74D49.3080201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 4504 bytes --]
On Tue, 2015-01-27 at 10:33 +0200, Erez Shitrit wrote:
> The patch I sent, only claims to fix the regression in multicast and
> sendonly issues, it can replace the first 3 patches and the one you sent
> me off list from your last patchset.
OK. But my patchset resolve those issues too, and as you point out,
it's not really my entire patchset that resolves just this regression.
It would be easy enough to squash just those patches in my patchset
required to solve just this regression down to just one patch if patch
count bothers people (I find that to be a silly thing to be concerned
with, but oh well).
> (probably there are more bugs, that the rest of your patchset solved,
> hence it should be tested with the rest of your patchset)
Yes, there are. But Or has been arguing that we should take your patch
to resolve the regression and leave the rest out. So, testing that
specific situation seemed to make sense in terms of finding out if that
left us with a kernel that was in reasonable shape.
If we want to look at your patch combined with my patchset in lieu of
the individual patches in mine that perform the same fix, then it would
take some work to merge them as there are differences in assumptions
made and they would not simply work together.
> And probably there are more bugs that your patcheset didn't fix yet -:)
> For example, I run your last patchset+ the last fix you sent me with:
> modprobe -r ib_ipoib; modprobe ib_ipoib;
> ping6
> some adding/deleting one child interface and got the next panic:
>
> [81209.348259] ib0: join completion for
> ff12:601b:ffff:0000:0000:0001:ff43:3bf1 (status -102)
> [81209.408787] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000020
> [81209.416750] IP: [<ffffffffa096b399>] ipoib_mcast_join+0xa9/0x1b0
> [ib_ipoib]
This looks exactly like the issue solved by 3ee9876fbdb (IB/ipoib: fix
race between mcast_dev_flush and mcast_join). But without the full logs
I wouldn't know. However, if it happened while creating/deleting
children then there might be something lurking in the child delete logic
that is not properly locked.
> Doug, there are bugs that probably will be found all the time with any
> patchset, my point was only according to the way sendonly need to be
> handled.
And I proved with my final patchset that sendonly does *not* have to be
handled that way. I proved that the regression could be solved without
basically reverting the logic to the old split way of handling multicast
joins.
So you've argued several times that "sendonly conceptually should be in
the tx path because at the Ethernet layer we are emulating there is no
join for sendonly traffic, it just goes straight out". To whit, your
patch restores that behavior.
My argument against is that:
1) We are emulating Ethernet, we are not Ethernet, and we have to have a
sendonly join that Ethernet does not. Emulating Ethernet does not mean
we should do conceptually silly things in our lower layer driver. We
must do a sendonly join, which is still a join, and splitting it off
from other joins makes no sense for us at the InfiniBand layer. And in
fact, once it leaves the ipoib layer and enters the ib_sa layer, the two
code paths completely merge and are absolutely no different. And at
that layer, they *always* get queued to a work queue and handled later.
So there really is no advantage to doing the send directly in the tx
path once you look beyond the ipoib_multicast.c file itself.
2) Having two code paths that only vary in 2 simple ways
a) one path uses 1 for the membership type, one uses 4
b) one attaches a QP at the end, the other does not
but which are otherwise almost identical (and could probably be joined
into a single code path, which is something I'm planning for 3.20) does
not make sense.
3) Fixing oopses and races and other issues with (needlessly) multiple
racing code paths is considerably harder than with fewer code paths.
For the sake of hardening the IPoIB driver against various situations, a
single code path is preferable to multiple code paths.
I haven't heard an argument from you yet that I believe beats the points
I've made above. So I believe a solution that does not revert back to
having two separate code paths to be maintained is preferable to your
patch.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-01-27 17:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-26 13:00 [PATCH V3 FIX for-3.19] IB/ipoib: Fix sendonly traffic and multicast traffic Erez Shitrit
[not found] ` <1422277227-1086-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-01-26 13:16 ` Or Gerlitz
[not found] ` <CAJ3xEMjERaEP5d_ZT8RN5+w8Z_Hig4T7dhuq3o+1NOUuQgfJLw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-26 19:38 ` Doug Ledford
[not found] ` <1422301106.2854.41.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 20:57 ` Or Gerlitz
[not found] ` <CAJ3xEMg3vYGbGuT+Z-XQMv5YuPws33XHQP_Wcz8gvpBbCg3TSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-26 22:00 ` Doug Ledford
[not found] ` <1422309605.2854.62.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-27 8:33 ` Erez Shitrit
[not found] ` <54C74D49.3080201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-27 17:02 ` Doug Ledford [this message]
[not found] ` <1422378130.2854.119.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-29 12:51 ` Or Gerlitz
[not found] ` <54CA2CE0.30107-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-01-29 15:34 ` Doug Ledford
[not found] ` <1422545677.2854.260.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-29 19:23 ` Roland Dreier
[not found] ` <CAL1RGDV30SRUv0oxZCQW0e+tziO0g+iDha8DSWeM56PiWtnRwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-29 19:27 ` Doug Ledford
2015-01-29 20:29 ` Jason Gunthorpe
2015-01-27 13:05 ` Or Gerlitz
[not found] ` <54C78D36.7050700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-01-27 17:51 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1422378130.2854.119.camel@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox