public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Amir Vadai <amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Eyal Perry <eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH V3 FIX for-3.19] IB/ipoib: Fix sendonly traffic and multicast traffic
Date: Tue, 27 Jan 2015 12:51:20 -0500	[thread overview]
Message-ID: <1422381080.2854.142.camel@redhat.com> (raw)
In-Reply-To: <54C78D36.7050700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 4240 bytes --]

On Tue, 2015-01-27 at 15:05 +0200, Or Gerlitz wrote:
> On 1/27/2015 12:00 AM, Doug Ledford wrote:
> > However, I didn't get more than 5 minutes into testing before I was able
> > to livelock the system.  In this case, from machine A running my
> > patchset, I did
> >
> > ping6 -I mlx4_ib0 -i .25 <machine B address>
> >
> > On machine B running Erez's patch, I did:
> >
> > rmmod ib_ipoib; modprobe ib_ipoib mcast_debug_level=1; sleep 2; ping6
> > -i .25 -c 10 -I mlx4_ib0 <machine A address>
> >
> > And on the machine rdma-master, where the opensm runs, I did just a few:
> >
> > systemctl restart opensm
> >
> > The livelock is in the mcast flushing code.  On the machine that livelocked
> 
> Doug,
> 
> The tests you are running and the issues you are seeing fall well into a 
> to-be-fixed-in-some-kernel-rc1 category but by NO means as something 
> which should be an rc6 fix.
> 
> You must do the distinction between Erez's patch that fixes the 
> regressions introduced on 3.19-rc1 to your attempts to fix many more 
> instabilities in the IPoIB driver, which are seen under whatever nasty 
> test you are running (and it's good we want to reach there).
> 
> Roland, the V3 patch solves the rc1 regression and I think we should 
> pick it up, by no way we can allow to pick eleven patches @ this point.
> 
> Thoughts?

As I said in my other email to Erez, and as Erez points out, not all 11
patches of mine are needed to resolve the specific regression you are
talking about.  However, my fix resolves the regression without
reverting to splitting the multicast joins down two separate code paths,
which I think is the wrong thing to do and something that actually makes
hardening the driver harder.  If you *really* don't want my patchset
because it's 11 patches (something I couldn't care less about, and I
don't think you should either...the content of the patches is much more
important than the count), I could certainly do some squashing.  And I
could split out just the regression fix from all the rest too.

But in a situation like this, what I'm *really* concerned about is the
final result.  And here's how it breaks down under the various options:

v3.18 plain - ifconfig down/ifconfig up on ib0 can easily lock machine

v3.18 + 8 patches for above issue - initial multicast bringup works, but
additional joins attempted later (after the multicast task had decided
it was done with the initial join set) did not.  there were multiple
symptoms of the multicast join issue, one of which was failure of ipv6
or ipv4 multicast, but another was hangs in ib_sa_unregister_client on
shutdown which could just as easily be classified as a regression as the
ipv6/ipv4 multicast support

v3.18 + 8 patches + Erez patch - subsequent multicast joins now work
again, but other symptoms of the 8 patch series not addressed at all,
including other regressions, and in adding this patch in, it reverts
part of the changes made in the original 8 patch series and quite likely
reintroduces instability on ifconfig down/ifconfig up cycles (making one
wonder if this fix is better or worse than just reverting the original 8
patch set)

v3.18 + 8 patches + 11 fix patches - multicast joins now work again,
ifconfig down/ifconfig up fix continues to work, other regressions such
as hangs in ib_sa_unregister_client on shutdown fixed, overall
considerably harder to cause the kernel to behave badly than with any of
the above alternatives.  I don't claim that it's perfect and that there
isn't additional hardening to be done, but I believe it is considerably
harder/less likely to trip this kernel up than all of the rest above

If there hadn't been a flurry of testing around my patches, then I
wouldn't suggest them at all.  But they have been getting testing.  Lots
of it.  And so have the alternatives.  And out of the bunch, regardless
of patch count, my patchset has fared best under testing.  But if we
don't want to do that, then I would probably recommend reverting the
original 8 patches and then dropping the whole bunch early into 3.20.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

      parent reply	other threads:[~2015-01-27 17:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-26 13:00 [PATCH V3 FIX for-3.19] IB/ipoib: Fix sendonly traffic and multicast traffic Erez Shitrit
     [not found] ` <1422277227-1086-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-01-26 13:16   ` Or Gerlitz
     [not found]     ` <CAJ3xEMjERaEP5d_ZT8RN5+w8Z_Hig4T7dhuq3o+1NOUuQgfJLw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-26 19:38       ` Doug Ledford
     [not found]         ` <1422301106.2854.41.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-26 20:57           ` Or Gerlitz
     [not found]             ` <CAJ3xEMg3vYGbGuT+Z-XQMv5YuPws33XHQP_Wcz8gvpBbCg3TSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-26 22:00               ` Doug Ledford
     [not found]                 ` <1422309605.2854.62.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-27  8:33                   ` Erez Shitrit
     [not found]                     ` <54C74D49.3080201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-27 17:02                       ` Doug Ledford
     [not found]                         ` <1422378130.2854.119.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-29 12:51                           ` Or Gerlitz
     [not found]                             ` <54CA2CE0.30107-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-01-29 15:34                               ` Doug Ledford
     [not found]                                 ` <1422545677.2854.260.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-29 19:23                                   ` Roland Dreier
     [not found]                                     ` <CAL1RGDV30SRUv0oxZCQW0e+tziO0g+iDha8DSWeM56PiWtnRwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-29 19:27                                       ` Doug Ledford
2015-01-29 20:29                                       ` Jason Gunthorpe
2015-01-27 13:05                   ` Or Gerlitz
     [not found]                     ` <54C78D36.7050700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-01-27 17:51                       ` Doug Ledford [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1422381080.2854.142.camel@redhat.com \
    --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=eyalpe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox