public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: [PATCH 00/22] IB/ipoib: Fixups for multicast issues
Date: Wed, 11 Feb 2015 20:43:23 -0500	[thread overview]
Message-ID: <cover.1423703861.git.dledford@redhat.com> (raw)

This patchset revives the 8 patches that were reverted from 3.19,
the 11 patches that fixed the problems with the first 8, the single
patch that was related to reaping of ah's and failure to dealloc
resources on shutdown, and then adds two new patches that would
have been enhancements and not bugfixes and hence weren't appropriate
to post in the 3.19 tussle.

Testing of this patchset is currently underway, but it has done
well so far.  IPv4 multicast, IPv6 multicast, connected mode,
datagram mode, rmmod/insmod while active, restart opensm while
active, ifconfig up/ifconfig down in a tight while loop have
all passed.

There are two outstanding issues that I think stilll need addressed
(while performing all the other testing I ran across these issues,
and I think they existed prior to my patchset, but I haven't booted
up a clean kernel to verify it yet...I'll do that tomorrow and if
things are not as I expect, I'll report back here):

1) In connected mode, the initial ip6 ping to any host takes almost
exactly 1 second to complete.  The debug messages show this delay
very clearly:

[19059.689967] qib_ib0: joining MGID ff12:601b:ffff:0000:0000:0001:ff31:7791
[19059.689970] qib_ib0: successfully started all multicast joins
[19059.690313] qib_ib0: sendonly join completion for ff12:601b:ffff:0000:0000:0001:ff31:7791 (status 0)
[19059.690314] qib_ib0: Created ah ffff88080cc0ef60
[19059.690315] qib_ib0: MGID ff12:601b:ffff:0000:0000:0001:ff31:7791 AV ffff88080cc0ef60, LID 0xc035, SL 0 <- Final debug message from creating our AH and when we should have requeued our sends
[19060.694190] qib_ib0: REQ arrived <- almost exactly 1 second later, we
finally start setting up our connection

In datagram mode, this does not happen and initial startup of ping6
is mostly immediate.

2) In connected mode, restartng opensm repeatedly can cause some of
the machines to start failing to find other machines when trying to
use ping6.  However, they don't loose connectivity to all machines,
only specific machines.  A rmmod/insmod cycle solves the problem.
So does a full ifdown/ifup cycle.  Given enough idle time, the
problem goes away.  I suspect that neighbor flushing when in
connected mode is not reliable/sufficient when opensm events come
in.  Again, I think this exists in the upstream kernel and I'll
test more on that tomorrow.

Doug Ledford (22):
  IB/ipoib: Consolidate rtnl_lock tasks in workqueue
  IB/ipoib: Make the carrier_on_task race aware
  IB/ipoib: fix MCAST_FLAG_BUSY usage
  IB/ipoib: fix mcast_dev_flush/mcast_restart_task race
  IB/ipoib: change init sequence ordering
  IB/ipoib: Use dedicated workqueues per interface
  IB/ipoib: Make ipoib_mcast_stop_thread flush the workqueue
  IB/ipoib: No longer use flush as a parameter
  IB/ipoib: fix IPOIB_MCAST_RUN flag usage
  IB/ipoib: Add a helper to restart the multicast task
  IB/ipoib: make delayed tasks not hold up everything
  IB/ipoib: Handle -ENETRESET properly in our callback
  IB/ipoib: don't restart our thread on ENETRESET
  IB/ipoib: remove unneeded locks
  IB/ipoib: fix race between mcast_dev_flush and mcast_join
  IB/ipoib: fix ipoib_mcast_restart_task
  IB/ipoib: flush the ipoib_workqueue on unregister
  IB/ipoib: cleanup a couple debug messages
  IB/ipoib: make sure we reap all our ah on shutdown
  IB/ipoib: don't queue a work struct up twice
  IB/ipoib: deserialize multicast joins
  IB/ipoib: drop mcast_mutex usage

 drivers/infiniband/ulp/ipoib/ipoib.h           |  20 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        |  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  69 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |  51 ++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 479 ++++++++++++-------------
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |  24 +-
 6 files changed, 356 insertions(+), 305 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

             reply	other threads:[~2015-02-12  1:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-12  1:43 Doug Ledford [this message]
     [not found] ` <cover.1423703861.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-12  1:43   ` [PATCH 01/22] IB/ipoib: Consolidate rtnl_lock tasks in workqueue Doug Ledford
2015-02-12  1:43   ` [PATCH 02/22] IB/ipoib: Make the carrier_on_task race aware Doug Ledford
2015-02-12  1:43   ` [PATCH 03/22] IB/ipoib: fix MCAST_FLAG_BUSY usage Doug Ledford
2015-02-12  1:43   ` [PATCH 04/22] IB/ipoib: fix mcast_dev_flush/mcast_restart_task race Doug Ledford
2015-02-12  1:43   ` [PATCH 05/22] IB/ipoib: change init sequence ordering Doug Ledford
2015-02-12  1:43   ` [PATCH 06/22] IB/ipoib: Use dedicated workqueues per interface Doug Ledford
2015-02-12  1:43   ` [PATCH 07/22] IB/ipoib: Make ipoib_mcast_stop_thread flush the workqueue Doug Ledford
2015-02-12  1:43   ` [PATCH 08/22] IB/ipoib: No longer use flush as a parameter Doug Ledford
2015-02-12  1:43   ` [PATCH 09/22] IB/ipoib: fix IPOIB_MCAST_RUN flag usage Doug Ledford
     [not found]     ` <73f38862d167a9849482464519c04b7c1f0a8b7c.1423703861.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-13  8:50       ` Erez Shitrit
     [not found]         ` <54DDBAB8.10002-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-02-13 13:30           ` Doug Ledford
2015-02-12  1:43   ` [PATCH 10/22] IB/ipoib: Add a helper to restart the multicast task Doug Ledford
2015-02-12  1:43   ` [PATCH 11/22] IB/ipoib: make delayed tasks not hold up everything Doug Ledford
2015-02-12  1:43   ` [PATCH 12/22] IB/ipoib: Handle -ENETRESET properly in our callback Doug Ledford
2015-02-12  1:43   ` [PATCH 13/22] IB/ipoib: don't restart our thread on ENETRESET Doug Ledford
2015-02-12  1:43   ` [PATCH 14/22] IB/ipoib: remove unneeded locks Doug Ledford
     [not found]     ` <3cd3c664adb2877317c8f684ee344749b2915e45.1423703861.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-13 16:59       ` Or Gerlitz
2015-02-12  1:43   ` [PATCH 15/22] IB/ipoib: fix race between mcast_dev_flush and mcast_join Doug Ledford
2015-02-12  1:43   ` [PATCH 16/22] IB/ipoib: fix ipoib_mcast_restart_task Doug Ledford
2015-02-12  1:43   ` [PATCH 17/22] IB/ipoib: flush the ipoib_workqueue on unregister Doug Ledford
2015-02-12  1:43   ` [PATCH 18/22] IB/ipoib: cleanup a couple debug messages Doug Ledford
     [not found]     ` <7aeffef3862526da5a472c15f94564897a4d7537.1423703861.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-13 16:40       ` Or Gerlitz
     [not found]         ` <CAJ3xEMi2yvaL6QGPFXDcJavnnw4Lxk64bdkZnm--FE8NiKQmbg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-13 17:21           ` Doug Ledford
2015-02-12  1:43   ` [PATCH 19/22] IB/ipoib: make sure we reap all our ah on shutdown Doug Ledford
     [not found]     ` <14ee2e7737123c7d37465ac52be5c026240c9985.1423703861.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-12 18:35       ` Or Gerlitz
2015-02-12  1:43   ` [PATCH 20/22] IB/ipoib: don't queue a work struct up twice Doug Ledford
     [not found]     ` <30a5bd6461381448c52af0d7408dbc14da9ac4d0.1423703861.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-12 18:33       ` Or Gerlitz
     [not found]         ` <54DCF20C.3040704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-02-12 19:47           ` Doug Ledford
     [not found]             ` <1423770436.3387.4.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-12 21:35               ` Or Gerlitz
2015-02-12  1:43   ` [PATCH 21/22] IB/ipoib: deserialize multicast joins Doug Ledford
2015-02-12  1:43   ` [PATCH 22/22] IB/ipoib: drop mcast_mutex usage Doug Ledford
2015-02-12  4:46   ` [PATCH 00/22] IB/ipoib: Fixups for multicast issues Or Gerlitz
     [not found]     ` <54DC303C.2020207-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-02-12  4:55       ` Doug Ledford
     [not found]         ` <1423716906.3424.74.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-12  5:25           ` Or Gerlitz
     [not found]             ` <54DC3934.3010401-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-02-12 14:02               ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1423703861.git.dledford@redhat.com \
    --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox