From: Jason Gunthorpe <jgg@ziepe.ca>
To: Leon Romanovsky <leon@kernel.org>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
"bugzilla-daemon@bugzilla.kernel.org"
<bugzilla-daemon@bugzilla.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [Bug 214523] New: RDMA Mellanox RoCE drivers are unresponsive to ARP updates during a reconnect
Date: Mon, 27 Sep 2021 09:24:25 -0300 [thread overview]
Message-ID: <20210927122425.GC3544071@ziepe.ca> (raw)
In-Reply-To: <YVG0iI3dSdP/6/1J@unreal>
On Mon, Sep 27, 2021 at 03:09:44PM +0300, Leon Romanovsky wrote:
> On Sun, Sep 26, 2021 at 05:36:01PM +0000, Chuck Lever III wrote:
> > Hi Leon-
> >
> > Thanks for the suggestion! More below.
> >
> > > On Sep 26, 2021, at 4:02 AM, Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Fri, Sep 24, 2021 at 03:34:32PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > >> https://bugzilla.kernel.org/show_bug.cgi?id=214523
> > >>
> > >> Bug ID: 214523
> > >> Summary: RDMA Mellanox RoCE drivers are unresponsive to ARP
> > >> updates during a reconnect
> > >> Product: Drivers
> > >> Version: 2.5
> > >> Kernel Version: 5.14
> > >> Hardware: All
> > >> OS: Linux
> > >> Tree: Mainline
> > >> Status: NEW
> > >> Severity: normal
> > >> Priority: P1
> > >> Component: Infiniband/RDMA
> > >> Assignee: drivers_infiniband-rdma@kernel-bugs.osdl.org
> > >> Reporter: kolga@netapp.com
> > >> Regression: No
> > >>
> > >> RoCE RDMA connection uses CMA protocol to establish an RDMA connection. During
> > >> the setup the code uses hard coded timeout/retry values. These values are used
> > >> for when Connect Request is not being answered to to re-try the request. During
> > >> the re-try attempts the ARP updates of the destination server are ignored.
> > >> Current timeout values lead to 4+minutes long attempt at connecting to a server
> > >> that no longer owns the IP since the ARP update happens.
> > >>
> > >> The ask is to make the timeout/retry values configurable via procfs or sysfs.
> > >> This will allow for environments that use RoCE to reduce the timeouts to a more
> > >> reasonable values and be able to react to the ARP updates faster. Other CMA
> > >> users (eg IB or others) can continue to use existing values.
> >
> > I would rather not add a user-facing tunable. The fabric should
> > be better at detecting addressing changes within a reasonable
> > time. It would be helpful to provide a history of why the ARP
> > timeout is so lax -- do certain ULPs rely on it being long?
>
> I don't know about ULPs and ARPs, but how to calculate TimeWait is
> described in the spec.
>
> Regarding tunable, I agree. Because it needs to be per-connection, most
> likely not many people in the world will success to configure it properly.
Maybe we should be disconnecting the cm_id if a gratituous ARP changes
the MAC address? The cm_id is surely broken after that event right?
Jason
next prev parent reply other threads:[~2021-09-27 12:24 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-24 15:34 [Bug 214523] New: RDMA Mellanox RoCE drivers are unresponsive to ARP updates during a reconnect bugzilla-daemon
2021-09-26 8:02 ` Leon Romanovsky
2021-09-26 17:36 ` Chuck Lever III
2021-09-27 12:09 ` Leon Romanovsky
2021-09-27 12:24 ` Jason Gunthorpe [this message]
2021-09-27 12:55 ` Mark Zhang
2021-09-27 13:10 ` Jason Gunthorpe
2021-09-27 13:32 ` Haakon Bugge
2021-10-15 6:35 ` Mark Zhang
2021-09-27 16:14 ` Chuck Lever III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210927122425.GC3544071@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.