From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: Or Gerlitz <ogerlitz-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
Cc: Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>,
Sean Hefty <mshefty-+/W+9+QloQG75v1z/vFq2g@public.gmane.org>,
linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: sense remote hardware address change by rdma-cm applications
Date: Tue, 20 Jul 2010 13:12:17 -0500 [thread overview]
Message-ID: <4C45E701.7030501@opengridcomputing.com> (raw)
In-Reply-To: <4C454F80.1060808-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
Or Gerlitz wrote:
> Jason Gunthorpe wrote:
>
>> It is a bit wider problem than just ND entries, changes in routing can
>> also alter the L2 address, so that needs to be tracked as well.
>>
>
> sure, when we did the address change work, see commit dd5bdff "RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event", the problem I wanted to solve was related to
> the local bonding. Over the review thread, remote address change related
> to bonding fail-over and routing changes were mentioned, and left to future work.
>
>
>
>> this is back to original criticisms from netdev of this whole separated
>> stack idea - it isn't integrated, so where do you draw the line? What gets left out?
>> Today, it is pretty clear that only the CM portion integrates at all
>> with netdev and after that things are separate.
>>
>
> the address change event was an attempt to make the CM part which integrates with netdev
> go a step further and help the data path which is offloaded to be more consistent with netdev,
> this email is about going another step.
>
>
>> So.. I think to tackle this you need to start looking at how the
>> dst_entry structure works in netdev and apply the same idea to RDMA-CM
>> and reflect the changes in AH back to the QP owner.
>>
>
> I can take a look (pointer would be very much appreciated...) still, the dst entry is used
> for every netdev xmit where here the xmit is offloaded, so I don't see what could be really used from the dst code, but I might be wrong. The rdma app uses the neighbour once, upon address resolving, and I was trying to see if we can ref the neighbour so the neigh sub-system probes would keep going even though the neighbour is not directly used.
>
>
>> Is this an iwarp problem too? Not sure how L3->L2 translation works there.
>>
>
> I never managed to understand how address resolving really works with iwarp...
>
> Doing a bit of detective work... you can see that addr4_resolve says
>
>
>> /* If the device does ARP internally, return 'done' */
>> if (rt->idev->dev->flags & IFF_NOARP) {
>> rdma_copy_addr(addr, rt->idev->dev, NULL);
>> goto put;
>> }
>>
>
> and later cma_connect_iw places into the iwarp cm the src/dst IP addresses
>
>
>> sin = (struct sockaddr_in*) &id_priv->id.route.addr.src_addr;
>> cm_id->local_addr = *sin;
>> sin = (struct sockaddr_in*) &id_priv->id.route.addr.dst_addr;
>> cm_id->remote_addr = *sin;
>>
>
> so all the iwarp providers do ARP resolving in their TOE stack?! Steve, can you
> clarify that?
>
>
The Ammasso driver uses the IFF_NOARP, and I think actually that is the
only iwarp driver that uses it.
The cxgb3/4 drivers do not set IFF_NOARP and rely on ND being done as
part of connection setup. The driver will initiate ND if there isn't a
neigh entry available at the time the iwarp driver tries to send a SYN
or SYN/ACK. So even though the rdma_cm does ND initially, the cxgb*
drivers don't assume that. The code that handles all this is in
cxgb3.ko. See drivers/net/cxgb3/l2t.c. The iwarp driver code that uses
the L2T services is mainly in drivers/infiniband/hw/cxgb3/iwch_cm.c.
The cxgb* drivers actually reference the neigh and dst structs until the
offload connection is gone. Also if the the offloaded connection has
problems transmitting (due to a L2 address change, for example), then
the driver will initiate ND again by calling neigh_event_send(). See
t4_l2t_send_event() in l2t.c which is called by the iwarp driver in
peer_abort() from iwch_cm.c when the HW tells us its retransmitting too
much.
The cxgb* drivers also handle routing redirects, but I think that path
has bugs.
What doesn't happen is active positive feedback during the connection to
avoid NUD. IE once the connection is setup, nobody calls
dst_confirm(). It is only called during connection setup/teardown.
Steve.
>
>
>> Not sure what you do about UD.. Maybe RDMA-CM learns to do UC where
>> the only action is to register notification monitors for L2 addressing
>> changes in the kernel?
>>
>
> The problem exists for all IB transports (even for RD, if it would have been implemented...), the only difference between the U and R onces is that for the R's, if the remote side vanished, eventually the IB HW would let you know on that in the form of CQ error.
>
>
>> Can this be hidden with Sean's recent work on simplified progamming models?
>>
>
> not sure how Sean's work relates to this proposed change.
>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-07-20 18:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-19 21:42 sense remote hardware address change by rdma-cm applications Or Gerlitz
[not found] ` <AANLkTimmWiNqHJIqSEKbY-X6mSx6zA19p__JDYPEmp8b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-20 0:14 ` Jason Gunthorpe
[not found] ` <20100720001436.GH7920-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-07-20 7:25 ` Or Gerlitz
[not found] ` <4C454F80.1060808-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2010-07-20 17:22 ` Jason Gunthorpe
2010-07-20 18:12 ` Steve Wise [this message]
[not found] ` <4C45E701.7030501-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-20 18:46 ` Jason Gunthorpe
[not found] ` <20100720184620.GJ7920-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-07-20 19:20 ` Steve Wise
[not found] ` <4C45F6F5.6050008-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-20 20:30 ` Jason Gunthorpe
[not found] ` <20100720203044.GK7920-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-07-20 20:50 ` Steve Wise
[not found] ` <4C460BFD.5010707-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-20 20:57 ` Jason Gunthorpe
[not found] ` <20100720205746.GL7920-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-07-20 21:03 ` Steve Wise
[not found] ` <4C460F08.7030304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-20 21:15 ` Steve Wise
2010-07-21 14:40 ` Or Gerlitz
2010-07-21 14:33 ` Or Gerlitz
[not found] ` <4C47053B.3000802-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2010-07-21 15:48 ` Steve Wise
[not found] ` <4C4716D8.2040902-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-22 8:18 ` Or Gerlitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C45E701.7030501@opengridcomputing.com \
--to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
--cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mshefty-+/W+9+QloQG75v1z/vFq2g@public.gmane.org \
--cc=ogerlitz-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.