On 12/23/2015 11:35 AM, Matan Barak wrote: > On Wed, Dec 23, 2015 at 6:08 PM, Doug Ledford wrote: >> On 12/22/2015 02:26 PM, Matan Barak wrote: >>> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford wrote: >>>> On 12/22/2015 05:47 AM, Or Gerlitz wrote: >>>>> On 12/21/2015 5:01 PM, Matan Barak wrote: >>>>>> Previously, cma_match_net_dev called cma_protocol_roce which >>>>>> tried to verify that the IB device uses RoCE protocol. However, >>>>>> if rdma_id didn't have a bounded port, it used the first port >>>>>> of the device. >>>>>> >>>>>> In VPI systems, the first port might be an IB port while the second >>>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids >>>>>> that come from the Ethernet port fail. >>>>>> Fixing this by passing the port of the request and checking this port >>>>>> of the device. >>>>>> >>>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev >>>>>> on RoCE') >>>>>> Signed-off-by: Matan Barak >>>>> >>>>> seems that the patch is missing from patchworks, I can't explain that. >>>> >>>> I've already downloaded it and marked it accepted. >>>> >>> >>> Thanks Doug. Would you like that I'll repost the patch with the commit >>> message changed as Or suggested or is the current version good enough? >>> >>> Regarding the Ethernet loopback issue, I started looking into that, >>> but as Or stated, it's broken even before the RoCE patches. >> >> Ping. Any progress on this? > > Yeah, there's some progress - the basic problem is that we don't have > a bounded ndev and thus cma_resolve_iboe_route returns -ENODEV. Which makes sense considering that 127.0.0.1 doesn't belong to any of the devs. > The root cause for this is that we have to store the ndev in > cma_bind_loopback. Even after doing that, cma_set_loopback changes the > sgid to be the localhost GID, which doesn't exist in the GID table and > thus will fail later in the GID lookup. Again, makes sense. > I think that regarding loopback, we actually want to send the data on > the link local default GID, Which link local default GID? If you have more than one port or card, then that is not a unique value. > which is guaranteed to exist. And in many cases, multiple times. > That's why I > think we should: > 1. Change the cma_src_addr and cma_dst_addr in cma_bind_loopback to be > the default GID. > 2. Store the associated ndev of this default GID as the bounded device. > 3. In cma_resolve_loopback, get the MAC of this bounded device and > store it as the DMAC. > 4. In cma_resolve_iboe_route, don't try to do route resolve if the > dGID matches the default GID. > > It's still not working though, but this is where I'm headed. What do you think? Let's punt this until later. It only effects the situation when you use 127.0.0.1 as the address. If you use the local IP address of a specific interface, you get the same loopback behavior, but no failures (and on top of that instead of getting a random device to handle the loopback transfer, you get a specific device of your choosing). To me, that qualifies as a reasonable workaround. The 127.0.0.1 behavior has been broken for a while (and I'm not sure it should have ever been relied upon anyway), so I don't think we have to hold things up. -- Doug Ledford GPG KeyID: 0E572FDD