public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
To: Haggai Eran <haggaie@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
	Liran Liss <liranl@mellanox.com>,
	Guy Shapiro <guysh@mellanox.com>,
	Shachar Raindel <raindel@mellanox.com>,
	Yotam Kenneth <yotamke@mellanox.com>,
	"Hefty, Sean" <sean.hefty@intel.com>
Subject: Re: [PATCH v4 for-next 04/12] IB/ipoib: Return IPoIB devices matching connection parameters
Date: Thu, 28 May 2015 09:45:01 -0600	[thread overview]
Message-ID: <20150528154501.GA2962@obsidianresearch.com> (raw)
In-Reply-To: <55670157.8090000@mellanox.com>

On Thu, May 28, 2015 at 02:51:51PM +0300, Haggai Eran wrote:

> > But RDMA CM doesn't provide the QPN. So when RDMA CM searches the
> > netdevs for an address it cannot *uniquely* map to a IPoIB interface.

> This is technically true, but if someone configures their system that
> way, they will also have ARP conflicts in addition. I don't see why we
> should support such a configuration.

Based on the dicussion in the other thread about this, the feeling is
we should not support same-GUID, same-PKey child interfaces at all. It
breaks too much stuff (DHCP,NetworkManager,IPv6,RDMA-CM)

> No, this is exactly what would happen in the Ethernet world. If you
> create a conflicting configuration between two containers on the same
> Ethernet segment, then one of them could get the traffic that was
> intended for the other.

It is not exactly the same. In Ethernet there is an ARP collision at
L3, but the traffic is properly addressed at L2 and unambiguously
directed into only one container. There are ways to deal with ARP
collisions, but those are only effective if the full LLADDR is
consistently used for routing to containers.

With RDMA-CM it is an L3 collsion, so our anti-ARP collision stuff
doesn't help.

Like I said, I don't from a security perspective what to make of this,
but it isn't exactly the same of ethernet.

> >  1) Locate the netdev associated with the ingress of the packet,
> >     in a sane world this is done by only checking the
> >     unique (Device,Port,Pkey,QPN) tuple.
> >     If we keep our brokeness, we'd do this based on
> >     (Device,Port,Pkey,IP) - if there are IP collisions then randomly
> >     select a netdev (similar to how ARP collision is handled).
> That's what ib_get_net_dev_by_port_pkey_ip intends to do.

Right, almost there.

ib_get_net_dev_by_port_pkey_ip needs to work in a very specific
way: If there is only one netdev with the (Device,Port,GUID,Pkey)
match then that is the answer.
(guid comes from the CM_REQ, if we add alias GUID support to IPoIB as
 Or suggested then it is needed)

The IP search should *only* be done if there are two children with
identical (Pkey,GUID), and as above, perhaps we should de-support that.

> >  2) Then we do the ip_route_input_noref step, this will set skb_dst to
> >     the netdev that will handle the packet, or tell us to drop it.
> >     This is not always the same as the netdev that accepts the
> >     packet!!!
> > 
> >     NOTE: This route step is missing today, it does critical things
> >     like check that the node is actually listening on the dest IP!
> 
> Isn't this a little over-engineered? If all you want is to make sure the
> net dev is up, can't we use something like netif_running()?

The routing check is not to see if the netdev is up, it is doing all
sorts of subtle userspace visible things. Like checking there is no
blackhole route configured for the packet, checking that the IP is
present in the system, netdevs are up, etc etc.

We don't get to pick and choose what netdev behaviors we implement
when doing this kind of stuff. Copy the netstack, don't make stuff up.

Understand the two layer separation, first with pick a netdev without
looking at L3 info, then we feed the netdev and L3 info into routing
to complete the process.

> Also, this sounds like a major change in behavior even for applications
> that do not use containers. I think today RDMA CM will accept
> connections even if the ipoib interface is down.

Yes, it is a change in behavior, things move closer to alignment with
how netdev works. We did a similar change to the output side years ago
as well. I guess the input side was missed.

Unless I'm missing something this isn't 'major', these are corner case
conditions/bugs that nobody sane should rely on.

> >  3) Now we can use skb_dst to iterate over the set of all RDMA CM listens:
> >      1) Bound to the skb_dst netdev
> >      2) Unbound in the same namespace as skb_dst netdev
> >     The first to match the dst IP + port is the listen that will accept the
> >     connection, now we go into the cma_new_conn_id path, and we don't
> >     need rdma_translate_ip because we already have the handling netdev.
> You shouldn't be able to bind one listener to a netdev in a namespace
> and also have a different listener listening for any netdev on that same
> namespace. (That is what cma_check_port verifies, right?) So when
> looking for a listener in a namespace there should be only one match.

You know, I don't remember off hand the exact semantics of sockets,
whatever sockets does :)

> > The backwards operation of the current code is part of why this is all
> > so strange looking, and I think is strongly connected to the private
> > data compare issues Sean is talking about. It is very much the wrong
> > flow to look for the RDMA CM listen first, and then try to work
> > backwards to the netdev.

> That's not what the code does. It first finds the netdev and decides on
> the namespace based on that. It then finds the RDMA CM listener in that
> namespace.

I was talking about the current code, but even with the patch set, the
behavior is still backwards:

 1) Find the netdev
 2) Get the namespace, throw away the netdev
 3) Find a CM_ID without a netdev
 4) Find a netdev from the CM_ID

Instead:
 1) Find the netdev
 2) Find a CM_ID compatible with that netdev

> I think I can refactor the series this way, but I don't think we need
> step 2. It seems like an overly complicated way of checking whether a
> netdev is up or not. It doesn't seem to provide any new information over
> what ib_get_net_dev_by_port_pkey_ip does.

If it is hard to do then leave a FIXME comment and go on, if it is
easy, it is the right step.

It does provide more information because
ib_get_net_dev_by_port_pkey_ip should not check the IP.

> For RoCE, you could still have multiple macvlan interfaces using the
> same VLAN, with different IP address. So the unique tuple is
> (Device,Port,VLAN,IP). With Matan adding a netdev to each GID entry in
> the RoCE GID table, I think it would be simpler to find the netdev in RoCE.

That isn't how I understand macvlan, the tuple will be
(Device,Port,MAC,VLAN)

You should never search for an IP, that is not how netdev works.

Jason

  reply	other threads:[~2015-05-28 15:45 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-17  5:50 [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM Haggai Eran
2015-05-17  5:50 ` [PATCH v4 for-next 02/12] IB/addr: Pass network namespace as a parameter Haggai Eran
2015-05-17  5:50 ` [PATCH v4 for-next 03/12] IB/core: Find the network namespace matching connection parameters Haggai Eran
     [not found]   ` <1431841868-28063-4-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:26     ` Jason Gunthorpe
     [not found]       ` <20150519182616.GF18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20 14:48         ` Haggai Eran
2015-05-17  5:51 ` [PATCH v4 for-next 04/12] IB/ipoib: Return IPoIB devices " Haggai Eran
     [not found]   ` <1431841868-28063-5-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:28     ` Jason Gunthorpe
     [not found]       ` <20150519182810.GG18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20 15:17         ` Haggai Eran
2015-05-19 23:55     ` Jason Gunthorpe
     [not found]       ` <20150519235502.GB26634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-21  5:33         ` Haggai Eran
     [not found]           ` <555D6E41.10606-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-21  5:48             ` Or Gerlitz
     [not found]               ` <CAJ3xEMjN+o=vC4abAeG5EuOo3Y1gSyh1qPDseA_aaYmoLWAunw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-21  6:33                 ` Haggai Eran
     [not found]                   ` <555D7C4A.2060708-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-21 10:31                     ` Or Gerlitz
2015-05-21 17:43             ` Jason Gunthorpe
     [not found]               ` <20150521174336.GA6771-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28 11:51                 ` Haggai Eran
2015-05-28 15:45                   ` Jason Gunthorpe [this message]
2015-05-21  5:48         ` Haggai Eran
2015-05-17  5:51 ` [PATCH v4 for-next 05/12] IB/cm: Share listening CM IDs Haggai Eran
2015-05-19 18:35   ` Jason Gunthorpe
     [not found]     ` <20150519183545.GH18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-19 22:35       ` Jason Gunthorpe
     [not found]         ` <20150519223502.GA26324-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-21  8:08           ` Haggai Eran
2015-05-21 17:54             ` Jason Gunthorpe
2015-05-21  7:07       ` Haggai Eran
     [not found] ` <1431841868-28063-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-17  5:50   ` [PATCH v4 for-next 01/12] IB/core: Add rwsem to allow reading device list or client list Haggai Eran
2015-05-17  5:51   ` [PATCH v4 for-next 06/12] IB/cm: Expose service ID in request events Haggai Eran
2015-05-17  5:51   ` [PATCH v4 for-next 07/12] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
2015-05-17  5:51   ` [PATCH v4 for-next 08/12] IB/cma: Add compare_data checks to the RDMA CM module Haggai Eran
2015-05-17  5:51   ` [PATCH v4 for-next 12/12] IB/ucma: Take the network namespace from the process Haggai Eran
2015-05-17  5:51 ` [PATCH v4 for-next 09/12] IB/cma: Separate port allocation to network namespaces Haggai Eran
2015-05-17  5:51 ` [PATCH v4 for-next 10/12] IB/cma: Share CM IDs between namespaces Haggai Eran
2015-05-17  5:51 ` [PATCH v4 for-next 11/12] IB/cma: Add support for network namespaces Haggai Eran
2015-05-19 14:30 ` [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM Yann Droneaud
2015-05-19 14:54   ` Haggai Eran
     [not found]     ` <555B4EBE.7010900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 16:39       ` Parav Pandit
2015-05-19 18:01         ` Haggai Eran
     [not found]           ` <1432058488417.98688-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:42             ` Parav Pandit
2015-05-19 18:38         ` Jason Gunthorpe
     [not found]           ` <20150519183843.GI18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-19 18:44             ` Parav Pandit
     [not found]               ` <CAGgvQNTXAWkQWzBBrQfk39GaCQ2ck63AhgURpYFFBPTbkpx4kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-19 19:20                 ` Jason Gunthorpe
2015-05-26 13:34 ` Doug Ledford
     [not found]   ` <1432647280.28905.107.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-26 16:59     ` Jason Gunthorpe
2015-05-26 17:46       ` Doug Ledford
     [not found]         ` <1432662396.28905.157.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-26 18:47           ` Jason Gunthorpe
2015-05-28 13:22           ` Haggai Eran
2015-05-28 15:46             ` Jason Gunthorpe
     [not found]               ` <20150528154633.GB2962-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 10:07                 ` Haggai Eran
2015-05-28 13:15       ` Haggai Eran
2015-05-26 17:55     ` Christian Benvenuti (benve)
2015-05-28 13:07     ` Haggai Eran
     [not found]       ` <55671309.6080303-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-28 14:07         ` Doug Ledford
     [not found]           ` <1432822057.114391.26.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-28 16:21             ` Or Gerlitz
     [not found]               ` <55674077.5040707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-28 17:43                 ` Jason Gunthorpe
     [not found]                   ` <20150528174337.GA10448-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28 18:22                     ` Doug Ledford
     [not found]                       ` <1432837360.114391.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-28 19:05                         ` Or Gerlitz
     [not found]                           ` <CAJ3xEMh2T5-56rFxWVdct2uAZYW1ZrKivWfS45V-mvhAfwyGaA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-28 21:55                             ` Doug Ledford
     [not found]                               ` <1432850150.114391.56.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-03 10:03                                 ` Haggai Eran
2015-06-03 16:14                                   ` Jason Gunthorpe
     [not found]                                     ` <20150603161447.GC12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 19:05                                       ` Or Gerlitz
2015-06-03 19:53                                         ` Jason Gunthorpe
     [not found]                                           ` <20150603195325.GC7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 20:07                                             ` Or Gerlitz
     [not found]                                               ` <CAJ3xEMiO+hEzOJ2oJ5G-mmBeaX4ZHvUyhNSAzsrRDui6dFjvCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 21:45                                                 ` Jason Gunthorpe
2015-06-04  9:41                                                   ` Haggai Eran
2015-06-04 16:06                                                     ` Jason Gunthorpe
2015-06-03 23:48                                                 ` Jason Gunthorpe
     [not found]                                                   ` <20150603234811.GA15128-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-04  6:24                                                     ` Haggai Eran
     [not found]                                                       ` <556FEF25.80409-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-04 16:40                                                         ` Jason Gunthorpe
     [not found]                                                           ` <20150604164058.GB27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-08  7:52                                                             ` Haggai Eran
2015-06-08 16:53                                                               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150528154501.GA2962@obsidianresearch.com \
    --to=jgunthorpe@obsidianresearch.com \
    --cc=dledford@redhat.com \
    --cc=guysh@mellanox.com \
    --cc=haggaie@mellanox.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=liranl@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=raindel@mellanox.com \
    --cc=sean.hefty@intel.com \
    --cc=yotamke@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox