From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
To: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
"Hefty,
Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH v4 for-next 04/12] IB/ipoib: Return IPoIB devices matching connection parameters
Date: Thu, 21 May 2015 11:43:36 -0600 [thread overview]
Message-ID: <20150521174336.GA6771@obsidianresearch.com> (raw)
In-Reply-To: <555D6E41.10606-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
On Thu, May 21, 2015 at 08:33:53AM +0300, Haggai Eran wrote:
> To create a new child interface on the default P_Key, its possible to
> use iproute:
> # ip link add link ib0 name ib0.1 type ipoib
Uh..
A key invariant of the IP stack is that is it possible to uniquely
identify the ingress device.
So the above scheme is fine for IPoIB, because it uses the interface
unique QPN to uniquely ID the netdevice: (Device,Port,Pkey,QPN)
is the unique ID tuple. The world is happy.
But RDMA CM doesn't provide the QPN. So when RDMA CM searches the
netdevs for an address it cannot *uniquely* map to a IPoIB interface.
This is bad, and *completely wrong*, but today, nobody is going to
really notice or care. The cases where it does something you don't
want are not very significant.
But with containers.. Think this through for a minute: 'In some cases
the RDMA CM selecs the wrong child' - that goes from being a minor
annoyance to a violation of containment! Worse the criteria for
'selects the wrong child' can be triggered by the contained users. Eg
the contained user adds a IP to their child that duplicates another
container. Now we've lost control.
The very idea of ib_get_net_dev_by_port_pkey_ip is broken.
So, I don't know what to say here.. Ideas?
1) Forbid creating more than one pkey per ipoib interface?
2) Somehow extend the RDMA CM to send the IPoIB qpn too?
3) ??
Right now the only case that comes to mind is duplicating IPs, that is
already going to cause an ARP collision, so maybe having the RDMA CM
randomly select an IP is not the end of the world... But with
containers and security, who knows? I'm not confident I've
exhaustively thought of all possibilities here.
----------
Anyhow.. looking again through this series and the existing code, the
flow is wrong, and really needs to be changed before this starts to
make sense to anyone, and is no doubt part of how we got here..
When a REQ arrives RDMA CM needs to run down these steps (this is identical
to what ip_input.c does)
1) Locate the netdev associated with the ingress of the packet,
in a sane world this is done by only checking the
unique (Device,Port,Pkey,QPN) tuple.
If we keep our brokeness, we'd do this based on
(Device,Port,Pkey,IP) - if there are IP collisions then randomly
select a netdev (similar to how ARP collision is handled).
2) Then we do the ip_route_input_noref step, this will set skb_dst to
the netdev that will handle the packet, or tell us to drop it.
This is not always the same as the netdev that accepts the
packet!!!
NOTE: This route step is missing today, it does critical things
like check that the node is actually listening on the dest IP!
3) Now we can use skb_dst to iterate over the set of all RDMA CM listens:
1) Bound to the skb_dst netdev
2) Unbound in the same namespace as skb_dst netdev
The first to match the dst IP + port is the listen that will accept the
connection, now we go into the cma_new_conn_id path, and we don't
need rdma_translate_ip because we already have the handling netdev.
The backwards operation of the current code is part of why this is all
so strange looking, and I think is strongly connected to the private
data compare issues Sean is talking about. It is very much the wrong
flow to look for the RDMA CM listen first, and then try to work
backwards to the netdev.
The above 3 steps hard require that the ib_cm and rdma_cm
maintain different listening lists, because we need the 2nd search in
#3. So this gets the ib_cm completely out of looking at the private
data. [And now we can think carefully about the best way to refcount
the listens in RDMA CM]
Once the above is cleaned up dropping in namespaces should just
happen naturally.
So.. I'm going to suggest you make a cleanup series to fix this:
- Introduce the ib_get_net_dev_by_port_pkey_ip and document the
breakage it represents
- Rework rdma_cm to do steps 1,2,3 above, using
ib_get_net_dev_by_port_pkey_ip to drive #1, your patch set is
already doing about 50% of this change.
- Fix the collateral damage
As I said to Matan, clean up series like this should not introduce
major functional changes, so stack the few remaining net namespace
things in another series after it.
** The same comments apply to RoCE too, but for RoCE step #1 works
properly based on the (Device,Port,VLAN) unique tuple
Ditto for iWarp **
I think this also moves to address Sean's concern about generality, at
least for listen. All three protocols will run down the same common
code to locate the netdev and find the correct RDMA CM listen. The
only difference is the 'ib_get_net_dev_by_port_pkey_ip' call varies.
.. I also wouldn't mind seeing the giant cma.c split, if it makes
sense to have a cma_listen.c, for instance, wouldn't that be nice?
Sorry Haggi, this is a big change to your patchset :(
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-05-21 17:43 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-17 5:50 [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM Haggai Eran
[not found] ` <1431841868-28063-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-17 5:50 ` [PATCH v4 for-next 01/12] IB/core: Add rwsem to allow reading device list or client list Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 06/12] IB/cm: Expose service ID in request events Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 07/12] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 08/12] IB/cma: Add compare_data checks to the RDMA CM module Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 12/12] IB/ucma: Take the network namespace from the process Haggai Eran
2015-05-17 5:50 ` [PATCH v4 for-next 02/12] IB/addr: Pass network namespace as a parameter Haggai Eran
2015-05-17 5:50 ` [PATCH v4 for-next 03/12] IB/core: Find the network namespace matching connection parameters Haggai Eran
[not found] ` <1431841868-28063-4-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:26 ` Jason Gunthorpe
[not found] ` <20150519182616.GF18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20 14:48 ` Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 04/12] IB/ipoib: Return IPoIB devices " Haggai Eran
[not found] ` <1431841868-28063-5-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:28 ` Jason Gunthorpe
[not found] ` <20150519182810.GG18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20 15:17 ` Haggai Eran
2015-05-19 23:55 ` Jason Gunthorpe
[not found] ` <20150519235502.GB26634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-21 5:33 ` Haggai Eran
[not found] ` <555D6E41.10606-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-21 5:48 ` Or Gerlitz
[not found] ` <CAJ3xEMjN+o=vC4abAeG5EuOo3Y1gSyh1qPDseA_aaYmoLWAunw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-21 6:33 ` Haggai Eran
[not found] ` <555D7C4A.2060708-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-21 10:31 ` Or Gerlitz
2015-05-21 17:43 ` Jason Gunthorpe [this message]
[not found] ` <20150521174336.GA6771-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28 11:51 ` Haggai Eran
2015-05-28 15:45 ` Jason Gunthorpe
2015-05-21 5:48 ` Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 05/12] IB/cm: Share listening CM IDs Haggai Eran
2015-05-19 18:35 ` Jason Gunthorpe
[not found] ` <20150519183545.GH18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-19 22:35 ` Jason Gunthorpe
[not found] ` <20150519223502.GA26324-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-21 8:08 ` Haggai Eran
2015-05-21 17:54 ` Jason Gunthorpe
2015-05-21 7:07 ` Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 09/12] IB/cma: Separate port allocation to network namespaces Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 10/12] IB/cma: Share CM IDs between namespaces Haggai Eran
2015-05-17 5:51 ` [PATCH v4 for-next 11/12] IB/cma: Add support for network namespaces Haggai Eran
2015-05-19 14:30 ` [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM Yann Droneaud
2015-05-19 14:54 ` Haggai Eran
[not found] ` <555B4EBE.7010900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 16:39 ` Parav Pandit
2015-05-19 18:01 ` Haggai Eran
[not found] ` <1432058488417.98688-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:42 ` Parav Pandit
2015-05-19 18:38 ` Jason Gunthorpe
[not found] ` <20150519183843.GI18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-19 18:44 ` Parav Pandit
[not found] ` <CAGgvQNTXAWkQWzBBrQfk39GaCQ2ck63AhgURpYFFBPTbkpx4kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-19 19:20 ` Jason Gunthorpe
2015-05-26 13:34 ` Doug Ledford
[not found] ` <1432647280.28905.107.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-26 16:59 ` Jason Gunthorpe
2015-05-26 17:46 ` Doug Ledford
[not found] ` <1432662396.28905.157.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-26 18:47 ` Jason Gunthorpe
2015-05-28 13:22 ` Haggai Eran
2015-05-28 15:46 ` Jason Gunthorpe
[not found] ` <20150528154633.GB2962-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 10:07 ` Haggai Eran
2015-05-28 13:15 ` Haggai Eran
2015-05-26 17:55 ` Christian Benvenuti (benve)
2015-05-28 13:07 ` Haggai Eran
[not found] ` <55671309.6080303-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-28 14:07 ` Doug Ledford
[not found] ` <1432822057.114391.26.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-28 16:21 ` Or Gerlitz
[not found] ` <55674077.5040707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-28 17:43 ` Jason Gunthorpe
[not found] ` <20150528174337.GA10448-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28 18:22 ` Doug Ledford
[not found] ` <1432837360.114391.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-28 19:05 ` Or Gerlitz
[not found] ` <CAJ3xEMh2T5-56rFxWVdct2uAZYW1ZrKivWfS45V-mvhAfwyGaA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-28 21:55 ` Doug Ledford
[not found] ` <1432850150.114391.56.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-03 10:03 ` Haggai Eran
2015-06-03 16:14 ` Jason Gunthorpe
[not found] ` <20150603161447.GC12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 19:05 ` Or Gerlitz
2015-06-03 19:53 ` Jason Gunthorpe
[not found] ` <20150603195325.GC7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 20:07 ` Or Gerlitz
[not found] ` <CAJ3xEMiO+hEzOJ2oJ5G-mmBeaX4ZHvUyhNSAzsrRDui6dFjvCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 21:45 ` Jason Gunthorpe
2015-06-04 9:41 ` Haggai Eran
2015-06-04 16:06 ` Jason Gunthorpe
2015-06-03 23:48 ` Jason Gunthorpe
[not found] ` <20150603234811.GA15128-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-04 6:24 ` Haggai Eran
[not found] ` <556FEF25.80409-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-04 16:40 ` Jason Gunthorpe
[not found] ` <20150604164058.GB27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-08 7:52 ` Haggai Eran
2015-06-08 16:53 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150521174336.GA6771@obsidianresearch.com \
--to=jgunthorpe-epgobjl8dl3ta4ec/59zmfatqe2ktcn/@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox