From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C2EB71 for ; Wed, 19 May 2021 18:06:03 +0000 (UTC) IronPort-SDR: q2MjqYxXjy1BuwSZoTeSDbbTYuOfQYNcIkjDyrMtGsGncAvcGOtuQY6cZ9E/9MeA7YUgrD8lrD syswBRINfQPQ== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="264961023" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="264961023" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2021 11:06:02 -0700 IronPort-SDR: 0StTh8U+dGt/hBCLBAzMmhdkcLXbMRczJWrP/Ykn5EbVsSuQZyaOY0/tC+80g5c/7Mrh5+6QiC O8fKxnJIWQwQ== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="474731722" Received: from rmacfarl-mobl1.amr.corp.intel.com ([10.255.228.18]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2021 11:06:01 -0700 Date: Wed, 19 May 2021 11:06:01 -0700 (PDT) From: Mat Martineau To: Paolo Abeni cc: Florian Westphal , Geliang Tang , mptcp@lists.linux.dev Subject: Re: [MPTCP][PATCH v6 mptcp-next 0/6] add MP_CAPABLE 'C' flag In-Reply-To: Message-ID: <56ee8dc0-e1b1-84c1-acbd-154cc24ee098@linux.intel.com> References: <10c0783f0b81ab279dbbf202033295e7891c18d6.camel@redhat.com> <11169a25-2e4-a879-2c92-bad09a28a18@linux.intel.com> X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed On Wed, 19 May 2021, Paolo Abeni wrote: > On Tue, 2021-05-18 at 12:07 -0700, Mat Martineau wrote: >> On Tue, 18 May 2021, Paolo Abeni wrote: >>> The in-kernel path manager is want to be as simple as possible. As soon >>> as we start allocating memory on peer req it's very easy to be prone to >>> DoS. Memory accounting and limits could save us or at least mitigate >>> the problem, but they will add more complexity. >> >> My assumption was that high-connection-count servers would probably have >> their limits set in a way that ADD_ADDRs would not be accepted anyway. >> This is the use case where clients may be behind a NAT and the server is >> accepting MP_JOINs from the client (but the server is never sending >> MP_JOINs), and the server is possibly sending ADD_ADDRs to the client. The >> configurable limit is already there from the existing in-kernel PM code. >> >> I do agree with "as simple as possible", but the RFC requirements are >> constraining that. Even though the C bit is ignored in Linux >> implementations so far, I don't think it should stay that way. The current >> code always tries to connect to the initial subflow, so we need to do >> *something* different to remember an alternate peer address. There are >> other options (like remembering only the most recent ADD_ADDR), but given >> the configurable limits we have it doesn't seem that different >> memory-wise. What's important is implementing this RFC functionality well, >> so it's really helpful to explore the alternatives here - and your >> understanding of the resource issues and DoS tradeoffs is very useful. > > Uhmmm... I start feeling I misunderstand the RFC WRT to 'C' bit > handling. > > """ > The third bit, labeled "C", is set to 1 to indicate > that the sender of this option will not accept > additional MPTCP subflows to the source address and > port, and therefore the receiver MUST NOT try to open > any additional subflows toward this address and port. > """ > > To avoid creating additional subflows towards the MPC server > address/port, we don't need to store any additional information beyond > the 'C' flag: net-next uses that pair only for new subflow with > 'subflow' type (yep, the type name is not very self-explaining :( > > We just need to forbit them: > > --- > diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c > index d094588afad8..2ae9fac00623 100644 > --- a/net/mptcp/pm_netlink.c > +++ b/net/mptcp/pm_netlink.c > @@ -451,7 +451,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) > > /* check if should create a new subflow */ > if (msk->pm.local_addr_used < local_addr_max && > - msk->pm.subflows < subflows_max) { > + msk->pm.subflows < subflows_max && !msk->pm.remote_deny_join_id0) { > local = select_local_address(pernet, msk); > if (local) { > struct mptcp_addr_info remote = { 0 }; > > --- > >>> Just does not seam worthy to me, I really think we should start with >>> the bare minumum to respect the RFC itself, avoiding as much complexity >>> as possible: >>> - setting C always to 0 on server >>> - when the peer send C==1, cache the MPC addr/port, and do not attempt >>> other connection to such endpoint. >>> >>> I'm not even sure the latter step is strictly necessary. >> >> For the former, it is certainly simpler to always send C==0, but sending >> C==1 from a high-connection-count server behind a load balancer is the >> primary use case. > > I *think* it would be far better (and possibly not that complex!!! > @Florian: WDYT? ) implementing an MPTCP aware load balancer. The MPJ > syn pkt carries the remote token, so it's possible for the load > balancer to make the "correct" decision. > >> I think the "MUST NOT" wording of the RFC does require the latter step >> (server or client), and that implies a minimally functional in-kernel PM >> would need to track some other address to connect to. > > This part is unclear to me. Why should the kernel cache more > addresses? > > It's already capabale of using as many ADD_ADDRs as the peer provides > with the current options handling. Sure, there are some constraints. > The stricter one posed by the current implementation is that the kernel > fully processes (up to subflow creation) a given ADD_ADDR option before > using another one. > > That is useful to avoid transforming an MPTCP client in a DDOS tool > controlled by an (evil) MPTCP server, and sounds like a reasonable > tread-off between complexity and flexibility. > > Looks like we have some topic for tomorrow's mtg :) > Yeah, I think it will be far easier to discuss in that format! -- Mat Martineau Intel