From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Paasch Subject: Re: [PATCH net-next 02/13] sk_buff: add skb extension infrastructure Date: Thu, 13 Dec 2018 09:00:21 -0800 Message-ID: <20181213170021.GZ41383@MacBook-Pro-19.local> References: <20181210145006.19098-1-fw@strlen.de> <20181210145006.19098-3-fw@strlen.de> <4a43ca01-d09b-71e8-18e1-9a5707787ae0@gmail.com> <20181212184446.gyjwbwoyzhrk7kxw@breakpoint.cc> <7906a8d9-c2f9-0883-3d13-5ef38e53f10c@gmail.com> <20181212205236.sja4fw4sf5egtkyw@breakpoint.cc> <8ae68a38-c981-f317-39b1-1092e7efbeeb@gmail.com> <20181213092706.fq4mulrp73r2wpq2@breakpoint.cc> <46920c3f-564d-e9ef-f714-22f96239736d@gmail.com> <20181213103918.rfh3battqdn7u6b6@breakpoint.cc> Mime-Version: 1.0 Content-Type: text/plain; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT Cc: Eric Dumazet , netdev@vger.kernel.org, peter.krystad@intel.com, mathew.j.martineau@linux.intel.com To: Florian Westphal Return-path: Received: from nwk-aaemail-lapp03.apple.com ([17.151.62.68]:52478 "EHLO nwk-aaemail-lapp03.apple.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729339AbeLMRBx (ORCPT ); Thu, 13 Dec 2018 12:01:53 -0500 Content-disposition: inline In-reply-to: <20181213103918.rfh3battqdn7u6b6@breakpoint.cc> Sender: netdev-owner@vger.kernel.org List-ID: On 13/12/18 - 11:39:18, Florian Westphal wrote: > Eric Dumazet wrote: > > > If its going to be used as I expect, then the extension could be > > > discarded after the DSS mapping has been written to the tcp option > > > space, i.e. before cloning occurs. > > > > I do not see how this would work, without also discarding on the master skb > > the needed info. > > Ok, so lets assume this would result in one atomic_inc/dec due to clone > for now for skbs coming from mptcp socket. > > But I don't see why this would have to be. > > > > For TCP, thats true. But there are other places that could clone, e.g. > > > when bridge has to flood-forward. > > > > > > > So you propose a mechanism that forces a preserve on clone, base on existing needs > > for bridging. > > secpath does the same thing: > > static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old) > { > ... > #ifdef CONFIG_XFRM > new->sp = secpath_get(old->sp); > #endif > ... > > So I am not proposing anything new. > > > > At least in bridge case the 'preseve on clone' is needed, else required > > > information is missing from the cloned skb. > > > > > > > We need something where MPTCP info does not need to be propagated all the way to the NIC... > > Thats whats done in the MPTCP out-of-tree implementation, but I don't > think its needed. Yes, it indeed does not need to go all the way down to the NIC. The info basically "just" needs to be propagated from the MPTCP-layer down to the TCP-option space. Thus, it needs to remain on the skbs that are sitting in the TCP-subflow's send-queue and rexmit tree as we need it when retransmitting. In tcp_transmit_skb, the clone is done at the beginning. Thus, we could for example not inc the refcount on the clone and simply pass a pointer to the original skb to tcp_established_options. That way it the DSS option stays within the MPTCP/TCP layer and does not make it down to the NIC. Christoph > > It could just delete the extension before ->queue_xmit() AFAIU. > > > This skb extension is an incentive for adding more sticky things in the skbs > > to violate layering of networking stacks :/ > > 8-( > > Where do you see "layering violations"?