From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Nambiar, Amritha" Subject: Re: [net-next PATCH v2 2/4] net: Enable Tx queue selection based on Rx queues Date: Wed, 23 May 2018 12:19:26 -0700 Message-ID: <93bd35dc-59d7-7922-becd-fb77c4a1a0e6@intel.com> References: <152643356116.4991.7215767041139726872.stgit@anamdev.jf.intel.com> <152643400925.4991.5029989601625953592.stgit@anamdev.jf.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Linux Kernel Network Developers , "David S. Miller" , Alexander Duyck , Sridhar Samudrala , Eric Dumazet , Hannes Frederic Sowa To: Willem de Bruijn , Tom Herbert Return-path: Received: from mga03.intel.com ([134.134.136.65]:40797 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934090AbeEWTT2 (ORCPT ); Wed, 23 May 2018 15:19:28 -0400 In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 5/19/2018 1:13 PM, Willem de Bruijn wrote: > On Fri, May 18, 2018 at 12:03 AM, Tom Herbert wrote: >> On Tue, May 15, 2018 at 6:26 PM, Amritha Nambiar >> wrote: >>> This patch adds support to pick Tx queue based on the Rx queue map >>> configuration set by the admin through the sysfs attribute >>> for each Tx queue. If the user configuration for receive >>> queue map does not apply, then the Tx queue selection falls back >>> to CPU map based selection and finally to hashing. >>> >>> Signed-off-by: Amritha Nambiar >>> Signed-off-by: Sridhar Samudrala >>> --- >>> include/net/sock.h | 18 ++++++++++++++++++ >>> net/core/dev.c | 36 +++++++++++++++++++++++++++++------- >>> net/core/sock.c | 5 +++++ >>> net/ipv4/tcp_input.c | 7 +++++++ >>> net/ipv4/tcp_ipv4.c | 1 + >>> net/ipv4/tcp_minisocks.c | 1 + >>> 6 files changed, 61 insertions(+), 7 deletions(-) >>> >>> diff --git a/include/net/sock.h b/include/net/sock.h >>> index 4f7c584..0613f63 100644 >>> --- a/include/net/sock.h >>> +++ b/include/net/sock.h >>> @@ -139,6 +139,8 @@ typedef __u64 __bitwise __addrpair; >>> * @skc_node: main hash linkage for various protocol lookup tables >>> * @skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol >>> * @skc_tx_queue_mapping: tx queue number for this connection >>> + * @skc_rx_queue_mapping: rx queue number for this connection >>> + * @skc_rx_ifindex: rx ifindex for this connection >>> * @skc_flags: place holder for sk_flags >>> * %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, >>> * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings >>> @@ -215,6 +217,10 @@ struct sock_common { >>> struct hlist_nulls_node skc_nulls_node; >>> }; >>> int skc_tx_queue_mapping; >>> +#ifdef CONFIG_XPS >>> + int skc_rx_queue_mapping; >>> + int skc_rx_ifindex; >> >> Isn't this increasing size of sock_common for a narrow use case functionality? > > You can get the device from the already recorded sk_napi_id. > Sadly, not the queue number as far as I can see. > I plan to not have the ifindex cached in the sock_common, but retain the rx_queue only. This way, it'll look similar to skb_tx_hash where rx_queue recorded is used and if not, fall through to flow hash calculation. Likewise, we use the rx_queue mapped and fall through to CPU map on failures. > >>> +static inline void sk_mark_rx_queue(struct sock *sk, struct sk_buff *skb) >>> +{ >>> +#ifdef CONFIG_XPS >>> + sk->sk_rx_ifindex = skb->skb_iif; >>> + sk->sk_rx_queue_mapping = skb_get_rx_queue(skb); >>> +#endif >>> +} >>> + > > Instead of adding this function and calls to it in many locations in > the stack, you can expand sk_mark_napi_id. > > Also, it is not clear why this should be called in locations where > sk_mark_napi_id is not. > Makes sense, I will add this as part of sk_mark_napi_id. > >>> +static int get_xps_queue(struct net_device *dev, struct sk_buff *skb) >>> +{ >>> +#ifdef CONFIG_XPS >>> + enum xps_map_type i = XPS_MAP_RXQS; >>> + struct xps_dev_maps *dev_maps; >>> + struct sock *sk = skb->sk; >>> + int queue_index = -1; >>> + unsigned int tci = 0; >>> + >>> + if (sk && sk->sk_rx_queue_mapping <= dev->real_num_rx_queues && >>> + dev->ifindex == sk->sk_rx_ifindex) >>> + tci = sk->sk_rx_queue_mapping; >>> + >>> + rcu_read_lock(); >>> + while (queue_index < 0 && i < __XPS_MAP_MAX) { >>> + if (i == XPS_MAP_CPUS) >> >> This while loop typifies exactly why I don't think the XPS maps should >> be an array. > > +1 > Okay, I will change this to two maps with separate pointers.