From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27099C0650E for ; Sun, 7 Jul 2019 09:34:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C7AA820665 for ; Sun, 7 Jul 2019 09:34:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RX5/1rTr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727239AbfGGJeu (ORCPT ); Sun, 7 Jul 2019 05:34:50 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:34033 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725822AbfGGJeu (ORCPT ); Sun, 7 Jul 2019 05:34:50 -0400 Received: by mail-qt1-f193.google.com with SMTP id k10so7641427qtq.1 for ; Sun, 07 Jul 2019 02:34:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=wPOV8xL8pj4Px0S4iOeNHsj59y051UjB79Vy7COW/+w=; b=RX5/1rTrLjyQs9DURDNuiucVFQ4z1IV71xsU0eNTF34IzM9nOqmonAEJaWtgZ2yEDZ kj/R6GR4Pb7+0sWMoOyLZb6uIfYLlw9Z0QysqAHhFx+jerDmBiXTb6bIuXav6vzoUxGo Q6+6KM1LoBU81+aXOXvmSRpqwmZv3JVBFEdTddDaJDCq2OaNcpbjiU6uRSOIUmQBDjfe bwckMdCOZMe8oPBhglavzxGHizZpqJGtAAMD5AcNsj/Ecuroh0AmEy7SCvL9tc2nXR2+ 16dSoD/E0Q+fClqJcG7Ta4dzyPEtOByjVBP/Hn2rtQSCCT4fyx9NLktphxPFjVKImx2m wGFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=wPOV8xL8pj4Px0S4iOeNHsj59y051UjB79Vy7COW/+w=; b=dFX3Aqki5xG136ZN5+3e/pGesAk7Iqih5o+se9PzHePiCZRKhGdG+ZdMPvvGlyBJmp gxKRAaildTJpm8f5liVHFPK7elC1iMiXkaXfq0Vtt02UwaDH0Qd6Gx3CDh3gmcrB+Zbo PrSXjcHdRIQ/Sfj48LS4MJMYHWRKL9Q2ZBtbE/AGvUXNiPVSho2Pp91zgZXA5ClvHtFe xTq71WnG2zk8dBE66MptJ0LfU+99dheoHxAiU6qKa3flLDmAMTR7zsOF/jOY9Wr3REnA FJ88T3dW3QCAwYKXdAi2enzE8U0W7gybR9RkJRYRGI3hqklC0ssHCEq3N8XSl3iUT+ob SA5A== X-Gm-Message-State: APjAAAX67sTUiG6H26UnvrADArFcJT7hgy9SPQeg+RdRm+J4bdCymsh8 ibOUW0eIQHMh4JhIz9ao55MS04djrgY= X-Google-Smtp-Source: APXvYqwGsjFLyyyWy/x59r58ZXlLKwisEPOISVviaLxBexQ3CPYCGJKkujAQP3qn+Ys1hSiKKgI5hA== X-Received: by 2002:ad4:5311:: with SMTP id y17mr10417455qvr.1.1562492088487; Sun, 07 Jul 2019 02:34:48 -0700 (PDT) Received: from willemb.nyc.corp.google.com ([2620:0:1003:315:55eb:ba49:fa6:b843]) by smtp.gmail.com with ESMTPSA id k123sm5791496qkf.13.2019.07.07.02.34.47 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 07 Jul 2019 02:34:47 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, Willem de Bruijn Subject: [PATCH net-next] ipv6: elide flowlabel check if no exclusive leases exist Date: Sun, 7 Jul 2019 05:34:45 -0400 Message-Id: <20190707093445.15121-1-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO. If not set, by default an autogenerated flowlabel is selected. Explicit flowlabels require a control operation per label plus a datapath check on every connection (every datagram if unconnected). This is particularly expensive on unconnected sockets multiplexing many flows, such as QUIC. In the common case, where no lease is exclusive, the check can be safely elided, as both lease request and check trivially succeed. Indeed, autoflowlabel does the same even with exclusive leases. Elide the check if no process has requested an exclusive lease. fl6_sock_lookup previously returns either a reference to a lease or NULL to denote failure. Modify to return a real error and update all callers. On return NULL, they can use the label and will elide the atomic_dec in fl6_sock_release. This is an optimization. Robust applications still have to revert to requesting leases if the fast path fails due to an exclusive lease. Changes RFC->v1: - use static_key_false_deferred to rate limit jump label operations - call static_key_deferred_flush to stop timers on exit - move decrement out of RCU context - defer optimization also if opt data is associated with a lease - updated all fp6_sock_lookup callers, not just udp Signed-off-by: Willem de Bruijn --- include/net/ipv6.h | 14 +++++++++++++- net/dccp/ipv6.c | 2 +- net/ipv6/ip6_flowlabel.c | 27 +++++++++++++++++++++++---- net/ipv6/raw.c | 4 ++-- net/ipv6/tcp_ipv6.c | 2 +- net/ipv6/udp.c | 4 ++-- net/l2tp/l2tp_ip6.c | 4 ++-- net/sctp/ipv6.c | 2 +- 8 files changed, 45 insertions(+), 14 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 8eca5fb30376f..8dfc65639aa4c 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -389,7 +390,18 @@ static inline void txopt_put(struct ipv6_txoptions *opt) kfree_rcu(opt, rcu); } -struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label); +struct ip6_flowlabel *__fl6_sock_lookup(struct sock *sk, __be32 label); + +extern struct static_key_false_deferred ipv6_flowlabel_exclusive; +static inline struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, + __be32 label) +{ + if (static_branch_unlikely(&ipv6_flowlabel_exclusive.key)) + return __fl6_sock_lookup(sk, label) ? : ERR_PTR(-ENOENT); + + return NULL; +} + struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space, struct ip6_flowlabel *fl, struct ipv6_txoptions *fopt); diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 85c10c8f50bd1..1b7381ff787b3 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -830,7 +830,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if (fl6.flowlabel & IPV6_FLOWLABEL_MASK) { struct ip6_flowlabel *flowlabel; flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (flowlabel == NULL) + if (IS_ERR(flowlabel)) return -EINVAL; fl6_sock_release(flowlabel); } diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c index 545e339b8c4fb..ad284b1fd308a 100644 --- a/net/ipv6/ip6_flowlabel.c +++ b/net/ipv6/ip6_flowlabel.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -53,6 +54,9 @@ static DEFINE_SPINLOCK(ip6_fl_lock); static DEFINE_SPINLOCK(ip6_sk_fl_lock); +DEFINE_STATIC_KEY_DEFERRED_FALSE(ipv6_flowlabel_exclusive, HZ); +EXPORT_SYMBOL(ipv6_flowlabel_exclusive); + #define for_each_fl_rcu(hash, fl) \ for (fl = rcu_dereference_bh(fl_ht[(hash)]); \ fl != NULL; \ @@ -90,6 +94,13 @@ static struct ip6_flowlabel *fl_lookup(struct net *net, __be32 label) return fl; } +static bool fl_shared_exclusive(struct ip6_flowlabel *fl) +{ + return fl->share == IPV6_FL_S_EXCL || + fl->share == IPV6_FL_S_PROCESS || + fl->share == IPV6_FL_S_USER; +} + static void fl_free_rcu(struct rcu_head *head) { struct ip6_flowlabel *fl = container_of(head, struct ip6_flowlabel, rcu); @@ -103,8 +114,13 @@ static void fl_free_rcu(struct rcu_head *head) static void fl_free(struct ip6_flowlabel *fl) { - if (fl) - call_rcu(&fl->rcu, fl_free_rcu); + if (!fl) + return; + + if (fl_shared_exclusive(fl) || fl->opt) + static_branch_slow_dec_deferred(&ipv6_flowlabel_exclusive); + + call_rcu(&fl->rcu, fl_free_rcu); } static void fl_release(struct ip6_flowlabel *fl) @@ -240,7 +256,7 @@ static struct ip6_flowlabel *fl_intern(struct net *net, /* Socket flowlabel lists */ -struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label) +struct ip6_flowlabel *__fl6_sock_lookup(struct sock *sk, __be32 label) { struct ipv6_fl_socklist *sfl; struct ipv6_pinfo *np = inet6_sk(sk); @@ -260,7 +276,7 @@ struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label) rcu_read_unlock_bh(); return NULL; } -EXPORT_SYMBOL_GPL(fl6_sock_lookup); +EXPORT_SYMBOL_GPL(__fl6_sock_lookup); void fl6_free_socklist(struct sock *sk) { @@ -419,6 +435,8 @@ fl_create(struct net *net, struct sock *sk, struct in6_flowlabel_req *freq, } fl->dst = freq->flr_dst; atomic_set(&fl->users, 1); + if (fl_shared_exclusive(fl) || fl->opt) + static_branch_deferred_inc(&ipv6_flowlabel_exclusive); switch (fl->share) { case IPV6_FL_S_EXCL: case IPV6_FL_S_ANY: @@ -854,6 +872,7 @@ int ip6_flowlabel_init(void) void ip6_flowlabel_cleanup(void) { + static_key_deferred_flush(&ipv6_flowlabel_exclusive); del_timer(&ip6_fl_gc_timer); unregister_pernet_subsys(&ip6_flowlabel_net_ops); } diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 70693bc7ad9d2..8a6131991e38f 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -834,7 +834,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) fl6.flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK; if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) { flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (!flowlabel) + if (IS_ERR(flowlabel)) return -EINVAL; } } @@ -876,7 +876,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) } if ((fl6.flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) { flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (!flowlabel) + if (IS_ERR(flowlabel)) return -EINVAL; } if (!(opt->opt_nflen|opt->opt_flen)) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4f3f99b398209..d56a9019a0feb 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -171,7 +171,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) { struct ip6_flowlabel *flowlabel; flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (!flowlabel) + if (IS_ERR(flowlabel)) return -EINVAL; fl6_sock_release(flowlabel); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 4406e059da680..827fe73850788 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1319,7 +1319,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) fl6.flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK; if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) { flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (!flowlabel) + if (IS_ERR(flowlabel)) return -EINVAL; } } @@ -1371,7 +1371,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) } if ((fl6.flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) { flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (!flowlabel) + if (IS_ERR(flowlabel)) return -EINVAL; } if (!(opt->opt_nflen|opt->opt_flen)) diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index 1a76a0a4e3abb..687e23a8b3266 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -536,7 +536,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) fl6.flowlabel = lsa->l2tp_flowinfo & IPV6_FLOWINFO_MASK; if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) { flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (flowlabel == NULL) + if (IS_ERR(flowlabel)) return -EINVAL; } } @@ -577,7 +577,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) } if ((fl6.flowlabel & IPV6_FLOWLABEL_MASK) && !flowlabel) { flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); - if (flowlabel == NULL) + if (IS_ERR(flowlabel)) return -EINVAL; } if (!(opt->opt_nflen|opt->opt_flen)) diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 64e0a594a6516..e5f2fc726a983 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -253,7 +253,7 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr, struct ip6_flowlabel *flowlabel; flowlabel = fl6_sock_lookup(sk, fl6->flowlabel); - if (!flowlabel) + if (IS_ERR(flowlabel)) goto out; fl6_sock_release(flowlabel); } -- 2.22.0.410.gd8fdbe21b5-goog