netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Florian Westphal <fw@strlen.de>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	netdev@vger.kernel.org, Pablo Neira Ayuso <pablo@netfilter.org>,
	Cong Wang <amwang@redhat.com>,
	"Patrick McHardy" <kaber@trash.net>,
	Herbert Xu <herbert@gondor.hengli.com.au>,
	Daniel Borkmann <dborkman@redhat.com>
Subject: [net-next PATCH V2 5/6] net: use lib/percpu_counter API for fragmentation mem accounting
Date: Tue, 29 Jan 2013 10:45:33 +0100	[thread overview]
Message-ID: <20130129094517.13513.12103.stgit@dragon> (raw)
In-Reply-To: <20130129094331.13513.28377.stgit@dragon>

Replace the per network namespace shared atomic "mem" accounting
variable, in the fragmentation code, with a lib/percpu_counter.

Getting percpu_counter to scale to the fragmentation code usage
requires some tweaks.

At first view, percpu_counter looks superfast, but it does not
scale on multi-CPU/NUMA machines, because the default batch size
is too small, for frag code usage.  Thus, I have adjusted the
batch size by using __percpu_counter_add() directly, instead of
percpu_counter_sub() and percpu_counter_add().

The batch size is increased to 130.000, based on the largest 64K
fragment memory usage.  This does introduce some imprecise
memory accounting, but its does not need to be strict for this
use-case.

It is also essential, that the percpu_counter, does not
share cacheline with other writers, to make this scale.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V2:
 - Remove unrelated change/comment in include/linux/percpu_counter.h

 include/net/inet_frag.h  |   26 ++++++++++++++++++--------
 net/ipv4/inet_fragment.c |    2 ++
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index f2fabc2..e0eec74 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -1,14 +1,17 @@
 #ifndef __NET_FRAG_H__
 #define __NET_FRAG_H__
 
+#include <linux/percpu_counter.h>
+
 struct netns_frags {
 	int			nqueues;
 	struct list_head	lru_list;
 
-	/* Its important for performance to keep lru_list and mem on
-	 * separate cachelines
+	/* The percpu_counter "mem" need to be cacheline aligned.
+	 *  mem.count must not share cacheline with other writers
 	 */
-	atomic_t		mem ____cacheline_aligned_in_smp;
+	struct percpu_counter   mem ____cacheline_aligned_in_smp;
+
 	/* sysctls */
 	int			timeout;
 	int			high_thresh;
@@ -81,29 +84,36 @@ static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f
 
 /* Memory Tracking Functions. */
 
+/* The default percpu_counter batch size is not big enough to scale to
+ * fragmentation mem acct sizes.
+ * The mem size of a 64K fragment is approx:
+ *  (44 fragments * 2944 truesize) + frag_queue struct(200) = 129736 bytes
+ */
+static unsigned int frag_percpu_counter_batch = 130000;
+
 static inline int frag_mem_limit(struct netns_frags *nf)
 {
-	return atomic_read(&nf->mem);
+	return percpu_counter_read(&nf->mem);
 }
 
 static inline void sub_frag_mem_limit(struct inet_frag_queue *q, int i)
 {
-	atomic_sub(i, &q->net->mem);
+	__percpu_counter_add(&q->net->mem, -i, frag_percpu_counter_batch);
 }
 
 static inline void add_frag_mem_limit(struct inet_frag_queue *q, int i)
 {
-	atomic_add(i, &q->net->mem);
+	__percpu_counter_add(&q->net->mem, i, frag_percpu_counter_batch);
 }
 
 static inline void init_frag_mem_limit(struct netns_frags *nf)
 {
-	atomic_set(&nf->mem, 0);
+	percpu_counter_init(&nf->mem, 0);
 }
 
 static inline int sum_frag_mem_limit(struct netns_frags *nf)
 {
-	return atomic_read(&nf->mem);
+	return percpu_counter_sum_positive(&nf->mem);
 }
 
 #endif
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index e348c84..b825205 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -91,6 +91,8 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
 	local_bh_disable();
 	inet_frag_evictor(nf, f, true);
 	local_bh_enable();
+
+	percpu_counter_destroy(&nf->mem);
 }
 EXPORT_SYMBOL(inet_frags_exit_net);
 

  parent reply	other threads:[~2013-01-29  9:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-29  9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
2013-01-29  9:44 ` [net-next PATCH V2 1/6] net: cacheline adjust struct netns_frags for better frag performance Jesper Dangaard Brouer
2013-01-29  9:44 ` [net-next PATCH V2 2/6] net: cacheline adjust struct inet_frags " Jesper Dangaard Brouer
2013-01-29  9:44 ` [net-next PATCH V2 3/6] net: cacheline adjust struct inet_frag_queue Jesper Dangaard Brouer
2013-01-29  9:45 ` [net-next PATCH V2 4/6] net: frag helper functions for mem limit tracking Jesper Dangaard Brouer
2013-01-29  9:45 ` Jesper Dangaard Brouer [this message]
2013-01-29  9:45 ` [net-next PATCH V2 6/6] net: frag, move LRU list maintenance outside of rwlock Jesper Dangaard Brouer
2013-01-29 18:38 ` [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130129094517.13513.12103.stgit@dragon \
    --to=brouer@redhat.com \
    --cc=amwang@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=herbert@gondor.hengli.com.au \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).