* [net-next PATCH V2 1/6] net: cacheline adjust struct netns_frags for better frag performance
2013-01-29 9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
@ 2013-01-29 9:44 ` Jesper Dangaard Brouer
2013-01-29 9:44 ` [net-next PATCH V2 2/6] net: cacheline adjust struct inet_frags " Jesper Dangaard Brouer
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2013-01-29 9:44 UTC (permalink / raw)
To: Eric Dumazet, David S. Miller, Florian Westphal
Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Cong Wang,
Patrick McHardy, Herbert Xu, Daniel Borkmann
This small cacheline adjustment of struct netns_frags improves
performance significantly for the fragmentation code.
Struct members 'lru_list' and 'mem' are both hot elements, and it
hurts performance, due to cacheline bouncing at every call point,
when they share a cacheline. Also notice, how mem is placed
together with 'high_thresh' and 'low_thresh', as they are used in
the compare operations together.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/inet_frag.h | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 32786a0..91e7797 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -3,9 +3,12 @@
struct netns_frags {
int nqueues;
- atomic_t mem;
struct list_head lru_list;
+ /* Its important for performance to keep lru_list and mem on
+ * separate cachelines
+ */
+ atomic_t mem ____cacheline_aligned_in_smp;
/* sysctls */
int timeout;
int high_thresh;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [net-next PATCH V2 2/6] net: cacheline adjust struct inet_frags for better frag performance
2013-01-29 9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
2013-01-29 9:44 ` [net-next PATCH V2 1/6] net: cacheline adjust struct netns_frags for better frag performance Jesper Dangaard Brouer
@ 2013-01-29 9:44 ` Jesper Dangaard Brouer
2013-01-29 9:44 ` [net-next PATCH V2 3/6] net: cacheline adjust struct inet_frag_queue Jesper Dangaard Brouer
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2013-01-29 9:44 UTC (permalink / raw)
To: Eric Dumazet, David S. Miller, Florian Westphal
Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Cong Wang,
Patrick McHardy, Herbert Xu, Daniel Borkmann
The globally shared rwlock, of struct inet_frags, shares
cacheline with the 'rnd' number, which is used by the hash
calculations. Fix this, as this obviously is a bad idea, as
unnecessary cache-misses will occur when accessing the 'rnd'
number.
Also small note that, moving function ptr (*match) up in struct,
is to avoid it lands on the next cacheline (on 64-bit).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
V2:
- Remove comment about cacheline boundary
include/net/inet_frag.h | 11 +++++++----
1 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 91e7797..54c1de7 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -40,18 +40,21 @@ struct inet_frag_queue {
struct inet_frags {
struct hlist_head hash[INETFRAGS_HASHSZ];
- rwlock_t lock;
- u32 rnd;
- int qsize;
+ /* This rwlock is a global lock (seperate per IPv4, IPv6 and
+ * netfilter). Important to keep this on a seperate cacheline.
+ */
+ rwlock_t lock ____cacheline_aligned_in_smp;
int secret_interval;
struct timer_list secret_timer;
+ u32 rnd;
+ int qsize;
unsigned int (*hashfn)(struct inet_frag_queue *);
+ bool (*match)(struct inet_frag_queue *q, void *arg);
void (*constructor)(struct inet_frag_queue *q,
void *arg);
void (*destructor)(struct inet_frag_queue *);
void (*skb_free)(struct sk_buff *);
- bool (*match)(struct inet_frag_queue *q, void *arg);
void (*frag_expire)(unsigned long data);
};
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [net-next PATCH V2 3/6] net: cacheline adjust struct inet_frag_queue
2013-01-29 9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
2013-01-29 9:44 ` [net-next PATCH V2 1/6] net: cacheline adjust struct netns_frags for better frag performance Jesper Dangaard Brouer
2013-01-29 9:44 ` [net-next PATCH V2 2/6] net: cacheline adjust struct inet_frags " Jesper Dangaard Brouer
@ 2013-01-29 9:44 ` Jesper Dangaard Brouer
2013-01-29 9:45 ` [net-next PATCH V2 4/6] net: frag helper functions for mem limit tracking Jesper Dangaard Brouer
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2013-01-29 9:44 UTC (permalink / raw)
To: Eric Dumazet, David S. Miller, Florian Westphal
Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Cong Wang,
Patrick McHardy, Herbert Xu, Daniel Borkmann
Fragmentation code cacheline adjusting of struct inet_frag_queue.
Take advantage of the size of struct timer_list, and move all but
spinlock_t lock, below the timer struct. On 64-bit 'lru_list',
'list' and 'refcnt', fits exactly into the next cacheline, and a
new cacheline starts at 'fragments'.
The netns_frags *net pointer is moved to the end of the struct,
because its used in a compare, with "next/close-by" elements of
which this struct is embedded into.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/inet_frag.h | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 54c1de7..8e4c425 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -16,12 +16,11 @@ struct netns_frags {
};
struct inet_frag_queue {
- struct hlist_node list;
- struct netns_frags *net;
- struct list_head lru_list; /* lru list member */
spinlock_t lock;
- atomic_t refcnt;
struct timer_list timer; /* when will this queue expire? */
+ struct list_head lru_list; /* lru list member */
+ struct hlist_node list;
+ atomic_t refcnt;
struct sk_buff *fragments; /* list of received fragments */
struct sk_buff *fragments_tail;
ktime_t stamp;
@@ -34,6 +33,8 @@ struct inet_frag_queue {
#define INET_FRAG_LAST_IN 1
u16 max_size;
+
+ struct netns_frags *net;
};
#define INETFRAGS_HASHSZ 64
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [net-next PATCH V2 4/6] net: frag helper functions for mem limit tracking
2013-01-29 9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
` (2 preceding siblings ...)
2013-01-29 9:44 ` [net-next PATCH V2 3/6] net: cacheline adjust struct inet_frag_queue Jesper Dangaard Brouer
@ 2013-01-29 9:45 ` Jesper Dangaard Brouer
2013-01-29 9:45 ` [net-next PATCH V2 5/6] net: use lib/percpu_counter API for fragmentation mem accounting Jesper Dangaard Brouer
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2013-01-29 9:45 UTC (permalink / raw)
To: Eric Dumazet, David S. Miller, Florian Westphal
Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Cong Wang,
Patrick McHardy, Herbert Xu, Daniel Borkmann
This change is primarily a preparation to ease the extension of memory
limit tracking.
The change does reduce the number atomic operation, during freeing of
a frag queue. This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/inet_frag.h | 27 +++++++++++++++++++++++++++
include/net/ipv6.h | 2 +-
net/ipv4/inet_fragment.c | 25 ++++++++++++-------------
net/ipv4/ip_fragment.c | 24 +++++++++++-------------
net/ipv6/netfilter/nf_conntrack_reasm.c | 6 +++---
net/ipv6/reassembly.c | 6 +++---
6 files changed, 57 insertions(+), 33 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 8e4c425..f2fabc2 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -79,4 +79,31 @@ static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f
inet_frag_destroy(q, f, NULL);
}
+/* Memory Tracking Functions. */
+
+static inline int frag_mem_limit(struct netns_frags *nf)
+{
+ return atomic_read(&nf->mem);
+}
+
+static inline void sub_frag_mem_limit(struct inet_frag_queue *q, int i)
+{
+ atomic_sub(i, &q->net->mem);
+}
+
+static inline void add_frag_mem_limit(struct inet_frag_queue *q, int i)
+{
+ atomic_add(i, &q->net->mem);
+}
+
+static inline void init_frag_mem_limit(struct netns_frags *nf)
+{
+ atomic_set(&nf->mem, 0);
+}
+
+static inline int sum_frag_mem_limit(struct netns_frags *nf)
+{
+ return atomic_read(&nf->mem);
+}
+
#endif
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index c1878f7..dc30b60 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -288,7 +288,7 @@ static inline int ip6_frag_nqueues(struct net *net)
static inline int ip6_frag_mem(struct net *net)
{
- return atomic_read(&net->ipv6.frags.mem);
+ return sum_frag_mem_limit(&net->ipv6.frags);
}
#endif
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 4750d2b..e348c84 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -73,7 +73,7 @@ EXPORT_SYMBOL(inet_frags_init);
void inet_frags_init_net(struct netns_frags *nf)
{
nf->nqueues = 0;
- atomic_set(&nf->mem, 0);
+ init_frag_mem_limit(nf);
INIT_LIST_HEAD(&nf->lru_list);
}
EXPORT_SYMBOL(inet_frags_init_net);
@@ -117,12 +117,8 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
EXPORT_SYMBOL(inet_frag_kill);
static inline void frag_kfree_skb(struct netns_frags *nf, struct inet_frags *f,
- struct sk_buff *skb, int *work)
+ struct sk_buff *skb)
{
- if (work)
- *work -= skb->truesize;
-
- atomic_sub(skb->truesize, &nf->mem);
if (f->skb_free)
f->skb_free(skb);
kfree_skb(skb);
@@ -133,6 +129,7 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
{
struct sk_buff *fp;
struct netns_frags *nf;
+ unsigned int sum, sum_truesize = 0;
WARN_ON(!(q->last_in & INET_FRAG_COMPLETE));
WARN_ON(del_timer(&q->timer) != 0);
@@ -143,13 +140,14 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
while (fp) {
struct sk_buff *xp = fp->next;
- frag_kfree_skb(nf, f, fp, work);
+ sum_truesize += fp->truesize;
+ frag_kfree_skb(nf, f, fp);
fp = xp;
}
-
+ sum = sum_truesize + f->qsize;
if (work)
- *work -= f->qsize;
- atomic_sub(f->qsize, &nf->mem);
+ *work -= sum;
+ sub_frag_mem_limit(q, sum);
if (f->destructor)
f->destructor(q);
@@ -164,11 +162,11 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
int work, evicted = 0;
if (!force) {
- if (atomic_read(&nf->mem) <= nf->high_thresh)
+ if (frag_mem_limit(nf) <= nf->high_thresh)
return 0;
}
- work = atomic_read(&nf->mem) - nf->low_thresh;
+ work = frag_mem_limit(nf) - nf->low_thresh;
while (work > 0) {
read_lock(&f->lock);
if (list_empty(&nf->lru_list)) {
@@ -250,7 +248,8 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
q->net = nf;
f->constructor(q, arg);
- atomic_add(f->qsize, &nf->mem);
+ add_frag_mem_limit(q, f->qsize);
+
setup_timer(&q->timer, f->frag_expire, (unsigned long)q);
spin_lock_init(&q->lock);
atomic_set(&q->refcnt, 1);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index f55a4e6..927fe58 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -122,7 +122,7 @@ int ip_frag_nqueues(struct net *net)
int ip_frag_mem(struct net *net)
{
- return atomic_read(&net->ipv4.frags.mem);
+ return sum_frag_mem_limit(&net->ipv4.frags);
}
static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
@@ -161,13 +161,6 @@ static bool ip4_frag_match(struct inet_frag_queue *q, void *a)
qp->user == arg->user;
}
-/* Memory Tracking Functions. */
-static void frag_kfree_skb(struct netns_frags *nf, struct sk_buff *skb)
-{
- atomic_sub(skb->truesize, &nf->mem);
- kfree_skb(skb);
-}
-
static void ip4_frag_init(struct inet_frag_queue *q, void *a)
{
struct ipq *qp = container_of(q, struct ipq, q);
@@ -340,6 +333,7 @@ static inline int ip_frag_too_far(struct ipq *qp)
static int ip_frag_reinit(struct ipq *qp)
{
struct sk_buff *fp;
+ unsigned int sum_truesize = 0;
if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
atomic_inc(&qp->q.refcnt);
@@ -349,9 +343,12 @@ static int ip_frag_reinit(struct ipq *qp)
fp = qp->q.fragments;
do {
struct sk_buff *xp = fp->next;
- frag_kfree_skb(qp->q.net, fp);
+
+ sum_truesize += fp->truesize;
+ kfree_skb(fp);
fp = xp;
} while (fp);
+ sub_frag_mem_limit(&qp->q, sum_truesize);
qp->q.last_in = 0;
qp->q.len = 0;
@@ -496,7 +493,8 @@ found:
qp->q.fragments = next;
qp->q.meat -= free_it->len;
- frag_kfree_skb(qp->q.net, free_it);
+ sub_frag_mem_limit(&qp->q, free_it->truesize);
+ kfree_skb(free_it);
}
}
@@ -519,7 +517,7 @@ found:
qp->q.stamp = skb->tstamp;
qp->q.meat += skb->len;
qp->ecn |= ecn;
- atomic_add(skb->truesize, &qp->q.net->mem);
+ add_frag_mem_limit(&qp->q, skb->truesize);
if (offset == 0)
qp->q.last_in |= INET_FRAG_FIRST_IN;
@@ -617,7 +615,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
head->len -= clone->len;
clone->csum = 0;
clone->ip_summed = head->ip_summed;
- atomic_add(clone->truesize, &qp->q.net->mem);
+ add_frag_mem_limit(&qp->q, clone->truesize);
}
skb_push(head, head->data - skb_network_header(head));
@@ -645,7 +643,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
}
fp = next;
}
- atomic_sub(sum_truesize, &qp->q.net->mem);
+ sub_frag_mem_limit(&qp->q, sum_truesize);
head->next = NULL;
head->dev = dev;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 3dacecc..07ef294 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -319,7 +319,7 @@ found:
fq->q.meat += skb->len;
if (payload_len > fq->q.max_size)
fq->q.max_size = payload_len;
- atomic_add(skb->truesize, &fq->q.net->mem);
+ add_frag_mem_limit(&fq->q, skb->truesize);
/* The first fragment.
* nhoffset is obtained from the first fragment, of course.
@@ -398,7 +398,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
clone->ip_summed = head->ip_summed;
NFCT_FRAG6_CB(clone)->orig = NULL;
- atomic_add(clone->truesize, &fq->q.net->mem);
+ add_frag_mem_limit(&fq->q, clone->truesize);
}
/* We have to remove fragment header from datagram and to relocate
@@ -422,7 +422,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
head->csum = csum_add(head->csum, fp->csum);
head->truesize += fp->truesize;
}
- atomic_sub(head->truesize, &fq->q.net->mem);
+ sub_frag_mem_limit(&fq->q, head->truesize);
head->local_df = 1;
head->next = NULL;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index e5253ec..18cb8de 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -327,7 +327,7 @@ found:
}
fq->q.stamp = skb->tstamp;
fq->q.meat += skb->len;
- atomic_add(skb->truesize, &fq->q.net->mem);
+ add_frag_mem_limit(&fq->q, skb->truesize);
/* The first fragment.
* nhoffset is obtained from the first fragment, of course.
@@ -429,7 +429,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
head->len -= clone->len;
clone->csum = 0;
clone->ip_summed = head->ip_summed;
- atomic_add(clone->truesize, &fq->q.net->mem);
+ add_frag_mem_limit(&fq->q, clone->truesize);
}
/* We have to remove fragment header from datagram and to relocate
@@ -467,7 +467,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
}
fp = next;
}
- atomic_sub(sum_truesize, &fq->q.net->mem);
+ sub_frag_mem_limit(&fq->q, sum_truesize);
head->next = NULL;
head->dev = dev;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [net-next PATCH V2 5/6] net: use lib/percpu_counter API for fragmentation mem accounting
2013-01-29 9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
` (3 preceding siblings ...)
2013-01-29 9:45 ` [net-next PATCH V2 4/6] net: frag helper functions for mem limit tracking Jesper Dangaard Brouer
@ 2013-01-29 9:45 ` Jesper Dangaard Brouer
2013-01-29 9:45 ` [net-next PATCH V2 6/6] net: frag, move LRU list maintenance outside of rwlock Jesper Dangaard Brouer
2013-01-29 18:38 ` [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2013-01-29 9:45 UTC (permalink / raw)
To: Eric Dumazet, David S. Miller, Florian Westphal
Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Cong Wang,
Patrick McHardy, Herbert Xu, Daniel Borkmann
Replace the per network namespace shared atomic "mem" accounting
variable, in the fragmentation code, with a lib/percpu_counter.
Getting percpu_counter to scale to the fragmentation code usage
requires some tweaks.
At first view, percpu_counter looks superfast, but it does not
scale on multi-CPU/NUMA machines, because the default batch size
is too small, for frag code usage. Thus, I have adjusted the
batch size by using __percpu_counter_add() directly, instead of
percpu_counter_sub() and percpu_counter_add().
The batch size is increased to 130.000, based on the largest 64K
fragment memory usage. This does introduce some imprecise
memory accounting, but its does not need to be strict for this
use-case.
It is also essential, that the percpu_counter, does not
share cacheline with other writers, to make this scale.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
V2:
- Remove unrelated change/comment in include/linux/percpu_counter.h
include/net/inet_frag.h | 26 ++++++++++++++++++--------
net/ipv4/inet_fragment.c | 2 ++
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index f2fabc2..e0eec74 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -1,14 +1,17 @@
#ifndef __NET_FRAG_H__
#define __NET_FRAG_H__
+#include <linux/percpu_counter.h>
+
struct netns_frags {
int nqueues;
struct list_head lru_list;
- /* Its important for performance to keep lru_list and mem on
- * separate cachelines
+ /* The percpu_counter "mem" need to be cacheline aligned.
+ * mem.count must not share cacheline with other writers
*/
- atomic_t mem ____cacheline_aligned_in_smp;
+ struct percpu_counter mem ____cacheline_aligned_in_smp;
+
/* sysctls */
int timeout;
int high_thresh;
@@ -81,29 +84,36 @@ static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f
/* Memory Tracking Functions. */
+/* The default percpu_counter batch size is not big enough to scale to
+ * fragmentation mem acct sizes.
+ * The mem size of a 64K fragment is approx:
+ * (44 fragments * 2944 truesize) + frag_queue struct(200) = 129736 bytes
+ */
+static unsigned int frag_percpu_counter_batch = 130000;
+
static inline int frag_mem_limit(struct netns_frags *nf)
{
- return atomic_read(&nf->mem);
+ return percpu_counter_read(&nf->mem);
}
static inline void sub_frag_mem_limit(struct inet_frag_queue *q, int i)
{
- atomic_sub(i, &q->net->mem);
+ __percpu_counter_add(&q->net->mem, -i, frag_percpu_counter_batch);
}
static inline void add_frag_mem_limit(struct inet_frag_queue *q, int i)
{
- atomic_add(i, &q->net->mem);
+ __percpu_counter_add(&q->net->mem, i, frag_percpu_counter_batch);
}
static inline void init_frag_mem_limit(struct netns_frags *nf)
{
- atomic_set(&nf->mem, 0);
+ percpu_counter_init(&nf->mem, 0);
}
static inline int sum_frag_mem_limit(struct netns_frags *nf)
{
- return atomic_read(&nf->mem);
+ return percpu_counter_sum_positive(&nf->mem);
}
#endif
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index e348c84..b825205 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -91,6 +91,8 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
local_bh_disable();
inet_frag_evictor(nf, f, true);
local_bh_enable();
+
+ percpu_counter_destroy(&nf->mem);
}
EXPORT_SYMBOL(inet_frags_exit_net);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [net-next PATCH V2 6/6] net: frag, move LRU list maintenance outside of rwlock
2013-01-29 9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
` (4 preceding siblings ...)
2013-01-29 9:45 ` [net-next PATCH V2 5/6] net: use lib/percpu_counter API for fragmentation mem accounting Jesper Dangaard Brouer
@ 2013-01-29 9:45 ` Jesper Dangaard Brouer
2013-01-29 18:38 ` [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2013-01-29 9:45 UTC (permalink / raw)
To: Eric Dumazet, David S. Miller, Florian Westphal
Cc: Jesper Dangaard Brouer, netdev, Pablo Neira Ayuso, Cong Wang,
Patrick McHardy, Herbert Xu, Daniel Borkmann
Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock. However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.
Original-idea-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/inet_frag.h | 22 ++++++++++++++++++++++
net/ipv4/inet_fragment.c | 12 +++++++-----
net/ipv4/ip_fragment.c | 4 +---
net/ipv6/netfilter/nf_conntrack_reasm.c | 5 ++---
net/ipv6/reassembly.c | 4 +---
5 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index e0eec74..3f237db 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -6,6 +6,7 @@
struct netns_frags {
int nqueues;
struct list_head lru_list;
+ spinlock_t lru_lock;
/* The percpu_counter "mem" need to be cacheline aligned.
* mem.count must not share cacheline with other writers
@@ -116,4 +117,25 @@ static inline int sum_frag_mem_limit(struct netns_frags *nf)
return percpu_counter_sum_positive(&nf->mem);
}
+static inline void inet_frag_lru_move(struct inet_frag_queue *q)
+{
+ spin_lock(&q->net->lru_lock);
+ list_move_tail(&q->lru_list, &q->net->lru_list);
+ spin_unlock(&q->net->lru_lock);
+}
+
+static inline void inet_frag_lru_del(struct inet_frag_queue *q)
+{
+ spin_lock(&q->net->lru_lock);
+ list_del(&q->lru_list);
+ spin_unlock(&q->net->lru_lock);
+}
+
+static inline void inet_frag_lru_add(struct netns_frags *nf,
+ struct inet_frag_queue *q)
+{
+ spin_lock(&nf->lru_lock);
+ list_add_tail(&q->lru_list, &nf->lru_list);
+ spin_unlock(&nf->lru_lock);
+}
#endif
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index b825205..2e453bd 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -75,6 +75,7 @@ void inet_frags_init_net(struct netns_frags *nf)
nf->nqueues = 0;
init_frag_mem_limit(nf);
INIT_LIST_HEAD(&nf->lru_list);
+ spin_lock_init(&nf->lru_lock);
}
EXPORT_SYMBOL(inet_frags_init_net);
@@ -100,9 +101,9 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
{
write_lock(&f->lock);
hlist_del(&fq->list);
- list_del(&fq->lru_list);
fq->net->nqueues--;
write_unlock(&f->lock);
+ inet_frag_lru_del(fq);
}
void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
@@ -170,16 +171,17 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
work = frag_mem_limit(nf) - nf->low_thresh;
while (work > 0) {
- read_lock(&f->lock);
+ spin_lock(&nf->lru_lock);
+
if (list_empty(&nf->lru_list)) {
- read_unlock(&f->lock);
+ spin_unlock(&nf->lru_lock);
break;
}
q = list_first_entry(&nf->lru_list,
struct inet_frag_queue, lru_list);
atomic_inc(&q->refcnt);
- read_unlock(&f->lock);
+ spin_unlock(&nf->lru_lock);
spin_lock(&q->lock);
if (!(q->last_in & INET_FRAG_COMPLETE))
@@ -233,9 +235,9 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
atomic_inc(&qp->refcnt);
hlist_add_head(&qp->list, &f->hash[hash]);
- list_add_tail(&qp->lru_list, &nf->lru_list);
nf->nqueues++;
write_unlock(&f->lock);
+ inet_frag_lru_add(nf, qp);
return qp;
}
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 927fe58..1211613 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -529,9 +529,7 @@ found:
qp->q.meat == qp->q.len)
return ip_frag_reasm(qp, prev, dev);
- write_lock(&ip4_frags.lock);
- list_move_tail(&qp->q.lru_list, &qp->q.net->lru_list);
- write_unlock(&ip4_frags.lock);
+ inet_frag_lru_move(&qp->q);
return -EINPROGRESS;
err:
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 07ef294..c674f15 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -328,9 +328,8 @@ found:
fq->nhoffset = nhoff;
fq->q.last_in |= INET_FRAG_FIRST_IN;
}
- write_lock(&nf_frags.lock);
- list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);
- write_unlock(&nf_frags.lock);
+
+ inet_frag_lru_move(&fq->q);
return 0;
discard_fq:
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 18cb8de..bab2c27 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -341,9 +341,7 @@ found:
fq->q.meat == fq->q.len)
return ip6_frag_reasm(fq, prev, dev);
- write_lock(&ip6_frags.lock);
- list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);
- write_unlock(&ip6_frags.lock);
+ inet_frag_lru_move(&fq->q);
return -1;
discard_fq:
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems
2013-01-29 9:44 [net-next PATCH V2 0/6] net: frag performance tuning cachelines for NUMA/SMP systems Jesper Dangaard Brouer
` (5 preceding siblings ...)
2013-01-29 9:45 ` [net-next PATCH V2 6/6] net: frag, move LRU list maintenance outside of rwlock Jesper Dangaard Brouer
@ 2013-01-29 18:38 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2013-01-29 18:38 UTC (permalink / raw)
To: brouer; +Cc: eric.dumazet, fw, netdev, pablo, amwang, kaber, herbert, dborkman
From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Tue, 29 Jan 2013 10:44:01 +0100
> This patchset is V2, with some trivial code fixes, which were noticed
> by DaveM. It is still a partly respin of my fragmentation optimization
> patches: http://thread.gmane.org/gmane.linux.network/250914
Series applied, thanks Jesper.
^ permalink raw reply [flat|nested] 8+ messages in thread