* [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc
2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
@ 2017-05-02 15:27 ` John Fastabend
0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-05-02 15:27 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, john.fastabend
In qdisc_graft_qdisc a "new" qdisc is attached and the 'qdisc_destroy'
operation is called on the old qdisc. The destroy operation will wait
a rcu grace period and call qdisc_rcu_free(). At which point
gso_cpu_skb is free'd along with all stats so no need to zero stats
and gso_cpu_skb from the graft operation itself.
Further after dropping the qdisc locks we can not continue to call
qdisc_reset before waiting an rcu grace period so that the qdisc is
detached from all cpus. By removing the qdisc_reset() here we get
the correct property of waiting an rcu grace period and letting the
qdisc_destroy operation clean up the qdisc correctly.
Note, a refcnt greater than 1 would cause the destroy operation to
be aborted however if this ever happened the reference to the qdisc
would be lost and we would have a memory leak.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
net/sched/sch_generic.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f179fc9..37cd54e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -813,10 +813,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
root_lock = qdisc_lock(oqdisc);
spin_lock_bh(root_lock);
- /* Prune old scheduler */
- if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
- qdisc_reset(oqdisc);
-
/* ... and graft new one */
if (qdisc == NULL)
qdisc = &noop_qdisc;
@@ -971,6 +967,16 @@ static bool some_qdisc_is_busy(struct net_device *dev)
return false;
}
+static void dev_qdisc_reset(struct net_device *dev,
+ struct netdev_queue *dev_queue,
+ void *none)
+{
+ struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+ if (qdisc)
+ qdisc_reset(qdisc);
+}
+
/**
* dev_deactivate_many - deactivate transmissions on several devices
* @head: list of devices to deactivate
@@ -981,7 +987,6 @@ static bool some_qdisc_is_busy(struct net_device *dev)
void dev_deactivate_many(struct list_head *head)
{
struct net_device *dev;
- bool sync_needed = false;
list_for_each_entry(dev, head, close_list) {
netdev_for_each_tx_queue(dev, dev_deactivate_queue,
@@ -991,20 +996,25 @@ void dev_deactivate_many(struct list_head *head)
&noop_qdisc);
dev_watchdog_down(dev);
- sync_needed |= !dev->dismantle;
}
/* Wait for outstanding qdisc-less dev_queue_xmit calls.
* This is avoided if all devices are in dismantle phase :
* Caller will call synchronize_net() for us
*/
- if (sync_needed)
- synchronize_net();
+ synchronize_net();
/* Wait for outstanding qdisc_run calls. */
- list_for_each_entry(dev, head, close_list)
+ list_for_each_entry(dev, head, close_list) {
while (some_qdisc_is_busy(dev))
yield();
+ /* The new qdisc is assigned at this point so we can safely
+ * unwind stale skb lists and qdisc statistics
+ */
+ netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL);
+ if (dev_ingress_queue(dev))
+ dev_qdisc_reset(dev, dev_ingress_queue(dev), NULL);
+ }
}
void dev_deactivate(struct net_device *dev)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 00/17] lockless qdisc
@ 2017-11-13 20:07 John Fastabend
2017-11-13 20:07 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
` (16 more replies)
0 siblings, 17 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:07 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Multiple folks asked me about this series at net(dev)conf so with
a 10+hour flight and a bit testing once back home I think these
are ready to be submitted. Net-next is closed at the moment
http://vger.kernel.org/~davem/net-next.html
but, once it opens up we can get these in first thing and have
plenty of time to resolve in fallout. Although I haven't seen any
issues with my latest testing.
My first test case uses multiple containers (via cilium) where
multiple client containers use 'wrk' to benchmark connections with
a server container running lighttpd. Where lighttpd is configured
to use multiple threads, one per core. Additionally this test has
a proxy agent running so all traffic takes an extra hop through a
proxy container. In these cases each TCP packet traverses the egress
qdisc layer at least four times and the ingress qdisc layer an
additional four times. This makes for a good stress test IMO, perf
details below.
The other micro-benchmark I run is injecting packets directly into
qdisc layer using pktgen. This uses the benchmark script,
./pktgen_bench_xmit_mode_queue_xmit.sh
Benchmarks taken in two cases, "base" running latest net-next no
changes to qdisc layer and "qdisc" tests run with qdisc lockless
updates. Numbers reported in req/sec. All virtual 'veth' devices
run with pfifo_fast in the qdisc test case.
`wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"`
conns 16 32 64 1024
-----------------------------------------------
base: 18831 20201 21393 29151
qdisc: 19309 21063 23899 29265
notice in all cases we see performance improvement when running
with qdisc case.
Microbenchmarks using pktgen are as follows,
`pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000
base(mq): 2.1Mpps
base(pfifo_fast): 2.1Mpps
qdisc(mq): 2.6Mpps
qdisc(pfifo_fast): 2.6Mpps
notice numbers are the same for mq and pfifo_fast because only
testing a single thread here.
Comments and feedback welcome. Anyone willing to do additional testing
would be greatly appreciated. The patches can be pulled here,
https://github.com/cilium/linux/tree/qdisc
Thanks,
John
---
John Fastabend (17):
net: sched: cleanup qdisc_run and __qdisc_run semantics
net: sched: allow qdiscs to handle locking
net: sched: remove remaining uses for qdisc_qlen in xmit path
net: sched: provide per cpu qstat helpers
net: sched: a dflt qdisc may be used with per cpu stats
net: sched: explicit locking in gso_cpu fallback
net: sched: drop qdisc_reset from dev_graft_qdisc
net: sched: use skb list for skb_bad_tx
net: sched: check for frozen queue before skb_bad_txq check
net: sched: qdisc_qlen for per cpu logic
net: sched: helper to sum qlen
net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio
net: skb_array: expose peek API
net: sched: pfifo_fast use skb_array
net: skb_array additions for unlocked consumer
net: sched: lock once per bulk dequeue
0 files changed
^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
@ 2017-11-13 20:07 ` John Fastabend
2017-11-13 20:08 ` [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking John Fastabend
` (15 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:07 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Currently __qdisc_run calls qdisc_run_end() but does not call
qdisc_run_begin(). This makes it hard to track pairs of
qdisc_run_{begin,end} across function calls.
To simplify reading these code paths this patch moves begin/end calls
into qdisc_run().
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index d1f413f..4eea719 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -113,8 +113,10 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
static inline void qdisc_run(struct Qdisc *q)
{
- if (qdisc_run_begin(q))
+ if (qdisc_run_begin(q)) {
__qdisc_run(q);
+ qdisc_run_end(q);
+ }
}
static inline __be16 tc_skb_protocol(const struct sk_buff *skb)
diff --git a/net/core/dev.c b/net/core/dev.c
index 30b5fe3..9f5fa78 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3209,9 +3209,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
contended = false;
}
__qdisc_run(q);
- } else
- qdisc_run_end(q);
+ }
+ qdisc_run_end(q);
rc = NET_XMIT_SUCCESS;
} else {
rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
@@ -3221,6 +3221,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
contended = false;
}
__qdisc_run(q);
+ qdisc_run_end(q);
}
}
spin_unlock(root_lock);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 3839cbb..f6803e1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -266,8 +266,6 @@ void __qdisc_run(struct Qdisc *q)
break;
}
}
-
- qdisc_run_end(q);
}
unsigned long dev_trans_start(struct net_device *dev)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
2017-11-13 20:07 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
@ 2017-11-13 20:08 ` John Fastabend
2017-11-13 20:08 ` [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
` (14 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:08 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
This patch adds a flag for queueing disciplines to indicate the stack
does not need to use the qdisc lock to protect operations. This can
be used to build lockless scheduling algorithms and improving
performance.
The flag is checked in the tx path and the qdisc lock is only taken
if it is not set. For now use a conditional if statement. Later we
could be more aggressive if it proves worthwhile and use a static key
or wrap this in a likely().
Also the lockless case drops the TCQ_F_CAN_BYPASS logic. The reason
for this is synchronizing a qlen counter across threads proves to
cost more than doing the enqueue/dequeue operations when tested with
pktgen.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 65d0d25..bb806a0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -71,6 +71,7 @@ struct Qdisc {
* qdisc_tree_decrease_qlen() should stop.
*/
#define TCQ_F_INVISIBLE 0x80 /* invisible by default in dump */
+#define TCQ_F_NOLOCK 0x100 /* qdisc does not require locking */
u32 limit;
const struct Qdisc_ops *ops;
struct qdisc_size_table __rcu *stab;
diff --git a/net/core/dev.c b/net/core/dev.c
index 9f5fa78..e03bee6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3179,6 +3179,21 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
int rc;
qdisc_calculate_pkt_len(skb, q);
+
+ if (q->flags & TCQ_F_NOLOCK) {
+ if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
+ __qdisc_drop(skb, &to_free);
+ rc = NET_XMIT_DROP;
+ } else {
+ rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+ __qdisc_run(q);
+ }
+
+ if (unlikely(to_free))
+ kfree_skb_list(to_free);
+ return rc;
+ }
+
/*
* Heuristic to force contended enqueues to serialize on a
* separate lock before trying to get qdisc main lock.
@@ -4161,19 +4176,22 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
while (head) {
struct Qdisc *q = head;
- spinlock_t *root_lock;
+ spinlock_t *root_lock = NULL;
head = head->next_sched;
- root_lock = qdisc_lock(q);
- spin_lock(root_lock);
+ if (!(q->flags & TCQ_F_NOLOCK)) {
+ root_lock = qdisc_lock(q);
+ spin_lock(root_lock);
+ }
/* We need to make sure head->next_sched is read
* before clearing __QDISC_STATE_SCHED
*/
smp_mb__before_atomic();
clear_bit(__QDISC_STATE_SCHED, &q->state);
qdisc_run(q);
- spin_unlock(root_lock);
+ if (root_lock)
+ spin_unlock(root_lock);
}
}
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f6803e1..ec757f6 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -174,7 +174,8 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
int ret = NETDEV_TX_BUSY;
/* And release qdisc */
- spin_unlock(root_lock);
+ if (root_lock)
+ spin_unlock(root_lock);
/* Note that we validate skb (GSO, checksum, ...) outside of locks */
if (validate)
@@ -187,10 +188,13 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
HARD_TX_UNLOCK(dev, txq);
} else {
- spin_lock(root_lock);
+ if (root_lock)
+ spin_lock(root_lock);
return qdisc_qlen(q);
}
- spin_lock(root_lock);
+
+ if (root_lock)
+ spin_lock(root_lock);
if (dev_xmit_complete(ret)) {
/* Driver sent out skb successfully or skb was consumed */
@@ -231,9 +235,9 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
*/
static inline int qdisc_restart(struct Qdisc *q, int *packets)
{
+ spinlock_t *root_lock = NULL;
struct netdev_queue *txq;
struct net_device *dev;
- spinlock_t *root_lock;
struct sk_buff *skb;
bool validate;
@@ -242,7 +246,9 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
if (unlikely(!skb))
return 0;
- root_lock = qdisc_lock(q);
+ if (!(q->flags & TCQ_F_NOLOCK))
+ root_lock = qdisc_lock(q);
+
dev = qdisc_dev(q);
txq = skb_get_tx_queue(dev, skb);
@@ -880,14 +886,18 @@ static bool some_qdisc_is_busy(struct net_device *dev)
dev_queue = netdev_get_tx_queue(dev, i);
q = dev_queue->qdisc_sleeping;
- root_lock = qdisc_lock(q);
- spin_lock_bh(root_lock);
+ if (q->flags & TCQ_F_NOLOCK) {
+ val = test_bit(__QDISC_STATE_SCHED, &q->state);
+ } else {
+ root_lock = qdisc_lock(q);
+ spin_lock_bh(root_lock);
- val = (qdisc_is_running(q) ||
- test_bit(__QDISC_STATE_SCHED, &q->state));
+ val = (qdisc_is_running(q) ||
+ test_bit(__QDISC_STATE_SCHED, &q->state));
- spin_unlock_bh(root_lock);
+ spin_unlock_bh(root_lock);
+ }
if (val)
return true;
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
2017-11-13 20:07 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
2017-11-13 20:08 ` [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking John Fastabend
@ 2017-11-13 20:08 ` John Fastabend
2017-11-15 0:11 ` Willem de Bruijn
2017-11-13 20:08 ` [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers John Fastabend
` (13 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:08 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
sch_direct_xmit() uses qdisc_qlen as a return value but all call sites
of the routine only check if it is zero or not. Simplify the logic so
that we don't need to return an actual queue length value.
This introduces a case now where sch_direct_xmit would have returned
a qlen of zero but now it returns true. However in this case all
call sites of sch_direct_xmit will implement a dequeue() and get
a null skb and abort. This trades tracking qlen in the hotpath for
an extra dequeue operation. Overall this seems to be good for
performance.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 4eea719..2404692 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -105,9 +105,9 @@ struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
void qdisc_put_rtab(struct qdisc_rate_table *tab);
void qdisc_put_stab(struct qdisc_size_table *tab);
void qdisc_warn_nonwc(const char *txt, struct Qdisc *qdisc);
-int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
- struct net_device *dev, struct netdev_queue *txq,
- spinlock_t *root_lock, bool validate);
+bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
+ struct net_device *dev, struct netdev_queue *txq,
+ spinlock_t *root_lock, bool validate);
void __qdisc_run(struct Qdisc *q);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ec757f6..d063f6b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -164,12 +164,12 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
* only one CPU can execute this function.
*
* Returns to the caller:
- * 0 - queue is empty or throttled.
- * >0 - queue is not empty.
+ * false - hardware queue frozen backoff
+ * true - feel free to send more pkts
*/
-int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
- struct net_device *dev, struct netdev_queue *txq,
- spinlock_t *root_lock, bool validate)
+bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
+ struct net_device *dev, struct netdev_queue *txq,
+ spinlock_t *root_lock, bool validate)
{
int ret = NETDEV_TX_BUSY;
@@ -190,7 +190,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
} else {
if (root_lock)
spin_lock(root_lock);
- return qdisc_qlen(q);
+ return true;
}
if (root_lock)
@@ -198,18 +198,19 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
if (dev_xmit_complete(ret)) {
/* Driver sent out skb successfully or skb was consumed */
- ret = qdisc_qlen(q);
+ ret = true;
} else {
/* Driver returned NETDEV_TX_BUSY - requeue skb */
if (unlikely(ret != NETDEV_TX_BUSY))
net_warn_ratelimited("BUG %s code %d qlen %d\n",
dev->name, ret, q->q.qlen);
- ret = dev_requeue_skb(skb, q);
+ dev_requeue_skb(skb, q);
+ ret = false;
}
if (ret && netif_xmit_frozen_or_stopped(txq))
- ret = 0;
+ ret = false;
return ret;
}
@@ -233,7 +234,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
* >0 - queue is not empty.
*
*/
-static inline int qdisc_restart(struct Qdisc *q, int *packets)
+static inline bool qdisc_restart(struct Qdisc *q, int *packets)
{
spinlock_t *root_lock = NULL;
struct netdev_queue *txq;
@@ -244,7 +245,7 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
/* Dequeue packet */
skb = dequeue_skb(q, &validate, packets);
if (unlikely(!skb))
- return 0;
+ return false;
if (!(q->flags & TCQ_F_NOLOCK))
root_lock = qdisc_lock(q);
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (2 preceding siblings ...)
2017-11-13 20:08 ` [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
@ 2017-11-13 20:08 ` John Fastabend
2017-11-13 20:09 ` [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
` (12 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:08 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
The per cpu qstats support was added with per cpu bstat support which
is currently used by the ingress qdisc. This patch adds a set of
helpers needed to make other qdiscs that use qstats per cpu as well.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index bb806a0..7c4b96b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -631,12 +631,39 @@ static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch,
sch->qstats.backlog -= qdisc_pkt_len(skb);
}
+static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
+ const struct sk_buff *skb)
+{
+ this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
+}
+
static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
const struct sk_buff *skb)
{
sch->qstats.backlog += qdisc_pkt_len(skb);
}
+static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
+ const struct sk_buff *skb)
+{
+ this_cpu_add(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
+}
+
+static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
+{
+ this_cpu_inc(sch->cpu_qstats->qlen);
+}
+
+static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch)
+{
+ this_cpu_dec(sch->cpu_qstats->qlen);
+}
+
+static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
+{
+ this_cpu_inc(sch->cpu_qstats->requeues);
+}
+
static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
{
sch->qstats.drops += count;
@@ -844,6 +871,14 @@ static inline void rtnl_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch)
qdisc_qstats_drop(sch);
}
+static inline int qdisc_drop_cpu(struct sk_buff *skb, struct Qdisc *sch,
+ struct sk_buff **to_free)
+{
+ __qdisc_drop(skb, to_free);
+ qdisc_qstats_cpu_drop(sch);
+
+ return NET_XMIT_DROP;
+}
static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (3 preceding siblings ...)
2017-11-13 20:08 ` [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers John Fastabend
@ 2017-11-13 20:09 ` John Fastabend
2017-11-13 20:09 ` [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback John Fastabend
` (11 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:09 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Enable dflt qdisc support for per cpu stats before this patch a dflt
qdisc was required to use the global statistics qstats and bstats.
This adds a static flags field to qdisc_ops that is propagated
into qdisc->flags in qdisc allocate call. This allows the allocation
block to completely allocate the qdisc object so we don't have
dangling allocations after qdisc init.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7c4b96b..7bc2826 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -179,6 +179,7 @@ struct Qdisc_ops {
const struct Qdisc_class_ops *cl_ops;
char id[IFNAMSIZ];
int priv_size;
+ unsigned int static_flags;
int (*enqueue)(struct sk_buff *skb,
struct Qdisc *sch,
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index d063f6b..131eab8 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -635,6 +635,19 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
qdisc_skb_head_init(&sch->q);
spin_lock_init(&sch->q.lock);
+ if (ops->static_flags & TCQ_F_CPUSTATS) {
+ sch->cpu_bstats =
+ netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+ if (!sch->cpu_bstats)
+ goto errout1;
+
+ sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+ if (!sch->cpu_qstats) {
+ free_percpu(sch->cpu_bstats);
+ goto errout1;
+ }
+ }
+
spin_lock_init(&sch->busylock);
lockdep_set_class(&sch->busylock,
dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
@@ -644,6 +657,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
dev->qdisc_running_key ?: &qdisc_running_key);
sch->ops = ops;
+ sch->flags = ops->static_flags;
sch->enqueue = ops->enqueue;
sch->dequeue = ops->dequeue;
sch->dev_queue = dev_queue;
@@ -651,6 +665,8 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
refcount_set(&sch->refcnt, 1);
return sch;
+errout1:
+ kfree(p);
errout:
return ERR_PTR(err);
}
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (4 preceding siblings ...)
2017-11-13 20:09 ` [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
@ 2017-11-13 20:09 ` John Fastabend
2017-11-15 0:41 ` Willem de Bruijn
2017-11-13 20:09 ` [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
` (10 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:09 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
This work is preparing the qdisc layer to support egress lockless
qdiscs. If we are running the egress qdisc lockless in the case we
overrun the netdev, for whatever reason, the netdev returns a busy
error code and the skb is parked on the gso_skb pointer. With many
cores all hitting this case at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.
This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7bc2826..6e329f0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -88,7 +88,7 @@ struct Qdisc {
/*
* For performance sake on SMP, we put highly modified fields at the end
*/
- struct sk_buff *gso_skb ____cacheline_aligned_in_smp;
+ struct sk_buff_head gso_skb ____cacheline_aligned_in_smp;
struct qdisc_skb_head q;
struct gnet_stats_basic_packed bstats;
seqcount_t running;
@@ -795,26 +795,30 @@ static inline struct sk_buff *qdisc_peek_head(struct Qdisc *sch)
/* generic pseudo peek method for non-work-conserving qdisc */
static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
{
+ struct sk_buff *skb = skb_peek(&sch->gso_skb);
+
/* we can reuse ->gso_skb because peek isn't called for root qdiscs */
- if (!sch->gso_skb) {
- sch->gso_skb = sch->dequeue(sch);
- if (sch->gso_skb) {
+ if (!skb) {
+ skb = sch->dequeue(sch);
+
+ if (skb) {
+ __skb_queue_head(&sch->gso_skb, skb);
/* it's still part of the queue */
- qdisc_qstats_backlog_inc(sch, sch->gso_skb);
+ qdisc_qstats_backlog_inc(sch, skb);
sch->q.qlen++;
}
}
- return sch->gso_skb;
+ return skb;
}
/* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
{
- struct sk_buff *skb = sch->gso_skb;
+ struct sk_buff *skb = skb_peek(&sch->gso_skb);
if (skb) {
- sch->gso_skb = NULL;
+ skb = __skb_dequeue(&sch->gso_skb);
qdisc_qstats_backlog_dec(sch, skb);
sch->q.qlen--;
} else {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 131eab8..1055bec 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -45,10 +45,9 @@
* - ingress filtering is also serialized via qdisc root lock
* - updates to tree and tree walking are only done under the rtnl mutex.
*/
-
-static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
{
- q->gso_skb = skb;
+ __skb_queue_head(&q->gso_skb, skb);
q->qstats.requeues++;
qdisc_qstats_backlog_inc(q, skb);
q->q.qlen++; /* it's still part of the queue */
@@ -57,6 +56,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
return 0;
}
+static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
+{
+ spinlock_t *lock = qdisc_lock(q);
+
+ spin_lock(lock);
+ __skb_queue_tail(&q->gso_skb, skb);
+ spin_unlock(lock);
+
+ qdisc_qstats_cpu_requeues_inc(q);
+ qdisc_qstats_cpu_backlog_inc(q, skb);
+ qdisc_qstats_cpu_qlen_inc(q);
+ __netif_schedule(q);
+
+ return 0;
+}
+
+static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+ if (q->flags & TCQ_F_NOLOCK)
+ return dev_requeue_skb_locked(skb, q);
+ else
+ return __dev_requeue_skb(skb, q);
+}
+
static void try_bulk_dequeue_skb(struct Qdisc *q,
struct sk_buff *skb,
const struct netdev_queue *txq,
@@ -112,23 +135,50 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
int *packets)
{
- struct sk_buff *skb = q->gso_skb;
const struct netdev_queue *txq = q->dev_queue;
+ struct sk_buff *skb;
*packets = 1;
- if (unlikely(skb)) {
+ if (unlikely(!skb_queue_empty(&q->gso_skb))) {
+ spinlock_t *lock = NULL;
+
+ if (q->flags & TCQ_F_NOLOCK) {
+ lock = qdisc_lock(q);
+ spin_lock(lock);
+ }
+
+ skb = skb_peek(&q->gso_skb);
+
+ /* skb may be null if another cpu pulls gso_skb off in between
+ * empty check and lock.
+ */
+ if (!skb) {
+ if (lock)
+ spin_unlock(lock);
+ goto validate;
+ }
+
/* skb in gso_skb were already validated */
*validate = false;
/* check the reason of requeuing without tx lock first */
txq = skb_get_tx_queue(txq->dev, skb);
if (!netif_xmit_frozen_or_stopped(txq)) {
- q->gso_skb = NULL;
- qdisc_qstats_backlog_dec(q, skb);
- q->q.qlen--;
- } else
+ skb = __skb_dequeue(&q->gso_skb);
+ if (qdisc_is_percpu_stats(q)) {
+ qdisc_qstats_cpu_backlog_dec(q, skb);
+ qdisc_qstats_cpu_qlen_dec(q);
+ } else {
+ qdisc_qstats_backlog_dec(q, skb);
+ q->q.qlen--;
+ }
+ } else {
skb = NULL;
+ }
+ if (lock)
+ spin_unlock(lock);
goto trace;
}
+validate:
*validate = true;
skb = q->skb_bad_txq;
if (unlikely(skb)) {
@@ -632,6 +682,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
sch->padded = (char *) sch - (char *) p;
}
+ __skb_queue_head_init(&sch->gso_skb);
qdisc_skb_head_init(&sch->q);
spin_lock_init(&sch->q.lock);
@@ -700,6 +751,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
void qdisc_reset(struct Qdisc *qdisc)
{
const struct Qdisc_ops *ops = qdisc->ops;
+ struct sk_buff *skb, *tmp;
if (ops->reset)
ops->reset(qdisc);
@@ -707,10 +759,9 @@ void qdisc_reset(struct Qdisc *qdisc)
kfree_skb(qdisc->skb_bad_txq);
qdisc->skb_bad_txq = NULL;
- if (qdisc->gso_skb) {
- kfree_skb_list(qdisc->gso_skb);
- qdisc->gso_skb = NULL;
- }
+ skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
+ kfree_skb_list(skb);
+
qdisc->q.qlen = 0;
qdisc->qstats.backlog = 0;
}
@@ -729,6 +780,7 @@ static void qdisc_free(struct Qdisc *qdisc)
void qdisc_destroy(struct Qdisc *qdisc)
{
const struct Qdisc_ops *ops = qdisc->ops;
+ struct sk_buff *skb, *tmp;
if (qdisc->flags & TCQ_F_BUILTIN ||
!refcount_dec_and_test(&qdisc->refcnt))
@@ -748,7 +800,9 @@ void qdisc_destroy(struct Qdisc *qdisc)
module_put(ops->owner);
dev_put(qdisc_dev(qdisc));
- kfree_skb_list(qdisc->gso_skb);
+ skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
+ kfree_skb_list(skb);
+
kfree_skb(qdisc->skb_bad_txq);
qdisc_free(qdisc);
}
@@ -976,6 +1030,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
rcu_assign_pointer(dev_queue->qdisc, qdisc);
dev_queue->qdisc_sleeping = qdisc;
+ __skb_queue_head_init(&qdisc->gso_skb);
}
void dev_init_scheduler(struct net_device *dev)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (5 preceding siblings ...)
2017-11-13 20:09 ` [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback John Fastabend
@ 2017-11-13 20:09 ` John Fastabend
2017-11-13 20:10 ` [RFC PATCH 08/17] net: sched: use skb list for skb_bad_tx John Fastabend
` (9 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:09 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
In qdisc_graft_qdisc a "new" qdisc is attached and the 'qdisc_destroy'
operation is called on the old qdisc. The destroy operation will wait
a rcu grace period and call qdisc_rcu_free(). At which point
gso_cpu_skb is free'd along with all stats so no need to zero stats
and gso_cpu_skb from the graft operation itself.
Further after dropping the qdisc locks we can not continue to call
qdisc_reset before waiting an rcu grace period so that the qdisc is
detached from all cpus. By removing the qdisc_reset() here we get
the correct property of waiting an rcu grace period and letting the
qdisc_destroy operation clean up the qdisc correctly.
Note, a refcnt greater than 1 would cause the destroy operation to
be aborted however if this ever happened the reference to the qdisc
would be lost and we would have a memory leak.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 1055bec..ea017ce 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -818,10 +818,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
root_lock = qdisc_lock(oqdisc);
spin_lock_bh(root_lock);
- /* Prune old scheduler */
- if (oqdisc && refcount_read(&oqdisc->refcnt) <= 1)
- qdisc_reset(oqdisc);
-
/* ... and graft new one */
if (qdisc == NULL)
qdisc = &noop_qdisc;
@@ -976,6 +972,16 @@ static bool some_qdisc_is_busy(struct net_device *dev)
return false;
}
+static void dev_qdisc_reset(struct net_device *dev,
+ struct netdev_queue *dev_queue,
+ void *none)
+{
+ struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+ if (qdisc)
+ qdisc_reset(qdisc);
+}
+
/**
* dev_deactivate_many - deactivate transmissions on several devices
* @head: list of devices to deactivate
@@ -986,7 +992,6 @@ static bool some_qdisc_is_busy(struct net_device *dev)
void dev_deactivate_many(struct list_head *head)
{
struct net_device *dev;
- bool sync_needed = false;
list_for_each_entry(dev, head, close_list) {
netdev_for_each_tx_queue(dev, dev_deactivate_queue,
@@ -996,20 +1001,25 @@ void dev_deactivate_many(struct list_head *head)
&noop_qdisc);
dev_watchdog_down(dev);
- sync_needed |= !dev->dismantle;
}
/* Wait for outstanding qdisc-less dev_queue_xmit calls.
* This is avoided if all devices are in dismantle phase :
* Caller will call synchronize_net() for us
*/
- if (sync_needed)
- synchronize_net();
+ synchronize_net();
/* Wait for outstanding qdisc_run calls. */
- list_for_each_entry(dev, head, close_list)
+ list_for_each_entry(dev, head, close_list) {
while (some_qdisc_is_busy(dev))
yield();
+ /* The new qdisc is assigned at this point so we can safely
+ * unwind stale skb lists and qdisc statistics
+ */
+ netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL);
+ if (dev_ingress_queue(dev))
+ dev_qdisc_reset(dev, dev_ingress_queue(dev), NULL);
+ }
}
void dev_deactivate(struct net_device *dev)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 08/17] net: sched: use skb list for skb_bad_tx
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (6 preceding siblings ...)
2017-11-13 20:09 ` [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
@ 2017-11-13 20:10 ` John Fastabend
2017-11-13 20:10 ` [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check John Fastabend
` (8 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:10 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Similar to how gso is handled use skb list for skb_bad_tx this is
required with lockless qdiscs because we may have multiple cores
attempting to push skbs into skb_bad_tx concurrently
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 6e329f0..4717c4b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -95,7 +95,7 @@ struct Qdisc {
struct gnet_stats_queue qstats;
unsigned long state;
struct Qdisc *next_sched;
- struct sk_buff *skb_bad_txq;
+ struct sk_buff_head skb_bad_txq;
int padded;
refcount_t refcnt;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ea017ce..92570d4 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -45,6 +45,68 @@
* - ingress filtering is also serialized via qdisc root lock
* - updates to tree and tree walking are only done under the rtnl mutex.
*/
+
+static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
+{
+ const struct netdev_queue *txq = q->dev_queue;
+ spinlock_t *lock = NULL;
+ struct sk_buff *skb;
+
+ if (q->flags & TCQ_F_NOLOCK) {
+ lock = qdisc_lock(q);
+ spin_lock(lock);
+ }
+
+ skb = skb_peek(&q->skb_bad_txq);
+ if (skb) {
+ /* check the reason of requeuing without tx lock first */
+ txq = skb_get_tx_queue(txq->dev, skb);
+ if (!netif_xmit_frozen_or_stopped(txq)) {
+ skb = __skb_dequeue(&q->skb_bad_txq);
+ if (qdisc_is_percpu_stats(q)) {
+ qdisc_qstats_cpu_backlog_dec(q, skb);
+ qdisc_qstats_cpu_qlen_dec(q);
+ } else {
+ qdisc_qstats_backlog_dec(q, skb);
+ q->q.qlen--;
+ }
+ } else {
+ skb = NULL;
+ }
+ }
+
+ if (lock)
+ spin_unlock(lock);
+
+ return skb;
+}
+
+static inline struct sk_buff *qdisc_dequeue_skb_bad_txq(struct Qdisc *q)
+{
+ struct sk_buff *skb = skb_peek(&q->skb_bad_txq);
+
+ if (unlikely(skb))
+ skb = __skb_dequeue_bad_txq(q);
+
+ return skb;
+}
+
+static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
+ struct sk_buff *skb)
+{
+ spinlock_t *lock = NULL;
+
+ if (q->flags & TCQ_F_NOLOCK) {
+ lock = qdisc_lock(q);
+ spin_lock(lock);
+ }
+
+ __skb_queue_tail(&q->skb_bad_txq, skb);
+
+ if (lock)
+ spin_unlock(lock);
+}
+
static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
{
__skb_queue_head(&q->gso_skb, skb);
@@ -117,9 +179,15 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
if (!nskb)
break;
if (unlikely(skb_get_queue_mapping(nskb) != mapping)) {
- q->skb_bad_txq = nskb;
- qdisc_qstats_backlog_inc(q, nskb);
- q->q.qlen++;
+ qdisc_enqueue_skb_bad_txq(q, nskb);
+
+ if (qdisc_is_percpu_stats(q)) {
+ qdisc_qstats_cpu_backlog_inc(q, nskb);
+ qdisc_qstats_cpu_qlen_inc(q);
+ } else {
+ qdisc_qstats_backlog_inc(q, nskb);
+ q->q.qlen++;
+ }
break;
}
skb->next = nskb;
@@ -180,19 +248,9 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
}
validate:
*validate = true;
- skb = q->skb_bad_txq;
- if (unlikely(skb)) {
- /* check the reason of requeuing without tx lock first */
- txq = skb_get_tx_queue(txq->dev, skb);
- if (!netif_xmit_frozen_or_stopped(txq)) {
- q->skb_bad_txq = NULL;
- qdisc_qstats_backlog_dec(q, skb);
- q->q.qlen--;
- goto bulk;
- }
- skb = NULL;
- goto trace;
- }
+ skb = qdisc_dequeue_skb_bad_txq(q);
+ if (unlikely(skb))
+ goto bulk;
if (!(q->flags & TCQ_F_ONETXQUEUE) ||
!netif_xmit_frozen_or_stopped(txq))
skb = q->dequeue(q);
@@ -683,6 +741,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
sch->padded = (char *) sch - (char *) p;
}
__skb_queue_head_init(&sch->gso_skb);
+ __skb_queue_head_init(&sch->skb_bad_txq);
qdisc_skb_head_init(&sch->q);
spin_lock_init(&sch->q.lock);
@@ -756,12 +815,12 @@ void qdisc_reset(struct Qdisc *qdisc)
if (ops->reset)
ops->reset(qdisc);
- kfree_skb(qdisc->skb_bad_txq);
- qdisc->skb_bad_txq = NULL;
-
skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
kfree_skb_list(skb);
+ skb_queue_walk_safe(&qdisc->skb_bad_txq, skb, tmp)
+ kfree_skb_list(skb);
+
qdisc->q.qlen = 0;
qdisc->qstats.backlog = 0;
}
@@ -802,8 +861,8 @@ void qdisc_destroy(struct Qdisc *qdisc)
skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
kfree_skb_list(skb);
-
- kfree_skb(qdisc->skb_bad_txq);
+ skb_queue_walk_safe(&qdisc->skb_bad_txq, skb, tmp)
+ kfree_skb_list(skb);
qdisc_free(qdisc);
}
EXPORT_SYMBOL(qdisc_destroy);
@@ -1041,6 +1100,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
rcu_assign_pointer(dev_queue->qdisc, qdisc);
dev_queue->qdisc_sleeping = qdisc;
__skb_queue_head_init(&qdisc->gso_skb);
+ __skb_queue_head_init(&qdisc->skb_bad_txq);
}
void dev_init_scheduler(struct net_device *dev)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (7 preceding siblings ...)
2017-11-13 20:10 ` [RFC PATCH 08/17] net: sched: use skb list for skb_bad_tx John Fastabend
@ 2017-11-13 20:10 ` John Fastabend
2017-11-13 20:10 ` [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic John Fastabend
` (7 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:10 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
I can not think of any reason to pull the bad txq skb off the qdisc if
the txq we plan to send this on is still frozen. So check for frozen
queue first and abort before dequeuing either skb_bad_txq skb or
normal qdisc dequeue() skb.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 92570d4..84c4ea1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -204,7 +204,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
int *packets)
{
const struct netdev_queue *txq = q->dev_queue;
- struct sk_buff *skb;
+ struct sk_buff *skb = NULL;
*packets = 1;
if (unlikely(!skb_queue_empty(&q->gso_skb))) {
@@ -248,12 +248,15 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
}
validate:
*validate = true;
+
+ if ((q->flags & TCQ_F_ONETXQUEUE) &&
+ netif_xmit_frozen_or_stopped(txq))
+ return skb;
+
skb = qdisc_dequeue_skb_bad_txq(q);
if (unlikely(skb))
goto bulk;
- if (!(q->flags & TCQ_F_ONETXQUEUE) ||
- !netif_xmit_frozen_or_stopped(txq))
- skb = q->dequeue(q);
+ skb = q->dequeue(q);
if (skb) {
bulk:
if (qdisc_may_bulk(q))
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (8 preceding siblings ...)
2017-11-13 20:10 ` [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check John Fastabend
@ 2017-11-13 20:10 ` John Fastabend
2017-11-15 1:16 ` Willem de Bruijn
2017-11-13 20:11 ` [RFC PATCH 11/17] net: sched: helper to sum qlen John Fastabend
` (6 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:10 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Add qdisc qlen helper routines for lockless qdiscs to use.
The qdisc qlen is no longer used in the hotpath but it is reported
via stats query on the qdisc so it still needs to be tracked. This
adds the per cpu operations needed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4717c4b..bad24a9 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -291,8 +291,16 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
BUILD_BUG_ON(sizeof(qcb->data) < sz);
}
+static inline int qdisc_qlen_cpu(const struct Qdisc *q)
+{
+ return this_cpu_ptr(q->cpu_qstats)->qlen;
+}
+
static inline int qdisc_qlen(const struct Qdisc *q)
{
+ if (q->flags & TCQ_F_NOLOCK)
+ return qdisc_qlen_cpu(q);
+
return q->q.qlen;
}
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 11/17] net: sched: helper to sum qlen
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (9 preceding siblings ...)
2017-11-13 20:10 ` [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic John Fastabend
@ 2017-11-13 20:11 ` John Fastabend
2017-11-13 20:11 ` [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
` (5 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:11 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Reporting qlen when qlen is per cpu requires aggregating the per
cpu counters. This adds a helper routine for this.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index bad24a9..5824509 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -304,6 +304,21 @@ static inline int qdisc_qlen(const struct Qdisc *q)
return q->q.qlen;
}
+static inline int qdisc_qlen_sum(const struct Qdisc *q)
+{
+ __u32 qlen = 0;
+ int i;
+
+ if (q->flags & TCQ_F_NOLOCK) {
+ for_each_possible_cpu(i)
+ qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+ } else {
+ qlen = q->q.qlen;
+ }
+
+ return qlen;
+}
+
static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
{
return (struct qdisc_skb_cb *)skb->cb;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index b6c4f53..5cb64d2 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -797,7 +797,8 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
goto nla_put_failure;
if (q->ops->dump && q->ops->dump(q, skb) < 0)
goto nla_put_failure;
- qlen = q->q.qlen;
+
+ qlen = qdisc_qlen_sum(q);
stab = rtnl_dereference(q->stab);
if (stab && qdisc_dump_stab(skb, stab) < 0)
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (10 preceding siblings ...)
2017-11-13 20:11 ` [RFC PATCH 11/17] net: sched: helper to sum qlen John Fastabend
@ 2017-11-13 20:11 ` John Fastabend
2017-11-15 1:22 ` Willem de Bruijn
2017-11-13 20:11 ` [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
` (4 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:11 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
The sch_mq qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.
This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Also exports __gnet_stats_copy_queue() to use as a helper function.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 304f7aa..0304ba2 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -49,6 +49,9 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d,
int gnet_stats_copy_queue(struct gnet_dump *d,
struct gnet_stats_queue __percpu *cpu_q,
struct gnet_stats_queue *q, __u32 qlen);
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+ const struct gnet_stats_queue __percpu *cpu_q,
+ const struct gnet_stats_queue *q, __u32 qlen);
int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
int gnet_stats_finish_copy(struct gnet_dump *d);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 87f2855..b2b2323b 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -252,10 +252,10 @@
}
}
-static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
- const struct gnet_stats_queue __percpu *cpu,
- const struct gnet_stats_queue *q,
- __u32 qlen)
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+ const struct gnet_stats_queue __percpu *cpu,
+ const struct gnet_stats_queue *q,
+ __u32 qlen)
{
if (cpu) {
__gnet_stats_copy_queue_cpu(qstats, cpu);
@@ -269,6 +269,7 @@ static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
qstats->qlen = qlen;
}
+EXPORT_SYMBOL(__gnet_stats_copy_queue);
/**
* gnet_stats_copy_queue - copy queue statistics into statistics TLV
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 213b586..bc59f05 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -17,6 +17,7 @@
#include <linux/skbuff.h>
#include <net/netlink.h>
#include <net/pkt_sched.h>
+#include <net/sch_generic.h>
struct mq_sched {
struct Qdisc **qdiscs;
@@ -103,15 +104,25 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
memset(&sch->qstats, 0, sizeof(sch->qstats));
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+ struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+ struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+ __u32 qlen = 0;
+
qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
spin_lock_bh(qdisc_lock(qdisc));
- sch->q.qlen += qdisc->q.qlen;
- sch->bstats.bytes += qdisc->bstats.bytes;
- sch->bstats.packets += qdisc->bstats.packets;
- sch->qstats.backlog += qdisc->qstats.backlog;
- sch->qstats.drops += qdisc->qstats.drops;
- sch->qstats.requeues += qdisc->qstats.requeues;
- sch->qstats.overlimits += qdisc->qstats.overlimits;
+
+ if (qdisc_is_percpu_stats(qdisc)) {
+ cpu_bstats = qdisc->cpu_bstats;
+ cpu_qstats = qdisc->cpu_qstats;
+ }
+
+ qlen = qdisc_qlen_sum(qdisc);
+
+ __gnet_stats_copy_basic(NULL, &sch->bstats,
+ cpu_bstats, &qdisc->bstats);
+ __gnet_stats_copy_queue(&sch->qstats,
+ cpu_qstats, &qdisc->qstats, qlen);
+
spin_unlock_bh(qdisc_lock(qdisc));
}
return 0;
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (11 preceding siblings ...)
2017-11-13 20:11 ` [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
@ 2017-11-13 20:11 ` John Fastabend
2017-11-13 20:12 ` [RFC PATCH 14/17] net: skb_array: expose peek API John Fastabend
` (3 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:11 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.
This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b85885a9..24474d0 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -388,22 +388,32 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
struct tc_mqprio_qopt opt = { 0 };
struct Qdisc *qdisc;
- unsigned int i;
+ unsigned int ntx, tc;
sch->q.qlen = 0;
memset(&sch->bstats, 0, sizeof(sch->bstats));
memset(&sch->qstats, 0, sizeof(sch->qstats));
- for (i = 0; i < dev->num_tx_queues; i++) {
- qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
+ for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+ struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+ struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+ __u32 qlen = 0;
+
+ qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
spin_lock_bh(qdisc_lock(qdisc));
- sch->q.qlen += qdisc->q.qlen;
- sch->bstats.bytes += qdisc->bstats.bytes;
- sch->bstats.packets += qdisc->bstats.packets;
- sch->qstats.backlog += qdisc->qstats.backlog;
- sch->qstats.drops += qdisc->qstats.drops;
- sch->qstats.requeues += qdisc->qstats.requeues;
- sch->qstats.overlimits += qdisc->qstats.overlimits;
+
+ if (qdisc_is_percpu_stats(qdisc)) {
+ cpu_bstats = qdisc->cpu_bstats;
+ cpu_qstats = qdisc->cpu_qstats;
+ }
+
+ qlen = qdisc_qlen_sum(qdisc);
+
+ __gnet_stats_copy_basic(NULL, &sch->bstats,
+ cpu_bstats, &qdisc->bstats);
+ __gnet_stats_copy_queue(&sch->qstats,
+ cpu_qstats, &qdisc->qstats, qlen);
+
spin_unlock_bh(qdisc_lock(qdisc));
}
@@ -411,9 +421,9 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map));
opt.hw = priv->hw_offload;
- for (i = 0; i < netdev_get_num_tc(dev); i++) {
- opt.count[i] = dev->tc_to_txq[i].count;
- opt.offset[i] = dev->tc_to_txq[i].offset;
+ for (tc = 0; tc < netdev_get_num_tc(dev); tc++) {
+ opt.count[tc] = dev->tc_to_txq[tc].count;
+ opt.offset[tc] = dev->tc_to_txq[tc].offset;
}
if (nla_put(skb, TCA_OPTIONS, NLA_ALIGN(sizeof(opt)), &opt))
@@ -495,7 +505,6 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
if (cl >= TC_H_MIN_PRIORITY) {
int i;
__u32 qlen = 0;
- struct Qdisc *qdisc;
struct gnet_stats_queue qstats = {0};
struct gnet_stats_basic_packed bstats = {0};
struct net_device *dev = qdisc_dev(sch);
@@ -511,18 +520,26 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
for (i = tc.offset; i < tc.offset + tc.count; i++) {
struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+ struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
+ struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+ struct gnet_stats_queue __percpu *cpu_qstats = NULL;
- qdisc = rtnl_dereference(q->qdisc);
spin_lock_bh(qdisc_lock(qdisc));
- qlen += qdisc->q.qlen;
- bstats.bytes += qdisc->bstats.bytes;
- bstats.packets += qdisc->bstats.packets;
- qstats.backlog += qdisc->qstats.backlog;
- qstats.drops += qdisc->qstats.drops;
- qstats.requeues += qdisc->qstats.requeues;
- qstats.overlimits += qdisc->qstats.overlimits;
+ if (qdisc_is_percpu_stats(qdisc)) {
+ cpu_bstats = qdisc->cpu_bstats;
+ cpu_qstats = qdisc->cpu_qstats;
+ }
+
+ qlen = qdisc_qlen_sum(qdisc);
+ __gnet_stats_copy_basic(NULL, &sch->bstats,
+ cpu_bstats, &qdisc->bstats);
+ __gnet_stats_copy_queue(&sch->qstats,
+ cpu_qstats,
+ &qdisc->qstats,
+ qlen);
spin_unlock_bh(qdisc_lock(qdisc));
}
+
/* Reclaim root sleeping lock before completing stats */
if (d->lock)
spin_lock_bh(d->lock);
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 14/17] net: skb_array: expose peek API
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (12 preceding siblings ...)
2017-11-13 20:11 ` [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
@ 2017-11-13 20:12 ` John Fastabend
2017-11-13 20:12 ` [RFC PATCH 15/17] net: sched: pfifo_fast use skb_array John Fastabend
` (2 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:12 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
This adds a peek routine to skb_array.h for use with qdisc.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 8621ffd..c7addf3 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -72,6 +72,11 @@ static inline bool __skb_array_empty(struct skb_array *a)
return !__ptr_ring_peek(&a->ring);
}
+static inline struct sk_buff *__skb_array_peek(struct skb_array *a)
+{
+ return __ptr_ring_peek(&a->ring);
+}
+
static inline bool skb_array_empty(struct skb_array *a)
{
return ptr_ring_empty(&a->ring);
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 15/17] net: sched: pfifo_fast use skb_array
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (13 preceding siblings ...)
2017-11-13 20:12 ` [RFC PATCH 14/17] net: skb_array: expose peek API John Fastabend
@ 2017-11-13 20:12 ` John Fastabend
2017-11-14 23:34 ` Willem de Bruijn
2017-11-13 20:12 ` [RFC PATCH 16/17] net: skb_array additions for unlocked consumer John Fastabend
2017-11-13 20:13 ` [RFC PATCH 17/17] net: sched: lock once per bulk dequeue John Fastabend
16 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:12 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
This converts the pfifo_fast qdisc to use the skb_array data structure
and set the lockless qdisc bit.
This also removes the logic used to pick the next band to dequeue from
and instead just checks a per priority array for packets from top priority
to lowest. This might need to be a bit more clever but seems to work
for now.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 84c4ea1..683f6ec 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -26,6 +26,7 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/if_vlan.h>
+#include <linux/skb_array.h>
#include <net/sch_generic.h>
#include <net/pkt_sched.h>
#include <net/dst.h>
@@ -581,93 +582,95 @@ struct Qdisc_ops noqueue_qdisc_ops __read_mostly = {
/*
* Private data for a pfifo_fast scheduler containing:
- * - queues for the three band
- * - bitmap indicating which of the bands contain skbs
+ * - rings for priority bands
*/
struct pfifo_fast_priv {
- u32 bitmap;
- struct qdisc_skb_head q[PFIFO_FAST_BANDS];
+ struct skb_array q[PFIFO_FAST_BANDS];
};
-/*
- * Convert a bitmap to the first band number where an skb is queued, where:
- * bitmap=0 means there are no skbs on any band.
- * bitmap=1 means there is an skb on band 0.
- * bitmap=7 means there are skbs on all 3 bands, etc.
- */
-static const int bitmap2band[] = {-1, 0, 1, 0, 2, 0, 1, 0};
-
-static inline struct qdisc_skb_head *band2list(struct pfifo_fast_priv *priv,
- int band)
+static inline struct skb_array *band2list(struct pfifo_fast_priv *priv,
+ int band)
{
- return priv->q + band;
+ return &priv->q[band];
}
static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
struct sk_buff **to_free)
{
- if (qdisc->q.qlen < qdisc_dev(qdisc)->tx_queue_len) {
- int band = prio2band[skb->priority & TC_PRIO_MAX];
- struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
- struct qdisc_skb_head *list = band2list(priv, band);
-
- priv->bitmap |= (1 << band);
- qdisc->q.qlen++;
- return __qdisc_enqueue_tail(skb, qdisc, list);
- }
+ int band = prio2band[skb->priority & TC_PRIO_MAX];
+ struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
+ struct skb_array *q = band2list(priv, band);
+ int err;
- return qdisc_drop(skb, qdisc, to_free);
+ err = skb_array_produce_bh(q, skb);
+
+ if (unlikely(err))
+ return qdisc_drop_cpu(skb, qdisc, to_free);
+
+ qdisc_qstats_cpu_qlen_inc(qdisc);
+ qdisc_qstats_cpu_backlog_inc(qdisc, skb);
+ return NET_XMIT_SUCCESS;
}
static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
{
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
- int band = bitmap2band[priv->bitmap];
-
- if (likely(band >= 0)) {
- struct qdisc_skb_head *qh = band2list(priv, band);
- struct sk_buff *skb = __qdisc_dequeue_head(qh);
+ struct sk_buff *skb = NULL;
+ int band;
- if (likely(skb != NULL)) {
- qdisc_qstats_backlog_dec(qdisc, skb);
- qdisc_bstats_update(qdisc, skb);
- }
+ for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
+ struct skb_array *q = band2list(priv, band);
- qdisc->q.qlen--;
- if (qh->qlen == 0)
- priv->bitmap &= ~(1 << band);
+ if (__skb_array_empty(q))
+ continue;
- return skb;
+ skb = skb_array_consume_bh(q);
+ }
+ if (likely(skb)) {
+ qdisc_qstats_cpu_backlog_dec(qdisc, skb);
+ qdisc_bstats_cpu_update(qdisc, skb);
+ qdisc_qstats_cpu_qlen_dec(qdisc);
}
- return NULL;
+ return skb;
}
static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
{
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
- int band = bitmap2band[priv->bitmap];
+ struct sk_buff *skb = NULL;
+ int band;
- if (band >= 0) {
- struct qdisc_skb_head *qh = band2list(priv, band);
+ for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
+ struct skb_array *q = band2list(priv, band);
- return qh->head;
+ skb = __skb_array_peek(q);
+ if (!skb)
+ continue;
}
- return NULL;
+ return skb;
}
static void pfifo_fast_reset(struct Qdisc *qdisc)
{
- int prio;
+ int i, band;
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
- for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
- __qdisc_reset_queue(band2list(priv, prio));
+ for (band = 0; band < PFIFO_FAST_BANDS; band++) {
+ struct skb_array *q = band2list(priv, band);
+ struct sk_buff *skb;
- priv->bitmap = 0;
- qdisc->qstats.backlog = 0;
- qdisc->q.qlen = 0;
+ while ((skb = skb_array_consume_bh(q)) != NULL)
+ __skb_array_destroy_skb(skb);
+ }
+
+ for_each_possible_cpu(i) {
+ struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i);
+
+ q->backlog = 0;
+ q->qlen = 0;
+ }
}
static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb)
@@ -685,17 +688,40 @@ static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb)
static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt)
{
- int prio;
+ unsigned int qlen = qdisc_dev(qdisc)->tx_queue_len;
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
+ int prio;
+
+ /* guard against zero length rings */
+ if (!qlen)
+ return -EINVAL;
- for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
- qdisc_skb_head_init(band2list(priv, prio));
+ for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
+ struct skb_array *q = band2list(priv, prio);
+ int err;
+
+ err = skb_array_init(q, qlen, GFP_KERNEL);
+ if (err)
+ return -ENOMEM;
+ }
/* Can by-pass the queue discipline */
qdisc->flags |= TCQ_F_CAN_BYPASS;
return 0;
}
+static void pfifo_fast_destroy(struct Qdisc *sch)
+{
+ struct pfifo_fast_priv *priv = qdisc_priv(sch);
+ int prio;
+
+ for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
+ struct skb_array *q = band2list(priv, prio);
+
+ skb_array_cleanup(q);
+ }
+}
+
struct Qdisc_ops pfifo_fast_ops __read_mostly = {
.id = "pfifo_fast",
.priv_size = sizeof(struct pfifo_fast_priv),
@@ -703,9 +729,11 @@ struct Qdisc_ops pfifo_fast_ops __read_mostly = {
.dequeue = pfifo_fast_dequeue,
.peek = pfifo_fast_peek,
.init = pfifo_fast_init,
+ .destroy = pfifo_fast_destroy,
.reset = pfifo_fast_reset,
.dump = pfifo_fast_dump,
.owner = THIS_MODULE,
+ .static_flags = TCQ_F_NOLOCK | TCQ_F_CPUSTATS,
};
EXPORT_SYMBOL(pfifo_fast_ops);
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 16/17] net: skb_array additions for unlocked consumer
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (14 preceding siblings ...)
2017-11-13 20:12 ` [RFC PATCH 15/17] net: sched: pfifo_fast use skb_array John Fastabend
@ 2017-11-13 20:12 ` John Fastabend
2017-11-13 20:13 ` [RFC PATCH 17/17] net: sched: lock once per bulk dequeue John Fastabend
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:12 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index c7addf3..d0a240e 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -142,6 +142,11 @@ static inline int skb_array_consume_batched_bh(struct skb_array *a,
return ptr_ring_consume_batched_bh(&a->ring, (void **)array, n);
}
+static inline struct sk_buff *__skb_array_consume(struct skb_array *a)
+{
+ return __ptr_ring_consume(&a->ring);
+}
+
static inline int __skb_array_len_with_tag(struct sk_buff *skb)
{
if (likely(skb)) {
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 17/17] net: sched: lock once per bulk dequeue
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
` (15 preceding siblings ...)
2017-11-13 20:12 ` [RFC PATCH 16/17] net: skb_array additions for unlocked consumer John Fastabend
@ 2017-11-13 20:13 ` John Fastabend
16 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-13 20:13 UTC (permalink / raw)
To: willemdebruijn.kernel, daniel, eric.dumazet
Cc: make0818, netdev, jiri, xiyou.wangcong
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 683f6ec..8ab7933 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -206,33 +206,22 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
{
const struct netdev_queue *txq = q->dev_queue;
struct sk_buff *skb = NULL;
+ spinlock_t *lock = NULL;
- *packets = 1;
- if (unlikely(!skb_queue_empty(&q->gso_skb))) {
- spinlock_t *lock = NULL;
-
- if (q->flags & TCQ_F_NOLOCK) {
- lock = qdisc_lock(q);
- spin_lock(lock);
- }
-
- skb = skb_peek(&q->gso_skb);
-
- /* skb may be null if another cpu pulls gso_skb off in between
- * empty check and lock.
- */
- if (!skb) {
- if (lock)
- spin_unlock(lock);
- goto validate;
- }
+ if (q->flags & TCQ_F_NOLOCK) {
+ lock = qdisc_lock(q);
+ spin_lock(lock);
+ }
+ *packets = 1;
+ skb = skb_peek(&q->gso_skb);
+ if (unlikely(skb)) {
/* skb in gso_skb were already validated */
*validate = false;
/* check the reason of requeuing without tx lock first */
txq = skb_get_tx_queue(txq->dev, skb);
if (!netif_xmit_frozen_or_stopped(txq)) {
- skb = __skb_dequeue(&q->gso_skb);
+ __skb_unlink(skb, &q->gso_skb);
if (qdisc_is_percpu_stats(q)) {
qdisc_qstats_cpu_backlog_dec(q, skb);
qdisc_qstats_cpu_qlen_dec(q);
@@ -247,17 +236,17 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
spin_unlock(lock);
goto trace;
}
-validate:
- *validate = true;
- if ((q->flags & TCQ_F_ONETXQUEUE) &&
- netif_xmit_frozen_or_stopped(txq))
- return skb;
+ *validate = true;
skb = qdisc_dequeue_skb_bad_txq(q);
if (unlikely(skb))
goto bulk;
- skb = q->dequeue(q);
+
+ if (!(q->flags & TCQ_F_ONETXQUEUE) ||
+ !netif_xmit_frozen_or_stopped(txq))
+ skb = q->dequeue(q);
+
if (skb) {
bulk:
if (qdisc_may_bulk(q))
@@ -265,6 +254,8 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
else
try_bulk_dequeue_skb_slow(q, skb, packets);
}
+ if (lock)
+ spin_unlock(lock);
trace:
trace_qdisc_dequeue(q, txq, *packets, skb);
return skb;
@@ -624,7 +615,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
if (__skb_array_empty(q))
continue;
- skb = skb_array_consume_bh(q);
+ skb = __skb_array_consume(q);
}
if (likely(skb)) {
qdisc_qstats_cpu_backlog_dec(qdisc, skb);
@@ -661,7 +652,7 @@ static void pfifo_fast_reset(struct Qdisc *qdisc)
struct skb_array *q = band2list(priv, band);
struct sk_buff *skb;
- while ((skb = skb_array_consume_bh(q)) != NULL)
+ while ((skb = __skb_array_consume(q)) != NULL)
__skb_array_destroy_skb(skb);
}
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 15/17] net: sched: pfifo_fast use skb_array
2017-11-13 20:12 ` [RFC PATCH 15/17] net: sched: pfifo_fast use skb_array John Fastabend
@ 2017-11-14 23:34 ` Willem de Bruijn
2017-11-15 14:57 ` John Fastabend
0 siblings, 1 reply; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-14 23:34 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
> static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
> {
> struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
> - int band = bitmap2band[priv->bitmap];
> + struct sk_buff *skb = NULL;
> + int band;
>
> - if (band >= 0) {
> - struct qdisc_skb_head *qh = band2list(priv, band);
> + for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> + struct skb_array *q = band2list(priv, band);
>
> - return qh->head;
> + skb = __skb_array_peek(q);
> + if (!skb)
> + continue;
> }
This explicit continue is not needed.
> static void pfifo_fast_reset(struct Qdisc *qdisc)
> {
> - int prio;
> + int i, band;
> struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>
> - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
> - __qdisc_reset_queue(band2list(priv, prio));
> + for (band = 0; band < PFIFO_FAST_BANDS; band++) {
> + struct skb_array *q = band2list(priv, band);
> + struct sk_buff *skb;
>
> - priv->bitmap = 0;
> - qdisc->qstats.backlog = 0;
> - qdisc->q.qlen = 0;
> + while ((skb = skb_array_consume_bh(q)) != NULL)
> + __skb_array_destroy_skb(skb);
Can use regular kfree_skb after dequeue. This skb_array specific
callback probably only exists to match the ptr_ring callback typedef.
> static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb)
> @@ -685,17 +688,40 @@ static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb)
>
> static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt)
> {
> - int prio;
> + unsigned int qlen = qdisc_dev(qdisc)->tx_queue_len;
> struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
> + int prio;
> +
> + /* guard against zero length rings */
> + if (!qlen)
> + return -EINVAL;
>
> - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
> - qdisc_skb_head_init(band2list(priv, prio));
> + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
> + struct skb_array *q = band2list(priv, prio);
> + int err;
> +
> + err = skb_array_init(q, qlen, GFP_KERNEL);
> + if (err)
> + return -ENOMEM;
This relies on the caller calling ops->destroy on error to free partially
allocated state. For uninitialized bands, that calls spin_lock on an
uninitialized spinlock from skb_array_cleanup -> ptr_ring_cleanup ->
ptr_ring_consume.
> + }
>
> /* Can by-pass the queue discipline */
> qdisc->flags |= TCQ_F_CAN_BYPASS;
> return 0;
> }
>
> +static void pfifo_fast_destroy(struct Qdisc *sch)
> +{
> + struct pfifo_fast_priv *priv = qdisc_priv(sch);
> + int prio;
> +
> + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
> + struct skb_array *q = band2list(priv, prio);
> +
> + skb_array_cleanup(q);
> + }
> +}
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path
2017-11-13 20:08 ` [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
@ 2017-11-15 0:11 ` Willem de Bruijn
2017-11-15 1:56 ` Willem de Bruijn
0 siblings, 1 reply; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-15 0:11 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
On Mon, Nov 13, 2017 at 3:08 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> sch_direct_xmit() uses qdisc_qlen as a return value but all call sites
> of the routine only check if it is zero or not. Simplify the logic so
> that we don't need to return an actual queue length value.
>
> This introduces a case now where sch_direct_xmit would have returned
> a qlen of zero but now it returns true.
You mean the first case, when the likely(skb) branch failed?
Can that return false, then?
> However in this case all
> call sites of sch_direct_xmit will implement a dequeue() and get
> a null skb and abort. This trades tracking qlen in the hotpath for
> an extra dequeue operation. Overall this seems to be good for
> performance.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> @@ -198,18 +198,19 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>
> if (dev_xmit_complete(ret)) {
> /* Driver sent out skb successfully or skb was consumed */
> - ret = qdisc_qlen(q);
> + ret = true;
> } else {
> /* Driver returned NETDEV_TX_BUSY - requeue skb */
> if (unlikely(ret != NETDEV_TX_BUSY))
> net_warn_ratelimited("BUG %s code %d qlen %d\n",
> dev->name, ret, q->q.qlen);
>
> - ret = dev_requeue_skb(skb, q);
> + dev_requeue_skb(skb, q);
> + ret = false;
> }
>
> if (ret && netif_xmit_frozen_or_stopped(txq))
> - ret = 0;
> + ret = false;
>
> return ret;
> }
This can now become
if (!dev_queue_complete(ret)) {
if (unlikely(ret != NETDEV_TX_BUSY))
...
dev_requeue_skb(skb, q);
return false;
}
if (netif_xmit_frozen_or_stopped(txq))
return false;
return true;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback
2017-11-13 20:09 ` [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback John Fastabend
@ 2017-11-15 0:41 ` Willem de Bruijn
2017-11-15 2:04 ` Willem de Bruijn
2017-11-15 15:11 ` John Fastabend
0 siblings, 2 replies; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-15 0:41 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
> /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
> static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
> {
> - struct sk_buff *skb = sch->gso_skb;
> + struct sk_buff *skb = skb_peek(&sch->gso_skb);
>
> if (skb) {
> - sch->gso_skb = NULL;
> + skb = __skb_dequeue(&sch->gso_skb);
> qdisc_qstats_backlog_dec(sch, skb);
> sch->q.qlen--;
In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
Same for its use in qdisc_peek_dequeued.
> -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> {
Perhaps dev_requeue_skb_qdisc_locked is more descriptive. Or
adding a lockdep_is_held(..) also documents that the __locked variant
below is not just a lock/unlock wrapper around this inner function.
> - q->gso_skb = skb;
> + __skb_queue_head(&q->gso_skb, skb);
> q->qstats.requeues++;
> qdisc_qstats_backlog_inc(q, skb);
> q->q.qlen++; /* it's still part of the queue */
> @@ -57,6 +56,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> return 0;
> }
>
> +static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
> +{
> + spinlock_t *lock = qdisc_lock(q);
> +
> + spin_lock(lock);
> + __skb_queue_tail(&q->gso_skb, skb);
why does this requeue at the tail, unlike __dev_requeue_skb?
> + spin_unlock(lock);
> +
> + qdisc_qstats_cpu_requeues_inc(q);
> + qdisc_qstats_cpu_backlog_inc(q, skb);
> + qdisc_qstats_cpu_qlen_inc(q);
> + __netif_schedule(q);
> +
> + return 0;
> +}
> +
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic
2017-11-13 20:10 ` [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic John Fastabend
@ 2017-11-15 1:16 ` Willem de Bruijn
2017-11-15 15:18 ` John Fastabend
0 siblings, 1 reply; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-15 1:16 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
On Mon, Nov 13, 2017 at 3:10 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Add qdisc qlen helper routines for lockless qdiscs to use.
>
> The qdisc qlen is no longer used in the hotpath but it is reported
> via stats query on the qdisc so it still needs to be tracked. This
> adds the per cpu operations needed.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
> 0 files changed
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 4717c4b..bad24a9 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -291,8 +291,16 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
> BUILD_BUG_ON(sizeof(qcb->data) < sz);
> }
>
> +static inline int qdisc_qlen_cpu(const struct Qdisc *q)
> +{
> + return this_cpu_ptr(q->cpu_qstats)->qlen;
> +}
> +
> static inline int qdisc_qlen(const struct Qdisc *q)
> {
> + if (q->flags & TCQ_F_NOLOCK)
> + return qdisc_qlen_cpu(q);
Shouldn't this return qdisc_qlen_sum from the follow-on patch?
The two patches can also be squashed.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
2017-11-13 20:11 ` [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
@ 2017-11-15 1:22 ` Willem de Bruijn
0 siblings, 0 replies; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-15 1:22 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
> /**
> * gnet_stats_copy_queue - copy queue statistics into statistics TLV
> diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
> index 213b586..bc59f05 100644
> --- a/net/sched/sch_mq.c
> +++ b/net/sched/sch_mq.c
> @@ -17,6 +17,7 @@
> #include <linux/skbuff.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> +#include <net/sch_generic.h>
>
> struct mq_sched {
> struct Qdisc **qdiscs;
> @@ -103,15 +104,25 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
> memset(&sch->qstats, 0, sizeof(sch->qstats));
>
> for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> + struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
> + struct gnet_stats_queue __percpu *cpu_qstats = NULL;
> + __u32 qlen = 0;
> +
> qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
> spin_lock_bh(qdisc_lock(qdisc));
> - sch->q.qlen += qdisc->q.qlen;
> - sch->bstats.bytes += qdisc->bstats.bytes;
> - sch->bstats.packets += qdisc->bstats.packets;
> - sch->qstats.backlog += qdisc->qstats.backlog;
> - sch->qstats.drops += qdisc->qstats.drops;
> - sch->qstats.requeues += qdisc->qstats.requeues;
> - sch->qstats.overlimits += qdisc->qstats.overlimits;
> +
> + if (qdisc_is_percpu_stats(qdisc)) {
> + cpu_bstats = qdisc->cpu_bstats;
> + cpu_qstats = qdisc->cpu_qstats;
> + }
Else, these fields are NULL. So no need for the intermediate
variables.
> +
> + qlen = qdisc_qlen_sum(qdisc);
> +
> + __gnet_stats_copy_basic(NULL, &sch->bstats,
> + cpu_bstats, &qdisc->bstats);
> + __gnet_stats_copy_queue(&sch->qstats,
> + cpu_qstats, &qdisc->qstats, qlen);
These functions add if the percpu variant is taken, but assign otherwise,
so overwrite on each iteration of the loop.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path
2017-11-15 0:11 ` Willem de Bruijn
@ 2017-11-15 1:56 ` Willem de Bruijn
2017-11-15 15:00 ` John Fastabend
0 siblings, 1 reply; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-15 1:56 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
On Tue, Nov 14, 2017 at 7:11 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Mon, Nov 13, 2017 at 3:08 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> sch_direct_xmit() uses qdisc_qlen as a return value but all call sites
>> of the routine only check if it is zero or not. Simplify the logic so
>> that we don't need to return an actual queue length value.
>>
>> This introduces a case now where sch_direct_xmit would have returned
>> a qlen of zero but now it returns true.
>
> You mean the first case, when the likely(skb) branch failed?
> Can that return false, then?
I misunderstood. __qdisc_run will just take one extra loop, since it
will no longer break out of the loop on reading the last packet on
the queue with qdisc_restart.
That does have a subtle (but benign if rare) side effect of possibly
calling __netif_schedule while there is nothing queued, if either
the quota is exactly exhausted or need_resched is true.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback
2017-11-15 0:41 ` Willem de Bruijn
@ 2017-11-15 2:04 ` Willem de Bruijn
2017-11-15 15:11 ` John Fastabend
1 sibling, 0 replies; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-15 2:04 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
>> -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>> +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>> {
>
> Perhaps dev_requeue_skb_qdisc_locked is more descriptive. Or
> adding a lockdep_is_held(..) also documents that the __locked variant
> below is not just a lock/unlock wrapper around this inner function.
>
>> - q->gso_skb = skb;
>> + __skb_queue_head(&q->gso_skb, skb);
>> q->qstats.requeues++;
>> qdisc_qstats_backlog_inc(q, skb);
>> q->q.qlen++; /* it's still part of the queue */
>> @@ -57,6 +56,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>> return 0;
>> }
>>
>> +static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
>> +{
>> + spinlock_t *lock = qdisc_lock(q);
>> +
>> + spin_lock(lock);
>> + __skb_queue_tail(&q->gso_skb, skb);
>
> why does this requeue at the tail, unlike __dev_requeue_skb?
I guess that requeue has to queue at the tail in the lockless case,
and it does not matter in the qdisc_locked case, as then there can
only ever be at most one outstanding gso_skb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 15/17] net: sched: pfifo_fast use skb_array
2017-11-14 23:34 ` Willem de Bruijn
@ 2017-11-15 14:57 ` John Fastabend
0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-15 14:57 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
[...]
>> static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb)
>> @@ -685,17 +688,40 @@ static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb)
>>
>> static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt)
>> {
>> - int prio;
>> + unsigned int qlen = qdisc_dev(qdisc)->tx_queue_len;
>> struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>> + int prio;
>> +
>> + /* guard against zero length rings */
>> + if (!qlen)
>> + return -EINVAL;
>>
>> - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
>> - qdisc_skb_head_init(band2list(priv, prio));
>> + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
>> + struct skb_array *q = band2list(priv, prio);
>> + int err;
>> +
>> + err = skb_array_init(q, qlen, GFP_KERNEL);
>> + if (err)
>> + return -ENOMEM;
>
> This relies on the caller calling ops->destroy on error to free partially
> allocated state. For uninitialized bands, that calls spin_lock on an
> uninitialized spinlock from skb_array_cleanup -> ptr_ring_cleanup ->
> ptr_ring_consume.
Nice catch will fix in next version. And also make above suggested
changes.
Thanks,
John.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path
2017-11-15 1:56 ` Willem de Bruijn
@ 2017-11-15 15:00 ` John Fastabend
0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-15 15:00 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
On 11/14/2017 05:56 PM, Willem de Bruijn wrote:
> On Tue, Nov 14, 2017 at 7:11 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Mon, Nov 13, 2017 at 3:08 PM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>> sch_direct_xmit() uses qdisc_qlen as a return value but all call sites
>>> of the routine only check if it is zero or not. Simplify the logic so
>>> that we don't need to return an actual queue length value.
>>>
>>> This introduces a case now where sch_direct_xmit would have returned
>>> a qlen of zero but now it returns true.
>>
>> You mean the first case, when the likely(skb) branch failed?
>> Can that return false, then?
>
> I misunderstood. __qdisc_run will just take one extra loop, since it
> will no longer break out of the loop on reading the last packet on
> the queue with qdisc_restart.
>
> That does have a subtle (but benign if rare) side effect of possibly
> calling __netif_schedule while there is nothing queued, if either
> the quota is exactly exhausted or need_resched is true.
>
Right. Seems OK to me.
Will also make code cleanup comment from intial email.
.John
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback
2017-11-15 0:41 ` Willem de Bruijn
2017-11-15 2:04 ` Willem de Bruijn
@ 2017-11-15 15:11 ` John Fastabend
2017-11-15 17:51 ` Willem de Bruijn
1 sibling, 1 reply; 32+ messages in thread
From: John Fastabend @ 2017-11-15 15:11 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
>> /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
>> static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
>> {
>> - struct sk_buff *skb = sch->gso_skb;
>> + struct sk_buff *skb = skb_peek(&sch->gso_skb);
>>
>> if (skb) {
>> - sch->gso_skb = NULL;
>> + skb = __skb_dequeue(&sch->gso_skb);
>> qdisc_qstats_backlog_dec(sch, skb);
>> sch->q.qlen--;
>
> In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
> Same for its use in qdisc_peek_dequeued.
>
Yes, agree if this was used in lockless qdisc it could race. However,
I don't think it is actually used in the lockless cases yet. For pfifo
fast __skb_array_peek is used.
>> -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>> +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>> {
>
> Perhaps dev_requeue_skb_qdisc_locked is more descriptive. Or
> adding a lockdep_is_held(..) also documents that the __locked variant
> below is not just a lock/unlock wrapper around this inner function.
>
Adding lockdep annotation here makes sense to me.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic
2017-11-15 1:16 ` Willem de Bruijn
@ 2017-11-15 15:18 ` John Fastabend
0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-15 15:18 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Eric Dumazet, make0818, Network Development,
Jiří Pírko, Cong Wang
On 11/14/2017 05:16 PM, Willem de Bruijn wrote:
> On Mon, Nov 13, 2017 at 3:10 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> Add qdisc qlen helper routines for lockless qdiscs to use.
>>
>> The qdisc qlen is no longer used in the hotpath but it is reported
>> via stats query on the qdisc so it still needs to be tracked. This
>> adds the per cpu operations needed.
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>> 0 files changed
>>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 4717c4b..bad24a9 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -291,8 +291,16 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>> BUILD_BUG_ON(sizeof(qcb->data) < sz);
>> }
>>
>> +static inline int qdisc_qlen_cpu(const struct Qdisc *q)
>> +{
>> + return this_cpu_ptr(q->cpu_qstats)->qlen;
>> +}
>> +
>> static inline int qdisc_qlen(const struct Qdisc *q)
>> {
>> + if (q->flags & TCQ_F_NOLOCK)
>> + return qdisc_qlen_cpu(q);
>
> Shouldn't this return qdisc_qlen_sum from the follow-on patch?
> The two patches can also be squashed.
>
Actually I think this change can just be dropped. Looks like
I removed all cases of calling it from NOLOCK paths.
Also squashing this patch with 11/17 seems reasonable will go ahead
and do that.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback
2017-11-15 15:11 ` John Fastabend
@ 2017-11-15 17:51 ` Willem de Bruijn
2017-11-16 13:31 ` John Fastabend
0 siblings, 1 reply; 32+ messages in thread
From: Willem de Bruijn @ 2017-11-15 17:51 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, Eric Dumazet, Michael Ma, Network Development,
Jiří Pírko, Cong Wang
On Wed, Nov 15, 2017 at 10:11 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
>>> /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
>>> static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
>>> {
>>> - struct sk_buff *skb = sch->gso_skb;
>>> + struct sk_buff *skb = skb_peek(&sch->gso_skb);
>>>
>>> if (skb) {
>>> - sch->gso_skb = NULL;
>>> + skb = __skb_dequeue(&sch->gso_skb);
>>> qdisc_qstats_backlog_dec(sch, skb);
>>> sch->q.qlen--;
>>
>> In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
>> Same for its use in qdisc_peek_dequeued.
>>
>
> Yes, agree if this was used in lockless qdisc it could race. However,
> I don't think it is actually used in the lockless cases yet. For pfifo
> fast __skb_array_peek is used.
Oh right. That will be easy to miss when other qdiscs are converted
to lockless. Perhaps another location to add lockdep annotations.
Related: what happens when pfifo_fast is used as a leaf in a non-lockless
qdisc hierarchy, say htb? The individual leaves will still have
TCQ_F_NOLOCK, so will try to take the qdisc lock in dequeue_skb
and other locations, but that is already held?
Thanks for revising and resubmitting the patchset, btw!
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback
2017-11-15 17:51 ` Willem de Bruijn
@ 2017-11-16 13:31 ` John Fastabend
0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2017-11-16 13:31 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Eric Dumazet, Michael Ma, Network Development,
Jiří Pírko, Cong Wang
On 11/15/2017 09:51 AM, Willem de Bruijn wrote:
> On Wed, Nov 15, 2017 at 10:11 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
>>>> /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
>>>> static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
>>>> {
>>>> - struct sk_buff *skb = sch->gso_skb;
>>>> + struct sk_buff *skb = skb_peek(&sch->gso_skb);
>>>>
>>>> if (skb) {
>>>> - sch->gso_skb = NULL;
>>>> + skb = __skb_dequeue(&sch->gso_skb);
>>>> qdisc_qstats_backlog_dec(sch, skb);
>>>> sch->q.qlen--;
>>>
>>> In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
>>> Same for its use in qdisc_peek_dequeued.
>>>
>>
>> Yes, agree if this was used in lockless qdisc it could race. However,
>> I don't think it is actually used in the lockless cases yet. For pfifo
>> fast __skb_array_peek is used.
>
> Oh right. That will be easy to miss when other qdiscs are converted
> to lockless. Perhaps another location to add lockdep annotations.
>
Yep. Will add lockdep here.
> Related: what happens when pfifo_fast is used as a leaf in a non-lockless
> qdisc hierarchy, say htb? The individual leaves will still have
> TCQ_F_NOLOCK, so will try to take the qdisc lock in dequeue_skb
> and other locations, but that is already held?
>
Right. So I guess TCQ_F_NOLOCK needs to be propagated through the chain
of qdiscs and at attach we can only set TCQ_F_NOLOCK if the entire chain
of qdisc's are NOLOCK. Will spin an update with this as well.
> Thanks for revising and resubmitting the patchset, btw!
>
no problem will be good to finally get this out of my queue.
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2017-11-16 13:31 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
2017-11-13 20:07 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
2017-11-13 20:08 ` [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking John Fastabend
2017-11-13 20:08 ` [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
2017-11-15 0:11 ` Willem de Bruijn
2017-11-15 1:56 ` Willem de Bruijn
2017-11-15 15:00 ` John Fastabend
2017-11-13 20:08 ` [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers John Fastabend
2017-11-13 20:09 ` [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
2017-11-13 20:09 ` [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback John Fastabend
2017-11-15 0:41 ` Willem de Bruijn
2017-11-15 2:04 ` Willem de Bruijn
2017-11-15 15:11 ` John Fastabend
2017-11-15 17:51 ` Willem de Bruijn
2017-11-16 13:31 ` John Fastabend
2017-11-13 20:09 ` [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
2017-11-13 20:10 ` [RFC PATCH 08/17] net: sched: use skb list for skb_bad_tx John Fastabend
2017-11-13 20:10 ` [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check John Fastabend
2017-11-13 20:10 ` [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic John Fastabend
2017-11-15 1:16 ` Willem de Bruijn
2017-11-15 15:18 ` John Fastabend
2017-11-13 20:11 ` [RFC PATCH 11/17] net: sched: helper to sum qlen John Fastabend
2017-11-13 20:11 ` [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
2017-11-15 1:22 ` Willem de Bruijn
2017-11-13 20:11 ` [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
2017-11-13 20:12 ` [RFC PATCH 14/17] net: skb_array: expose peek API John Fastabend
2017-11-13 20:12 ` [RFC PATCH 15/17] net: sched: pfifo_fast use skb_array John Fastabend
2017-11-14 23:34 ` Willem de Bruijn
2017-11-15 14:57 ` John Fastabend
2017-11-13 20:12 ` [RFC PATCH 16/17] net: skb_array additions for unlocked consumer John Fastabend
2017-11-13 20:13 ` [RFC PATCH 17/17] net: sched: lock once per bulk dequeue John Fastabend
-- strict thread matches above, loose matches on Subject: below --
2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
2017-05-02 15:27 ` [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).