* [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows
@ 2015-02-02 18:59 Kenneth Klette Jonassen
2015-02-02 18:59 ` [PATCH net-next 2/2] pkt_sched: fq: remove redundant flow credit refill Kenneth Klette Jonassen
2015-02-02 19:24 ` [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows Eric Dumazet
0 siblings, 2 replies; 5+ messages in thread
From: Kenneth Klette Jonassen @ 2015-02-02 18:59 UTC (permalink / raw)
To: netdev; +Cc: Kenneth Klette Jonassen
Current pacing behavior always throttle flows for a time equal to one full
quantum, starting at the instance in time when a flow depletes its credit.
This is optimal for burst sizes that are a multiple of the chosen quantum.
For flows with many small and evenly clocked packets, the depletion and
refilling of credits cause packets to queue and transmit in bursts, even
when their clocked rate is below the pacing rate. With TCP ACKs, this
artificial queueing induces significant noise to RTTs, e.g. up to 2.07 ms
for rtt 20 ms, cwnd 10 and quantum 3028.
Packetdrill script to illustrate bursts:
0.000 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 connect(3, ..., ...) = 0
// SO_MAX_PACING_RATE: 2500 Bps, 100 ms per quantum, 20 ms per 50B packet.
0.000 setsockopt(3, SOL_SOCKET, 47, [2500], 4) = 0
0.000 `tc qdisc add dev tun0 root fq initial_quantum 250 quantum 250`
// Use 200 credits: send four perfectly spaced 50 byte packets.
0.000 write(3, ..., 22) = 22
0.000 > udp (22)
0.020 write(3, ..., 22) = 22
0.020 > udp (22)
0.040 write(3, ..., 22) = 22
0.040 > udp (22)
0.060 write(3, ..., 22) = 22
0.060 > udp (22)
// Send five perfectly spaced packets. The first credits are depleted at
// 1.000, and the remaining four packets are sent in a burst at 1.100.
// Packets are sent at their intended times when this patch is applied.
1.000 write(3, ..., 22) = 22
1.000 > udp (22)
1.020 write(3, ..., 22) = 22
1.040 write(3, ..., 22) = 22
1.060 write(3, ..., 22) = 22
1.080 write(3, ..., 22) = 22
1.100 > udp (22)
1.100 > udp (22)
1.100 > udp (22)
1.100 > udp (22)
Keep track of when a flows credit was last filled, and use this to
approximate a credit refill for each quantum of time that passes.
Increases memory footprint from 104 to 112 bytes per flow.
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
---
net/sched/sch_fq.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 2a50f5c..6f0c45e 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -71,6 +71,7 @@ struct fq_flow {
struct rb_node rate_node; /* anchor in q->delayed tree */
u64 time_next_packet;
+ u64 time_credit_filled;
};
struct fq_flow_head {
@@ -250,6 +251,7 @@ static struct fq_flow *fq_classify(struct sk_buff *skb, struct fq_sched_data *q)
if (unlikely(skb->sk &&
f->socket_hash != sk->sk_hash)) {
f->credit = q->initial_quantum;
+ f->time_credit_filled = ktime_get_ns();
f->socket_hash = sk->sk_hash;
f->time_next_packet = 0ULL;
}
@@ -271,6 +273,7 @@ static struct fq_flow *fq_classify(struct sk_buff *skb, struct fq_sched_data *q)
if (skb->sk)
f->socket_hash = sk->sk_hash;
f->credit = q->initial_quantum;
+ f->time_credit_filled = ktime_get_ns();
rb_link_node(&f->fq_node, parent, p);
rb_insert_color(&f->fq_node, root);
@@ -374,8 +377,10 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
qdisc_qstats_backlog_inc(sch, skb);
if (fq_flow_is_detached(f)) {
fq_flow_add_tail(&q->new_flows, f);
- if (time_after(jiffies, f->age + q->flow_refill_delay))
+ if (time_after(jiffies, f->age + q->flow_refill_delay)) {
f->credit = max_t(u32, f->credit, q->quantum);
+ f->time_credit_filled = ktime_get_ns();
+ }
q->inactive_flows--;
}
@@ -440,6 +445,7 @@ begin:
if (f->credit <= 0) {
f->credit += q->quantum;
+ f->time_credit_filled = max(now, f->time_next_packet);
head->first = f->next;
fq_flow_add_tail(&q->old_flows, f);
goto begin;
@@ -489,7 +495,10 @@ begin:
q->stat_pkts_too_long++;
}
- f->time_next_packet = now + len;
+ /* If now < time_next_packet, throttles flow for a time equal
+ * to one quantum (len) after current credits were filled.
+ */
+ f->time_next_packet = f->time_credit_filled + len;
}
out:
qdisc_bstats_update(sch, skb);
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next 2/2] pkt_sched: fq: remove redundant flow credit refill
2015-02-02 18:59 [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows Kenneth Klette Jonassen
@ 2015-02-02 18:59 ` Kenneth Klette Jonassen
2015-02-02 19:24 ` [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows Eric Dumazet
1 sibling, 0 replies; 5+ messages in thread
From: Kenneth Klette Jonassen @ 2015-02-02 18:59 UTC (permalink / raw)
To: netdev; +Cc: Kenneth Klette Jonassen
Current behavior explicitly refills flow credit after idle. But following
the first patch in this series, regular refill no longer throttles a flow
if idle_time >= quantum_time.
Remove redundant refill, and warn possible users of the refill delay knob.
Updates f52ed89971ad ("pkt_sched: fq: fix pacing for small frames").
Inspired by 65c5189a2b57 ("pkt_sched: fq: warn users using defrate").
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
---
include/uapi/linux/pkt_sched.h | 2 +-
net/sched/sch_fq.c | 20 ++++++--------------
2 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index d62316b..5a9afb4 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -772,7 +772,7 @@ enum {
TCA_FQ_BUCKETS_LOG, /* log2(number of buckets) */
- TCA_FQ_FLOW_REFILL_DELAY, /* flow credit refill delay in usec */
+ TCA_FQ_FLOW_REFILL_DELAY, /* obsolete, do not use */
__TCA_FQ_MAX
};
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 6f0c45e..81695ac 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -90,7 +90,6 @@ struct fq_sched_data {
struct fq_flow internal; /* for non classified or high prio packets */
u32 quantum;
u32 initial_quantum;
- u32 flow_refill_delay;
u32 flow_max_rate; /* optional max rate per flow */
u32 flow_plimit; /* max packets per flow */
struct rb_root *fq_root;
@@ -377,10 +376,6 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
qdisc_qstats_backlog_inc(sch, skb);
if (fq_flow_is_detached(f)) {
fq_flow_add_tail(&q->new_flows, f);
- if (time_after(jiffies, f->age + q->flow_refill_delay)) {
- f->credit = max_t(u32, f->credit, q->quantum);
- f->time_credit_filled = ktime_get_ns();
- }
q->inactive_flows--;
}
@@ -701,11 +696,9 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt)
err = -EINVAL;
}
- if (tb[TCA_FQ_FLOW_REFILL_DELAY]) {
- u32 usecs_delay = nla_get_u32(tb[TCA_FQ_FLOW_REFILL_DELAY]) ;
-
- q->flow_refill_delay = usecs_to_jiffies(usecs_delay);
- }
+ if (tb[TCA_FQ_FLOW_REFILL_DELAY])
+ pr_warn_ratelimited("sch_fq: refill delay %u ignored.\n",
+ nla_get_u32(tb[TCA_FQ_FLOW_REFILL_DELAY]));
if (!err) {
sch_tree_unlock(sch);
@@ -744,7 +737,6 @@ static int fq_init(struct Qdisc *sch, struct nlattr *opt)
q->flow_plimit = 100;
q->quantum = 2 * psched_mtu(qdisc_dev(sch));
q->initial_quantum = 10 * psched_mtu(qdisc_dev(sch));
- q->flow_refill_delay = msecs_to_jiffies(40);
q->flow_max_rate = ~0U;
q->rate_enable = 1;
q->new_flows.first = NULL;
@@ -771,7 +763,9 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
if (opts == NULL)
goto nla_put_failure;
- /* TCA_FQ_FLOW_DEFAULT_RATE is not used anymore */
+ /* TCA_FQ_FLOW_DEFAULT_RATE and TCA_FQ_FLOW_REFILL_DELAY
+ * is not used anymore.
+ */
if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
@@ -779,8 +773,6 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM, q->initial_quantum) ||
nla_put_u32(skb, TCA_FQ_RATE_ENABLE, q->rate_enable) ||
nla_put_u32(skb, TCA_FQ_FLOW_MAX_RATE, q->flow_max_rate) ||
- nla_put_u32(skb, TCA_FQ_FLOW_REFILL_DELAY,
- jiffies_to_usecs(q->flow_refill_delay)) ||
nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log))
goto nla_put_failure;
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows
2015-02-02 18:59 [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows Kenneth Klette Jonassen
2015-02-02 18:59 ` [PATCH net-next 2/2] pkt_sched: fq: remove redundant flow credit refill Kenneth Klette Jonassen
@ 2015-02-02 19:24 ` Eric Dumazet
2015-02-03 19:51 ` Kenneth Klette Jonassen
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2015-02-02 19:24 UTC (permalink / raw)
To: Kenneth Klette Jonassen; +Cc: netdev
On Mon, 2015-02-02 at 19:59 +0100, Kenneth Klette Jonassen wrote:
> Current pacing behavior always throttle flows for a time equal to one full
> quantum, starting at the instance in time when a flow depletes its credit.
> This is optimal for burst sizes that are a multiple of the chosen quantum.
>
> For flows with many small and evenly clocked packets, the depletion and
> refilling of credits cause packets to queue and transmit in bursts, even
> when their clocked rate is below the pacing rate. With TCP ACKs, this
> artificial queueing induces significant noise to RTTs, e.g. up to 2.07 ms
> for rtt 20 ms, cwnd 10 and quantum 3028.
>
> Packetdrill script to illustrate bursts:
> 0.000 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 3
> 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 0.000 bind(3, ..., ...) = 0
> 0.000 connect(3, ..., ...) = 0
>
> // SO_MAX_PACING_RATE: 2500 Bps, 100 ms per quantum, 20 ms per 50B packet.
> 0.000 setsockopt(3, SOL_SOCKET, 47, [2500], 4) = 0
> 0.000 `tc qdisc add dev tun0 root fq initial_quantum 250 quantum 250`
>
> // Use 200 credits: send four perfectly spaced 50 byte packets.
> 0.000 write(3, ..., 22) = 22
> 0.000 > udp (22)
> 0.020 write(3, ..., 22) = 22
> 0.020 > udp (22)
> 0.040 write(3, ..., 22) = 22
> 0.040 > udp (22)
> 0.060 write(3, ..., 22) = 22
> 0.060 > udp (22)
We do not want to perfectly space packets, but have an efficient packet
scheduler, allowing TCP pacing.
I chose to not use ktime_get() in enqueue() when I wrote sch_fq.
A Token Bucket Filter has the notion of quantum, meaning you configure
the granularity given this quantum.
At Google, we have a special handling for TCP ACK packets, so that they
do not interfere with FQ/pacing.
ACK packets are not paced, ever.
This patch also allows skb->ooo_okay being set even if the DATA packet
immediately follows a train of ACK packets (as in typical RPC patterns)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows
2015-02-02 19:24 ` [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows Eric Dumazet
@ 2015-02-03 19:51 ` Kenneth Klette Jonassen
2015-02-03 22:05 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Kenneth Klette Jonassen @ 2015-02-03 19:51 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev@vger.kernel.org
> On 02 Feb 2015, at 20:24, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> On Mon, 2015-02-02 at 19:59 +0100, Kenneth Klette Jonassen wrote:
>> Current pacing behavior always throttle flows for a time equal to one full
>> quantum, starting at the instance in time when a flow depletes its credit.
>> …
>
>
> We do not want to perfectly space packets, but have an efficient packet
> scheduler, allowing TCP pacing.
>
> I chose to not use ktime_get() in enqueue() when I wrote sch_fq.
Posted V2 to address this.
>
> A Token Bucket Filter has the notion of quantum, meaning you configure
> the granularity given this quantum.
>
> At Google, we have a special handling for TCP ACK packets, so that they
> do not interfere with FQ/pacing.
Interesting. How does this work when ACKs are piggybacked to data?
>
> ACK packets are not paced, ever.
>
> This patch also allows skb->ooo_okay being set even if the DATA packet
> immediately follows a train of ACK packets (as in typical RPC patterns)
>
Could you please expand on this?
Thanks for the feedback so far.
Sincerely,
Kenneth
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows
2015-02-03 19:51 ` Kenneth Klette Jonassen
@ 2015-02-03 22:05 ` Eric Dumazet
0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2015-02-03 22:05 UTC (permalink / raw)
To: Kenneth Klette Jonassen; +Cc: netdev@vger.kernel.org
On Tue, 2015-02-03 at 19:51 +0000, Kenneth Klette Jonassen wrote:
> >
> > ACK packets are not paced, ever.
> >
> > This patch also allows skb->ooo_okay being set even if the DATA packet
> > immediately follows a train of ACK packets (as in typical RPC patterns)
> >
> Could you please expand on this?
>
If you followed netdev recently, you might have noticed I am currently
upstreaming patches that we tested more than 6 months at Google.
I am doing this in a non bursty way to ease David Miller work ;)
A hint is given in commit b2532eb9abd88384a
("tcp: fix ooo_okay setting vs Small Queues")
* TODO: Ideally, in-flight pure ACK packets should not matter here.
* One way to get this would be to set skb->truesize = 2 on them.
Thanks
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-02-03 22:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-02 18:59 [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows Kenneth Klette Jonassen
2015-02-02 18:59 ` [PATCH net-next 2/2] pkt_sched: fq: remove redundant flow credit refill Kenneth Klette Jonassen
2015-02-02 19:24 ` [PATCH net-next 1/2] pkt_sched: fq: avoid artificial bursts for clocked flows Eric Dumazet
2015-02-03 19:51 ` Kenneth Klette Jonassen
2015-02-03 22:05 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).