[PATCH net-next 0/3] sch_fq: segment too big GSO packets

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] sch_fq: segment too big GSO packets
@ 2014-11-25 13:24 Yang Yingliang
  2014-11-25 13:24 ` [PATCH net-next 1/3] sch_fq: add skb_is_too_big() helper Yang Yingliang
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Yang Yingliang @ 2014-11-25 13:24 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, davem

As the TODO says: "maybe segment the too big skb, as in commit
e43ac79a4bc ("sch_tbf: segment too big GSO packets")" in fq_dequeue(),
this patchset segment the GSO packets that are too big.

Sometimes a GSO packet is too big at a low rate. This patchset check
the packet before it's enqueued, if the GSO packet cost more than 125ms
to send, it will be segmented, then enqueue the segments one by one.
Because of the segment, the qlen may be bigger than limit in some condition.
My way is that let the packet in if qlen is smaller than limit before
it's segmented.

Test way:
 Step 1.
 # tc qdisc add dev eth4 root handle 1: htb default 1
 # tc class add dev eth4 parent 1:0 classid 1:1 htb rate 100mbit ceil 100mbit
 # tc qdisc add dev eth4 parent 1:1 handle 11: fq maxrate 180kbit quantum 1000

 Step 2. use iperf to send packets

 Step 3. # tc -s -d qdisc show dev eth4
 There is no too long packets and there are some packets to send in queue:

qdisc htb 1: root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 direct_qlen 1000
 Sent 1273136 bytes 1158 pkt (dropped 0, overlimits 1544 requeues 0) 
 backlog 0b 54p requeues 0 
qdisc fq 11: parent 1:1 limit 10000p flow_limit 100p buckets 1024 quantum 1000 initial_quantum 15140 maxrate 180Kbit 
 Sent 1273136 bytes 1158 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 81756b 54p requeues 0 
  4 flows (3 inactive, 1 throttled), next packet delay 2015043 ns
  0 gc, 2 highprio, 792 throttled

 Several seconds later after stopping iperf, show qdisc again, all packets has been sent:

qdisc htb 1: root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 direct_qlen 1000
 Sent 2800098 bytes 2359 pkt (dropped 0, overlimits 3166 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq 11: parent 1:1 limit 10000p flow_limit 100p buckets 1024 quantum 1000 initial_quantum 15140 maxrate 180Kbit 
 Sent 2800098 bytes 2359 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  5 flows (3 inactive, 0 throttled)
  0 gc, 6 highprio, 1782 throttled

 Step 4. Without this patchset, do step1-2, then show the qdisc,
 there are some too long packets:

qdisc htb 1: root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 direct_qlen 1000
 Sent 1897859 bytes 2674 pkt (dropped 0, overlimits 1362 requeues 0) 
 backlog 0b 36p requeues 0 
qdisc fq 11: parent 1:1 limit 10000p flow_limit 100p buckets 1024 quantum 1000 initial_quantum 15140 maxrate 180Kbit 
 Sent 1897859 bytes 2674 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 109008b 36p requeues 0 
  9 flows (8 inactive, 1 throttled), next packet delay 21671844 ns
  0 gc, 82 highprio, 568 throttled
  513 _too long pkts_, 0 alloc errors

Yang Yingliang (3):
  sch_fq: add skb_is_too_big() helper
  sch_fq: add __fq_enqueue() helper
  sch_fq: segment too big GSO packets

 net/sched/sch_fq.c | 112 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 77 insertions(+), 35 deletions(-)

-- 
1.8.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/3] sch_fq: add skb_is_too_big() helper
  2014-11-25 13:24 [PATCH net-next 0/3] sch_fq: segment too big GSO packets Yang Yingliang
@ 2014-11-25 13:24 ` Yang Yingliang
  2014-11-25 13:24 ` [PATCH net-next 2/3] sch_fq: add __fq_enqueue() helper Yang Yingliang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Yang Yingliang @ 2014-11-25 13:24 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, davem

Add skb_is_too_big() helper to get the cost time of sending packets
and check if it is more than 125ms.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 net/sched/sch_fq.c | 57 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index cbd7e1f..36a22e0 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -304,6 +304,37 @@ static bool skb_is_retransmit(struct sk_buff *skb)
 	return false;
 }
 
+static bool skb_is_too_big(struct sk_buff *skb, struct fq_sched_data *q, u64 *cost_time)
+{
+	u32 rate = q->flow_max_rate;
+
+	if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT)
+		rate = min(skb->sk->sk_pacing_rate, q->flow_max_rate);
+
+	if (cost_time)
+		*cost_time = 0;
+
+	if (rate != ~0U) {
+		u32 plen = max(qdisc_pkt_len(skb), q->quantum);
+		u64 len = (u64)plen * NSEC_PER_SEC;
+
+		if (likely(rate))
+			do_div(len, rate);
+		if (cost_time)
+			*cost_time = len;
+		/* Since socket rate can change later,
+		 * clamp the delay to 125 ms.
+		 */
+		if (unlikely(len > 125 * NSEC_PER_MSEC)) {
+			if (cost_time)
+				*cost_time = 125 * NSEC_PER_MSEC;
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /* add skb to flow queue
  * flow queue is a linked list, kind of FIFO, except for TCP retransmits
  * We special case tcp retransmits to be transmitted before other packets.
@@ -418,7 +449,7 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch)
 	struct fq_flow_head *head;
 	struct sk_buff *skb;
 	struct fq_flow *f;
-	u32 rate;
+	u64 cost_time;
 
 	skb = fq_dequeue_head(sch, &q->internal);
 	if (skb)
@@ -470,28 +501,10 @@ begin:
 	if (f->credit > 0 || !q->rate_enable)
 		goto out;
 
-	rate = q->flow_max_rate;
-	if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT)
-		rate = min(skb->sk->sk_pacing_rate, rate);
+	if (skb_is_too_big(skb, q, &cost_time))
+		q->stat_pkts_too_long++;
+	f->time_next_packet = now + cost_time;
 
-	if (rate != ~0U) {
-		u32 plen = max(qdisc_pkt_len(skb), q->quantum);
-		u64 len = (u64)plen * NSEC_PER_SEC;
-
-		if (likely(rate))
-			do_div(len, rate);
-		/* Since socket rate can change later,
-		 * clamp the delay to 125 ms.
-		 * TODO: maybe segment the too big skb, as in commit
-		 * e43ac79a4bc ("sch_tbf: segment too big GSO packets")
-		 */
-		if (unlikely(len > 125 * NSEC_PER_MSEC)) {
-			len = 125 * NSEC_PER_MSEC;
-			q->stat_pkts_too_long++;
-		}
-
-		f->time_next_packet = now + len;
-	}
 out:
 	qdisc_bstats_update(sch, skb);
 	return skb;
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/3] sch_fq: add __fq_enqueue() helper
  2014-11-25 13:24 [PATCH net-next 0/3] sch_fq: segment too big GSO packets Yang Yingliang
  2014-11-25 13:24 ` [PATCH net-next 1/3] sch_fq: add skb_is_too_big() helper Yang Yingliang
@ 2014-11-25 13:24 ` Yang Yingliang
  2014-11-25 13:24 ` [PATCH net-next 3/3] sch_fq: segment too big GSO packets Yang Yingliang
  2014-11-25 16:31 ` [PATCH net-next 0/3] " Eric Dumazet
  3 siblings, 0 replies; 8+ messages in thread
From: Yang Yingliang @ 2014-11-25 13:24 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, davem

Move the original enqueue code to __fq_enqueue() helper.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 net/sched/sch_fq.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 36a22e0..ec6eff8 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -384,21 +384,9 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)
 		prev->next = skb;
 	}
 }
-
-static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+static int __fq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct fq_flow *f)
 {
 	struct fq_sched_data *q = qdisc_priv(sch);
-	struct fq_flow *f;
-
-	if (unlikely(sch->q.qlen >= sch->limit))
-		return qdisc_drop(skb, sch);
-
-	f = fq_classify(skb, q);
-	if (unlikely(f->qlen >= q->flow_plimit && f != &q->internal)) {
-		q->stat_flows_plimit++;
-		return qdisc_drop(skb, sch);
-	}
-
 	f->qlen++;
 	if (skb_is_retransmit(skb))
 		q->stat_tcp_retrans++;
@@ -421,6 +409,23 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	return NET_XMIT_SUCCESS;
 }
 
+static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct fq_sched_data *q = qdisc_priv(sch);
+	struct fq_flow *f;
+
+	if (unlikely(sch->q.qlen >= sch->limit))
+		return qdisc_drop(skb, sch);
+
+	f = fq_classify(skb, q);
+	if (unlikely(f->qlen >= q->flow_plimit && f != &q->internal)) {
+		q->stat_flows_plimit++;
+		return qdisc_drop(skb, sch);
+	}
+
+	return __fq_enqueue(skb, sch, f);
+}
+
 static void fq_check_throttled(struct fq_sched_data *q, u64 now)
 {
 	struct rb_node *p;
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 3/3] sch_fq: segment too big GSO packets
  2014-11-25 13:24 [PATCH net-next 0/3] sch_fq: segment too big GSO packets Yang Yingliang
  2014-11-25 13:24 ` [PATCH net-next 1/3] sch_fq: add skb_is_too_big() helper Yang Yingliang
  2014-11-25 13:24 ` [PATCH net-next 2/3] sch_fq: add __fq_enqueue() helper Yang Yingliang
@ 2014-11-25 13:24 ` Yang Yingliang
  2014-11-25 14:53   ` Eric Dumazet
  2014-11-25 16:31 ` [PATCH net-next 0/3] " Eric Dumazet
  3 siblings, 1 reply; 8+ messages in thread
From: Yang Yingliang @ 2014-11-25 13:24 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, davem

If a GSO packet cost more than 125ms to send, segment the
packet and send individual segments.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 net/sched/sch_fq.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index ec6eff8..0119340 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -423,6 +423,30 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return qdisc_drop(skb, sch);
 	}
 
+	if (skb_is_gso(skb) && skb_is_too_big(skb, q, NULL)) {
+		struct sk_buff *segs, *nskb;
+		netdev_features_t features = netif_skb_features(skb);
+		int nb = 0;
+
+		segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+		if (IS_ERR_OR_NULL(segs))
+			goto enqueue_out;
+
+		while (segs) {
+			nskb = segs->next;
+			qdisc_skb_cb(segs)->pkt_len = segs->len;
+			__fq_enqueue(segs, sch, f);
+			nb++;
+			segs = nskb;
+		}
+		if (nb > 1)
+			qdisc_tree_decrease_qlen(sch, 1 - nb);
+		consume_skb(skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+enqueue_out:
 	return __fq_enqueue(skb, sch, f);
 }
 
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 3/3] sch_fq: segment too big GSO packets
  2014-11-25 13:24 ` [PATCH net-next 3/3] sch_fq: segment too big GSO packets Yang Yingliang
@ 2014-11-25 14:53   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2014-11-25 14:53 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem

On Tue, 2014-11-25 at 21:24 +0800, Yang Yingliang wrote:
> If a GSO packet cost more than 125ms to send, segment the
> packet and send individual segments.
> 
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  net/sched/sch_fq.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index ec6eff8..0119340 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -423,6 +423,30 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  		return qdisc_drop(skb, sch);
>  	}
>  

I am opposed to this patch serie.

FQ is used at 40Gb speeds at Google, we can not afford doing such copies
in FQ at all.

If you believe packets are too big at this point, then you should fix
the provider of such packets.

Sorry.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] sch_fq: segment too big GSO packets
  2014-11-25 13:24 [PATCH net-next 0/3] sch_fq: segment too big GSO packets Yang Yingliang
                   ` (2 preceding siblings ...)
  2014-11-25 13:24 ` [PATCH net-next 3/3] sch_fq: segment too big GSO packets Yang Yingliang
@ 2014-11-25 16:31 ` Eric Dumazet
  2014-11-25 16:57   ` [PATCH net-next] pkt_sched: fq: increase max delay from 125 ms to one second Eric Dumazet
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-11-25 16:31 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem

On Tue, 2014-11-25 at 21:24 +0800, Yang Yingliang wrote:
> As the TODO says: "maybe segment the too big skb, as in commit
> e43ac79a4bc ("sch_tbf: segment too big GSO packets")" in fq_dequeue(),
> this patchset segment the GSO packets that are too big.
> 
> Sometimes a GSO packet is too big at a low rate. This patchset check
> the packet before it's enqueued, if the GSO packet cost more than 125ms
> to send, it will be segmented, then enqueue the segments one by one.
> Because of the segment, the qlen may be bigger than limit in some condition.
> My way is that let the packet in if qlen is smaller than limit before
> it's segmented.

We extended the 125ms value to 750ms, back in July.

I will upstream this change, now it gave us good experimental results.

There is no point segmenting packets with 2 MSS, as fq quantum is 2 MSS
by default.

Your patches are the wrong way to handle this.

We have SO_MAX_PACING_RATE, and try to cook GSO packets of the right
size, instead of segmenting them later.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next] pkt_sched: fq: increase max delay from 125 ms to one second
  2014-11-25 16:31 ` [PATCH net-next 0/3] " Eric Dumazet
@ 2014-11-25 16:57   ` Eric Dumazet
  2014-11-26 17:08     ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-11-25 16:57 UTC (permalink / raw)
  To: Yang Yingliang, David Miller; +Cc: netdev, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.

It turns out this delay is too small to allow pacing low rates :
Some ISP setup very aggressive policers as low as 16kbit.
    
Now TCP stack has spurious rtx prevention, it seems safe to increase
this fixed parameter, without adding a qdisc attribute.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
---
This is something we run since July 2014 with good results, I forgot to
upstream this change. Thanks !

Notes for Googlers :

 Google-Bug-Id: 9297267
 Google-Bug-Id: 11789651
 Change-Id: I081d7d842bfb860a4cd620bf1c71a8eb239150c8

 net/sched/sch_fq.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index cbd7e1fd23b41bb7ba0bb348c3a1a287652cca93..9b05924cc386ecc2cdb9816be27e439637fb37b3 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -481,12 +481,11 @@ begin:
 		if (likely(rate))
 			do_div(len, rate);
 		/* Since socket rate can change later,
-		 * clamp the delay to 125 ms.
-		 * TODO: maybe segment the too big skb, as in commit
-		 * e43ac79a4bc ("sch_tbf: segment too big GSO packets")
+		 * clamp the delay to 1 second.
+		 * Really, providers of too big packets should be fixed !
 		 */
-		if (unlikely(len > 125 * NSEC_PER_MSEC)) {
-			len = 125 * NSEC_PER_MSEC;
+		if (unlikely(len > NSEC_PER_SEC)) {
+			len = NSEC_PER_SEC;
 			q->stat_pkts_too_long++;
 		}
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] pkt_sched: fq: increase max delay from 125 ms to one second
  2014-11-25 16:57   ` [PATCH net-next] pkt_sched: fq: increase max delay from 125 ms to one second Eric Dumazet
@ 2014-11-26 17:08     ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-11-26 17:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: yangyingliang, netdev, ncardwell

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 25 Nov 2014 08:57:29 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.
> 
> It turns out this delay is too small to allow pacing low rates :
> Some ISP setup very aggressive policers as low as 16kbit.
>     
> Now TCP stack has spurious rtx prevention, it seems safe to increase
> this fixed parameter, without adding a qdisc attribute.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Yang Yingliang <yangyingliang@huawei.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-26 17:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-25 13:24 [PATCH net-next 0/3] sch_fq: segment too big GSO packets Yang Yingliang
2014-11-25 13:24 ` [PATCH net-next 1/3] sch_fq: add skb_is_too_big() helper Yang Yingliang
2014-11-25 13:24 ` [PATCH net-next 2/3] sch_fq: add __fq_enqueue() helper Yang Yingliang
2014-11-25 13:24 ` [PATCH net-next 3/3] sch_fq: segment too big GSO packets Yang Yingliang
2014-11-25 14:53   ` Eric Dumazet
2014-11-25 16:31 ` [PATCH net-next 0/3] " Eric Dumazet
2014-11-25 16:57   ` [PATCH net-next] pkt_sched: fq: increase max delay from 125 ms to one second Eric Dumazet
2014-11-26 17:08     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).