netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <dborkman@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Yuchung Cheng <ycheng@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler
Date: Wed, 04 Sep 2013 13:59:27 +0200	[thread overview]
Message-ID: <5227209F.4060708@redhat.com> (raw)
In-Reply-To: <1378294029.7360.92.camel@edumazet-glaptop>

On 09/04/2013 01:27 PM, Eric Dumazet wrote:
> On Wed, 2013-09-04 at 03:30 -0700, Eric Dumazet wrote:
>> On Wed, 2013-09-04 at 14:30 +0800, Jason Wang wrote:
>>
>>>> And tcpdump would certainly help ;)
>>>
>>> See attachment.
>>>
>>
>> Nothing obvious on tcpdump (only that lot of frames are missing)
>>
>> 1) Are you capturing part of the payload only (like tcpdump -s 128)
>>
>> 2) What is the setup.
>>
>> 3) tc -s -d qdisc
>
> If you use FQ in the guest, then it could be that high resolution timers
> have high latency ?

Probably they internally switch to a lower resolution clock event source if
there's no hardware support available:

   The [source event] management layer provides interfaces for hrtimers to
   implement high resolution timers [...] [and it] supports these more advanced
   functions only when appropriate clock event sources have been registered,
   otherwise the traditional periodic tick based behaviour is retained. [1]

[1] https://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf

> So FQ arms short timers, but effective duration could be much longer.
>
> Here I get a smooth latency of up to ~3 us
>
> lpq83:~# ./netperf -H lpq84 ; ./tc -s -d qd ; dmesg | tail -n1
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
>   87380  16384  16384    10.00    9410.82
> qdisc fq 8005: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140
>   Sent 50545633991 bytes 33385894 pkt (dropped 0, overlimits 0 requeues 19)
>   rate 9258Mbit 764335pps backlog 0b 0p requeues 19
>    117 flow, 115 inactive, 0 throttled
>    0 gc, 0 highprio, 0 retrans, 96861 throttled, 0 flows_plimit
> [  572.551664] latency = 3035 ns
>
>
> What do you get with this debugging patch ?
>
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 32ad015..c1312a0 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -103,6 +103,7 @@ struct fq_sched_data {
>   	u64		stat_internal_packets;
>   	u64		stat_tcp_retrans;
>   	u64		stat_throttled;
> +	s64		slatency;
>   	u64		stat_flows_plimit;
>   	u64		stat_pkts_too_long;
>   	u64		stat_allocation_errors;
> @@ -393,6 +394,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>   static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>   {
>   	struct rb_node *p;
> +	bool first = true;
>
>   	if (q->time_next_delayed_flow > now)
>   		return;
> @@ -405,6 +407,13 @@ static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>   			q->time_next_delayed_flow = f->time_next_packet;
>   			break;
>   		}
> +		if (first) {
> +			s64 delay = now - f->time_next_packet;
> +
> +			first = false;
> +			delay -= q->slatency >> 3;
> +			q->slatency += delay;
> +		}
>   		rb_erase(p, &q->delayed);
>   		q->throttled_flows--;
>   		fq_flow_add_tail(&q->old_flows, f);
> @@ -711,6 +720,7 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
>   	if (opts == NULL)
>   		goto nla_put_failure;
>
> +	pr_err("latency = %lld ns\n", q->slatency >> 3);
>   	if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
>   	    nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
>   	    nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) ||
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2013-09-04 11:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-29 22:49 [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler Eric Dumazet
2013-08-30  1:47 ` David Miller
2013-08-30  2:30   ` [PATCH iproute2] " Eric Dumazet
2013-09-03 15:49     ` Stephen Hemminger
2013-09-04  5:26 ` [PATCH v2 net-next] " Jason Wang
2013-09-04  5:59   ` Eric Dumazet
2013-09-04  6:30     ` Jason Wang
2013-09-04 10:30       ` Eric Dumazet
2013-09-04 11:27         ` Eric Dumazet
2013-09-04 11:59           ` Daniel Borkmann [this message]
2013-09-05  3:39             ` Jason Wang
2013-09-05  0:50           ` Eric Dumazet
2013-09-05  1:23             ` Eric Dumazet
2013-09-05  3:43             ` Jason Wang
2013-09-05  3:34           ` Jason Wang
2013-09-05  3:07         ` Jason Wang
2013-09-05  3:41           ` Eric Dumazet
2013-09-05  5:16             ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5227209F.4060708@redhat.com \
    --to=dborkman@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).