All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <dborkman@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Yuchung Cheng <ycheng@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler
Date: Wed, 04 Sep 2013 13:59:27 +0200	[thread overview]
Message-ID: <5227209F.4060708@redhat.com> (raw)
In-Reply-To: <1378294029.7360.92.camel@edumazet-glaptop>

On 09/04/2013 01:27 PM, Eric Dumazet wrote:
> On Wed, 2013-09-04 at 03:30 -0700, Eric Dumazet wrote:
>> On Wed, 2013-09-04 at 14:30 +0800, Jason Wang wrote:
>>
>>>> And tcpdump would certainly help ;)
>>>
>>> See attachment.
>>>
>>
>> Nothing obvious on tcpdump (only that lot of frames are missing)
>>
>> 1) Are you capturing part of the payload only (like tcpdump -s 128)
>>
>> 2) What is the setup.
>>
>> 3) tc -s -d qdisc
>
> If you use FQ in the guest, then it could be that high resolution timers
> have high latency ?

Probably they internally switch to a lower resolution clock event source if
there's no hardware support available:

   The [source event] management layer provides interfaces for hrtimers to
   implement high resolution timers [...] [and it] supports these more advanced
   functions only when appropriate clock event sources have been registered,
   otherwise the traditional periodic tick based behaviour is retained. [1]

[1] https://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf

> So FQ arms short timers, but effective duration could be much longer.
>
> Here I get a smooth latency of up to ~3 us
>
> lpq83:~# ./netperf -H lpq84 ; ./tc -s -d qd ; dmesg | tail -n1
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
>   87380  16384  16384    10.00    9410.82
> qdisc fq 8005: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140
>   Sent 50545633991 bytes 33385894 pkt (dropped 0, overlimits 0 requeues 19)
>   rate 9258Mbit 764335pps backlog 0b 0p requeues 19
>    117 flow, 115 inactive, 0 throttled
>    0 gc, 0 highprio, 0 retrans, 96861 throttled, 0 flows_plimit
> [  572.551664] latency = 3035 ns
>
>
> What do you get with this debugging patch ?
>
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 32ad015..c1312a0 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -103,6 +103,7 @@ struct fq_sched_data {
>   	u64		stat_internal_packets;
>   	u64		stat_tcp_retrans;
>   	u64		stat_throttled;
> +	s64		slatency;
>   	u64		stat_flows_plimit;
>   	u64		stat_pkts_too_long;
>   	u64		stat_allocation_errors;
> @@ -393,6 +394,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>   static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>   {
>   	struct rb_node *p;
> +	bool first = true;
>
>   	if (q->time_next_delayed_flow > now)
>   		return;
> @@ -405,6 +407,13 @@ static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>   			q->time_next_delayed_flow = f->time_next_packet;
>   			break;
>   		}
> +		if (first) {
> +			s64 delay = now - f->time_next_packet;
> +
> +			first = false;
> +			delay -= q->slatency >> 3;
> +			q->slatency += delay;
> +		}
>   		rb_erase(p, &q->delayed);
>   		q->throttled_flows--;
>   		fq_flow_add_tail(&q->old_flows, f);
> @@ -711,6 +720,7 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
>   	if (opts == NULL)
>   		goto nla_put_failure;
>
> +	pr_err("latency = %lld ns\n", q->slatency >> 3);
>   	if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
>   	    nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
>   	    nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) ||
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2013-09-04 11:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-29 22:49 [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler Eric Dumazet
2013-08-30  1:47 ` David Miller
2013-08-30  2:30   ` [PATCH iproute2] " Eric Dumazet
2013-09-03 15:49     ` Stephen Hemminger
2013-09-04  5:26 ` [PATCH v2 net-next] " Jason Wang
2013-09-04  5:59   ` Eric Dumazet
2013-09-04  6:30     ` Jason Wang
2013-09-04 10:30       ` Eric Dumazet
2013-09-04 11:27         ` Eric Dumazet
2013-09-04 11:59           ` Daniel Borkmann [this message]
2013-09-05  3:39             ` Jason Wang
2013-09-05  0:50           ` Eric Dumazet
2013-09-05  1:23             ` Eric Dumazet
2013-09-05  3:43             ` Jason Wang
2013-09-05  3:34           ` Jason Wang
2013-09-05  3:07         ` Jason Wang
2013-09-05  3:41           ` Eric Dumazet
2013-09-05  5:16             ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5227209F.4060708@redhat.com \
    --to=dborkman@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.