qdisc_watchdog_schedule_range

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* qdisc_watchdog_schedule_range_ns granularity
@ 2022-10-12 21:26 Thorsten Glaser
  2022-10-12 22:36 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Thorsten Glaser @ 2022-10-12 21:26 UTC (permalink / raw)
  To: netdev

Hi again,

next thing ☺

For my “faked extra latency” I sometimes need to reschedule to
future, when all queued-up packets have receive timestamps in
the future. For this, I have been using:

	qdisc_watchdog_schedule_range_ns(&q->watchdog, rs, 0);

Where rs is the smallest in-the-future enqueue timestamp.

However it was observed that this can add quite a lot more extra
delay than planned, I saw single-digit millisecond figures, which
IMHO is already a lot, but a coworker saw around 17 ms, which is
definitely too much.

What is the granularity of qdisc watchdogs, and how can I aim at
being called again for dequeueing in more precise fashion? I would
prefer to being called within 1 ms, 2 if it must absolutely be, of
the timestamp passed.

Thanks in advance,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

                        ****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against      Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also,     https://www.tarent.de/newsletter
╱ ╲ header encryption!
                        ****************************************************

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: qdisc_watchdog_schedule_range_ns granularity
  2022-10-12 21:26 qdisc_watchdog_schedule_range_ns granularity Thorsten Glaser
@ 2022-10-12 22:36 ` Eric Dumazet
  2022-10-12 23:15   ` Thorsten Glaser
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2022-10-12 22:36 UTC (permalink / raw)
  To: Thorsten Glaser, netdev


On 10/12/22 14:26, Thorsten Glaser wrote:
> Hi again,
>
> next thing ☺
>
> For my “faked extra latency” I sometimes need to reschedule to
> future, when all queued-up packets have receive timestamps in
> the future. For this, I have been using:
>
> 	qdisc_watchdog_schedule_range_ns(&q->watchdog, rs, 0);
>
> Where rs is the smallest in-the-future enqueue timestamp.
>
> However it was observed that this can add quite a lot more extra
> delay than planned, I saw single-digit millisecond figures, which
> IMHO is already a lot, but a coworker saw around 17 ms, which is
> definitely too much.

Make sure your .config has

CONFIG_HIGH_RES_TIMERS=y

I don't know how you measure this latency, but net/sched/sch_fq.c has 
instrumentation,

and following command on a random host in my lab shows an average (EWMA) 
latency

smaller than 29 usec, (32 TX queues on the NIC)

tc -s -d qd sh dev eth1 | grep latency
   gc 194315 highprio 0 throttled 902 latency 10.7us
   gc 196277 highprio 0 throttled 156 latency 11.8us
   gc 84107 highprio 0 throttled 286 latency 13.7us
   gc 19408 highprio 0 throttled 324 latency 10.9us
   gc 309405 highprio 0 throttled 370 latency 11.1us
   gc 147821 highprio 0 throttled 154 latency 12.2us
   gc 84768 highprio 0 throttled 2859 latency 10.7us
   gc 181833 highprio 0 throttled 4311 latency 12.9us
   gc 117038 highprio 0 throttled 1127 latency 11.1us
   gc 168430 highprio 0 throttled 1784 latency 22.1us
   gc 71086 highprio 0 throttled 2339 latency 14.3us
   gc 127584 highprio 0 throttled 1396 latency 11.5us
   gc 96239 highprio 0 throttled 297 latency 16.9us
   gc 96490 highprio 0 throttled 6374 latency 11.3us
   gc 117284 highprio 0 throttled 2011 latency 11.5us
   gc 122355 highprio 0 throttled 303 latency 12.8us
   gc 221196 highprio 0 throttled 330 latency 11.3us
   gc 204193 highprio 0 throttled 121 latency 12us
   gc 177423 highprio 0 throttled 1012 latency 11.9us
   gc 70236 highprio 0 throttled 1015 latency 15us
   gc 166721 highprio 0 throttled 488 latency 11.9us
   gc 92794 highprio 0 throttled 963 latency 17.1us
   gc 229031 highprio 0 throttled 274 latency 12.2us
   gc 109511 highprio 0 throttled 234 latency 10.5us
   gc 89160 highprio 0 throttled 729 latency 10.7us
   gc 182940 highprio 0 throttled 234 latency 11.7us
   gc 172111 highprio 0 throttled 2439 latency 11.4us
   gc 101261 highprio 0 throttled 2614 latency 11.6us
   gc 95759 highprio 0 throttled 336 latency 11.3us
   gc 103392 highprio 0 throttled 2990 latency 11.2us
   gc 173068 highprio 0 throttled 955 latency 16.5us
   gc 97893 highprio 0 throttled 748 latency 11.7us

Note that after the timer fires, a TX softirq is scheduled (to send more 
packets from qdisc -> NIC)

Under high cpu pressure, it is possible the softirq is delayed,

because ksoftirqd might compete with user threads.


>
> What is the granularity of qdisc watchdogs, and how can I aim at
> being called again for dequeueing in more precise fashion? I would
> prefer to being called within 1 ms, 2 if it must absolutely be, of
> the timestamp passed.
>
> Thanks in advance,
> //mirabilos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: qdisc_watchdog_schedule_range_ns granularity
  2022-10-12 22:36 ` Eric Dumazet
@ 2022-10-12 23:15   ` Thorsten Glaser
  2022-10-12 23:32     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Thorsten Glaser @ 2022-10-12 23:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Wed, 12 Oct 2022, Eric Dumazet wrote:

> CONFIG_HIGH_RES_TIMERS=y

It does.

> I don't know how you measure this latency, but net/sched/sch_fq.c has
> instrumentation,

On enqueue I add now+extradelay and save that as enqueue timestamp.
On dequeue I check that now>=timestamp then process the packet,
measuring now-timestamp as queue delay. This is surprisingly higher.

I’ll add some printks as well, to see when I’m called next after
such a watchdog scheduling.

> Under high cpu pressure, it is possible the softirq is delayed,
> because ksoftirqd might compete with user threads.

Is it a good idea to renice these?

Thanks,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

                        ****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against      Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also,     https://www.tarent.de/newsletter
╱ ╲ header encryption!
                        ****************************************************

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: qdisc_watchdog_schedule_range_ns granularity
  2022-10-12 23:15   ` Thorsten Glaser
@ 2022-10-12 23:32     ` Eric Dumazet
  2022-10-22 22:22       ` Thorsten Glaser
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2022-10-12 23:32 UTC (permalink / raw)
  To: Thorsten Glaser; +Cc: netdev


On 10/12/22 16:15, Thorsten Glaser wrote:
> On Wed, 12 Oct 2022, Eric Dumazet wrote:
>
>> CONFIG_HIGH_RES_TIMERS=y
> It does.
>
>> I don't know how you measure this latency, but net/sched/sch_fq.c has
>> instrumentation,
> On enqueue I add now+extradelay and save that as enqueue timestamp.
> On dequeue I check that now>=timestamp then process the packet,
> measuring now-timestamp as queue delay. This is surprisingly higher.

When packets are eligible, the qdisc itself can be stopped if NIC is 
full (or BQL triggers)

net/sched/sch_fq.c is not using the skb tstamp which could very well be 
in the past,

but an internal variable (q->time_next_delayed_flow)


>
> I’ll add some printks as well, to see when I’m called next after
> such a watchdog scheduling.
>
>> Under high cpu pressure, it is possible the softirq is delayed,
>> because ksoftirqd might compete with user threads.
> Is it a good idea to renice these?


Depends if you want to starve user thread(s) under flood/attack, I guess 
you can try.


>
> Thanks,
> //mirabilos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: qdisc_watchdog_schedule_range_ns granularity
  2022-10-12 23:32     ` Eric Dumazet
@ 2022-10-22 22:22       ` Thorsten Glaser
  0 siblings, 0 replies; 5+ messages in thread
From: Thorsten Glaser @ 2022-10-22 22:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Wed, 12 Oct 2022, Eric Dumazet wrote:

> net/sched/sch_fq.c is not using the skb tstamp which could very well be in the
> past,

For the enqueue timestamp I now save u64 ktime_get_ns() in the per-skb
extra data, so I have a reliable one.

> > > I don't know how you measure this latency, but net/sched/sch_fq.c has

I’ve now added some measurements. I use the qdisc watchdog in two
scenarios:

• extralatency, and the enqueue timestamp is > now (ktime_get_ns())
• independent of extralatency, if the time from the previous package
  (size * ns_per_byte) is not yet elapsed

The latter was not so problematic, because, even if I’m called “too
late” by the qdisc watchdog, I measure the time for the _next_ packet
since the time from the previous packet, not now, unless there was
no previous packet. These are also enqueued with an 1ms tolerance.

The former is problematic as it relies on being called somewhat
precisely once now has reached the enqueue timestamp of the head
of the FIFO.

In my measurement, I report detailled information on whenever my
dequeue function is called (which of the two watchdog causes it
was, if any, and the difference from the time I was supposed to
be called to now), but I also count them into five buckets:

• called too early (perhaps by the regular softirq instead of
  the watchdog)
• called within 50 us of the passed time
• called within 1 ms of the passed time
• called within 4 ms of the passed time (< HZ)
• called at 4 ms or more (>= HZ)

This categorisation is “quick” i.e. first-match.

In an experimental run of about 66 seconds of transmission time,
running iperf at 400kbps on a 512kbps queue, I get:

• 2257 calls too early
•  444 calls within 50 us
• 1702 calls within 1 ms
• 2460 calls within 4 ms
•   62 calls at/above 4 ms

Note that every “too early” call would also result in another
qdisc_watchdog_schedule_range_ns() call.

I used three separate machines on two separate test networks for this
(sender, qdisc, recipient); the test network is otherwise idle save
for background traffic (ARP, neighbour discovery, maybe Avahi).

> > > Under high cpu pressure, it is possible the softirq is delayed,
> > > because ksoftirqd might compete with user threads.

The utilisation of all three CPUs assigned to the VM is minimal,
around 1%, throughout the run, so I didn’t try that.

────────────────────────────────────────────────────────────────────────

See commit 2a61f1ea843dc767d291074eee9b2f1b8d3992a7 in
git@github.com:tarent/sch_jens.git branch master for the
added code and the schedule calls.

────────────────────────────────────────────────────────────────────────

Another run: 60s 15Mbit/s traffic on a 20Mbit/s queue.

Before:
$ cat /proc/softirqs
                    CPU0       CPU1       CPU2
          HI:          3          1          0
       TIMER:    1880381    1871169    2737200
      NET_TX:     498587          4         13
      NET_RX:   16918041  110157910      90553
       BLOCK:      90302      94916     141117
    IRQ_POLL:          0          0          0
     TASKLET:      23122       5419        252
       SCHED:    2707165    2799111    3584005
     HRTIMER:          2         45          2
         RCU:    2225711    2200451    2082194

After:
$ cat /proc/softirqs
                    CPU0       CPU1       CPU2
          HI:          3          1          0
       TIMER:    1880690    1871863    2737354
      NET_TX:     516224          4         15
      NET_RX:   17109410  110178118      90563
       BLOCK:      90314      94918     141118
    IRQ_POLL:          0          0          0
     TASKLET:      31992       9778       1270
       SCHED:    2708137    2800826    3585279
     HRTIMER:          2         45          2
         RCU:    2225787    2200548    2082297

So 17639 more NET_TX (virtually all on CPU0, interestingly enough,
even though irqbalance is running).

The numbers:

• 17264 (or 82800 or 148336… this is an u16) early
• 11101 within 50 us
• 68129 within 1 ms
• 1291 within 4 ms
• 103 at/above 4 ms

I’m “only” seeing delays of 5+ ms though, not 15–20 as my coworker.

That being said, they reconfigure the qdisc quite a high amount of
times per second to simulate changing environment conditions (i.e.
changing rate). Perhaps the sheer amount of tc change commands has
some impact? Let’s retry with changing the rate 50 times a second…

CPU usage about 60% for each core, though my shell script likely
uses more than the real solution. top alone uses 20% of one CPU…

I see 40 ms latency in one packet.

• 871 (possibly plus n*65536) early
• 2366 within 50 us
• 16041 within 1 ms
• 5186 within 4 ms
• 1462 at/above 4 ms

This definitely shifts things.

With reniced ksoftirqd/*:

• 60798 (possibly plus n*65536) early
• 2902 within 50 us
• 21532 within 1 ms
• 1735 within 4 ms
• 1495 at/above 4 ms

So, it… helps, but not too much.

Does the qdisc locking during reconfiguration impact things?

One more thing I noticed is that [rcu_sched] uses a lot of CPU
in the latter runs. Is that also related to the amount of tc(8)
processes created?

With tc -b - and writing to a ksh coprocess, both rcu_sched and
entire system load go down slightly.

• 43564 (possibly plus n*65536) early
• 4908 within 50 us
• 41450 within 1 ms
• 5406 within 4 ms
• 899 at/above 4 ms

It also helps but I still saw a packet with 42 ms queue delay…

Should I look into writing my own reconfiguration channel that
does not use tc, to adjust the runtime parameters frequently?
(Probably only rate; handover is infrequent enough one could
just use tc, and extralatency should be stable across one run.)

What technology would I use for that? Something like relayfs
but the other way round. For the rate, I basically would only
have to atomically swap one 64-bit quantity, or would I need
locking? (The harder part would be to find the address of that
quantity…)

bye,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

                        ****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against      Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also,     https://www.tarent.de/newsletter
╱ ╲ header encryption!
                        ****************************************************

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-22 22:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-12 21:26 qdisc_watchdog_schedule_range_ns granularity Thorsten Glaser
2022-10-12 22:36 ` Eric Dumazet
2022-10-12 23:15   ` Thorsten Glaser
2022-10-12 23:32     ` Eric Dumazet
2022-10-22 22:22       ` Thorsten Glaser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).