netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* qdisc spin lock
@ 2016-03-30  7:20 Michael Ma
  2016-03-31 19:18 ` Jesper Dangaard Brouer
  2016-03-31 22:16 ` Cong Wang
  0 siblings, 2 replies; 21+ messages in thread
From: Michael Ma @ 2016-03-30  7:20 UTC (permalink / raw)
  To: netdev

Hi -

I know this might be an old topic so bare with me – what we are facing
is that applications are sending small packets using hundreds of
threads so the contention on spin lock in __dev_xmit_skb increases the
latency of dev_queue_xmit significantly. We’re building a network QoS
solution to avoid interference of different applications using HTB.
But in this case when some applications send massive small packets in
parallel, the application to be protected will get its throughput
affected (because it’s doing synchronous network communication using
multiple threads and throughput is sensitive to the increased latency)

Here is the profiling from perf:

-  67.57%   iperf  [kernel.kallsyms]     [k] _spin_lock
                                                         - 99.94%
dev_queue_xmit
                                                              96.91%
_spin_lock
                                                            - 2.62%
__qdisc_run
                                                               -
98.98% sch_direct_xmit

99.98% _spin_lock
                                                                 1.01%
_spin_lock

As far as I understand the design of TC is to simplify locking schema
and minimize the work in __qdisc_run so that throughput won’t be
affected, especially with large packets. However if the scenario is
that multiple classes in the queueing discipline only have the shaping
limit, there isn’t really a necessary correlation between different
classes. The only synchronization point should be when the packet is
dequeued from the qdisc queue and enqueued to the transmit queue of
the device. My question is – is it worth investing on avoiding the
locking contention by partitioning the queue/lock so that this
scenario is addressed with relatively smaller latency?

I must have oversimplified a lot of details since I’m not familiar
with the TC implementation at this point – just want to get your input
in terms of whether this is a worthwhile effort or there is something
fundamental that I’m not aware of. If this is just a matter of quite
some additional work, would also appreciate helping to outline the
required work here.

Also would appreciate if there is any information about the latest
status of this work http://www.ijcset.com/docs/IJCSET13-04-04-113.pdf

Thanks,
Ke Ma

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-04-25 17:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-30  7:20 qdisc spin lock Michael Ma
2016-03-31 19:18 ` Jesper Dangaard Brouer
2016-03-31 23:41   ` Michael Ma
2016-04-16  8:52     ` Andrew
2016-03-31 22:16 ` Cong Wang
2016-03-31 23:48   ` Michael Ma
2016-04-01  2:19     ` David Miller
2016-04-01 17:17       ` Michael Ma
2016-04-01  3:44     ` John Fastabend
2016-04-13 18:23       ` Michael Ma
2016-04-08 14:19     ` Eric Dumazet
2016-04-15 22:46       ` Michael Ma
2016-04-15 22:54         ` Eric Dumazet
2016-04-15 23:05           ` Michael Ma
2016-04-15 23:56             ` Eric Dumazet
2016-04-20 21:24       ` Michael Ma
2016-04-20 22:34         ` Eric Dumazet
2016-04-21  5:51           ` Michael Ma
2016-04-21 12:41             ` Eric Dumazet
2016-04-21 22:12               ` Michael Ma
2016-04-25 17:29                 ` Michael Ma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).