From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Michael Ma <make0818@gmail.com>
Cc: brouer@redhat.com, netdev@vger.kernel.org
Subject: Re: qdisc spin lock
Date: Thu, 31 Mar 2016 21:18:52 +0200 [thread overview]
Message-ID: <20160331211852.2d228976@redhat.com> (raw)
In-Reply-To: <CAAmHdhw9bQkCm7uehRZ9mTetMzafdXxWhYj16f8W-YvSz8V4=g@mail.gmail.com>
On Wed, 30 Mar 2016 00:20:03 -0700 Michael Ma <make0818@gmail.com> wrote:
> I know this might be an old topic so bare with me – what we are facing
> is that applications are sending small packets using hundreds of
> threads so the contention on spin lock in __dev_xmit_skb increases the
> latency of dev_queue_xmit significantly. We’re building a network QoS
> solution to avoid interference of different applications using HTB.
Yes, as you have noticed with HTB there is a single qdisc lock, and
congestion obviously happens :-)
It is possible with different tricks to make it scale. I believe
Google is using a variant of HTB, and it scales for them. They have
not open source their modifications to HTB (which likely also involves
a great deal of setup tricks).
If your purpose it to limit traffic/bandwidth per "cloud" instance,
then you can just use another TC setup structure. Like using MQ and
assigning a HTB per MQ queue (where the MQ queues are bound to each
CPU/HW queue)... But you have to figure out this setup yourself...
> But in this case when some applications send massive small packets in
> parallel, the application to be protected will get its throughput
> affected (because it’s doing synchronous network communication using
> multiple threads and throughput is sensitive to the increased latency)
>
> Here is the profiling from perf:
>
> - 67.57% iperf [kernel.kallsyms] [k] _spin_lock
> - 99.94% dev_queue_xmit
> - 96.91% _spin_lock
> - 2.62% __qdisc_run
> - 98.98% sch_direct_xmit
> - 99.98% _spin_lock
>
> As far as I understand the design of TC is to simplify locking schema
> and minimize the work in __qdisc_run so that throughput won’t be
> affected, especially with large packets. However if the scenario is
> that multiple classes in the queueing discipline only have the shaping
> limit, there isn’t really a necessary correlation between different
> classes. The only synchronization point should be when the packet is
> dequeued from the qdisc queue and enqueued to the transmit queue of
> the device. My question is – is it worth investing on avoiding the
> locking contention by partitioning the queue/lock so that this
> scenario is addressed with relatively smaller latency?
Yes, there is a lot go gain, but it is not easy ;-)
> I must have oversimplified a lot of details since I’m not familiar
> with the TC implementation at this point – just want to get your input
> in terms of whether this is a worthwhile effort or there is something
> fundamental that I’m not aware of. If this is just a matter of quite
> some additional work, would also appreciate helping to outline the
> required work here.
>
> Also would appreciate if there is any information about the latest
> status of this work http://www.ijcset.com/docs/IJCSET13-04-04-113.pdf
This article seems to be very low quality... spelling errors, only 5
pages, no real code, etc.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2016-03-31 19:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-30 7:20 qdisc spin lock Michael Ma
2016-03-31 19:18 ` Jesper Dangaard Brouer [this message]
2016-03-31 23:41 ` Michael Ma
2016-04-16 8:52 ` Andrew
2016-03-31 22:16 ` Cong Wang
2016-03-31 23:48 ` Michael Ma
2016-04-01 2:19 ` David Miller
2016-04-01 17:17 ` Michael Ma
2016-04-01 3:44 ` John Fastabend
2016-04-13 18:23 ` Michael Ma
2016-04-08 14:19 ` Eric Dumazet
2016-04-15 22:46 ` Michael Ma
2016-04-15 22:54 ` Eric Dumazet
2016-04-15 23:05 ` Michael Ma
2016-04-15 23:56 ` Eric Dumazet
2016-04-20 21:24 ` Michael Ma
2016-04-20 22:34 ` Eric Dumazet
2016-04-21 5:51 ` Michael Ma
2016-04-21 12:41 ` Eric Dumazet
2016-04-21 22:12 ` Michael Ma
2016-04-25 17:29 ` Michael Ma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160331211852.2d228976@redhat.com \
--to=brouer@redhat.com \
--cc=make0818@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.