Re: qdisc spin lock - Jesper Dangaard Brouer

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Michael Ma <make0818@gmail.com>
Cc: brouer@redhat.com, netdev@vger.kernel.org
Subject: Re: qdisc spin lock
Date: Thu, 31 Mar 2016 21:18:52 +0200	[thread overview]
Message-ID: <20160331211852.2d228976@redhat.com> (raw)
In-Reply-To: <CAAmHdhw9bQkCm7uehRZ9mTetMzafdXxWhYj16f8W-YvSz8V4=g@mail.gmail.com>


On Wed, 30 Mar 2016 00:20:03 -0700 Michael Ma <make0818@gmail.com> wrote:

> I know this might be an old topic so bare with me – what we are facing
> is that applications are sending small packets using hundreds of
> threads so the contention on spin lock in __dev_xmit_skb increases the
> latency of dev_queue_xmit significantly. We’re building a network QoS
> solution to avoid interference of different applications using HTB.

Yes, as you have noticed with HTB there is a single qdisc lock, and
congestion obviously happens :-)

It is possible with different tricks to make it scale.  I believe
Google is using a variant of HTB, and it scales for them.  They have
not open source their modifications to HTB (which likely also involves
a great deal of setup tricks).

If your purpose it to limit traffic/bandwidth per "cloud" instance,
then you can just use another TC setup structure.  Like using MQ and
assigning a HTB per MQ queue (where the MQ queues are bound to each
CPU/HW queue)... But you have to figure out this setup yourself...


> But in this case when some applications send massive small packets in
> parallel, the application to be protected will get its throughput
> affected (because it’s doing synchronous network communication using
> multiple threads and throughput is sensitive to the increased latency)
> 
> Here is the profiling from perf:
> 
> -  67.57%   iperf  [kernel.kallsyms]     [k] _spin_lock
>   - 99.94% dev_queue_xmit
>   -  96.91% _spin_lock                                                           
>  - 2.62%  __qdisc_run                                                                
>   - 98.98% sch_direct_xmit
> - 99.98% _spin_lock
> 
> As far as I understand the design of TC is to simplify locking schema
> and minimize the work in __qdisc_run so that throughput won’t be
> affected, especially with large packets. However if the scenario is
> that multiple classes in the queueing discipline only have the shaping
> limit, there isn’t really a necessary correlation between different
> classes. The only synchronization point should be when the packet is
> dequeued from the qdisc queue and enqueued to the transmit queue of
> the device. My question is – is it worth investing on avoiding the
> locking contention by partitioning the queue/lock so that this
> scenario is addressed with relatively smaller latency?

Yes, there is a lot go gain, but it is not easy ;-)

> I must have oversimplified a lot of details since I’m not familiar
> with the TC implementation at this point – just want to get your input
> in terms of whether this is a worthwhile effort or there is something
> fundamental that I’m not aware of. If this is just a matter of quite
> some additional work, would also appreciate helping to outline the
> required work here.
> 
> Also would appreciate if there is any information about the latest
> status of this work http://www.ijcset.com/docs/IJCSET13-04-04-113.pdf

This article seems to be very low quality... spelling errors, only 5
pages, no real code, etc. 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2016-03-31 19:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30  7:20 qdisc spin lock Michael Ma
2016-03-31 19:18 ` Jesper Dangaard Brouer [this message]
2016-03-31 23:41   ` Michael Ma
2016-04-16  8:52     ` Andrew
2016-03-31 22:16 ` Cong Wang
2016-03-31 23:48   ` Michael Ma
2016-04-01  2:19     ` David Miller
2016-04-01 17:17       ` Michael Ma
2016-04-01  3:44     ` John Fastabend
2016-04-13 18:23       ` Michael Ma
2016-04-08 14:19     ` Eric Dumazet
2016-04-15 22:46       ` Michael Ma
2016-04-15 22:54         ` Eric Dumazet
2016-04-15 23:05           ` Michael Ma
2016-04-15 23:56             ` Eric Dumazet
2016-04-20 21:24       ` Michael Ma
2016-04-20 22:34         ` Eric Dumazet
2016-04-21  5:51           ` Michael Ma
2016-04-21 12:41             ` Eric Dumazet
2016-04-21 22:12               ` Michael Ma
2016-04-25 17:29                 ` Michael Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160331211852.2d228976@redhat.com \
    --to=brouer@redhat.com \
    --cc=make0818@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).