All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew <nitr0@seti.kr.ua>
To: Michael Ma <make0818@gmail.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>
Cc: netdev@vger.kernel.org
Subject: Re: qdisc spin lock
Date: Sat, 16 Apr 2016 11:52:05 +0300	[thread overview]
Message-ID: <5711FD35.90108@seti.kr.ua> (raw)
In-Reply-To: <CAAmHdhwpVOCv=4Y+pb9PfGKWV0ooqnr7eC58ZYfRTtYjC35EFw@mail.gmail.com>

I think that it isn't a good solution - unless you can bind specified 
host (src/dst) to specified txq. Usually traffic is spreaded on txqs by 
src+dst IP (or even IP:port) hash which results in traffic spreading 
among all mqs on device, and wrong bandwidth limiting (N*bandwidth on 
multi-session load like p2p/server traffic)...

People said that hfsc shaper have better performance, but I didn't 
tested it.

01.04.2016 02:41, Michael Ma пишет:
> Thanks for the suggestion - I'll try the MQ solution out. It seems to
> be able to solve the problem well with the assumption that bandwidth
> can be statically partitioned.
>
> 2016-03-31 12:18 GMT-07:00 Jesper Dangaard Brouer <brouer@redhat.com>:
>> On Wed, 30 Mar 2016 00:20:03 -0700 Michael Ma <make0818@gmail.com> wrote:
>>
>>> I know this might be an old topic so bare with me – what we are facing
>>> is that applications are sending small packets using hundreds of
>>> threads so the contention on spin lock in __dev_xmit_skb increases the
>>> latency of dev_queue_xmit significantly. We’re building a network QoS
>>> solution to avoid interference of different applications using HTB.
>> Yes, as you have noticed with HTB there is a single qdisc lock, and
>> congestion obviously happens :-)
>>
>> It is possible with different tricks to make it scale.  I believe
>> Google is using a variant of HTB, and it scales for them.  They have
>> not open source their modifications to HTB (which likely also involves
>> a great deal of setup tricks).
>>
>> If your purpose it to limit traffic/bandwidth per "cloud" instance,
>> then you can just use another TC setup structure.  Like using MQ and
>> assigning a HTB per MQ queue (where the MQ queues are bound to each
>> CPU/HW queue)... But you have to figure out this setup yourself...
>>
>>
>>> But in this case when some applications send massive small packets in
>>> parallel, the application to be protected will get its throughput
>>> affected (because it’s doing synchronous network communication using
>>> multiple threads and throughput is sensitive to the increased latency)
>>>
>>> Here is the profiling from perf:
>>>
>>> -  67.57%   iperf  [kernel.kallsyms]     [k] _spin_lock
>>>    - 99.94% dev_queue_xmit
>>>    -  96.91% _spin_lock
>>>   - 2.62%  __qdisc_run
>>>    - 98.98% sch_direct_xmit
>>> - 99.98% _spin_lock
>>>
>>> As far as I understand the design of TC is to simplify locking schema
>>> and minimize the work in __qdisc_run so that throughput won’t be
>>> affected, especially with large packets. However if the scenario is
>>> that multiple classes in the queueing discipline only have the shaping
>>> limit, there isn’t really a necessary correlation between different
>>> classes. The only synchronization point should be when the packet is
>>> dequeued from the qdisc queue and enqueued to the transmit queue of
>>> the device. My question is – is it worth investing on avoiding the
>>> locking contention by partitioning the queue/lock so that this
>>> scenario is addressed with relatively smaller latency?
>> Yes, there is a lot go gain, but it is not easy ;-)
>>
>>> I must have oversimplified a lot of details since I’m not familiar
>>> with the TC implementation at this point – just want to get your input
>>> in terms of whether this is a worthwhile effort or there is something
>>> fundamental that I’m not aware of. If this is just a matter of quite
>>> some additional work, would also appreciate helping to outline the
>>> required work here.
>>>
>>> Also would appreciate if there is any information about the latest
>>> status of this work http://www.ijcset.com/docs/IJCSET13-04-04-113.pdf
>> This article seems to be very low quality... spelling errors, only 5
>> pages, no real code, etc.
>>
>> --
>> Best regards,
>>    Jesper Dangaard Brouer
>>    MSc.CS, Principal Kernel Engineer at Red Hat
>>    Author of http://www.iptv-analyzer.org
>>    LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-04-16  9:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30  7:20 qdisc spin lock Michael Ma
2016-03-31 19:18 ` Jesper Dangaard Brouer
2016-03-31 23:41   ` Michael Ma
2016-04-16  8:52     ` Andrew [this message]
2016-03-31 22:16 ` Cong Wang
2016-03-31 23:48   ` Michael Ma
2016-04-01  2:19     ` David Miller
2016-04-01 17:17       ` Michael Ma
2016-04-01  3:44     ` John Fastabend
2016-04-13 18:23       ` Michael Ma
2016-04-08 14:19     ` Eric Dumazet
2016-04-15 22:46       ` Michael Ma
2016-04-15 22:54         ` Eric Dumazet
2016-04-15 23:05           ` Michael Ma
2016-04-15 23:56             ` Eric Dumazet
2016-04-20 21:24       ` Michael Ma
2016-04-20 22:34         ` Eric Dumazet
2016-04-21  5:51           ` Michael Ma
2016-04-21 12:41             ` Eric Dumazet
2016-04-21 22:12               ` Michael Ma
2016-04-25 17:29                 ` Michael Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5711FD35.90108@seti.kr.ua \
    --to=nitr0@seti.kr.ua \
    --cc=brouer@redhat.com \
    --cc=make0818@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.