No idea about shaping trough many pc

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* No idea about shaping trough many pc
@ 2008-01-10  9:06 Badalian Vyacheslav
  2008-01-10 11:00 ` Denys Fedoryshchenko
  2008-01-10 15:38 ` Lennart Sorensen
  0 siblings, 2 replies; 4+ messages in thread
From: Badalian Vyacheslav @ 2008-01-10  9:06 UTC (permalink / raw)
  To: netdev

Hello all.
I try more then 2 month resolve problem witch my shaping.  Maybe you can 
help for me?

Sheme:
                                +-------------------+
                     + ----- | Shaping PC 1 | ---------+
                     /          +-------------------+              \
+--------+   /           +--------------------+              \          
+ --------+
| Cisco |  +-------- | Shaping PC N  | -----------+ -----| CISCO |
+--------+   \           +--------------------+              /          
+---------+
                     \          +---------------------+           /
                     + ----- | Shaping PC 20 | --------+
                                +---------------------+

Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs
All computers have BGP and turn on multipath.
Cisco can't do load sharing by Packet (its can resolve all my problems 
=((( ). Only by DST IP, SRC IP, or +Level4.
Ok. User must have speed 1mbs.
Lets look variants:
1. Create rules to user = (1mbs/N computers). If user use N connection 
all great, but if it use 1 connection his speed = 1mbs/N - its not look 
good. All be great if cisco can PER PACKET load sharing =(
2. Create rules to user = 1mbs. If user use 1 connection all great, but 
if it use N connection his speed much more then needed limit =(

Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 
100% cpu usage on Sofware Interrupts...

Any idea how to resolve this problem?

In my dreams (feature request to netdev ;) ):
Get PC - title: MASTER TC.  All 20 PC syncronize statistic with MASTER 
and have common rules and statistic. Then i use variant 2 and will be 
happy... but its not real? =(
Maybe have other variants?

Thanks for help!
Slavon.
P.S. Sorry for my english =(

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: No idea about shaping trough many pc
  2008-01-10  9:06 No idea about shaping trough many pc Badalian Vyacheslav
@ 2008-01-10 11:00 ` Denys Fedoryshchenko
  2008-01-10 11:23   ` Badalian Vyacheslav
  2008-01-10 15:38 ` Lennart Sorensen
  1 sibling, 1 reply; 4+ messages in thread
From: Denys Fedoryshchenko @ 2008-01-10 11:00 UTC (permalink / raw)
  To: Badalian Vyacheslav, netdev

For proper link bandwidth sharing i guess something like network counters 
have to be shared between PC's (with proper locking). I didn't heard anything 
like this

IMHO a ways to do this:
Split destination network to multiple parts and do routes on Cisco. Let's say 
you have:
192.168.0.0/16
and u have 4 balancing PC's
total bandwidth 1Gbit/s (speed conforming to IEC 1000Mbit/s in 1Gbit/s)

Then u do on cisco :
192.168.0.0/18 via PC1(shared speed 250Mbit/s)
192.168.64.0/18 via PC2(shared speed 250Mbit/s)
192.168.128.0/18 via PC3(shared speed 250Mbit/s)
192.168.192.0/18 via PC4(shared speed 250Mbit/s)

Probably you can do some scripts to check, if there is in some PC too much 
available bandwidth (average 5 minutes), then you can give some other PC 
which is need more bandwidth - more bandwidth. For example:

Average counters for 5minute shows:
PC1 - occupy 100Mbit/s
PC2 - -//- 50Mbit/s
PC3 - -//- 150Mbit/s
PC4 - -//- 230Mbit/s

Then u change link speed:
PC1 max 200
PC2 max 150
PC3 max 250
PC4 max 400 (100 from PC2 and 50 from PC1)

Sure PC must be capable to pass this traffic. And my IMHO it is not normal 
that your PC's not able to handle more than 200Mbps of traffic. I have 
complicated setup, with 4 LAN 8139 cards, which is passing totally 200Mbps 
traffic. I am sure it can handle up to 300mbps, but already i am changing it 
to PC with PCI-E e1000/broadcom netxtreme with offloading capabilities, large 
buffers and proper drivers with NAPI. I have such hardware handling now for 
example 160Mbps and counters is:
12:50:41     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
   %idle    intr/s
12:50:42     all    0.00    0.00    0.00    0.00    0.25    1.24    0.00   
98.51   4009.90
12:50:43     all    0.00    0.00    0.00    0.00    0.00    1.25    0.00   
98.75   4024.75
12:50:44     all    0.00    0.00    0.00    0.00    0.00    1.50    0.00   
98.50   4181.82
12:50:45     all    0.25    0.00    0.00    0.00    0.00    1.50    0.00   
98.25   4626.73
12:50:46     all    0.00    0.00    0.00    0.00    0.00    1.50    0.00   
98.50   4351.52
12:50:47     all    0.25    0.00    0.00    0.00    0.00    1.75    0.00   
98.00   4805.88

It is 2.6.23.8 with some mistakes during configuration, i am doing to try 
2.6.24-rc7 and some optimizations.

Right now profile looks like:
10957    17.0675  mwait_idle_with_hints
7454     11.6110  read_hpet
3883      6.0485  _raw_spin_lock
1605      2.5001  timer_interrupt
1363      2.1231  irq_entries_start

So maybe i will have to try change timers to TSC, disable nmi_watchdog and 
try to tune up network driver (bnx2).
Probably you have to check such things too.

On Thu, 10 Jan 2008 12:06:35 +0300, Badalian Vyacheslav wrote
> Hello all.
> I try more then 2 month resolve problem witch my shaping.  Maybe you 
> can help for me?
> 
> Sheme:
>                                 +-------------------+
>                      + ----- | Shaping PC 1 | ---------+
>                      /          +-------------------+              \
> +--------+   /           +--------------------+              \       
>    + --------+ | Cisco |  +-------- | Shaping PC N  | -----------+ --
> ---| CISCO | +--------+   \           +--------------------+         
>      /          +---------+                     \          +---------
> ------------+           /                     + ----- | Shaping PC 
> 20 | --------+                                +---------------------+
> 
> Network - Over 10k users. Common bandwidth to INTERNET more then 1 
> GBs All computers have BGP and turn on multipath. Cisco can't do 
> load sharing by Packet (its can resolve all my problems =((( ). Only 
> by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets 
> look variants:
> 1. Create rules to user = (1mbs/N computers). If user use N 
> connection all great, but if it use 1 connection his speed = 1mbs/N -
>  its not look good. All be great if cisco can PER PACKET load 
> sharing =(
> 2. Create rules to user = 1mbs. If user use 1 connection all great,
>  but if it use N connection his speed much more then needed limit =(
> 
> Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it 
> have 100% cpu usage on Sofware Interrupts...
> 
> Any idea how to resolve this problem?
> 
> In my dreams (feature request to netdev ;) ):
> Get PC - title: MASTER TC.  All 20 PC syncronize statistic with 
> MASTER and have common rules and statistic. Then i use variant 2 and 
> will be happy... but its not real? =( Maybe have other variants?
> 
> Thanks for help!
> Slavon.
> P.S. Sorry for my english =(
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: No idea about shaping trough many pc
  2008-01-10 11:00 ` Denys Fedoryshchenko
@ 2008-01-10 11:23   ` Badalian Vyacheslav
  0 siblings, 0 replies; 4+ messages in thread
From: Badalian Vyacheslav @ 2008-01-10 11:23 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev

Thanks for answer!
1 PC have more then 15k TC rules.... tc get many cpu si...
I don't wont split networks on cisco, becouse i need HA. If PC do 
shutdown - another must get all function. BGP work great for this solution.
But have troubles in control common traffic from users.
Will be great if i can create common TC rules that share to all PC and 
its have common data (statistic, info, tokens, and e.t.c.). Maybe netdev 
think about system mode that we have one OS on many PC (Like HA 
Cluster). Linux may do great routing solution!
Will be great if all 20 PC may be 1 logical PC that have many CPU and 
may process lots of interrupts!

Thanks =)

> For proper link bandwidth sharing i guess something like network counters 
> have to be shared between PC's (with proper locking). I didn't heard anything 
> like this
>
> IMHO a ways to do this:
> Split destination network to multiple parts and do routes on Cisco. Let's say 
> you have:
> 192.168.0.0/16
> and u have 4 balancing PC's
> total bandwidth 1Gbit/s (speed conforming to IEC 1000Mbit/s in 1Gbit/s)
>
> Then u do on cisco :
> 192.168.0.0/18 via PC1(shared speed 250Mbit/s)
> 192.168.64.0/18 via PC2(shared speed 250Mbit/s)
> 192.168.128.0/18 via PC3(shared speed 250Mbit/s)
> 192.168.192.0/18 via PC4(shared speed 250Mbit/s)
>
> Probably you can do some scripts to check, if there is in some PC too much 
> available bandwidth (average 5 minutes), then you can give some other PC 
> which is need more bandwidth - more bandwidth. For example:
>
> Average counters for 5minute shows:
> PC1 - occupy 100Mbit/s
> PC2 - -//- 50Mbit/s
> PC3 - -//- 150Mbit/s
> PC4 - -//- 230Mbit/s
>
> Then u change link speed:
> PC1 max 200
> PC2 max 150
> PC3 max 250
> PC4 max 400 (100 from PC2 and 50 from PC1)
>
> Sure PC must be capable to pass this traffic. And my IMHO it is not normal 
> that your PC's not able to handle more than 200Mbps of traffic. I have 
> complicated setup, with 4 LAN 8139 cards, which is passing totally 200Mbps 
> traffic. I am sure it can handle up to 300mbps, but already i am changing it 
> to PC with PCI-E e1000/broadcom netxtreme with offloading capabilities, large 
> buffers and proper drivers with NAPI. I have such hardware handling now for 
> example 160Mbps and counters is:
> 12:50:41     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
>    %idle    intr/s
> 12:50:42     all    0.00    0.00    0.00    0.00    0.25    1.24    0.00   
> 98.51   4009.90
> 12:50:43     all    0.00    0.00    0.00    0.00    0.00    1.25    0.00   
> 98.75   4024.75
> 12:50:44     all    0.00    0.00    0.00    0.00    0.00    1.50    0.00   
> 98.50   4181.82
> 12:50:45     all    0.25    0.00    0.00    0.00    0.00    1.50    0.00   
> 98.25   4626.73
> 12:50:46     all    0.00    0.00    0.00    0.00    0.00    1.50    0.00   
> 98.50   4351.52
> 12:50:47     all    0.25    0.00    0.00    0.00    0.00    1.75    0.00   
> 98.00   4805.88
>
> It is 2.6.23.8 with some mistakes during configuration, i am doing to try 
> 2.6.24-rc7 and some optimizations.
>
> Right now profile looks like:
> 10957    17.0675  mwait_idle_with_hints
> 7454     11.6110  read_hpet
> 3883      6.0485  _raw_spin_lock
> 1605      2.5001  timer_interrupt
> 1363      2.1231  irq_entries_start
>
> So maybe i will have to try change timers to TSC, disable nmi_watchdog and 
> try to tune up network driver (bnx2).
> Probably you have to check such things too.
>
> On Thu, 10 Jan 2008 12:06:35 +0300, Badalian Vyacheslav wrote
>   
>> Hello all.
>> I try more then 2 month resolve problem witch my shaping.  Maybe you 
>> can help for me?
>>
>> Sheme:
>>                                 +-------------------+
>>                      + ----- | Shaping PC 1 | ---------+
>>                      /          +-------------------+              \
>> +--------+   /           +--------------------+              \       
>>    + --------+ | Cisco |  +-------- | Shaping PC N  | -----------+ --
>> ---| CISCO | +--------+   \           +--------------------+         
>>      /          +---------+                     \          +---------
>> ------------+           /                     + ----- | Shaping PC 
>> 20 | --------+                                +---------------------+
>>
>> Network - Over 10k users. Common bandwidth to INTERNET more then 1 
>> GBs All computers have BGP and turn on multipath. Cisco can't do 
>> load sharing by Packet (its can resolve all my problems =((( ). Only 
>> by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets 
>> look variants:
>> 1. Create rules to user = (1mbs/N computers). If user use N 
>> connection all great, but if it use 1 connection his speed = 1mbs/N -
>>  its not look good. All be great if cisco can PER PACKET load 
>> sharing =(
>> 2. Create rules to user = 1mbs. If user use 1 connection all great,
>>  but if it use N connection his speed much more then needed limit =(
>>
>> Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it 
>> have 100% cpu usage on Sofware Interrupts...
>>
>> Any idea how to resolve this problem?
>>
>> In my dreams (feature request to netdev ;) ):
>> Get PC - title: MASTER TC.  All 20 PC syncronize statistic with 
>> MASTER and have common rules and statistic. Then i use variant 2 and 
>> will be happy... but its not real? =( Maybe have other variants?
>>
>> Thanks for help!
>> Slavon.
>> P.S. Sorry for my english =(
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>     
>
>
> --
> Denys Fedoryshchenko
> Technical Manager
> Virtual ISP S.A.L.
>
>
>   


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: No idea about shaping trough many pc
  2008-01-10  9:06 No idea about shaping trough many pc Badalian Vyacheslav
  2008-01-10 11:00 ` Denys Fedoryshchenko
@ 2008-01-10 15:38 ` Lennart Sorensen
  1 sibling, 0 replies; 4+ messages in thread
From: Lennart Sorensen @ 2008-01-10 15:38 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev

On Thu, Jan 10, 2008 at 12:06:35PM +0300, Badalian Vyacheslav wrote:
> Hello all.
> I try more then 2 month resolve problem witch my shaping.  Maybe you can 
> help for me?
> 
> Sheme:
>                                +-------------------+
>                     + ----- | Shaping PC 1 | ---------+
>                     /          +-------------------+              \
> +--------+   /           +--------------------+              \          
> + --------+
> | Cisco |  +-------- | Shaping PC N  | -----------+ -----| CISCO |
> +--------+   \           +--------------------+              /          
> +---------+
>                     \          +---------------------+           /
>                     + ----- | Shaping PC 20 | --------+
>                                +---------------------+
> 
> Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs
> All computers have BGP and turn on multipath.
> Cisco can't do load sharing by Packet (its can resolve all my problems 
> =((( ). Only by DST IP, SRC IP, or +Level4.
> Ok. User must have speed 1mbs.
> Lets look variants:
> 1. Create rules to user = (1mbs/N computers). If user use N connection 
> all great, but if it use 1 connection his speed = 1mbs/N - its not look 
> good. All be great if cisco can PER PACKET load sharing =(
> 2. Create rules to user = 1mbs. If user use 1 connection all great, but 
> if it use N connection his speed much more then needed limit =(
> 
> Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 
> 100% cpu usage on Sofware Interrupts...

I have managed forwarding of 600Mbps using about 15% CPU load on a
500MHz Geode LX, using 4 100Mbit pcnet32 interfaces and a small tweak to
how the NAPI is implemented on it.  Adding traffic shapping and such to
the processing would certainly increase the CPU load, but hopefully not
by much.  The reason I didn't get more than 600Mbps was that the PCI bus
is now full.

> Any idea how to resolve this problem?
> 
> In my dreams (feature request to netdev ;) ):
> Get PC - title: MASTER TC.  All 20 PC syncronize statistic with MASTER 
> and have common rules and statistic. Then i use variant 2 and will be 
> happy... but its not real? =(
> Maybe have other variants?

Well now sure about synchornizing and all that.  I still think if I can
manage 600Mbps forwarding rate using a slow poke Geode then a modern CPU
like a Q6600 with a number of PCIe gig ports should be able to do quite
a lot.

The tweak I did was to add a timer to the driver that I can activate
whenever I finish emptying the receive queue.  When the timer expires it
adds the port back to the NAPI queue, and when it is called again the
poll will either process whatever packets arrived during the delay, or
it will actually unmask the IRQ and go back to IRQ mode.  The delay I
use is 1 jiffy, and I run with 1000HZ and set the queues to 256 packets,
since 1ms at 100MBps can provide at most about 200 packets (64byte worst
case).  I simply check whenever I empty the queue how many packets I
just processed.  If greater than 0, I enable the timer to expire on the
next jiffy and leave the port masked after removing port from napi
polling, and if it was 0 then I must have been called again after the
timer expired and still had no packets to process in which case I unmask
the IRQ and don't enable the timer.  I had to change the HZ to 1000
since at 250 or 100 I wouldn't be able to handle the worst case number
of packets (the pcnet32 has a maximum of 512 packets in a queue).

With NAPI the normal behaviour is that whenever you empty the receive
queue, you reenable IRQs, but it doesn't take that fast a CPU to
actually empty the queue all the time and then you end up with the
overhead for masking IRQs everytime you receive packets, process them,
and then the overhead of unmasking the IRQ just to within a fraction of
a milisecond getting an IRQ for the next packet.  With the delay until
the next jiffy for unmasking the IRQ you end up causing a potential lag
on processing packets of up to 1ms, although on average less than that,
but the IRQ load drops dramatically and the overhead of managing the IRQ
masking and the IRQ handler goes away.  In the case of this system the
CPU load dropped from 90% at 500Mbps to 15% at 600Mbps, and the
interrupt rate dropped from one IRQ every couple of packets, to one IRQ
at the start of each burst of packets.

I believe some GB ethernet ports and most 10Gig ports have the ability
to do delayed IRQ where they wait for a certain number of packets before
generating an IRQ, which is pretty much what I tried to emulate with my
tweak and it sure works amazingly well.

--
Len Sorensen

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-01-10 15:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-10  9:06 No idea about shaping trough many pc Badalian Vyacheslav
2008-01-10 11:00 ` Denys Fedoryshchenko
2008-01-10 11:23   ` Badalian Vyacheslav
2008-01-10 15:38 ` Lennart Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).