* Re: No idea about shaping trough many pc
2008-01-10 9:06 No idea about shaping trough many pc Badalian Vyacheslav
@ 2008-01-10 11:00 ` Denys Fedoryshchenko
2008-01-10 11:23 ` Badalian Vyacheslav
2008-01-10 15:38 ` Lennart Sorensen
1 sibling, 1 reply; 4+ messages in thread
From: Denys Fedoryshchenko @ 2008-01-10 11:00 UTC (permalink / raw)
To: Badalian Vyacheslav, netdev
For proper link bandwidth sharing i guess something like network counters
have to be shared between PC's (with proper locking). I didn't heard anything
like this
IMHO a ways to do this:
Split destination network to multiple parts and do routes on Cisco. Let's say
you have:
192.168.0.0/16
and u have 4 balancing PC's
total bandwidth 1Gbit/s (speed conforming to IEC 1000Mbit/s in 1Gbit/s)
Then u do on cisco :
192.168.0.0/18 via PC1(shared speed 250Mbit/s)
192.168.64.0/18 via PC2(shared speed 250Mbit/s)
192.168.128.0/18 via PC3(shared speed 250Mbit/s)
192.168.192.0/18 via PC4(shared speed 250Mbit/s)
Probably you can do some scripts to check, if there is in some PC too much
available bandwidth (average 5 minutes), then you can give some other PC
which is need more bandwidth - more bandwidth. For example:
Average counters for 5minute shows:
PC1 - occupy 100Mbit/s
PC2 - -//- 50Mbit/s
PC3 - -//- 150Mbit/s
PC4 - -//- 230Mbit/s
Then u change link speed:
PC1 max 200
PC2 max 150
PC3 max 250
PC4 max 400 (100 from PC2 and 50 from PC1)
Sure PC must be capable to pass this traffic. And my IMHO it is not normal
that your PC's not able to handle more than 200Mbps of traffic. I have
complicated setup, with 4 LAN 8139 cards, which is passing totally 200Mbps
traffic. I am sure it can handle up to 300mbps, but already i am changing it
to PC with PCI-E e1000/broadcom netxtreme with offloading capabilities, large
buffers and proper drivers with NAPI. I have such hardware handling now for
example 160Mbps and counters is:
12:50:41 CPU %user %nice %sys %iowait %irq %soft %steal
%idle intr/s
12:50:42 all 0.00 0.00 0.00 0.00 0.25 1.24 0.00
98.51 4009.90
12:50:43 all 0.00 0.00 0.00 0.00 0.00 1.25 0.00
98.75 4024.75
12:50:44 all 0.00 0.00 0.00 0.00 0.00 1.50 0.00
98.50 4181.82
12:50:45 all 0.25 0.00 0.00 0.00 0.00 1.50 0.00
98.25 4626.73
12:50:46 all 0.00 0.00 0.00 0.00 0.00 1.50 0.00
98.50 4351.52
12:50:47 all 0.25 0.00 0.00 0.00 0.00 1.75 0.00
98.00 4805.88
It is 2.6.23.8 with some mistakes during configuration, i am doing to try
2.6.24-rc7 and some optimizations.
Right now profile looks like:
10957 17.0675 mwait_idle_with_hints
7454 11.6110 read_hpet
3883 6.0485 _raw_spin_lock
1605 2.5001 timer_interrupt
1363 2.1231 irq_entries_start
So maybe i will have to try change timers to TSC, disable nmi_watchdog and
try to tune up network driver (bnx2).
Probably you have to check such things too.
On Thu, 10 Jan 2008 12:06:35 +0300, Badalian Vyacheslav wrote
> Hello all.
> I try more then 2 month resolve problem witch my shaping. Maybe you
> can help for me?
>
> Sheme:
> +-------------------+
> + ----- | Shaping PC 1 | ---------+
> / +-------------------+ \
> +--------+ / +--------------------+ \
> + --------+ | Cisco | +-------- | Shaping PC N | -----------+ --
> ---| CISCO | +--------+ \ +--------------------+
> / +---------+ \ +---------
> ------------+ / + ----- | Shaping PC
> 20 | --------+ +---------------------+
>
> Network - Over 10k users. Common bandwidth to INTERNET more then 1
> GBs All computers have BGP and turn on multipath. Cisco can't do
> load sharing by Packet (its can resolve all my problems =((( ). Only
> by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets
> look variants:
> 1. Create rules to user = (1mbs/N computers). If user use N
> connection all great, but if it use 1 connection his speed = 1mbs/N -
> its not look good. All be great if cisco can PER PACKET load
> sharing =(
> 2. Create rules to user = 1mbs. If user use 1 connection all great,
> but if it use N connection his speed much more then needed limit =(
>
> Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it
> have 100% cpu usage on Sofware Interrupts...
>
> Any idea how to resolve this problem?
>
> In my dreams (feature request to netdev ;) ):
> Get PC - title: MASTER TC. All 20 PC syncronize statistic with
> MASTER and have common rules and statistic. Then i use variant 2 and
> will be happy... but its not real? =( Maybe have other variants?
>
> Thanks for help!
> Slavon.
> P.S. Sorry for my english =(
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: No idea about shaping trough many pc
2008-01-10 9:06 No idea about shaping trough many pc Badalian Vyacheslav
2008-01-10 11:00 ` Denys Fedoryshchenko
@ 2008-01-10 15:38 ` Lennart Sorensen
1 sibling, 0 replies; 4+ messages in thread
From: Lennart Sorensen @ 2008-01-10 15:38 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: netdev
On Thu, Jan 10, 2008 at 12:06:35PM +0300, Badalian Vyacheslav wrote:
> Hello all.
> I try more then 2 month resolve problem witch my shaping. Maybe you can
> help for me?
>
> Sheme:
> +-------------------+
> + ----- | Shaping PC 1 | ---------+
> / +-------------------+ \
> +--------+ / +--------------------+ \
> + --------+
> | Cisco | +-------- | Shaping PC N | -----------+ -----| CISCO |
> +--------+ \ +--------------------+ /
> +---------+
> \ +---------------------+ /
> + ----- | Shaping PC 20 | --------+
> +---------------------+
>
> Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs
> All computers have BGP and turn on multipath.
> Cisco can't do load sharing by Packet (its can resolve all my problems
> =((( ). Only by DST IP, SRC IP, or +Level4.
> Ok. User must have speed 1mbs.
> Lets look variants:
> 1. Create rules to user = (1mbs/N computers). If user use N connection
> all great, but if it use 1 connection his speed = 1mbs/N - its not look
> good. All be great if cisco can PER PACKET load sharing =(
> 2. Create rules to user = 1mbs. If user use 1 connection all great, but
> if it use N connection his speed much more then needed limit =(
>
> Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have
> 100% cpu usage on Sofware Interrupts...
I have managed forwarding of 600Mbps using about 15% CPU load on a
500MHz Geode LX, using 4 100Mbit pcnet32 interfaces and a small tweak to
how the NAPI is implemented on it. Adding traffic shapping and such to
the processing would certainly increase the CPU load, but hopefully not
by much. The reason I didn't get more than 600Mbps was that the PCI bus
is now full.
> Any idea how to resolve this problem?
>
> In my dreams (feature request to netdev ;) ):
> Get PC - title: MASTER TC. All 20 PC syncronize statistic with MASTER
> and have common rules and statistic. Then i use variant 2 and will be
> happy... but its not real? =(
> Maybe have other variants?
Well now sure about synchornizing and all that. I still think if I can
manage 600Mbps forwarding rate using a slow poke Geode then a modern CPU
like a Q6600 with a number of PCIe gig ports should be able to do quite
a lot.
The tweak I did was to add a timer to the driver that I can activate
whenever I finish emptying the receive queue. When the timer expires it
adds the port back to the NAPI queue, and when it is called again the
poll will either process whatever packets arrived during the delay, or
it will actually unmask the IRQ and go back to IRQ mode. The delay I
use is 1 jiffy, and I run with 1000HZ and set the queues to 256 packets,
since 1ms at 100MBps can provide at most about 200 packets (64byte worst
case). I simply check whenever I empty the queue how many packets I
just processed. If greater than 0, I enable the timer to expire on the
next jiffy and leave the port masked after removing port from napi
polling, and if it was 0 then I must have been called again after the
timer expired and still had no packets to process in which case I unmask
the IRQ and don't enable the timer. I had to change the HZ to 1000
since at 250 or 100 I wouldn't be able to handle the worst case number
of packets (the pcnet32 has a maximum of 512 packets in a queue).
With NAPI the normal behaviour is that whenever you empty the receive
queue, you reenable IRQs, but it doesn't take that fast a CPU to
actually empty the queue all the time and then you end up with the
overhead for masking IRQs everytime you receive packets, process them,
and then the overhead of unmasking the IRQ just to within a fraction of
a milisecond getting an IRQ for the next packet. With the delay until
the next jiffy for unmasking the IRQ you end up causing a potential lag
on processing packets of up to 1ms, although on average less than that,
but the IRQ load drops dramatically and the overhead of managing the IRQ
masking and the IRQ handler goes away. In the case of this system the
CPU load dropped from 90% at 500Mbps to 15% at 600Mbps, and the
interrupt rate dropped from one IRQ every couple of packets, to one IRQ
at the start of each burst of packets.
I believe some GB ethernet ports and most 10Gig ports have the ability
to do delayed IRQ where they wait for a certain number of packets before
generating an IRQ, which is pretty much what I tried to emulate with my
tweak and it sure works amazingly well.
--
Len Sorensen
^ permalink raw reply [flat|nested] 4+ messages in thread