[RFC] bnx2x: Insane RX rings

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] bnx2x: Insane RX rings
@ 2010-09-09 20:45 Eric Dumazet
  2010-09-09 21:21 ` Krzysztof Olędzki
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-09-09 20:45 UTC (permalink / raw)
  To: netdev; +Cc: Eilon Greenstein

So I have a small dev machine, 4GB of ram,
a dual E5540 cpu (quad core, 2 threads per core), 
so a total of 16 threads.

Two ethernet ports, eth0 and eth1, 

02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe

bnx2x 0000:02:00.0: eth0: using MSI-X  IRQs: sp 68  fp[0] 69 ... fp[15] 84
bnx2x 0000:02:00.1: eth1: using MSI-X  IRQs: sp 85  fp[0] 86 ... fp[15] 101


Default configuration :

ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		4078
RX Mini:	0
RX Jumbo:	0
TX:		4078
Current hardware settings:
RX:		4078
RX Mini:	0
RX Jumbo:	0
TX:		4078

Problem is : With 16 RX queues per device , thats 4078*16*2Kbytes per
ethernet port.

Total : 

skbuff_head_cache 130747 131025    256   15    1 : tunables  120   60    8 : slabdata   8735   8735     40
size-2048         130866 130888   2048    2    1 : tunables   24   12    8 : slabdata  65444  65444     28

Thats about 300 Mbytes of memory, just in case some network trafic will occur.

Lets do something about that ?

Thanks



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] bnx2x: Insane RX rings
  2010-09-09 20:45 [RFC] bnx2x: Insane RX rings Eric Dumazet
@ 2010-09-09 21:21 ` Krzysztof Olędzki
  2010-09-09 21:30   ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Krzysztof Olędzki @ 2010-09-09 21:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Eilon Greenstein

On 2010-09-09 22:45, Eric Dumazet wrote:
> So I have a small dev machine, 4GB of ram,
> a dual E5540 cpu (quad core, 2 threads per core),
> so a total of 16 threads.
>
> Two ethernet ports, eth0 and eth1,
>
> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe
> 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe
>
> bnx2x 0000:02:00.0: eth0: using MSI-X  IRQs: sp 68  fp[0] 69 ... fp[15] 84
> bnx2x 0000:02:00.1: eth1: using MSI-X  IRQs: sp 85  fp[0] 86 ... fp[15] 101
>
>
> Default configuration :
>
> ethtool -g eth0
> Ring parameters for eth0:
> Pre-set maximums:
> RX:		4078
> RX Mini:	0
> RX Jumbo:	0
> TX:		4078
> Current hardware settings:
> RX:		4078
> RX Mini:	0
> RX Jumbo:	0
> TX:		4078
>
> Problem is : With 16 RX queues per device , thats 4078*16*2Kbytes per
> ethernet port.
>
> Total :
>
> skbuff_head_cache 130747 131025    256   15    1 : tunables  120   60    8 : slabdata   8735   8735     40
> size-2048         130866 130888   2048    2    1 : tunables   24   12    8 : slabdata  65444  65444     28
>
> Thats about 300 Mbytes of memory, just in case some network trafic will occur.
>
> Lets do something about that ?

Yep, it is ~8MB per queue, not so much alone, but a lot together. For 
this reason I use something like bnx2.num_queues=2 on servers where I 
don't need much CPU power for network workload.

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] bnx2x: Insane RX rings
  2010-09-09 21:21 ` Krzysztof Olędzki
@ 2010-09-09 21:30   ` David Miller
  2010-09-09 21:38     ` Rick Jones
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2010-09-09 21:30 UTC (permalink / raw)
  To: ole; +Cc: eric.dumazet, netdev, eilong

From: Krzysztof Olędzki <ole@ans.pl>
Date: Thu, 09 Sep 2010 23:21:01 +0200

> On 2010-09-09 22:45, Eric Dumazet wrote:
>> Problem is : With 16 RX queues per device , thats 4078*16*2Kbytes per
>> ethernet port.
>>
>> Total :
>>
>> skbuff_head_cache 130747 131025 256 15 1 : tunables 120 60 8 :
>> slabdata 8735 8735 40
>> size-2048 130866 130888 2048 2 1 : tunables 24 12 8 : slabdata 65444
>> 65444 28
>>
>> Thats about 300 Mbytes of memory, just in case some network trafic
>> will occur.
>>
>> Lets do something about that ?
> 
> Yep, it is ~8MB per queue, not so much alone, but a lot together. For
> this reason I use something like bnx2.num_queues=2 on servers where I
> don't need much CPU power for network workload.

I think simply that the RX queue size should be scaled by the number
of queues we have.

If people want enormous RX ring sizes even when there are many queues,
they can use ethtool to get that.

Taking up 130MB of memory per-card, just for RX packet buffers, is
certainly over the top.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] bnx2x: Insane RX rings
  2010-09-09 21:30   ` David Miller
@ 2010-09-09 21:38     ` Rick Jones
  2010-09-10 11:16       ` Eilon Greenstein
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Jones @ 2010-09-09 21:38 UTC (permalink / raw)
  To: David Miller; +Cc: ole, eric.dumazet, netdev, eilong

David Miller wrote:
> From: Krzysztof Olędzki <ole@ans.pl>
> Date: Thu, 09 Sep 2010 23:21:01 +0200
> 
> 
>>On 2010-09-09 22:45, Eric Dumazet wrote:
>>
>>>Problem is : With 16 RX queues per device , thats 4078*16*2Kbytes per
>>>ethernet port.
>>>
>>>Total :
>>>
>>>skbuff_head_cache 130747 131025 256 15 1 : tunables 120 60 8 :
>>>slabdata 8735 8735 40
>>>size-2048 130866 130888 2048 2 1 : tunables 24 12 8 : slabdata 65444
>>>65444 28
>>>
>>>Thats about 300 Mbytes of memory, just in case some network trafic
>>>will occur.
>>>
>>>Lets do something about that ?
>>
>>Yep, it is ~8MB per queue, not so much alone, but a lot together. For
>>this reason I use something like bnx2.num_queues=2 on servers where I
>>don't need much CPU power for network workload.
> 
> 
> I think simply that the RX queue size should be scaled by the number
> of queues we have.
> 
> If people want enormous RX ring sizes even when there are many queues,
> they can use ethtool to get that.
> 
> Taking up 130MB of memory per-card, just for RX packet buffers, is
> certainly over the top.

It gets even better if one consideres JumboFrames...  that said, I've had 
customer contacts (indirect) where they were quite keep to have a ring size of 
at least 2048 packets - I never could get it confirmed, but I suspect they had 
applications/systems that might "go out to lunch" for long-enough periods of 
time they wanted that degree of FIFO.

Doesn't necessarily change "what should be the defaults" much but there it is.

rick jones

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] bnx2x: Insane RX rings
  2010-09-09 21:38     ` Rick Jones
@ 2010-09-10 11:16       ` Eilon Greenstein
  2010-09-10 15:46         ` Rick Jones
  2010-09-10 16:42         ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Eilon Greenstein @ 2010-09-10 11:16 UTC (permalink / raw)
  To: David Miller, Rick Jones
  Cc: ole@ans.pl, eric.dumazet@gmail.com, netdev@vger.kernel.org

On Thu, 2010-09-09 at 14:38 -0700, Rick Jones wrote:
> David Miller wrote:
> > From: Krzysztof Olędzki <ole@ans.pl>
> > Date: Thu, 09 Sep 2010 23:21:01 +0200
> > 
> > 
> >>On 2010-09-09 22:45, Eric Dumazet wrote:
> >>
> >>>Problem is : With 16 RX queues per device , thats 4078*16*2Kbytes per
> >>>ethernet port.
> >>>
> >>>Total :
> >>>
> >>>skbuff_head_cache 130747 131025 256 15 1 : tunables 120 60 8 :
> >>>slabdata 8735 8735 40
> >>>size-2048 130866 130888 2048 2 1 : tunables 24 12 8 : slabdata 65444
> >>>65444 28
> >>>
> >>>Thats about 300 Mbytes of memory, just in case some network trafic
> >>>will occur.
> >>>
> >>>Lets do something about that ?
> >>
> >>Yep, it is ~8MB per queue, not so much alone, but a lot together. For
> >>this reason I use something like bnx2.num_queues=2 on servers where I
> >>don't need much CPU power for network workload.
> > 
> > 
> > I think simply that the RX queue size should be scaled by the number
> > of queues we have.

There are few factors that can be considered when scaling the ring
sizes:
- Number of queues per device
- Number of devices
- Available amount of memory
- Others...

I'm thinking about adding a factor only according to the number of
queues - this will still cause issues for systems with many ports. Does
that sound reasonable or not enough? Do you think the number of devices
or even the amount of free memory should be considered?

Thanks,
Eilon

> > If people want enormous RX ring sizes even when there are many queues,
> > they can use ethtool to get that.
> > 
> > Taking up 130MB of memory per-card, just for RX packet buffers, is
> > certainly over the top.
> 
> It gets even better if one consideres JumboFrames...  that said, I've had 
> customer contacts (indirect) where they were quite keep to have a ring size of 
> at least 2048 packets - I never could get it confirmed, but I suspect they had 
> applications/systems that might "go out to lunch" for long-enough periods of 
> time they wanted that degree of FIFO.
> 
> Doesn't necessarily change "what should be the defaults" much but there it is.
> 
> rick jones
> 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] bnx2x: Insane RX rings
  2010-09-10 11:16       ` Eilon Greenstein
@ 2010-09-10 15:46         ` Rick Jones
  2010-09-10 15:54           ` Rick Jones
  2010-09-10 16:42         ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Rick Jones @ 2010-09-10 15:46 UTC (permalink / raw)
  To: eilong
  Cc: David Miller, ole@ans.pl, eric.dumazet@gmail.com,
	netdev@vger.kernel.org

>>>I think simply that the RX queue size should be scaled by the number
>>>of queues we have.
> 
> 
> There are few factors that can be considered when scaling the ring
> sizes:
> - Number of queues per device
> - Number of devices
> - Available amount of memory
> - Others...
 >
> I'm thinking about adding a factor only according to the number of
> queues - this will still cause issues for systems with many ports. Does
> that sound reasonable or not enough? Do you think the number of devices
> or even the amount of free memory should be considered?

At one level we are talking about horses and barn doors - for example, the 
minimum memory requirements for ProLiants have already been set (and 
communicated for some time) taking memory usage of their LOMs (Lan On 
Motherboard) into account.

rick jones

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] bnx2x: Insane RX rings
  2010-09-10 15:46         ` Rick Jones
@ 2010-09-10 15:54           ` Rick Jones
  0 siblings, 0 replies; 8+ messages in thread
From: Rick Jones @ 2010-09-10 15:54 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: eilong, David Miller, ole@ans.pl, eric.dumazet@gmail.com

> At one level we are talking about horses and barn doors - for example, 
> the minimum memory requirements for ProLiants have already been set (and 
> communicated for some time) taking memory usage of their LOMs (Lan On 
> Motherboard) into account.

Their 10GbE LOMs that is...just to state the implicitly obvious.

rick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] bnx2x: Insane RX rings
  2010-09-10 11:16       ` Eilon Greenstein
  2010-09-10 15:46         ` Rick Jones
@ 2010-09-10 16:42         ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2010-09-10 16:42 UTC (permalink / raw)
  To: eilong; +Cc: rick.jones2, ole, eric.dumazet, netdev

From: "Eilon Greenstein" <eilong@broadcom.com>
Date: Fri, 10 Sep 2010 14:16:14 +0300

> There are few factors that can be considered when scaling the ring
> sizes:
> - Number of queues per device
> - Number of devices
> - Available amount of memory
> - Others...
> 
> I'm thinking about adding a factor only according to the number of
> queues - this will still cause issues for systems with many ports. Does
> that sound reasonable or not enough? Do you think the number of devices
> or even the amount of free memory should be considered?

I think scaling based upon the number of queues is a good place
to start.

Multi-port is less of an issue.  The problem we really care about is
stemming from the fact that the same exact port will require more
memory than another one simply because it has more queues active.

I would even argue that this is a zero sum thing to do, because
since the traffic ought to be distributed, you have enough buffers
to handle the load.

Of course I understand that a certain level of buffering is necessary
even on a per-queue level with many queues active, so if you scale
based upon the number of queues but then enforce a minimum (something
like 128 entries) that would be a reasonable thing to do.

Thanks for looking into this Eilon.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-10 16:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-09 20:45 [RFC] bnx2x: Insane RX rings Eric Dumazet
2010-09-09 21:21 ` Krzysztof Olędzki
2010-09-09 21:30   ` David Miller
2010-09-09 21:38     ` Rick Jones
2010-09-10 11:16       ` Eilon Greenstein
2010-09-10 15:46         ` Rick Jones
2010-09-10 15:54           ` Rick Jones
2010-09-10 16:42         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).