netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Using ethernet device as efficient small packet generator
@ 2010-12-30  1:11 Loke, Chetan
  2011-01-21 11:44 ` juice
  0 siblings, 1 reply; 28+ messages in thread
From: Loke, Chetan @ 2010-12-30  1:11 UTC (permalink / raw)
  To: Jon Zhou, juice, Eric Dumazet, Stephen Hemminger, netdev

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Jon Zhou
> Sent: December 23, 2010 3:58 AM
> To: juice@swagman.org; Eric Dumazet; Stephen Hemminger;
> netdev@vger.kernel.org
> Subject: RE: Using ethernet device as efficient small packet generator
> 
> 
> At another old kernel(2.6.16) with tg3 and bnx2 1G NIC,XEON E5450, I
> only got 490K pps(it is about 300Mbps,30% GE), I think the reason is
> multiqueue unsupported in this kernel.
> 
> I will do a test with 1Gb nic on the new kernel later.
> 


I can hit close to 1M pps(first time every time) w/ a 64-byte payload on
my VirtualMachine(running 2.6.33) via vmxnet3 vNIC - 


[root@localhost ~]# cat /proc/net/pktgen/eth2
Params: count 0  min_pkt_size: 60  max_pkt_size: 60
     frags: 0  delay: 0  clone_skb: 0  ifname: eth2
     flows: 0 flowlen: 0
     queue_map_min: 0  queue_map_max: 0
     dst_min: 192.168.222.2  dst_max:
        src_min:   src_max:
     src_mac: 00:50:56:b1:00:19 dst_mac: 00:50:56:c0:00:3e
     udp_src_min: 9  udp_src_max: 9  udp_dst_min: 9  udp_dst_max: 9
     src_mac_count: 0  dst_mac_count: 0
     Flags:
Current:
     pkts-sofar: 59241012  errors: 0
     started: 1898437021us  stopped: 1957709510us idle: 9168us
     seq_num: 59241013  cur_dst_mac_offset: 0  cur_src_mac_offset: 0
     cur_saddr: 0x0  cur_daddr: 0x2dea8c0
     cur_udp_dst: 9  cur_udp_src: 9
     cur_queue_map: 0
     flows: 0
Result: OK: 59272488(c59263320+d9168) nsec, 59241012 (60byte,0frags)
  999468pps 479Mb/sec (479744640bps) errors: 0



Chetan

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Using ethernet device as efficient small packet generator
@ 2010-12-23  5:15 juice
  2010-12-23  8:57 ` Jon Zhou
  0 siblings, 1 reply; 28+ messages in thread
From: juice @ 2010-12-23  5:15 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger, netdev

> Reaching 1Gbs should not be a problem (I was speaking about 10Gbps)
> I reach link speed with my tg3 card and one single cpu :)
> (Broadcom Corporation NetXtreme BCM5715S Gigabit Ethernet (rev a3))
>
> Please provide : ethtool -S eth0
>

This is from the e1000 interface:
03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 01)

root@a2labralinux:/home/juice# ethtool -S eth1
NIC statistics:
     rx_packets: 192069
     tx_packets: 60000313
     rx_bytes: 33850492
     tx_bytes: 3840026215
     rx_broadcast: 192069
     tx_broadcast: 3
     rx_multicast: 0
     tx_multicast: 310
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 0
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 1806437
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 33850492
     rx_csum_offload_good: 8978
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0


This is from the tg3 interface:
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5761
Gigabit Ethernet PCIe (rev 10)

root@d8labralinux:/home/juice# ethtool -S eth2
NIC statistics:
     rx_octets: 10814
     rx_fragments: 0
     rx_ucast_packets: 20
     rx_mcast_packets: 0
     rx_bcast_packets: 26
     rx_fcs_errors: 0
     rx_align_errors: 0
     rx_xon_pause_rcvd: 0
     rx_xoff_pause_rcvd: 0
     rx_mac_ctrl_rcvd: 0
     rx_xoff_entered: 0
     rx_frame_too_long_errors: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_in_length_errors: 0
     rx_out_length_errors: 0
     rx_64_or_less_octet_packets: 0
     rx_65_to_127_octet_packets: 0
     rx_128_to_255_octet_packets: 0
     rx_256_to_511_octet_packets: 0
     rx_512_to_1023_octet_packets: 0
     rx_1024_to_1522_octet_packets: 0
     rx_1523_to_2047_octet_packets: 0
     rx_2048_to_4095_octet_packets: 0
     rx_4096_to_8191_octet_packets: 0
     rx_8192_to_9022_octet_packets: 0
     tx_octets: 5120013863
     tx_collisions: 0
     tx_xon_sent: 0
     tx_xoff_sent: 0
     tx_flow_control: 0
     tx_mac_errors: 0
     tx_single_collisions: 0
     tx_mult_collisions: 0
     tx_deferred: 0
     tx_excessive_collisions: 0
     tx_late_collisions: 0
     tx_collide_2times: 0
     tx_collide_3times: 0
     tx_collide_4times: 0
     tx_collide_5times: 0
     tx_collide_6times: 0
     tx_collide_7times: 0
     tx_collide_8times: 0
     tx_collide_9times: 0
     tx_collide_10times: 0
     tx_collide_11times: 0
     tx_collide_12times: 0
     tx_collide_13times: 0
     tx_collide_14times: 0
     tx_collide_15times: 0
     tx_ucast_packets: 80000034
     tx_mcast_packets: 42
     tx_bcast_packets: 40
     tx_carrier_sense_errors: 0
     tx_discards: 0
     tx_errors: 0
     dma_writeq_full: 0
     dma_write_prioq_full: 0
     rxbds_empty: 0
     rx_discards: 0
     rx_errors: 0
     rx_threshold_hit: 0
     dma_readq_full: 0
     dma_read_prioq_full: 0
     tx_comp_queue_full: 0
     ring_set_send_prod_index: 0
     ring_status_update: 0
     nic_irqs: 0
     nic_avoided_irqs: 0
     nic_tx_threshold_hit: 0




^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Using ethernet device as efficient small packet generator
@ 2010-12-22  7:30 juice
  2010-12-22  8:08 ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: juice @ 2010-12-22  7:30 UTC (permalink / raw)
  To: Stephen Hemminger, netdev

> On Tue, 21 Dec 2010 11:56:42 +0200 shemminger wrote:
> I regularly get full 1G line rate of 64 byte packets using old Opteron
box and pktgen.  It does require some tuning of IRQ's and interrupt
mitigation but
> no patches. Did you remember to do the basic stuff like setting IRQ
affinity
> and not enabling debugging or tracing in the kernel? This is on sky2,
but
> also using e1000 and tg3. Others have reported 7M packets per second
over
> 10G cards.
> The r8169 hardware is low end consumer hardware and doesn't work as
well.
> It is possible to get close to 1G line rate forwarding with a single
core
> with current
> generation processors. Actual rate depends on hardware and configuration
(size of route
> table, firewalling, etc).  Much better performance with multi-queue
hardware to spread load
> over multiple cores.

I did my testing on two kinds of boxes we use in our lab, an older Pomi
Supermicro with e1000 and a newer Dell T3500 with tg3 and r8169.
Both computers have dual-core 2.4G Xeon Cpus, but with somewhat different
model and stepping.
Both boxes are running the same OS, Ubuntu 2.6.32-26-generic #48.

Could you share some information on the required interrupt tuning? It
would certainly be easiest if the full line rate can be achieved without
any patching of drivers or hindering normal eth/ip interface operation.

Yours, Jussi Ohenoja





^ permalink raw reply	[flat|nested] 28+ messages in thread
* Using ethernet device as efficient small packet generator
@ 2010-12-21  9:56 juice
  2010-12-21 18:22 ` Stephen Hemminger
  0 siblings, 1 reply; 28+ messages in thread
From: juice @ 2010-12-21  9:56 UTC (permalink / raw)
  To: netdev


Hi net-devvers.

I am involved in telecom equipment R&D, and I need to do some network
performance benchmarking. We need to generate streams of Ethernet/IP/UDP
traffic that consists of different sized payloads ranging from smallest
AMR payload to ethernet MTU.

We have various tools including for example Spirent traffic generators
as well as in-house made software generating 3GPP specified protocol
streams. Now, the problem with the off-the-shelf generators is the
inflexibility in our needs and the unavailability to R&D personnel to
have the generator available at any given time.

For larger packet sizes our linux-based generator is quite sufficent,
as I can use it to fully saturate GE link with packet sizes around 1kB.
However, as packet sizes get smalles ethernet performance suffers.

I did some benchmarking using pktgen with 64B packets against AX4000 and
confirmed that the maximun throughput is only around 25% of GE capacity.
I managed to get to about same speeds using own custom module that writes
skbuffs directly to kernel *xmit of the netdev.

Now, it is evident that something is not optimized to the maximum here
as PCI bus allows for way higher transfer speeds. If large packets can
fully saturate the ethernet link same should apply for minimum sized
packets too, unless there is some overhead I am unaware of.

I have couple of questions here:

1.) Is it possible to enhance the "normal" behaving network driver so
    that the device would still work as an ethernet device (ethxx)?

    Currently the test stream is generated in userland process that
    writes to RAW_SOCK, but it is OK for me if I need to write the
    packet generating part as a kernel module that is configured
    from the userland part to send the prepared stream out.

2.) If it is not possible to get the needed performance from normal
    network architecture, is it possible to make a "generate only"
    ethernet device that I can use to replace the network card driver?

    For example, RX is not really needed at all by my application, so
    just optimizing the driver to send out packets from memory as fast
    as possible is enough.

    Are there notable differences between ethernet chipsets/cards
    regarding to the raw output speed they are capable?
    I have benchmarked e1000, r8169 ang tg3 based cards and with all
    of those I get about same throughput of 64byte ethernet frames.

    For my purpose, it would be OK, for example, to remove the normal
    r8169 driver and replace it with a custom TX-only driver, and use
    some other normal driver tied to another card to access the box.

I appreciate your comments and any pointers to existing projects that
have similar implementation that I require.

Yours, Jussi Ohenoja




^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-02-02  8:13 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-30  1:11 Using ethernet device as efficient small packet generator Loke, Chetan
2011-01-21 11:44 ` juice
2011-01-21 11:51   ` Eric Dumazet
2011-01-21 12:12     ` juice
2011-01-21 13:38       ` Ben Greear
2011-01-21 22:09   ` Brandeburg, Jesse
2011-01-23 21:48     ` juice
2011-01-24  8:10       ` juice
2011-01-24  9:18         ` Eric Dumazet
2011-01-24 16:34         ` Eric Dumazet
2011-01-24 20:51           ` juice
2011-02-02  8:13       ` juice
  -- strict thread matches above, loose matches on Subject: below --
2010-12-23  5:15 juice
2010-12-23  8:57 ` Jon Zhou
2010-12-23 10:50   ` juice
2010-12-22  7:30 juice
2010-12-22  8:08 ` Eric Dumazet
2010-12-22 11:11   ` juice
2010-12-22 11:28     ` Eric Dumazet
2010-12-22 15:48   ` Jon Zhou
2010-12-22 15:59     ` Eric Dumazet
2010-12-22 16:52       ` Jon Zhou
2010-12-22 17:18         ` Eric Dumazet
2010-12-22 17:40           ` Jon Zhou
2010-12-22 17:51             ` Eric Dumazet
2010-12-22 17:15       ` Jon Zhou
2010-12-21  9:56 juice
2010-12-21 18:22 ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).