netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] New driver API to speed up small packets xmits
@ 2007-05-11  7:14 Krishna Kumar2
  0 siblings, 0 replies; 96+ messages in thread
From: Krishna Kumar2 @ 2007-05-11  7:14 UTC (permalink / raw)
  To: David Stevens
  Cc: Evgeniy Polyakov, Krishna Kumar2, netdev, netdev-owner,
	Rick Jones


(Mistaken didn't reply-all previous time)

Hi Dave,

David Stevens <dlstevens@us.ibm.com> wrote on 05/11/2007 02:57:56 AM:

> The word "small" is coming up a lot in this discussion, and
> I think packet size really has nothing to do with it. Multiple
> streams generating packets of any size would benefit; the
> key ingredient is a queue length greater than 1.

Correct. Re-thinking about this - for larger packets the bandwidth
may not improve as it reaches line-speed, but I guess CPU util could
reduce (which could reduce in small packet case too). Using "small"
was not covering this case.

Thanks,

- KK

> I think the intent is to remove queue lock cycles by taking
> the whole list (at least up to the count of free ring buffers)
> when the queue is greater than one packet, thus effectively
> removing the lock expense for n-1 packets.



^ permalink raw reply	[flat|nested] 96+ messages in thread
* [RFC] New driver API to speed up small packets xmits
@ 2007-05-10 14:53 Krishna Kumar
  2007-05-10 15:08 ` Evgeniy Polyakov
                   ` (2 more replies)
  0 siblings, 3 replies; 96+ messages in thread
From: Krishna Kumar @ 2007-05-10 14:53 UTC (permalink / raw)
  To: netdev; +Cc: krkumar2, Krishna Kumar

Hi all,

While looking at common packet sizes on xmits, I found that most of
the packets are small. On my personal system, the statistics of
packets after using (browsing, mail, ftp'ing two linux kernels from
www.kernel.org) for about 6 hours is :

-----------------------------------------------------------
	Packet Size	#packets (Total:60720)	Percentage
-----------------------------------------------------------
	32 		0 			0
	64 		7716 			12.70
	80 		40193 			66.19
sub-total:		47909			78.90 %

	96 		2007 			3.30
	128 		1917 			3.15
sub-total:		3924			6.46 %

	256 		1822 			3.00
	384 		863 			1.42
	512 		459 			.75
sub-total:		3144			5.18 %

	640 		763 			1.25
	768 		2329 			3.83
	1024 		1700 			2.79
	1500 		461 			.75
sub-total:		5253			8.65 %

	2048 		312 			.51
	4096 		53 			.08
	8192 		84 			.13
	16384 		41 			.06
	32768+ 		0 			0
sub-total:		490			0.81 %
-----------------------------------------------------------

Doing some measurements, I found that for small packets like 128 bytes,
the bandwidth is approximately 60% of the line speed. To possibly speed
up performance of small packet xmits, a method of "linking" skbs was
thought of - where two pointers (skb_flink/blink) is added to the skb.
It is felt (no data yet) that drivers will get better results when more
number of "linked" skbs are sent to it in one shot, rather than sending
each skb independently (where for each skb, extra call to driver is
made and also the driver needs to get/drop lock, etc). The method is to
send as many packets as possible from qdisc (eg multiple packets can
accumulate if the driver is stopped or trylock failed) if the device
supports the new API. Steps for enabling API for a driver is :

	- driver needs to set NETIF_F_LINKED_SKB before netdev_register
	- register_netdev sets a new tx_link_skbs tunable parameter in
	  dev to 1, indicating that the driver supports linked skbs.
	- driver implements the new API - hard_start_xmit_link to
	  handle linked skbs, which is mostly a simple task. Eg,
	  support for e1000 driver can be added, avoiding duplicating
	  existing code as :

	e1000_xmit_frame_link()
	{
	top:
		next_skb = skb->linked
		(original driver code here)
		skb = next_skb;
		if (skb)
			goto top;
		...
	}

	e1000_xmit_frame()
	{
		return e1000_xmit_frame_link(skb, NULL, dev);
	}

	Drivers can take other approaches, eg, get lock at the top and
	handle all the packets in one shot, or get/drop locks for each
	skb; but those are internal to the driver. In any case, driver
	changes to support (optional) this API is minimal.

The main change is in core/sched code. Qdisc links packets if the
device supports it and multiple skbs are present, and calls
dev_hard_start_xmit, which calls one of the two API's depending on
whether the passed skb is linked or not. A sys interface can set or
reset the tx_link_skbs parameter for the device to use the old or the
new driver API.

The reason to implement the same was to speed up IPoIB driver. But
before doing that, a proof of concept for E1000/AMSO drivers was
considered (as most of the code is generic) before implementing for
IPoIB. I do not have test results at this time but I am working on it.

Please let me know if this approach is acceptable, or any suggestions.

Thanks,

- KK

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2007-05-22 23:12 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <OF0CAD6D87.DBE62968-ON872572DC.0073646A-882572DC.0073BEC2@us.ibm.com>
2007-05-15 21:17 ` [RFC] New driver API to speed up small packets xmits Roland Dreier
     [not found]   ` <OFF5654BB8.74EC8DCB-ON872572DC.00752079-882572DC.00756B23@us.ibm.com>
2007-05-15 21:25     ` Roland Dreier
     [not found]       ` <OF21D475A2.5E5C88DE-ON872572DC.00763DE4-882572DC.00768A7E@us.ibm.com>
2007-05-15 21:38         ` David Miller
2007-05-15 21:32     ` David Miller
2007-05-15 22:17       ` [WIP] [PATCH] WAS " jamal
2007-05-15 22:48         ` jamal
2007-05-16  0:50           ` jamal
2007-05-16 22:12         ` Sridhar Samudrala
2007-05-16 22:52           ` jamal
2007-05-17  3:25             ` jamal
2007-05-18 12:07               ` jamal
2007-05-17  4:03           ` Krishna Kumar2
2007-05-16 21:44             ` Sridhar Samudrala
2007-05-17  5:01               ` Krishna Kumar2
     [not found]       ` <OF6757F56D.EE5984FD-ON872572DC.0081026C-882572DC.00814B8F@us.ibm.com>
2007-05-15 23:36         ` David Miller
2007-05-21  7:56       ` Herbert Xu
     [not found]         ` <OF9ABCD08D.2CD1B193-ON872572E3.007A6FC1-882572E3.007ACE1A@us.ibm.com>
2007-05-22 22:36           ` David Miller
     [not found]             ` <OFCF3EB7F8.9740C0C7-ON872572E3.007DADF6-882572E3.007E0E7B@us.ibm.com>
2007-05-22 23:04               ` David Miller
2007-05-22 23:12             ` Herbert Xu
2007-05-11  7:14 Krishna Kumar2
  -- strict thread matches above, loose matches on Subject: below --
2007-05-10 14:53 Krishna Kumar
2007-05-10 15:08 ` Evgeniy Polyakov
2007-05-10 15:22   ` Krishna Kumar2
2007-05-10 15:48     ` Evgeniy Polyakov
2007-05-10 16:08       ` jamal
2007-05-10 17:19     ` Rick Jones
2007-05-10 18:07       ` Sridhar Samudrala
2007-05-10 19:43         ` Gagan Arneja
2007-05-10 20:11           ` jamal
2007-05-10 20:14             ` Rick Jones
2007-05-10 20:15               ` jamal
2007-05-10 20:15             ` Gagan Arneja
2007-05-10 20:21               ` jamal
2007-05-10 20:25                 ` Gagan Arneja
2007-05-11  5:22             ` Krishna Kumar2
2007-05-11 11:27               ` jamal
2007-05-10 20:37           ` David Miller
2007-05-10 20:40             ` Gagan Arneja
2007-05-10 20:57               ` David Miller
2007-05-11  6:07                 ` Krishna Kumar2
2007-05-11  5:21             ` Krishna Kumar2
2007-05-11  5:20           ` Krishna Kumar2
2007-05-11  5:35             ` Gagan Arneja
2007-05-11  5:43               ` Krishna Kumar2
2007-05-11  5:57                 ` Gagan Arneja
2007-05-11  6:06                   ` Krishna Kumar2
2007-05-11  6:29                     ` Gagan Arneja
2007-05-11  6:52                       ` Krishna Kumar2
2007-05-10 18:13       ` Vlad Yasevich
2007-05-10 18:20         ` Rick Jones
2007-05-10 18:32           ` Vlad Yasevich
2007-05-10 18:40             ` Rick Jones
2007-05-10 18:59             ` Ian McDonald
2007-05-10 19:21               ` Vlad Yasevich
2007-05-10 19:26                 ` Ian McDonald
2007-05-10 20:32                 ` David Miller
2007-05-10 20:49                   ` Rick Jones
2007-05-10 21:02                     ` David Miller
2007-05-10 21:14                       ` Gagan Arneja
2007-05-11  2:28                         ` Stephen Hemminger
2007-05-11  5:01                           ` Gagan Arneja
2007-05-11  5:04               ` Krishna Kumar2
2007-05-11  9:01                 ` Evgeniy Polyakov
2007-05-11  9:18                   ` Krishna Kumar2
2007-05-11  9:32                     ` Evgeniy Polyakov
2007-05-11  9:52                       ` Krishna Kumar2
2007-05-11  9:56                         ` Evgeniy Polyakov
2007-05-11 11:30                           ` jamal
2007-05-11 11:53                             ` Evgeniy Polyakov
2007-05-11 12:15                               ` jamal
2007-05-10 21:27       ` David Stevens
2007-05-10 21:40         ` David Miller
2007-05-10 21:50           ` Gagan Arneja
2007-05-10 22:06             ` David Miller
2007-05-11  9:46               ` Krishna Kumar2
2007-05-10 21:41         ` Eric Dumazet
2007-05-10 22:09           ` Rick Jones
2007-05-10 21:45         ` Rick Jones
2007-05-10 21:53           ` David Stevens
2007-05-10 20:21 ` Roland Dreier
2007-05-11  7:30   ` Krishna Kumar2
2007-05-11 11:21     ` Roland Dreier
2007-05-11  9:05 ` Andi Kleen
2007-05-11  8:32   ` Krishna Kumar2
2007-05-11  9:37     ` Andi Kleen
2007-05-11  8:50       ` Krishna Kumar2
2007-05-11 11:16       ` Roland Dreier
2007-05-13  6:00         ` Andi Kleen
2007-05-15 16:25           ` Roland Dreier
2007-05-15 20:18             ` David Miller
2007-05-15 20:52               ` Roland Dreier
2007-05-15 21:48                 ` Michael Chan
2007-05-15 21:08                   ` Roland Dreier
2007-05-15 22:05                     ` Michael Chan
2007-05-15 21:28                       ` David Miller
2007-05-18  7:04                         ` Michael Chan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).