[PATCH 0/10 Rev4] Implement skb batching and support in IPoIB

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB
@ 2007-08-22  8:28 Krishna Kumar
  2007-08-22  8:28 ` [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching Krishna Kumar
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:28 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, netdev, rdreier,
	peter.p.waskiewicz.jr, mcarlson, jeff, mchan, general, kumarkr,
	tgraf, Krishna Kumar, sri

This set of patches implements the batching xmit capability (changed from
API), and adds support for batching in IPoIB. Also included is a sample patch
for E1000 (ported - thanks to Jamal's E1000 changes from earlier kernel). I
will use this patch for testing E1000 TSO vs batching after the weekend.

List of changes from previous revision:
----------------------------------------
1. [Dave/Patrick] Remove new xmit API altogether (and add a capabilities
	flag in dev->features). Modify documentation to remove API, etc.
2. [Evgeniy] Remove bogus checks for <0, and use spin_lock_bh.
3. [Jamal] Ported Jamal's E1000 driver changes for using batching xmit.
5. [KK] Fix out-of-order sending of skbs bug resulting in re-transmissions
	by a fix in IPoIB [see XXX].
6. [KK] Do not force device to use batching as default, instead let user
	enable batching if required. This is useful in case users are not
	aware that batching is taking place.
4. [KK] IPoIB: Remove multiple xmit handlers and convert to use one.
7. [KK] IPoIB: Removed overkill - poll handler can be called on one CPU, so
	there is no need to take a new lock against parallel WC's.

Extras that I can do later:
---------------------------
1. [Patrick] Use skb_blist statically in netdevice. This could also be used
	to integrate GSO and batching.
2. [Evgeniy] Useful to splice lists dev_add_skb_to_blist (and this can be
	done for regular xmit's of GSO skbs too for #1 above).

Patches are described as:
		 Mail 0/10:  This mail
		 Mail 1/10:  HOWTO documentation
		 Mail 2/10:  Introduce skb_blist, NETIF_F_BATCH_SKBS, use
		 	     single API for batching/no-batching, etc.
		 Mail 3/10:  Modify qdisc_run() to support batching
		 Mail 4/10:  Add ethtool support to enable/disable batching
		 Mail 5/10:  IPoIB: Header file changes to use batching
		 Mail 6/10:  IPoIB: CM & Multicast changes
		 Mail 7/10:  IPoIB: Verbs changes to use batching
		 Mail 8/10:  IPoIB: Internal post and work completion handler
		 Mail 9/10:  IPoIB: Implement the new batching capability
		 Mail 10/10: E1000: Implement the new batching capability

Issues:
--------
I am getting a huge amount of retransmissions for both TCP and TCP No Delay
cases for IPoIB (which explains the slight degradation for some test cases
mentioned in previous mail). After a full test run, there were 18500
retransmissions for every 1 in regular code. But there is 20.7% overall
improvement in BW even with this huge amount of retransmissions (which implies
batching could improve results even more if this problem is fixed). Results of
experiments are:
	a. With batching set to maximum 2 skbs, I get almost the same number
	   of retransmissions (implies receiver probably is not dropping skbs).
	   ifconfig/netstat on receiver gives no clue (drop/errors, etc).
	b. Making the IPoIB xmit create single work requests for each skb on
	   blist reduces retrans to same as in regular code.
	c. Similar retransmission increase is not seen for E1000.

Please review and provide feedback; and consider for inclusion.

Thanks,

- KK

[XXX] Dave had suggested to use batching only in the net_tx_action case.
When I implemented that in earlier revisions, there were lots of TCP
retransmissions (about 18,000 to every 1 in regular code). I found the reason
for part of that problem as: skbs get queue'd up in dev->qdisc (when tx lock
was not got or queue blocked); when net_tx_action is called later, it passes
the batch list as argument to qdisc_run and this results in skbs being moved
to the batch list; then batching xmit also fails due to tx lock failure; the
next many regular xmit of a single skb will go through the fast path (pass
NULL batch list to qdisc_run) and send those skbs out to the device while
previous skbs are cooling their heels in the batch list.

The first fix was to not pass NULL/batch-list to qdisc_run() but to always
check whether skbs are present in batch list when trying to xmit. This reduced
retransmissions by a third (from 18,000 to around 12,000), but led to another
problem while testing - iperf transmit almost zero data for higher # of
parallel flows like 64 or more (and when I run iperf for a 2 min run, it
takes about 5-6 mins, and reports that it ran 0 secs and the amount of data
transfered is a few MB's). I don't know why this happens with this being the
only change (any ideas is very appreciated).

The second fix that resolved this was to revert back to Dave's suggestion to
use batching only in net_tx_action case, and modify the driver to see if skbs
are present in batch list and to send them out first before sending the
current skb. I still see huge retransmission for IPoIB (but not for E1000),
though it has reduced to 12,000 from the earlier 18,000 number.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
@ 2007-08-22  8:28 ` Krishna Kumar
  2007-08-22 15:50   ` [ofa-general] " Randy Dunlap
  2007-08-22  8:29 ` [PATCH 2/10 Rev4] [core] Add skb_blist & support " Krishna Kumar
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:28 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, peter.p.waskiewicz.jr, kumarkr, xma,
	gaagaan, netdev, rdreier, rick.jones2, mcarlson, jeff, general,
	mchan, tgraf, Krishna Kumar, sri

Add Documentation describing batching skb xmit capability.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 batching_skb_xmit.txt |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 78 insertions(+)

diff -ruNp org/Documentation/networking/batching_skb_xmit.txt new/Documentation/networking/batching_skb_xmit.txt
--- org/Documentation/networking/batching_skb_xmit.txt	1970-01-01 05:30:00.000000000 +0530
+++ new/Documentation/networking/batching_skb_xmit.txt	2007-08-22 10:21:19.000000000 +0530
@@ -0,0 +1,78 @@
+		 HOWTO for batching skb xmit support
+		 -----------------------------------
+
+Section 1: What is batching skb xmit
+Section 2: How batching xmit works vs the regular xmit
+Section 3: How drivers can support batching
+Section 4: How users can work with batching
+
+
+Introduction: Kernel support for batching skb
+----------------------------------------------
+
+A new capability to support xmit of multiple skbs is provided in the netdevice
+layer. Drivers which enable this capability should be able to process multiple
+skbs in a single call to their xmit handler.
+
+
+Section 1: What is batching skb xmit
+-------------------------------------
+
+	This capability is optionally enabled by a driver by setting the
+	NETIF_F_BATCH_SKBS bit in dev->features. The pre-requisite for a
+	driver to use this capability is that it should have a reasonably
+	sized hardware queue that can process multiple skbs.
+
+
+Section 2: How batching xmit works vs the regular xmit
+-------------------------------------------------------
+
+	The network stack gets called from upper layer protocols with a single
+	skb to transmit. This skb is first enqueue'd and an attempt is made to
+	transmit it immediately (via qdisc_run). However, events like tx lock
+	contention, tx queue stopped, etc, can result in the skb not getting
+	sent out and it remains in the queue. When the next xmit is called or
+	when the queue is re-enabled, qdisc_run could potentially find
+	multiple packets in the queue, and iteratively send them all out
+	one-by-one.
+
+	Batching skb xmit is a mechanism to exploit this situation where all
+	skbs can be passed in one shot to the device. This reduces driver
+	processing, locking at the driver (or in stack for ~LLTX drivers)
+	gets amortized over multiple skbs, and in case of specific drivers
+	where every xmit results in a completion processing (like IPoIB) -
+	optimizations can be made in the driver to request a completion for
+	only the last skb that was sent which results in saving interrupts
+	for every (but the last) skb that was sent in the same batch.
+
+	Batching can result in significant performance gains for systems that
+	have multiple data stream paths over the same network interface card.
+
+
+Section 3: How drivers can support batching
+---------------------------------------------
+
+	Batching requires the driver to set the NETIF_F_BATCH_SKBS bit in
+	dev->features.
+
+	The driver's xmit handler should be modified to process multiple skbs
+	instead of one skb. The driver's xmit handler is called either with a
+	skb to transmit or NULL skb, where the latter case should be handled
+	as a call to xmit multiple skbs. This is done by sending out all skbs
+	in the dev->skb_blist list (where it was added by the core stack).
+
+
+Section 4: How users can work with batching
+---------------------------------------------
+
+	Batching can be disabled for a particular device, e.g. on desktop
+	systems if only one stream of network activity for that device is
+	taking place, since performance could be slightly affected due to
+	extra processing that batching adds (unless packets are getting
+	sent fast resulting in stopped queue's). Batching can be enabled if
+	more than one stream of network activity per device is being done,
+	e.g. on servers; or even desktop usage with multiple browser, chat,
+	file transfer sessions, etc.
+
+	Per device batching can be enabled/disabled by passing 'on' or 'off'
+	respectively to ethtool.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ofa-general] Re: [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching
  2007-08-22  8:28 ` [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching Krishna Kumar
@ 2007-08-22 15:50   ` Randy Dunlap
  2007-08-23  2:48     ` Krishna Kumar2
  0 siblings, 1 reply; 15+ messages in thread
From: Randy Dunlap @ 2007-08-22 15:50 UTC (permalink / raw)
  To: Krishna Kumar
  Cc: johnpol, jagana, herbert, gaagaan, Robert.Olsson, kumarkr,
	rdreier, peter.p.waskiewicz.jr, hadi, kaber, jeff, general,
	netdev, tgraf, mcarlson, sri, shemminger, davem, mchan

On Wed, 22 Aug 2007 13:58:58 +0530 Krishna Kumar wrote:

> Add Documentation describing batching skb xmit capability.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>  batching_skb_xmit.txt |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 78 insertions(+)
> 
> diff -ruNp org/Documentation/networking/batching_skb_xmit.txt new/Documentation/networking/batching_skb_xmit.txt
> --- org/Documentation/networking/batching_skb_xmit.txt	1970-01-01 05:30:00.000000000 +0530
> +++ new/Documentation/networking/batching_skb_xmit.txt	2007-08-22 10:21:19.000000000 +0530
> @@ -0,0 +1,78 @@
> +		 HOWTO for batching skb xmit support
> +		 -----------------------------------
> +
> +Section 1: What is batching skb xmit
> +Section 2: How batching xmit works vs the regular xmit
> +Section 3: How drivers can support batching
> +Section 4: How users can work with batching
> +
> +
> +Introduction: Kernel support for batching skb
> +----------------------------------------------
> +
> +A new capability to support xmit of multiple skbs is provided in the netdevice
> +layer. Drivers which enable this capability should be able to process multiple
> +skbs in a single call to their xmit handler.
> +
> +
> +Section 1: What is batching skb xmit
> +-------------------------------------
> +
> +	This capability is optionally enabled by a driver by setting the
> +	NETIF_F_BATCH_SKBS bit in dev->features. The pre-requisite for a

                                                     prerequisite

> +	driver to use this capability is that it should have a reasonably

	I would say "reasonably-sized".

> +	sized hardware queue that can process multiple skbs.
> +
> +
> +Section 2: How batching xmit works vs the regular xmit
> +-------------------------------------------------------
> +
> +	The network stack gets called from upper layer protocols with a single
> +	skb to transmit. This skb is first enqueue'd and an attempt is made to

                                           enqueued

> +	transmit it immediately (via qdisc_run). However, events like tx lock
> +	contention, tx queue stopped, etc, can result in the skb not getting

                                      etc.,

> +	sent out and it remains in the queue. When the next xmit is called or
> +	when the queue is re-enabled, qdisc_run could potentially find
> +	multiple packets in the queue, and iteratively send them all out
> +	one-by-one.
> +
> +	Batching skb xmit is a mechanism to exploit this situation where all
> +	skbs can be passed in one shot to the device. This reduces driver
> +	processing, locking at the driver (or in stack for ~LLTX drivers)
> +	gets amortized over multiple skbs, and in case of specific drivers
> +	where every xmit results in a completion processing (like IPoIB) -
> +	optimizations can be made in the driver to request a completion for
> +	only the last skb that was sent which results in saving interrupts
> +	for every (but the last) skb that was sent in the same batch.
> +
> +	Batching can result in significant performance gains for systems that
> +	have multiple data stream paths over the same network interface card.
> +
> +
> +Section 3: How drivers can support batching
> +---------------------------------------------
> +
> +	Batching requires the driver to set the NETIF_F_BATCH_SKBS bit in
> +	dev->features.
> +
> +	The driver's xmit handler should be modified to process multiple skbs
> +	instead of one skb. The driver's xmit handler is called either with a

                                                                           an

> +	skb to transmit or NULL skb, where the latter case should be handled
> +	as a call to xmit multiple skbs. This is done by sending out all skbs
> +	in the dev->skb_blist list (where it was added by the core stack).
> +
> +
> +Section 4: How users can work with batching
> +---------------------------------------------
> +
> +	Batching can be disabled for a particular device, e.g. on desktop
> +	systems if only one stream of network activity for that device is
> +	taking place, since performance could be slightly affected due to
> +	extra processing that batching adds (unless packets are getting
> +	sent fast resulting in stopped queue's). Batching can be enabled if

                                       queues).

> +	more than one stream of network activity per device is being done,
> +	e.g. on servers; or even desktop usage with multiple browser, chat,
> +	file transfer sessions, etc.
> +
> +	Per device batching can be enabled/disabled by passing 'on' or 'off'
> +	respectively to ethtool.

	with what other parameter(s), e.g.,

	ethtool <dev> batching on/off ?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching
  2007-08-22 15:50   ` [ofa-general] " Randy Dunlap
@ 2007-08-23  2:48     ` Krishna Kumar2
  0 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar2 @ 2007-08-23  2:48 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: davem, gaagaan, general, hadi, herbert, jagana, jeff, johnpol,
	kaber, kumarkr, mcarlson, mchan, netdev, peter.p.waskiewicz.jr,
	rdreier, rick.jones2, Robert.Olsson, shemminger, sri, tgraf, xma

Hi Randy,

Thanks for your suggestions. Will clean up those changes.

- KK

Randy Dunlap <randy.dunlap@oracle.com> wrote on 08/22/2007 09:20:13 PM:

> On Wed, 22 Aug 2007 13:58:58 +0530 Krishna Kumar wrote:
>
> > Add Documentation describing batching skb xmit capability.
> >
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> > ---
> >  batching_skb_xmit.txt |   78
++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 78 insertions(+)
> >
> > diff -ruNp org/Documentation/networking/batching_skb_xmit.txt
> new/Documentation/networking/batching_skb_xmit.txt
> > --- org/Documentation/networking/batching_skb_xmit.txt   1970-01-01
05:30:
> 00.000000000 +0530
> > +++ new/Documentation/networking/batching_skb_xmit.txt   2007-08-22
10:21:
> 19.000000000 +0530
> > @@ -0,0 +1,78 @@
> > +       HOWTO for batching skb xmit support
> > +       -----------------------------------
> > +
> > +Section 1: What is batching skb xmit
> > +Section 2: How batching xmit works vs the regular xmit
> > +Section 3: How drivers can support batching
> > +Section 4: How users can work with batching
> > +
> > +
> > +Introduction: Kernel support for batching skb
> > +----------------------------------------------
> > +
> > +A new capability to support xmit of multiple skbs is provided in the
netdevice
> > +layer. Drivers which enable this capability should be able to process
multiple
> > +skbs in a single call to their xmit handler.
> > +
> > +
> > +Section 1: What is batching skb xmit
> > +-------------------------------------
> > +
> > +   This capability is optionally enabled by a driver by setting the
> > +   NETIF_F_BATCH_SKBS bit in dev->features. The pre-requisite for a
>
>                                                      prerequisite
>
> > +   driver to use this capability is that it should have a reasonably
>
>    I would say "reasonably-sized".
>
> > +   sized hardware queue that can process multiple skbs.
> > +
> > +
> > +Section 2: How batching xmit works vs the regular xmit
> > +-------------------------------------------------------
> > +
> > +   The network stack gets called from upper layer protocols with a
single
> > +   skb to transmit. This skb is first enqueue'd and an attempt is made
to
>
>                                            enqueued
>
> > +   transmit it immediately (via qdisc_run). However, events like tx
lock
> > +   contention, tx queue stopped, etc, can result in the skb not
getting
>
>                                       etc.,
>
> > +   sent out and it remains in the queue. When the next xmit is called
or
> > +   when the queue is re-enabled, qdisc_run could potentially find
> > +   multiple packets in the queue, and iteratively send them all out
> > +   one-by-one.
> > +
> > +   Batching skb xmit is a mechanism to exploit this situation where
all
> > +   skbs can be passed in one shot to the device. This reduces driver
> > +   processing, locking at the driver (or in stack for ~LLTX drivers)
> > +   gets amortized over multiple skbs, and in case of specific drivers
> > +   where every xmit results in a completion processing (like IPoIB) -
> > +   optimizations can be made in the driver to request a completion for
> > +   only the last skb that was sent which results in saving interrupts
> > +   for every (but the last) skb that was sent in the same batch.
> > +
> > +   Batching can result in significant performance gains for systems
that
> > +   have multiple data stream paths over the same network interface
card.
> > +
> > +
> > +Section 3: How drivers can support batching
> > +---------------------------------------------
> > +
> > +   Batching requires the driver to set the NETIF_F_BATCH_SKBS bit in
> > +   dev->features.
> > +
> > +   The driver's xmit handler should be modified to process multiple
skbs
> > +   instead of one skb. The driver's xmit handler is called either with
a
>
>
an
>
> > +   skb to transmit or NULL skb, where the latter case should be
handled
> > +   as a call to xmit multiple skbs. This is done by sending out all
skbs
> > +   in the dev->skb_blist list (where it was added by the core stack).
> > +
> > +
> > +Section 4: How users can work with batching
> > +---------------------------------------------
> > +
> > +   Batching can be disabled for a particular device, e.g. on desktop
> > +   systems if only one stream of network activity for that device is
> > +   taking place, since performance could be slightly affected due to
> > +   extra processing that batching adds (unless packets are getting
> > +   sent fast resulting in stopped queue's). Batching can be enabled if
>
>                                        queues).
>
> > +   more than one stream of network activity per device is being done,
> > +   e.g. on servers; or even desktop usage with multiple browser, chat,
> > +   file transfer sessions, etc.
> > +
> > +   Per device batching can be enabled/disabled by passing 'on' or
'off'
> > +   respectively to ethtool.
>
>    with what other parameter(s), e.g.,
>
>    ethtool <dev> batching on/off ?
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code
***


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/10 Rev4] [core] Add skb_blist & support for batching
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
  2007-08-22  8:28 ` [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching Krishna Kumar
@ 2007-08-22  8:29 ` Krishna Kumar
  2007-08-22  8:29 ` [PATCH 3/10 Rev4] [sched] Modify qdisc_run to support batching Krishna Kumar
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:29 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, kumarkr,
	rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, Krishna Kumar,
	general, netdev, tgraf, mchan, sri

Introduce skb_blist, NETIF_F_BATCH_SKBS, use single API for
batching/no-batching, etc.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 include/linux/netdevice.h |    4 ++++
 net/core/dev.c            |   21 ++++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h	2007-08-20 14:26:36.000000000 +0530
+++ new/include/linux/netdevice.h	2007-08-22 08:42:10.000000000 +0530
@@ -399,6 +399,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_GSO		2048	/* Enable software GSO. */
 #define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_BATCH_SKBS	8192	/* Driver supports multiple skbs/xmit */
 #define NETIF_F_MULTI_QUEUE	16384	/* Has multiple TX/RX queues */
 #define NETIF_F_LRO		32768	/* large receive offload */
 
@@ -510,6 +511,9 @@ struct net_device
 	/* Partially transmitted GSO packet. */
 	struct sk_buff		*gso_skb;
 
+	/* List of batch skbs (optional, used if driver supports skb batching */
+	struct sk_buff_head	*skb_blist;
+
 	/* ingress path synchronizer */
 	spinlock_t		ingress_lock;
 	struct Qdisc		*qdisc_ingress;
diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c	2007-08-20 14:26:37.000000000 +0530
+++ new/net/core/dev.c	2007-08-22 10:49:22.000000000 +0530
@@ -898,6 +898,16 @@ void netdev_state_change(struct net_devi
 	}
 }
 
+static void free_batching(struct net_device *dev)
+{
+	if (dev->skb_blist) {
+		if (!skb_queue_empty(dev->skb_blist))
+			skb_queue_purge(dev->skb_blist);
+		kfree(dev->skb_blist);
+		dev->skb_blist = NULL;
+	}
+}
+
 /**
  *	dev_load 	- load a network module
  *	@name: name of interface
@@ -1458,7 +1468,9 @@ static int dev_gso_segment(struct sk_buf
 
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
-	if (likely(!skb->next)) {
+	if (likely(skb)) {
+		if (unlikely(skb->next))
+			goto gso;
 		if (!list_empty(&ptype_all))
 			dev_queue_xmit_nit(skb, dev);
 
@@ -1468,10 +1480,10 @@ int dev_hard_start_xmit(struct sk_buff *
 			if (skb->next)
 				goto gso;
 		}
-
-		return dev->hard_start_xmit(skb, dev);
 	}
 
+	return dev->hard_start_xmit(skb, dev);
+
 gso:
 	do {
 		struct sk_buff *nskb = skb->next;
@@ -3791,6 +3803,9 @@ void unregister_netdevice(struct net_dev
 
 	synchronize_net();
 
+	/* Deallocate batching structure */
+	free_batching(dev);
+
 	/* Shutdown queueing discipline. */
 	dev_shutdown(dev);
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/10 Rev4] [sched] Modify qdisc_run to support batching
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
  2007-08-22  8:28 ` [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching Krishna Kumar
  2007-08-22  8:29 ` [PATCH 2/10 Rev4] [core] Add skb_blist & support " Krishna Kumar
@ 2007-08-22  8:29 ` Krishna Kumar
  2007-08-22  8:29 ` [PATCH 4/10 Rev4] [ethtool] Add ethtool support Krishna Kumar
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:29 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, peter.p.waskiewicz.jr, xma, gaagaan,
	kumarkr, rdreier, rick.jones2, mcarlson, jeff, mchan, general,
	netdev, tgraf, Krishna Kumar, sri

Modify qdisc_run() to support batching. Modify callers of qdisc_run to
use batching, modify qdisc_restart to implement batching.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 include/linux/netdevice.h |    2 +
 include/net/pkt_sched.h   |    6 +--
 net/core/dev.c            |   44 +++++++++++++++++++++++++++-
 net/sched/sch_generic.c   |   70 ++++++++++++++++++++++++++++++++++++++--------
 4 files changed, 105 insertions(+), 17 deletions(-)

diff -ruNp org/include/net/pkt_sched.h new/include/net/pkt_sched.h
--- org/include/net/pkt_sched.h	2007-08-20 14:26:36.000000000 +0530
+++ new/include/net/pkt_sched.h	2007-08-22 09:23:57.000000000 +0530
@@ -80,13 +80,13 @@ extern struct qdisc_rate_table *qdisc_ge
 		struct rtattr *tab);
 extern void qdisc_put_rtab(struct qdisc_rate_table *tab);
 
-extern void __qdisc_run(struct net_device *dev);
+extern void __qdisc_run(struct net_device *dev, struct sk_buff_head *blist);
 
-static inline void qdisc_run(struct net_device *dev)
+static inline void qdisc_run(struct net_device *dev, struct sk_buff_head *blist)
 {
 	if (!netif_queue_stopped(dev) &&
 	    !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
-		__qdisc_run(dev);
+		__qdisc_run(dev, blist);
 }
 
 extern int tc_classify_compat(struct sk_buff *skb, struct tcf_proto *tp,
diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h	2007-08-20 14:26:36.000000000 +0530
+++ new/include/linux/netdevice.h	2007-08-22 08:42:10.000000000 +0530
@@ -892,6 +896,8 @@ extern int		dev_set_mac_address(struct n
 					    struct sockaddr *);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev);
+extern int		dev_add_skb_to_blist(struct sk_buff *skb,
+					     struct net_device *dev);
 
 extern void		dev_init(void);
 
diff -ruNp org/net/sched/sch_generic.c new/net/sched/sch_generic.c
--- org/net/sched/sch_generic.c	2007-08-20 14:26:37.000000000 +0530
+++ new/net/sched/sch_generic.c	2007-08-22 08:49:55.000000000 +0530
@@ -59,10 +59,12 @@ static inline int qdisc_qlen(struct Qdis
 static inline int dev_requeue_skb(struct sk_buff *skb, struct net_device *dev,
 				  struct Qdisc *q)
 {
-	if (unlikely(skb->next))
-		dev->gso_skb = skb;
-	else
-		q->ops->requeue(skb, q);
+	if (likely(skb)) {
+		if (unlikely(skb->next))
+			dev->gso_skb = skb;
+		else
+			q->ops->requeue(skb, q);
+	}
 
 	netif_schedule(dev);
 	return 0;
@@ -91,10 +93,15 @@ static inline int handle_dev_cpu_collisi
 		/*
 		 * Same CPU holding the lock. It may be a transient
 		 * configuration error, when hard_start_xmit() recurses. We
-		 * detect it by checking xmit owner and drop the packet when
-		 * deadloop is detected. Return OK to try the next skb.
+		 * detect it by checking xmit owner and drop the packet (or
+		 * all packets in batching case) when deadloop is detected.
+		 * Return OK to try the next skb.
 		 */
-		kfree_skb(skb);
+		if (likely(skb))
+			kfree_skb(skb);
+		else if (!skb_queue_empty(dev->skb_blist))
+			skb_queue_purge(dev->skb_blist);
+
 		if (net_ratelimit())
 			printk(KERN_WARNING "Dead loop on netdevice %s, "
 			       "fix it urgently!\n", dev->name);
@@ -112,6 +119,38 @@ static inline int handle_dev_cpu_collisi
 }
 
 /*
+ * Algorithm to get skb(s) is:
+ *	- Non batching drivers, or if the batch list is empty and there is
+ *	  1 skb in the queue - dequeue skb and put it in *skbp to tell the
+ *	  caller to use the single xmit API.
+ *	- Batching drivers where the batch list already contains atleast one
+ *	  skb, or if there are multiple skbs in the queue: keep dequeue'ing
+ *	  skb's upto a limit and set *skbp to NULL to tell the caller to use
+ *	  the multiple xmit API.
+ *
+ * Returns:
+ *	1 - atleast one skb is to be sent out, *skbp contains skb or NULL
+ *	    (in case >1 skbs present in blist for batching)
+ *	0 - no skbs to be sent.
+ */
+static inline int get_skb(struct net_device *dev, struct Qdisc *q,
+			  struct sk_buff_head *blist, struct sk_buff **skbp)
+{
+	if (likely(!blist || (!skb_queue_len(blist) && qdisc_qlen(q) <= 1))) {
+		return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
+	} else {
+		struct sk_buff *skb;
+		int max = dev->tx_queue_len - skb_queue_len(blist);
+
+		while (max > 0 && (skb = dev_dequeue_skb(dev, q)) != NULL)
+			max -= dev_add_skb_to_blist(skb, dev);
+
+		*skbp = NULL;
+		return 1;	/* there is atleast one skb in skb_blist */
+	}
+}
+
+/*
  * NOTE: Called under dev->queue_lock with locally disabled BH.
  *
  * __LINK_STATE_QDISC_RUNNING guarantees only one CPU can process this
@@ -130,7 +169,8 @@ static inline int handle_dev_cpu_collisi
  *				>0 - queue is not empty.
  *
  */
-static inline int qdisc_restart(struct net_device *dev)
+static inline int qdisc_restart(struct net_device *dev,
+				struct sk_buff_head *blist)
 {
 	struct Qdisc *q = dev->qdisc;
 	struct sk_buff *skb;
@@ -138,7 +178,7 @@ static inline int qdisc_restart(struct n
 	int ret;
 
 	/* Dequeue packet */
-	if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL))
+	if (unlikely(!get_skb(dev, q, blist, &skb)))
 		return 0;
 
 	/*
@@ -168,7 +208,7 @@ static inline int qdisc_restart(struct n
 
 	switch (ret) {
 	case NETDEV_TX_OK:
-		/* Driver sent out skb successfully */
+		/* Driver sent out skb (or entire skb_blist) successfully */
 		ret = qdisc_qlen(q);
 		break;
 
@@ -190,10 +230,10 @@ static inline int qdisc_restart(struct n
 	return ret;
 }
 
-void __qdisc_run(struct net_device *dev)
+void __qdisc_run(struct net_device *dev, struct sk_buff_head *blist)
 {
 	do {
-		if (!qdisc_restart(dev))
+		if (!qdisc_restart(dev, blist))
 			break;
 	} while (!netif_queue_stopped(dev));
 
@@ -563,6 +603,12 @@ void dev_deactivate(struct net_device *d
 	qdisc = dev->qdisc;
 	dev->qdisc = &noop_qdisc;
 
+	if (dev->skb_blist) {
+		/* Release skbs on batch list */
+		if (!skb_queue_empty(dev->skb_blist))
+			skb_queue_purge(dev->skb_blist);
+	}
+
 	qdisc_reset(qdisc);
 
 	skb = dev->gso_skb;
diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c	2007-08-20 14:26:37.000000000 +0530
+++ new/net/core/dev.c	2007-08-22 10:49:22.000000000 +0530
@@ -1466,6 +1466,45 @@ static int dev_gso_segment(struct sk_buf
 	return 0;
 }
 
+/*
+ * Add skb (skbs in case segmentation is required) to dev->skb_blist. No one
+ * can add to this list simultaneously since we are holding QDISC RUNNING
+ * bit. Also list is safe from simultaneous deletes too since skbs are
+ * dequeued only when the driver is invoked.
+ *
+ * Returns count of successful skb(s) added to skb_blist.
+ */
+int dev_add_skb_to_blist(struct sk_buff *skb, struct net_device *dev)
+{
+	if (!list_empty(&ptype_all))
+		dev_queue_xmit_nit(skb, dev);
+
+	if (netif_needs_gso(dev, skb)) {
+		if (unlikely(dev_gso_segment(skb))) {
+			kfree(skb);
+			return 0;
+		}
+
+		if (skb->next) {
+			int count = 0;
+
+			do {
+				struct sk_buff *nskb = skb->next;
+
+				skb->next = nskb->next;
+				__skb_queue_tail(dev->skb_blist, nskb);
+				count++;
+			} while (skb->next);
+
+			skb->destructor = DEV_GSO_CB(skb)->destructor;
+			kfree_skb(skb);
+			return count;
+		}
+	}
+	__skb_queue_tail(dev->skb_blist, skb);
+	return 1;
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	if (likely(skb)) {
@@ -1620,7 +1659,7 @@ gso:
 			/* reset queue_mapping to zero */
 			skb->queue_mapping = 0;
 			rc = q->enqueue(skb, q);
-			qdisc_run(dev);
+			qdisc_run(dev, NULL);
 			spin_unlock(&dev->queue_lock);
 
 			rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc;
@@ -1818,7 +1857,8 @@ static void net_tx_action(struct softirq
 			clear_bit(__LINK_STATE_SCHED, &dev->state);
 
 			if (spin_trylock(&dev->queue_lock)) {
-				qdisc_run(dev);
+				/* Send all skbs if driver supports batching */
+				qdisc_run(dev, dev->skb_blist);
 				spin_unlock(&dev->queue_lock);
 			} else {
 				netif_schedule(dev);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/10 Rev4] [ethtool] Add ethtool support
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
                   ` (2 preceding siblings ...)
  2007-08-22  8:29 ` [PATCH 3/10 Rev4] [sched] Modify qdisc_run to support batching Krishna Kumar
@ 2007-08-22  8:29 ` Krishna Kumar
  2007-08-22  8:30 ` [PATCH 5/10 Rev4] [IPoIB] Header file changes Krishna Kumar
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:29 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, kumarkr,
	rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, general, mchan,
	tgraf, netdev, Krishna Kumar, sri

Add ethtool support to enable/disable batching.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 include/linux/ethtool.h   |    2 ++
 include/linux/netdevice.h |    2 ++
 net/core/dev.c            |   36 ++++++++++++++++++++++++++++++++++++
 net/core/ethtool.c        |   27 +++++++++++++++++++++++++++
 4 files changed, 67 insertions(+)

diff -ruNp org/include/linux/ethtool.h new/include/linux/ethtool.h
--- org/include/linux/ethtool.h	2007-08-20 14:26:35.000000000 +0530
+++ new/include/linux/ethtool.h	2007-08-22 08:37:35.000000000 +0530
@@ -440,6 +440,8 @@ struct ethtool_ops {
 #define ETHTOOL_SFLAGS		0x00000026 /* Set flags bitmap(ethtool_value) */
 #define ETHTOOL_GPFLAGS		0x00000027 /* Get driver-private flags bitmap */
 #define ETHTOOL_SPFLAGS		0x00000028 /* Set driver-private flags bitmap */
+#define ETHTOOL_GBATCH		0x00000029 /* Get Batching (ethtool_value) */
+#define ETHTOOL_SBATCH		0x00000030 /* Set Batching (ethtool_value) */
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h	2007-08-20 14:26:36.000000000 +0530
+++ new/include/linux/netdevice.h	2007-08-22 08:42:10.000000000 +0530
@@ -1152,6 +1152,8 @@ extern void		dev_set_promiscuity(struct 
 extern void		dev_set_allmulti(struct net_device *dev, int inc);
 extern void		netdev_state_change(struct net_device *dev);
 extern void		netdev_features_change(struct net_device *dev);
+extern int		dev_change_tx_batch_skb(struct net_device *dev,
+						unsigned long new_batch_skb);
 /* Load a device via the kmod */
 extern void		dev_load(const char *name);
 extern void		dev_mcast_init(void);
diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c  2007-08-20 14:26:37.000000000 +0530
+++ new/net/core/dev.c  2007-08-22 10:49:22.000000000 +0530
@@ -908,6 +908,42 @@ static void free_batching(struct net_dev
 	}
 }
 
+int dev_change_tx_batch_skb(struct net_device *dev, unsigned long new_batch_skb)
+{
+	int ret = 0;
+	struct sk_buff_head *blist;
+
+	if (!(dev->features & NETIF_F_BATCH_SKBS)) {
+		/* Driver doesn't support batching skb API */
+		ret = -ENOTSUPP;
+		goto out;
+	}
+
+	/*
+	 * Check if new value is same as the current (paranoia to use !! for
+	 * new_batch_skb as that should always be boolean).
+	 */
+	if (!!dev->skb_blist == !!new_batch_skb)
+		goto out;
+
+	if (new_batch_skb &&
+	    (blist = kmalloc(sizeof *blist, GFP_KERNEL)) == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	spin_lock_bh(&dev->queue_lock);
+	if (new_batch_skb) {
+		skb_queue_head_init(blist);
+		dev->skb_blist = blist;
+	} else
+		free_batching(dev);
+	spin_unlock_bh(&dev->queue_lock);
+
+out:
+	return ret;
+}
+
 /**
  *	dev_load 	- load a network module
  *	@name: name of interface
diff -ruNp org/net/core/ethtool.c new/net/core/ethtool.c
--- org/net/core/ethtool.c	2007-08-20 14:26:37.000000000 +0530
+++ new/net/core/ethtool.c	2007-08-22 08:36:07.000000000 +0530
@@ -556,6 +556,26 @@ static int ethtool_set_gso(struct net_de
 	return 0;
 }
 
+static int ethtool_get_batch(struct net_device *dev, char __user *useraddr)
+{
+	struct ethtool_value edata = { ETHTOOL_GBATCH };
+
+	edata.data = dev->skb_blist != NULL;
+	if (copy_to_user(useraddr, &edata, sizeof(edata)))
+		 return -EFAULT;
+	return 0;
+}
+
+static int ethtool_set_batch(struct net_device *dev, char __user *useraddr)
+{
+	struct ethtool_value edata;
+
+	if (copy_from_user(&edata, useraddr, sizeof(edata)))
+		return -EFAULT;
+
+	return dev_change_tx_batch_skb(dev, edata.data);
+}
+
 static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
 {
 	struct ethtool_test test;
@@ -813,6 +833,7 @@ int dev_ethtool(struct ifreq *ifr)
 	case ETHTOOL_GGSO:
 	case ETHTOOL_GFLAGS:
 	case ETHTOOL_GPFLAGS:
+	case ETHTOOL_GBATCH:
 		break;
 	default:
 		if (!capable(CAP_NET_ADMIN))
@@ -956,6 +977,12 @@ int dev_ethtool(struct ifreq *ifr)
 		rc = ethtool_set_value(dev, useraddr,
 				       dev->ethtool_ops->set_priv_flags);
 		break;
+	case ETHTOOL_GBATCH:
+		rc = ethtool_get_batch(dev, useraddr);
+		break;
+	case ETHTOOL_SBATCH:
+		rc = ethtool_set_batch(dev, useraddr);
+		break;
 	default:
 		rc = -EOPNOTSUPP;
 	}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/10 Rev4] [IPoIB] Header file changes
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
                   ` (3 preceding siblings ...)
  2007-08-22  8:29 ` [PATCH 4/10 Rev4] [ethtool] Add ethtool support Krishna Kumar
@ 2007-08-22  8:30 ` Krishna Kumar
  2007-08-22  8:30 ` [PATCH 6/10 Rev4] [IPoIB] CM & Multicast changes Krishna Kumar
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:30 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, peter.p.waskiewicz.jr, xma, gaagaan,
	kumarkr, rdreier, rick.jones2, mcarlson, jeff, general, mchan,
	tgraf, netdev, Krishna Kumar, sri

IPoIB header file changes to use batching.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 ipoib.h |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib.h new/drivers/infiniband/ulp/ipoib/ipoib.h
--- org/drivers/infiniband/ulp/ipoib/ipoib.h	2007-08-20 14:26:26.000000000 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib.h	2007-08-22 08:33:51.000000000 +0530
@@ -271,8 +271,8 @@ struct ipoib_dev_priv {
 	struct ipoib_tx_buf *tx_ring;
 	unsigned             tx_head;
 	unsigned             tx_tail;
-	struct ib_sge        tx_sge;
-	struct ib_send_wr    tx_wr;
+	struct ib_sge        *tx_sge;
+	struct ib_send_wr    *tx_wr;
 
 	struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -367,8 +367,11 @@ static inline void ipoib_put_ah(struct i
 int ipoib_open(struct net_device *dev);
 int ipoib_add_pkey_attr(struct net_device *dev);
 
+int ipoib_process_skb(struct net_device *dev, struct sk_buff *skb,
+		      struct ipoib_dev_priv *priv, struct ipoib_ah *address,
+		      u32 qpn, int wr_num);
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 qpn);
+		struct ipoib_ah *address, u32 qpn, int num_skbs);
 void ipoib_reap_ah(struct work_struct *work);
 
 void ipoib_flush_paths(struct net_device *dev);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/10 Rev4] [IPoIB] CM & Multicast changes
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
                   ` (4 preceding siblings ...)
  2007-08-22  8:30 ` [PATCH 5/10 Rev4] [IPoIB] Header file changes Krishna Kumar
@ 2007-08-22  8:30 ` Krishna Kumar
  2007-08-22  8:30 ` [PATCH 7/10 Rev4] [IPoIB] Verbs changes Krishna Kumar
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:30 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, kumarkr,
	rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, general, mchan,
	tgraf, netdev, Krishna Kumar, sri

IPoIB CM & Multicast changes based on header file changes.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 ipoib_cm.c        |   13 +++++++++----
 ipoib_multicast.c |    4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_cm.c new/drivers/infiniband/ulp/ipoib/ipoib_cm.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_cm.c	2007-08-20 14:26:26.000000000 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_cm.c	2007-08-22 08:33:51.000000000 +0530
@@ -493,14 +493,19 @@ static inline int post_send(struct ipoib
 			    unsigned int wr_id,
 			    u64 addr, int len)
 {
+	int ret;
 	struct ib_send_wr *bad_wr;
 
-	priv->tx_sge.addr             = addr;
-	priv->tx_sge.length           = len;
+	priv->tx_sge[0].addr          = addr;
+	priv->tx_sge[0].length        = len;
+
+	priv->tx_wr[0].wr_id 	      = wr_id;
 
-	priv->tx_wr.wr_id 	      = wr_id;
+	priv->tx_wr[0].next = NULL;
+	ret = ib_post_send(tx->qp, priv->tx_wr, &bad_wr);
+	priv->tx_wr[0].next = &priv->tx_wr[1];
 
-	return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr);
+	return ret;
 }
 
 void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx)
diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_multicast.c new/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2007-08-20 14:26:26.000000000 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2007-08-22 08:33:51.000000000 +0530
@@ -217,7 +217,7 @@ static int ipoib_mcast_join_finish(struc
 	if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
 		    sizeof (union ib_gid))) {
 		priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
-		priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
+		priv->tx_wr[0].wr.ud.remote_qkey = priv->qkey;
 	}
 
 	if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
@@ -736,7 +736,7 @@ out:
 			}
 		}
 
-		ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
+		ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN, 1);
 	}
 
 unlock:

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 7/10 Rev4] [IPoIB] Verbs changes
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
                   ` (5 preceding siblings ...)
  2007-08-22  8:30 ` [PATCH 6/10 Rev4] [IPoIB] CM & Multicast changes Krishna Kumar
@ 2007-08-22  8:30 ` Krishna Kumar
  2007-08-22  8:31 ` [PATCH 8/10 Rev4] [IPoIB] Post and work completion handler changes Krishna Kumar
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:30 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, peter.p.waskiewicz.jr, xma, gaagaan,
	kumarkr, rdreier, rick.jones2, mcarlson, jeff, general, mchan,
	tgraf, netdev, Krishna Kumar, sri

IPoIB verb changes to use batching.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 ipoib_verbs.c |   23 ++++++++++++++---------
 1 files changed, 14 insertions(+), 9 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_verbs.c new/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_verbs.c	2007-08-20 14:26:26.000000000 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_verbs.c	2007-08-22 08:33:51.000000000 +0530
@@ -152,11 +152,11 @@ int ipoib_transport_dev_init(struct net_
 			.max_send_sge = 1,
 			.max_recv_sge = 1
 		},
-		.sq_sig_type = IB_SIGNAL_ALL_WR,
+		.sq_sig_type = IB_SIGNAL_REQ_WR,	/* 11.2.4.1 */
 		.qp_type     = IB_QPT_UD
 	};
-
-	int ret, size;
+	struct ib_send_wr *next_wr = NULL;
+	int i, ret, size;
 
 	priv->pd = ib_alloc_pd(priv->ca);
 	if (IS_ERR(priv->pd)) {
@@ -197,12 +197,17 @@ int ipoib_transport_dev_init(struct net_
 	priv->dev->dev_addr[2] = (priv->qp->qp_num >>  8) & 0xff;
 	priv->dev->dev_addr[3] = (priv->qp->qp_num      ) & 0xff;
 
-	priv->tx_sge.lkey 	= priv->mr->lkey;
-
-	priv->tx_wr.opcode 	= IB_WR_SEND;
-	priv->tx_wr.sg_list 	= &priv->tx_sge;
-	priv->tx_wr.num_sge 	= 1;
-	priv->tx_wr.send_flags 	= IB_SEND_SIGNALED;
+	for (i = ipoib_sendq_size - 1; i >= 0; i--) {
+		priv->tx_sge[i].lkey		= priv->mr->lkey;
+		priv->tx_wr[i].opcode		= IB_WR_SEND;
+		priv->tx_wr[i].sg_list		= &priv->tx_sge[i];
+		priv->tx_wr[i].num_sge		= 1;
+		priv->tx_wr[i].send_flags	= 0;
+
+		/* Link the list properly for provider to use */
+		priv->tx_wr[i].next		= next_wr;
+		next_wr				= &priv->tx_wr[i];
+	}
 
 	return 0;
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 8/10 Rev4] [IPoIB] Post and work completion handler changes
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
                   ` (6 preceding siblings ...)
  2007-08-22  8:30 ` [PATCH 7/10 Rev4] [IPoIB] Verbs changes Krishna Kumar
@ 2007-08-22  8:31 ` Krishna Kumar
  2007-08-22  8:31 ` [PATCH 9/10 Rev4] [IPoIB] Implement batching Krishna Kumar
  2007-08-22  8:31 ` [PATCH 10/10 Rev4] [E1000] " Krishna Kumar
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:31 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, kumarkr,
	rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, general, mchan,
	tgraf, netdev, Krishna Kumar, sri

IPoIB internal post and work completion handler changes.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 ipoib_ib.c |  207 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 163 insertions(+), 44 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_ib.c new/drivers/infiniband/ulp/ipoib/ipoib_ib.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2007-08-20 14:26:26.000000000 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2007-08-22 08:33:51.000000000 +0530
@@ -242,6 +242,8 @@ repost:
 static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int i, num_completions;
+	unsigned int tx_ring_index;
 	unsigned int wr_id = wc->wr_id;
 	struct ipoib_tx_buf *tx_req;
 	unsigned long flags;
@@ -255,18 +257,46 @@ static void ipoib_ib_handle_tx_wc(struct
 		return;
 	}
 
-	tx_req = &priv->tx_ring[wr_id];
+	/* Get first WC to process (no one can update tx_tail at this time) */
+	tx_ring_index = priv->tx_tail & (ipoib_sendq_size - 1);
 
-	ib_dma_unmap_single(priv->ca, tx_req->mapping,
-			    tx_req->skb->len, DMA_TO_DEVICE);
+	/* Find number of WC's */
+	num_completions = wr_id - tx_ring_index + 1;
+	if (unlikely(num_completions <= 0))
+		num_completions += ipoib_sendq_size;
 
-	++priv->stats.tx_packets;
-	priv->stats.tx_bytes += tx_req->skb->len;
+	/*
+	 * Handle WC's from earlier (possibly multiple) post_sends in this
+	 * iteration as we move from tx_tail to wr_id, since if the last WR
+	 * (which is the one which requested completion notification) failed
+	 * to be sent for any of those earlier request(s), no completion
+	 * notification is generated for successful WR's of those earlier
+	 * request(s).
+	 */
+	tx_req = &priv->tx_ring[tx_ring_index];
+	for (i = 0; i < num_completions; i++) {
+		if (likely(tx_req->skb)) {
+			ib_dma_unmap_single(priv->ca, tx_req->mapping,
+					    tx_req->skb->len, DMA_TO_DEVICE);
+
+			++priv->stats.tx_packets;
+			priv->stats.tx_bytes += tx_req->skb->len;
 
-	dev_kfree_skb_any(tx_req->skb);
+			dev_kfree_skb_any(tx_req->skb);
+		}
+		/*
+		 * else this skb failed synchronously when posted and was
+		 * freed immediately.
+		 */
+
+		if (likely(++tx_ring_index != ipoib_sendq_size))
+			tx_req++;
+		else
+			tx_req = &priv->tx_ring[0];
+	}
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
-	++priv->tx_tail;
+	priv->tx_tail += num_completions;
 	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) &&
 	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) {
 		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
@@ -335,29 +365,57 @@ void ipoib_ib_completion(struct ib_cq *c
 	netif_rx_schedule(dev, &priv->napi);
 }
 
-static inline int post_send(struct ipoib_dev_priv *priv,
-			    unsigned int wr_id,
-			    struct ib_ah *address, u32 qpn,
-			    u64 addr, int len)
+/*
+ * post_send : Post WR(s) to the device.
+ *
+ * num_skbs is the number of WR's, first_wr is the first slot in tx_wr[] (or
+ * tx_sge[]). first_wr is normally zero unless a previous post_send returned
+ * error and we are trying to post the untried WR's, in which case first_wr
+ * is the index to the first untried WR.
+ *
+ * Break the WR link before posting so that provider knows how many WR's to
+ * process, and this is set back after the post.
+ */
+static inline int post_send(struct ipoib_dev_priv *priv, u32 qpn,
+			    int first_wr, int num_skbs,
+			    struct ib_send_wr **bad_wr)
 {
-	struct ib_send_wr *bad_wr;
+	int ret;
+	struct ib_send_wr *last_wr, *next_wr;
+
+	last_wr = &priv->tx_wr[first_wr + num_skbs - 1];
 
-	priv->tx_sge.addr             = addr;
-	priv->tx_sge.length           = len;
+	/* Set Completion Notification for last WR */
+	last_wr->send_flags = IB_SEND_SIGNALED;
 
-	priv->tx_wr.wr_id 	      = wr_id;
-	priv->tx_wr.wr.ud.remote_qpn  = qpn;
-	priv->tx_wr.wr.ud.ah 	      = address;
+	/* Terminate the last WR */
+	next_wr = last_wr->next;
+	last_wr->next = NULL;
 
-	return ib_post_send(priv->qp, &priv->tx_wr, &bad_wr);
+	/* Send all the WR's in one doorbell */
+	ret = ib_post_send(priv->qp, &priv->tx_wr[first_wr], bad_wr);
+
+	/* Restore send_flags & WR chain */
+	last_wr->send_flags = 0;
+	last_wr->next = next_wr;
+
+	return ret;
 }
 
-void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 qpn)
+/*
+ * Map skb & store skb/mapping in tx_ring; and details of the WR in tx_wr
+ * to pass to the provider.
+ *
+ * Returns:
+ *	1: Error and the skb is freed.
+ *	0 skb processed successfully.
+ */
+int ipoib_process_skb(struct net_device *dev, struct sk_buff *skb,
+		      struct ipoib_dev_priv *priv, struct ipoib_ah *address,
+		      u32 qpn, int wr_num)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ipoib_tx_buf *tx_req;
 	u64 addr;
+	unsigned int tx_ring_index;
 
 	if (unlikely(skb->len > priv->mcast_mtu + IPOIB_ENCAP_LEN)) {
 		ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n",
@@ -365,7 +423,7 @@ void ipoib_send(struct net_device *dev, 
 		++priv->stats.tx_dropped;
 		++priv->stats.tx_errors;
 		ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu);
-		return;
+		return 1;
 	}
 
 	ipoib_dbg_data(priv, "sending packet, length=%d address=%p qpn=0x%06x\n",
@@ -378,35 +436,96 @@ void ipoib_send(struct net_device *dev, 
 	 * means we have to make sure everything is properly recorded and
 	 * our state is consistent before we call post_send().
 	 */
-	tx_req = &priv->tx_ring[priv->tx_head & (ipoib_sendq_size - 1)];
-	tx_req->skb = skb;
-	addr = ib_dma_map_single(priv->ca, skb->data, skb->len,
-				 DMA_TO_DEVICE);
+	addr = ib_dma_map_single(priv->ca, skb->data, skb->len, DMA_TO_DEVICE);
 	if (unlikely(ib_dma_mapping_error(priv->ca, addr))) {
 		++priv->stats.tx_errors;
 		dev_kfree_skb_any(skb);
-		return;
+		return 1;
 	}
-	tx_req->mapping = addr;
 
-	if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1),
-			       address->ah, qpn, addr, skb->len))) {
-		ipoib_warn(priv, "post_send failed\n");
-		++priv->stats.tx_errors;
-		ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE);
-		dev_kfree_skb_any(skb);
-	} else {
-		dev->trans_start = jiffies;
+	tx_ring_index = priv->tx_head & (ipoib_sendq_size - 1);
 
-		address->last_send = priv->tx_head;
-		++priv->tx_head;
+	/* Save till completion handler executes */
+	priv->tx_ring[tx_ring_index].skb = skb;
+	priv->tx_ring[tx_ring_index].mapping = addr;
+
+	/* Set WR values for the provider to use */
+	priv->tx_sge[wr_num].addr = addr;
+	priv->tx_sge[wr_num].length = skb->len;
+
+	priv->tx_wr[wr_num].wr_id = tx_ring_index;
+	priv->tx_wr[wr_num].wr.ud.remote_qpn = qpn;
+	priv->tx_wr[wr_num].wr.ud.ah = address->ah;
+
+	priv->tx_head++;
+
+	if (unlikely(priv->tx_head - priv->tx_tail == ipoib_sendq_size)) {
+		ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
+		netif_stop_queue(dev);
+		set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
+	}
 
-		if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) {
-			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
-			netif_stop_queue(dev);
-			set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
+	return 0;
+}
+
+/*
+ * Send num_skbs to the device. If an skb is passed to this function, it is
+ * single, unprocessed skb send case; otherwise it means that all skbs are
+ * already processed and put on priv->tx_wr,tx_sge,tx_ring, etc.
+ */
+void ipoib_send(struct net_device *dev, struct sk_buff *skb,
+		struct ipoib_ah *address, u32 qpn, int num_skbs)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int first_wr = 0;
+
+	if (skb && ipoib_process_skb(dev, skb, priv, address, qpn, 0))
+		return;
+
+	/* Send all skb's in one post */
+	do {
+		struct ib_send_wr *bad_wr;
+
+		if (unlikely((post_send(priv, qpn, first_wr, num_skbs,
+					&bad_wr)))) {
+			int done;
+
+			ipoib_warn(priv, "post_send failed\n");
+
+			/* Get number of WR's that finished successfully */
+			done = bad_wr - &priv->tx_wr[first_wr];
+
+			/* Handle 1 error */
+			priv->stats.tx_errors++;
+			ib_dma_unmap_single(priv->ca,
+				priv->tx_sge[first_wr + done].addr,
+				priv->tx_sge[first_wr + done].length,
+				DMA_TO_DEVICE);
+
+			/* Free failed WR & reset for WC handler to recognize */
+			dev_kfree_skb_any(priv->tx_ring[bad_wr->wr_id].skb);
+			priv->tx_ring[bad_wr->wr_id].skb = NULL;
+
+			/* Handle 'n' successes */
+			if (done) {
+				dev->trans_start = jiffies;
+				address->last_send = priv->tx_head - (num_skbs -
+								      done) - 1;
+			}
+
+			/* Get count of skbs that were not tried */
+			num_skbs -= (done + 1);
+				/* + 1 for WR that was tried & failed */
+
+			/* Get start index for next iteration */
+			first_wr += (done + 1);
+		} else {
+			dev->trans_start = jiffies;
+
+			address->last_send = priv->tx_head - 1;
+			num_skbs = 0;
 		}
-	}
+	} while (num_skbs);
 }
 
 static void __ipoib_reap_ah(struct net_device *dev)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 9/10 Rev4] [IPoIB] Implement batching
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
                   ` (7 preceding siblings ...)
  2007-08-22  8:31 ` [PATCH 8/10 Rev4] [IPoIB] Post and work completion handler changes Krishna Kumar
@ 2007-08-22  8:31 ` Krishna Kumar
  2007-08-22  8:31 ` [PATCH 10/10 Rev4] [E1000] " Krishna Kumar
  9 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:31 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, peter.p.waskiewicz.jr, xma, gaagaan,
	kumarkr, rdreier, rick.jones2, mcarlson, jeff, general, mchan,
	tgraf, netdev, Krishna Kumar, sri

IPoIB: implement the new batching API.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 ipoib_main.c |  251 ++++++++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 171 insertions(+), 80 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_main.c new/drivers/infiniband/ulp/ipoib/ipoib_main.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_main.c	2007-08-20 14:26:26.000000000 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_main.c	2007-08-22 08:33:51.000000000 +0530
@@ -560,7 +560,8 @@ static void neigh_add_path(struct sk_buf
 				goto err_drop;
 			}
 		} else
-			ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha));
+			ipoib_send(dev, skb, path->ah,
+				   IPOIB_QPN(skb->dst->neighbour->ha), 1);
 	} else {
 		neigh->ah  = NULL;
 
@@ -640,7 +641,7 @@ static void unicast_arp_send(struct sk_b
 		ipoib_dbg(priv, "Send unicast ARP to %04x\n",
 			  be16_to_cpu(path->pathrec.dlid));
 
-		ipoib_send(dev, skb, path->ah, IPOIB_QPN(phdr->hwaddr));
+		ipoib_send(dev, skb, path->ah, IPOIB_QPN(phdr->hwaddr), 1);
 	} else if ((path->query || !path_rec_start(dev, path)) &&
 		   skb_queue_len(&path->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
 		/* put pseudoheader back on for next time */
@@ -654,105 +655,166 @@ static void unicast_arp_send(struct sk_b
 	spin_unlock(&priv->lock);
 }
 
+#define	XMIT_PROCESSED_SKBS()						\
+	do {								\
+		if (wr_num) {						\
+			ipoib_send(dev, NULL, old_neigh->ah, old_qpn,	\
+				   wr_num);				\
+			wr_num = 0;					\
+		}							\
+	} while (0)
+
 static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ipoib_neigh *neigh;
+	struct sk_buff_head *blist;
+	int max_skbs, wr_num = 0;
+	u32 qpn, old_qpn = 0;
+	struct ipoib_neigh *neigh, *old_neigh = NULL;
 	unsigned long flags;
 
 	if (unlikely(!spin_trylock_irqsave(&priv->tx_lock, flags)))
 		return NETDEV_TX_LOCKED;
 
-	/*
-	 * Check if our queue is stopped.  Since we have the LLTX bit
-	 * set, we can't rely on netif_stop_queue() preventing our
-	 * xmit function from being called with a full queue.
-	 */
-	if (unlikely(netif_queue_stopped(dev))) {
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
-		return NETDEV_TX_BUSY;
-	}
-
-	if (likely(skb->dst && skb->dst->neighbour)) {
-		if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) {
-			ipoib_path_lookup(skb, dev);
-			goto out;
-		}
+	blist = dev->skb_blist;
 
-		neigh = *to_ipoib_neigh(skb->dst->neighbour);
+	if (!skb || (blist && skb_queue_len(blist))) {
+		/*
+		 * Either batching xmit call, or single skb case but there are
+		 * skbs already in the batch list from previous failure to
+		 * xmit - send the earlier skbs first to avoid out of order.
+		 */
+
+		if (skb)
+			__skb_queue_tail(blist, skb);
+
+		/*
+		 * Figure out how many skbs can be sent. This prevents the
+		 * device getting full and avoids checking for stopped queue
+		 * after each iteration. Now the queue can get stopped atmost
+		 * after xmit of the last skb.
+		 */
+		max_skbs = ipoib_sendq_size - (priv->tx_head - priv->tx_tail);
+		skb = __skb_dequeue(blist);
+	} else {
+		blist = NULL;
+		max_skbs = 1;
+	}
 
-		if (ipoib_cm_get(neigh)) {
-			if (ipoib_cm_up(neigh)) {
-				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
-				goto out;
-			}
-		} else if (neigh->ah) {
-			if (unlikely(memcmp(&neigh->dgid.raw,
-					    skb->dst->neighbour->ha + 4,
-					    sizeof(union ib_gid)))) {
-				spin_lock(&priv->lock);
-				/*
-				 * It's safe to call ipoib_put_ah() inside
-				 * priv->lock here, because we know that
-				 * path->ah will always hold one more reference,
-				 * so ipoib_put_ah() will never do more than
-				 * decrement the ref count.
-				 */
-				ipoib_put_ah(neigh->ah);
-				list_del(&neigh->list);
-				ipoib_neigh_free(dev, neigh);
-				spin_unlock(&priv->lock);
+	do {
+		if (likely(skb->dst && skb->dst->neighbour)) {
+			if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) {
+				XMIT_PROCESSED_SKBS();
 				ipoib_path_lookup(skb, dev);
-				goto out;
+				continue;
 			}
 
-			ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(skb->dst->neighbour->ha));
-			goto out;
-		}
-
-		if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
-			spin_lock(&priv->lock);
-			__skb_queue_tail(&neigh->queue, skb);
-			spin_unlock(&priv->lock);
-		} else {
-			++priv->stats.tx_dropped;
-			dev_kfree_skb_any(skb);
-		}
-	} else {
-		struct ipoib_pseudoheader *phdr =
-			(struct ipoib_pseudoheader *) skb->data;
-		skb_pull(skb, sizeof *phdr);
+			neigh = *to_ipoib_neigh(skb->dst->neighbour);
 
-		if (phdr->hwaddr[4] == 0xff) {
-			/* Add in the P_Key for multicast*/
-			phdr->hwaddr[8] = (priv->pkey >> 8) & 0xff;
-			phdr->hwaddr[9] = priv->pkey & 0xff;
+			if (ipoib_cm_get(neigh)) {
+				if (ipoib_cm_up(neigh)) {
+					XMIT_PROCESSED_SKBS();
+					ipoib_cm_send(dev, skb,
+						      ipoib_cm_get(neigh));
+					continue;
+				}
+			} else if (neigh->ah) {
+				if (unlikely(memcmp(&neigh->dgid.raw,
+						    skb->dst->neighbour->ha + 4,
+						    sizeof(union ib_gid)))) {
+					spin_lock(&priv->lock);
+					/*
+					 * It's safe to call ipoib_put_ah()
+					 * inside priv->lock here, because we
+					 * know that path->ah will always hold
+					 * one more reference, so ipoib_put_ah()
+					 * will never do more than decrement
+					 * the ref count.
+					 */
+					ipoib_put_ah(neigh->ah);
+					list_del(&neigh->list);
+					ipoib_neigh_free(dev, neigh);
+					spin_unlock(&priv->lock);
+					XMIT_PROCESSED_SKBS();
+					ipoib_path_lookup(skb, dev);
+					continue;
+				}
+
+				qpn = IPOIB_QPN(skb->dst->neighbour->ha);
+				if (neigh != old_neigh || qpn != old_qpn) {
+					/*
+					 * Sending to a different destination
+					 * from earlier skb's (or this is the
+					 * first skb) - send all existing skbs.
+					 */
+					XMIT_PROCESSED_SKBS();
+					old_neigh = neigh;
+					old_qpn = qpn;
+				}
+
+				if (likely(!ipoib_process_skb(dev, skb, priv,
+							      neigh->ah, qpn,
+							      wr_num)))
+					wr_num++;
 
-			ipoib_mcast_send(dev, phdr->hwaddr + 4, skb);
-		} else {
-			/* unicast GID -- should be ARP or RARP reply */
+				continue;
+			}
 
-			if ((be16_to_cpup((__be16 *) skb->data) != ETH_P_ARP) &&
-			    (be16_to_cpup((__be16 *) skb->data) != ETH_P_RARP)) {
-				ipoib_warn(priv, "Unicast, no %s: type %04x, QPN %06x "
-					   IPOIB_GID_FMT "\n",
-					   skb->dst ? "neigh" : "dst",
-					   be16_to_cpup((__be16 *) skb->data),
-					   IPOIB_QPN(phdr->hwaddr),
-					   IPOIB_GID_RAW_ARG(phdr->hwaddr + 4));
+			if (skb_queue_len(&neigh->queue) <
+			    IPOIB_MAX_PATH_REC_QUEUE) {
+				spin_lock(&priv->lock);
+				__skb_queue_tail(&neigh->queue, skb);
+				spin_unlock(&priv->lock);
+			} else {
 				dev_kfree_skb_any(skb);
 				++priv->stats.tx_dropped;
-				goto out;
+				++max_skbs;
+			}
+		} else {
+			struct ipoib_pseudoheader *phdr =
+				(struct ipoib_pseudoheader *) skb->data;
+			skb_pull(skb, sizeof *phdr);
+
+			if (phdr->hwaddr[4] == 0xff) {
+				/* Add in the P_Key for multicast*/
+				phdr->hwaddr[8] = (priv->pkey >> 8) & 0xff;
+				phdr->hwaddr[9] = priv->pkey & 0xff;
+
+				XMIT_PROCESSED_SKBS();
+				ipoib_mcast_send(dev, phdr->hwaddr + 4, skb);
+			} else {
+				/* unicast GID -- should be ARP or RARP reply */
+
+				if ((be16_to_cpup((__be16 *) skb->data) !=
+				    ETH_P_ARP) &&
+				    (be16_to_cpup((__be16 *) skb->data) !=
+				    ETH_P_RARP)) {
+					ipoib_warn(priv, "Unicast, no %s: type %04x, QPN %06x "
+						IPOIB_GID_FMT "\n",
+						skb->dst ? "neigh" : "dst",
+						be16_to_cpup((__be16 *)
+						skb->data),
+						IPOIB_QPN(phdr->hwaddr),
+						IPOIB_GID_RAW_ARG(phdr->hwaddr
+								  + 4));
+					dev_kfree_skb_any(skb);
+					++priv->stats.tx_dropped;
+					++max_skbs;
+					continue;
+				}
+				XMIT_PROCESSED_SKBS();
+				unicast_arp_send(skb, dev, phdr);
 			}
-
-			unicast_arp_send(skb, dev, phdr);
 		}
-	}
+	} while (--max_skbs && (skb = __skb_dequeue(blist)) != NULL);
+
+	/* Send out last packets (if any) */
+	XMIT_PROCESSED_SKBS();
 
-out:
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
-	return NETDEV_TX_OK;
+	return (!blist || !skb_queue_len(blist)) ? NETDEV_TX_OK :
+						   NETDEV_TX_BUSY;
 }
 
 static struct net_device_stats *ipoib_get_stats(struct net_device *dev)
@@ -900,11 +962,35 @@ int ipoib_dev_init(struct net_device *de
 
 	/* priv->tx_head & tx_tail are already 0 */
 
-	if (ipoib_ib_dev_init(dev, ca, port))
+	/* Allocate tx_sge */
+	priv->tx_sge = kmalloc(ipoib_sendq_size * sizeof *priv->tx_sge,
+			       GFP_KERNEL);
+	if (!priv->tx_sge) {
+		printk(KERN_WARNING "%s: failed to allocate TX sge (%d entries)\n",
+		       ca->name, ipoib_sendq_size);
 		goto out_tx_ring_cleanup;
+	}
+
+	/* Allocate tx_wr */
+	priv->tx_wr = kmalloc(ipoib_sendq_size * sizeof *priv->tx_wr,
+			      GFP_KERNEL);
+	if (!priv->tx_wr) {
+		printk(KERN_WARNING "%s: failed to allocate TX wr (%d entries)\n",
+		       ca->name, ipoib_sendq_size);
+		goto out_tx_sge_cleanup;
+	}
+
+	if (ipoib_ib_dev_init(dev, ca, port))
+		goto out_tx_wr_cleanup;
 
 	return 0;
 
+out_tx_wr_cleanup:
+	kfree(priv->tx_wr);
+
+out_tx_sge_cleanup:
+	kfree(priv->tx_sge);
+
 out_tx_ring_cleanup:
 	kfree(priv->tx_ring);
 
@@ -932,9 +1018,13 @@ void ipoib_dev_cleanup(struct net_device
 
 	kfree(priv->rx_ring);
 	kfree(priv->tx_ring);
+	kfree(priv->tx_sge);
+	kfree(priv->tx_wr);
 
 	priv->rx_ring = NULL;
 	priv->tx_ring = NULL;
+	priv->tx_sge = NULL;
+	priv->tx_wr = NULL;
 }
 
 static void ipoib_setup(struct net_device *dev)
@@ -965,7 +1055,8 @@ static void ipoib_setup(struct net_devic
 	dev->addr_len 		 = INFINIBAND_ALEN;
 	dev->type 		 = ARPHRD_INFINIBAND;
 	dev->tx_queue_len 	 = ipoib_sendq_size * 2;
-	dev->features            = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX;
+	dev->features            = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX |
+				   NETIF_F_BATCH_SKBS;
 
 	/* MTU will be reset when mcast join happens */
 	dev->mtu 		 = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 10/10 Rev4] [E1000] Implement batching
  2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
                   ` (8 preceding siblings ...)
  2007-08-22  8:31 ` [PATCH 9/10 Rev4] [IPoIB] Implement batching Krishna Kumar
@ 2007-08-22  8:31 ` Krishna Kumar
  2007-08-22 14:39   ` [ofa-general] " Kok, Auke
  9 siblings, 1 reply; 15+ messages in thread
From: Krishna Kumar @ 2007-08-22  8:31 UTC (permalink / raw)
  To: johnpol, herbert, hadi, kaber, shemminger, davem
  Cc: jagana, Robert.Olsson, rick.jones2, xma, gaagaan, kumarkr,
	rdreier, peter.p.waskiewicz.jr, mcarlson, jeff, general, mchan,
	tgraf, netdev, Krishna Kumar, sri

E1000: Implement batching capability (ported thanks to changes taken from
	Jamal). Not all changes are made in this as in IPoIB, eg, handling
	out of order skbs (see XXX in the first mail).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 e1000_main.c |  150 +++++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 121 insertions(+), 29 deletions(-)

diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c
--- org/drivers/net/e1000/e1000_main.c	2007-08-20 14:26:29.000000000 +0530
+++ new/drivers/net/e1000/e1000_main.c	2007-08-22 08:33:51.000000000 +0530
@@ -157,6 +157,7 @@ static void e1000_update_phy_info(unsign
 static void e1000_watchdog(unsigned long data);
 static void e1000_82547_tx_fifo_stall(unsigned long data);
 static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
+static int e1000_xmit_frames(struct net_device *dev);
 static struct net_device_stats * e1000_get_stats(struct net_device *netdev);
 static int e1000_change_mtu(struct net_device *netdev, int new_mtu);
 static int e1000_set_mac(struct net_device *netdev, void *p);
@@ -990,7 +991,7 @@ e1000_probe(struct pci_dev *pdev,
 	if (pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
-	netdev->features |= NETIF_F_LLTX;
+	netdev->features |= NETIF_F_LLTX | NETIF_F_BATCH_SKBS;
 
 	adapter->en_mng_pt = e1000_enable_mng_pass_thru(&adapter->hw);
 
@@ -3098,6 +3099,18 @@ e1000_tx_map(struct e1000_adapter *adapt
 	return count;
 }
 
+static void e1000_kick_DMA(struct e1000_adapter *adapter,
+			   struct e1000_tx_ring *tx_ring, int i)
+{
+	wmb();
+
+	writel(i, adapter->hw.hw_addr + tx_ring->tdt);
+	/* we need this if more than one processor can write to our tail
+	 * at a time, it syncronizes IO on IA64/Altix systems */
+	mmiowb();
+}
+
+
 static void
 e1000_tx_queue(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring,
                int tx_flags, int count)
@@ -3144,13 +3157,7 @@ e1000_tx_queue(struct e1000_adapter *ada
 	 * know there are new descriptors to fetch.  (Only
 	 * applicable for weak-ordered memory model archs,
 	 * such as IA-64). */
-	wmb();
-
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tdt);
-	/* we need this if more than one processor can write to our tail
-	 * at a time, it syncronizes IO on IA64/Altix systems */
-	mmiowb();
 }
 
 /**
@@ -3257,21 +3264,31 @@ static int e1000_maybe_stop_tx(struct ne
 }
 
 #define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 )
+
+struct e1000_tx_cbdata {
+	int count;
+	unsigned int max_per_txd;
+	unsigned int nr_frags;
+	unsigned int mss;
+};
+
+#define E1000_SKB_CB(__skb)	((struct e1000_tx_cbdata *)&((__skb)->cb[0]))
+#define NETDEV_TX_DROPPED	-5
+
 static int
-e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
+e1000_prep_queue_frame(struct sk_buff *skb, struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_tx_ring *tx_ring;
-	unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD;
+	unsigned int max_per_txd = E1000_MAX_DATA_PER_TXD;
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
-	unsigned int tx_flags = 0;
 	unsigned int len = skb->len;
-	unsigned long flags;
-	unsigned int nr_frags = 0;
-	unsigned int mss = 0;
+	unsigned int nr_frags;
+	unsigned int mss;
 	int count = 0;
-	int tso;
 	unsigned int f;
+	struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb);
+
 	len -= skb->data_len;
 
 	/* This goes back to the question of how to logically map a tx queue
@@ -3282,7 +3299,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
 
 	if (unlikely(skb->len <= 0)) {
 		dev_kfree_skb_any(skb);
-		return NETDEV_TX_OK;
+		return NETDEV_TX_DROPPED;
 	}
 
 	/* 82571 and newer doesn't need the workaround that limited descriptor
@@ -3328,7 +3345,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
 					DPRINTK(DRV, ERR,
 						"__pskb_pull_tail failed.\n");
 					dev_kfree_skb_any(skb);
-					return NETDEV_TX_OK;
+					return NETDEV_TX_DROPPED;
 				}
 				len = skb->len - skb->data_len;
 				break;
@@ -3372,22 +3389,32 @@ e1000_xmit_frame(struct sk_buff *skb, st
 	    (adapter->hw.mac_type == e1000_82573))
 		e1000_transfer_dhcp_info(adapter, skb);
 
-	if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags))
-		/* Collision - tell upper layer to requeue */
-		return NETDEV_TX_LOCKED;
+	cb->count = count;
+	cb->max_per_txd = max_per_txd;
+	cb->nr_frags = nr_frags;
+	cb->mss = mss;
+
+	return NETDEV_TX_OK;
+}
+
+static int e1000_queue_frame(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	struct e1000_tx_ring *tx_ring = adapter->tx_ring;
+	int tso;
+	unsigned int first;
+	unsigned int tx_flags = 0;
+	struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb);
 
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */
-	if (unlikely(e1000_maybe_stop_tx(netdev, tx_ring, count + 2))) {
-		spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
+	if (unlikely(e1000_maybe_stop_tx(netdev, tx_ring, cb->count + 2)))
 		return NETDEV_TX_BUSY;
-	}
 
 	if (unlikely(adapter->hw.mac_type == e1000_82547)) {
 		if (unlikely(e1000_82547_fifo_workaround(adapter, skb))) {
 			netif_stop_queue(netdev);
 			mod_timer(&adapter->tx_fifo_stall_timer, jiffies + 1);
-			spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
 			return NETDEV_TX_BUSY;
 		}
 	}
@@ -3402,8 +3429,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
 	tso = e1000_tso(adapter, tx_ring, skb);
 	if (tso < 0) {
 		dev_kfree_skb_any(skb);
-		spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
-		return NETDEV_TX_OK;
+		return NETDEV_TX_DROPPED;
 	}
 
 	if (likely(tso)) {
@@ -3420,12 +3446,78 @@ e1000_xmit_frame(struct sk_buff *skb, st
 
 	e1000_tx_queue(adapter, tx_ring, tx_flags,
 	               e1000_tx_map(adapter, tx_ring, skb, first,
-	                            max_per_txd, nr_frags, mss));
+	                            cb->max_per_txd, cb->nr_frags, cb->mss));
+
+	return NETDEV_TX_OK;
+}
 
-	netdev->trans_start = jiffies;
+static inline int e1000_xmit_frames(struct net_device *netdev)
+{
+	struct e1000_adapter *adapter = netdev->priv;
+	struct e1000_tx_ring *tx_ring = adapter->tx_ring;
+	int ret = NETDEV_TX_OK;
+	int skbs_done = 0;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) {
+		/* Collision - tell upper layer to requeue */
+		return NETDEV_TX_LOCKED;
+	}
 
-	/* Make sure there is space in the ring for the next send. */
-	e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
+	while ((skb = __skb_dequeue(netdev->skb_blist)) != NULL) {
+		if ((ret = e1000_prep_queue_frame(skb, netdev)) != NETDEV_TX_OK)
+			continue;
+
+		ret = e1000_queue_frame(skb, netdev);
+		if (ret == NETDEV_TX_OK) {
+			skbs_done++;
+		} else {
+			if (ret == NETDEV_TX_BUSY)
+				__skb_queue_head(netdev->skb_blist, skb);
+			break;
+		}
+	}
+
+	if (skbs_done) {
+		e1000_kick_DMA(adapter, tx_ring, adapter->tx_ring->next_to_use);
+		netdev->trans_start = jiffies;
+		e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
+	}
+
+	spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
+
+	if (ret == NETDEV_TX_DROPPED)
+		ret = NETDEV_TX_OK;
+
+	return ret;
+}
+
+static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	struct e1000_tx_ring *tx_ring = adapter->tx_ring;
+	unsigned long flags;
+
+	if (!skb)
+		return e1000_xmit_frames(netdev);
+
+	if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) {
+		/* Collision - tell upper layer to requeue */
+		return NETDEV_TX_LOCKED;
+	}
+
+	if (unlikely(e1000_prep_queue_frame(skb, netdev) != NETDEV_TX_OK)) {
+		spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
+		return NETDEV_TX_OK;
+	}
+
+	if (e1000_queue_frame(skb, netdev) == NETDEV_TX_OK) {
+		e1000_kick_DMA(adapter, tx_ring, adapter->tx_ring->next_to_use);
+		netdev->trans_start = jiffies;
+		/* Make sure there is space in the ring for the next send. */
+		e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
+	}
 
 	spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
 	return NETDEV_TX_OK;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ofa-general] Re: [PATCH 10/10 Rev4] [E1000] Implement batching
  2007-08-22  8:31 ` [PATCH 10/10 Rev4] [E1000] " Krishna Kumar
@ 2007-08-22 14:39   ` Kok, Auke
  2007-08-23  2:44     ` Krishna Kumar2
  0 siblings, 1 reply; 15+ messages in thread
From: Kok, Auke @ 2007-08-22 14:39 UTC (permalink / raw)
  To: Krishna Kumar
  Cc: johnpol, jagana, peter.p.waskiewicz.jr, herbert, gaagaan,
	Robert.Olsson, kumarkr, rdreier, hadi, netdev, kaber, jeff,
	general, mchan, tgraf, mcarlson, sri, shemminger, davem

Krishna Kumar wrote:
> E1000: Implement batching capability (ported thanks to changes taken from
> 	Jamal). Not all changes are made in this as in IPoIB, eg, handling
> 	out of order skbs (see XXX in the first mail).
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>  e1000_main.c |  150 +++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 files changed, 121 insertions(+), 29 deletions(-)


Krishna,

while I appreciate the patch I would have preferred a patch to e1000e. Not only 
does the e1000e driver remove a lot of the workarounds for old silicon, it is 
also a good way for us to move the current e1000 driver into a bit more stable 
maintenance mode.

Do you think you can write this patch for e1000e instead? code-wise a lot of 
things are still the same, so your patch should be relatively easy to generate.

e1000e currently lives in a branch from jeff garzik's netdev-2.6 tree

Thanks,

Auke

> 
> diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c
> --- org/drivers/net/e1000/e1000_main.c	2007-08-20 14:26:29.000000000 +0530
> +++ new/drivers/net/e1000/e1000_main.c	2007-08-22 08:33:51.000000000 +0530
> @@ -157,6 +157,7 @@ static void e1000_update_phy_info(unsign
>  static void e1000_watchdog(unsigned long data);
>  static void e1000_82547_tx_fifo_stall(unsigned long data);
>  static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
> +static int e1000_xmit_frames(struct net_device *dev);
>  static struct net_device_stats * e1000_get_stats(struct net_device *netdev);
>  static int e1000_change_mtu(struct net_device *netdev, int new_mtu);
>  static int e1000_set_mac(struct net_device *netdev, void *p);
> @@ -990,7 +991,7 @@ e1000_probe(struct pci_dev *pdev,
>  	if (pci_using_dac)
>  		netdev->features |= NETIF_F_HIGHDMA;
>  
> -	netdev->features |= NETIF_F_LLTX;
> +	netdev->features |= NETIF_F_LLTX | NETIF_F_BATCH_SKBS;
>  
>  	adapter->en_mng_pt = e1000_enable_mng_pass_thru(&adapter->hw);
>  
> @@ -3098,6 +3099,18 @@ e1000_tx_map(struct e1000_adapter *adapt
>  	return count;
>  }
>  
> +static void e1000_kick_DMA(struct e1000_adapter *adapter,
> +			   struct e1000_tx_ring *tx_ring, int i)
> +{
> +	wmb();
> +
> +	writel(i, adapter->hw.hw_addr + tx_ring->tdt);
> +	/* we need this if more than one processor can write to our tail
> +	 * at a time, it syncronizes IO on IA64/Altix systems */
> +	mmiowb();
> +}
> +
> +
>  static void
>  e1000_tx_queue(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring,
>                 int tx_flags, int count)
> @@ -3144,13 +3157,7 @@ e1000_tx_queue(struct e1000_adapter *ada
>  	 * know there are new descriptors to fetch.  (Only
>  	 * applicable for weak-ordered memory model archs,
>  	 * such as IA-64). */
> -	wmb();
> -
>  	tx_ring->next_to_use = i;
> -	writel(i, adapter->hw.hw_addr + tx_ring->tdt);
> -	/* we need this if more than one processor can write to our tail
> -	 * at a time, it syncronizes IO on IA64/Altix systems */
> -	mmiowb();
>  }
>  
>  /**
> @@ -3257,21 +3264,31 @@ static int e1000_maybe_stop_tx(struct ne
>  }
>  
>  #define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 )
> +
> +struct e1000_tx_cbdata {
> +	int count;
> +	unsigned int max_per_txd;
> +	unsigned int nr_frags;
> +	unsigned int mss;
> +};
> +
> +#define E1000_SKB_CB(__skb)	((struct e1000_tx_cbdata *)&((__skb)->cb[0]))
> +#define NETDEV_TX_DROPPED	-5
> +
>  static int
> -e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
> +e1000_prep_queue_frame(struct sk_buff *skb, struct net_device *netdev)
>  {
>  	struct e1000_adapter *adapter = netdev_priv(netdev);
>  	struct e1000_tx_ring *tx_ring;
> -	unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD;
> +	unsigned int max_per_txd = E1000_MAX_DATA_PER_TXD;
>  	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
> -	unsigned int tx_flags = 0;
>  	unsigned int len = skb->len;
> -	unsigned long flags;
> -	unsigned int nr_frags = 0;
> -	unsigned int mss = 0;
> +	unsigned int nr_frags;
> +	unsigned int mss;
>  	int count = 0;
> -	int tso;
>  	unsigned int f;
> +	struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb);
> +
>  	len -= skb->data_len;
>  
>  	/* This goes back to the question of how to logically map a tx queue
> @@ -3282,7 +3299,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
>  
>  	if (unlikely(skb->len <= 0)) {
>  		dev_kfree_skb_any(skb);
> -		return NETDEV_TX_OK;
> +		return NETDEV_TX_DROPPED;
>  	}
>  
>  	/* 82571 and newer doesn't need the workaround that limited descriptor
> @@ -3328,7 +3345,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
>  					DPRINTK(DRV, ERR,
>  						"__pskb_pull_tail failed.\n");
>  					dev_kfree_skb_any(skb);
> -					return NETDEV_TX_OK;
> +					return NETDEV_TX_DROPPED;
>  				}
>  				len = skb->len - skb->data_len;
>  				break;
> @@ -3372,22 +3389,32 @@ e1000_xmit_frame(struct sk_buff *skb, st
>  	    (adapter->hw.mac_type == e1000_82573))
>  		e1000_transfer_dhcp_info(adapter, skb);
>  
> -	if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags))
> -		/* Collision - tell upper layer to requeue */
> -		return NETDEV_TX_LOCKED;
> +	cb->count = count;
> +	cb->max_per_txd = max_per_txd;
> +	cb->nr_frags = nr_frags;
> +	cb->mss = mss;
> +
> +	return NETDEV_TX_OK;
> +}
> +
> +static int e1000_queue_frame(struct sk_buff *skb, struct net_device *netdev)
> +{
> +	struct e1000_adapter *adapter = netdev_priv(netdev);
> +	struct e1000_tx_ring *tx_ring = adapter->tx_ring;
> +	int tso;
> +	unsigned int first;
> +	unsigned int tx_flags = 0;
> +	struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb);
>  
>  	/* need: count + 2 desc gap to keep tail from touching
>  	 * head, otherwise try next time */
> -	if (unlikely(e1000_maybe_stop_tx(netdev, tx_ring, count + 2))) {
> -		spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
> +	if (unlikely(e1000_maybe_stop_tx(netdev, tx_ring, cb->count + 2)))
>  		return NETDEV_TX_BUSY;
> -	}
>  
>  	if (unlikely(adapter->hw.mac_type == e1000_82547)) {
>  		if (unlikely(e1000_82547_fifo_workaround(adapter, skb))) {
>  			netif_stop_queue(netdev);
>  			mod_timer(&adapter->tx_fifo_stall_timer, jiffies + 1);
> -			spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
>  			return NETDEV_TX_BUSY;
>  		}
>  	}
> @@ -3402,8 +3429,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
>  	tso = e1000_tso(adapter, tx_ring, skb);
>  	if (tso < 0) {
>  		dev_kfree_skb_any(skb);
> -		spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
> -		return NETDEV_TX_OK;
> +		return NETDEV_TX_DROPPED;
>  	}
>  
>  	if (likely(tso)) {
> @@ -3420,12 +3446,78 @@ e1000_xmit_frame(struct sk_buff *skb, st
>  
>  	e1000_tx_queue(adapter, tx_ring, tx_flags,
>  	               e1000_tx_map(adapter, tx_ring, skb, first,
> -	                            max_per_txd, nr_frags, mss));
> +	                            cb->max_per_txd, cb->nr_frags, cb->mss));
> +
> +	return NETDEV_TX_OK;
> +}
>  
> -	netdev->trans_start = jiffies;
> +static inline int e1000_xmit_frames(struct net_device *netdev)
> +{
> +	struct e1000_adapter *adapter = netdev->priv;
> +	struct e1000_tx_ring *tx_ring = adapter->tx_ring;
> +	int ret = NETDEV_TX_OK;
> +	int skbs_done = 0;
> +	struct sk_buff *skb;
> +	unsigned long flags;
> +
> +	if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) {
> +		/* Collision - tell upper layer to requeue */
> +		return NETDEV_TX_LOCKED;
> +	}
>  
> -	/* Make sure there is space in the ring for the next send. */
> -	e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
> +	while ((skb = __skb_dequeue(netdev->skb_blist)) != NULL) {
> +		if ((ret = e1000_prep_queue_frame(skb, netdev)) != NETDEV_TX_OK)
> +			continue;
> +
> +		ret = e1000_queue_frame(skb, netdev);
> +		if (ret == NETDEV_TX_OK) {
> +			skbs_done++;
> +		} else {
> +			if (ret == NETDEV_TX_BUSY)
> +				__skb_queue_head(netdev->skb_blist, skb);
> +			break;
> +		}
> +	}
> +
> +	if (skbs_done) {
> +		e1000_kick_DMA(adapter, tx_ring, adapter->tx_ring->next_to_use);
> +		netdev->trans_start = jiffies;
> +		e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
> +	}
> +
> +	spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
> +
> +	if (ret == NETDEV_TX_DROPPED)
> +		ret = NETDEV_TX_OK;
> +
> +	return ret;
> +}
> +
> +static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
> +{
> +	struct e1000_adapter *adapter = netdev_priv(netdev);
> +	struct e1000_tx_ring *tx_ring = adapter->tx_ring;
> +	unsigned long flags;
> +
> +	if (!skb)
> +		return e1000_xmit_frames(netdev);
> +
> +	if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) {
> +		/* Collision - tell upper layer to requeue */
> +		return NETDEV_TX_LOCKED;
> +	}
> +
> +	if (unlikely(e1000_prep_queue_frame(skb, netdev) != NETDEV_TX_OK)) {
> +		spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
> +		return NETDEV_TX_OK;
> +	}
> +
> +	if (e1000_queue_frame(skb, netdev) == NETDEV_TX_OK) {
> +		e1000_kick_DMA(adapter, tx_ring, adapter->tx_ring->next_to_use);
> +		netdev->trans_start = jiffies;
> +		/* Make sure there is space in the ring for the next send. */
> +		e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
> +	}
>  
>  	spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
>  	return NETDEV_TX_OK;
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ofa-general] Re: [PATCH 10/10 Rev4] [E1000] Implement batching
  2007-08-22 14:39   ` [ofa-general] " Kok, Auke
@ 2007-08-23  2:44     ` Krishna Kumar2
  0 siblings, 0 replies; 15+ messages in thread
From: Krishna Kumar2 @ 2007-08-23  2:44 UTC (permalink / raw)
  To: Kok, Auke
  Cc: jagana, johnpol, herbert, gaagaan, Robert.Olsson, kumarkr,
	mcarlson, peter.p.waskiewicz.jr, hadi, kaber, jeff, general,
	mchan, tgraf, netdev, sri, shemminger, davem, rdreier

Hi Auke,

"Kok, Auke" <auke-jan.h.kok@intel.com> wrote on 08/22/2007 08:09:31 PM:

> Krishna,
>
> while I appreciate the patch I would have preferred a patch to e1000e.
Not only
> does the e1000e driver remove a lot of the workarounds for old silicon,
it is
> also a good way for us to move the current e1000 driver into a bit more
stable
> maintenance mode.
>
> Do you think you can write this patch for e1000e instead? code-wise a lot
of
> things are still the same, so your patch should be relatively easy to
generate.
>
> e1000e currently lives in a branch from jeff garzik's netdev-2.6 tree

Definitely, I will pick it up and generate a patch.

Thanks,

- KK

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-08-23  2:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-22  8:28 [PATCH 0/10 Rev4] Implement skb batching and support in IPoIB Krishna Kumar
2007-08-22  8:28 ` [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching Krishna Kumar
2007-08-22 15:50   ` [ofa-general] " Randy Dunlap
2007-08-23  2:48     ` Krishna Kumar2
2007-08-22  8:29 ` [PATCH 2/10 Rev4] [core] Add skb_blist & support " Krishna Kumar
2007-08-22  8:29 ` [PATCH 3/10 Rev4] [sched] Modify qdisc_run to support batching Krishna Kumar
2007-08-22  8:29 ` [PATCH 4/10 Rev4] [ethtool] Add ethtool support Krishna Kumar
2007-08-22  8:30 ` [PATCH 5/10 Rev4] [IPoIB] Header file changes Krishna Kumar
2007-08-22  8:30 ` [PATCH 6/10 Rev4] [IPoIB] CM & Multicast changes Krishna Kumar
2007-08-22  8:30 ` [PATCH 7/10 Rev4] [IPoIB] Verbs changes Krishna Kumar
2007-08-22  8:31 ` [PATCH 8/10 Rev4] [IPoIB] Post and work completion handler changes Krishna Kumar
2007-08-22  8:31 ` [PATCH 9/10 Rev4] [IPoIB] Implement batching Krishna Kumar
2007-08-22  8:31 ` [PATCH 10/10 Rev4] [E1000] " Krishna Kumar
2007-08-22 14:39   ` [ofa-general] " Kok, Auke
2007-08-23  2:44     ` Krishna Kumar2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).