Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] qlge: fix a eeh handler to not add a pending timer
From: David Miller @ 2010-07-03  4:59 UTC (permalink / raw)
  To: leitao; +Cc: netdev, ron.mercer
In-Reply-To: <3d4d609f65c74c269e056c5fbe2fe4b174023059.1277936929.git.root@sanx1002.austin.ibm.com>

From: leitao@linux.vnet.ibm.com
Date: Thu,  1 Jul 2010 10:00:18 -0300

> On some ocasions the function qlge_io_resume() tries to add a
> pending timer, which causes the system to hit the BUG() on
> add_timer() function.
> 
> This patch removes the timer during the EEH recovery.
> 
> Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] qlge: Replacing add_timer() to mod_timer()
From: David Miller @ 2010-07-03  4:59 UTC (permalink / raw)
  To: leitao; +Cc: netdev, ron.mercer
In-Reply-To: <e687077281d05d3a2da49431b7c0ff0b1076f3e6.1277936929.git.root@sanx1002.austin.ibm.com>

From: leitao@linux.vnet.ibm.com
Date: Thu,  1 Jul 2010 10:00:17 -0300

> Currently qlge driver calls add_timer() instead of mod_timer().
> This patch changes add_timer() to mod_timer(), which seems a better
> solution.
> 
> Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6 2/2] usbnet: Set parent device early for netdev_printk()
From: David Miller @ 2010-07-03  4:49 UTC (permalink / raw)
  To: ben; +Cc: netdev, pitxyoki, joe
In-Reply-To: <1278111386.4878.77.camel@localhost>

From: Ben Hutchings <ben@decadent.org.uk>
Date: Fri, 02 Jul 2010 23:56:26 +0100

> On Fri, 2010-07-02 at 23:40 +0100, Ben Hutchings wrote:
>> netdev_printk() follows the net_device's parent device pointer, so
>> we must set that earlier than we previously did.
>> 
>> Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
>> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> [...]
> 
> This should also go into a stable update for 2.6.34.


Ok, queued up.

^ permalink raw reply

* Re: [PATCH net-2.6 2/2] usbnet: Set parent device early for netdev_printk()
From: David Miller @ 2010-07-03  4:49 UTC (permalink / raw)
  To: ben; +Cc: netdev, pitxyoki, joe
In-Reply-To: <1278110450.4878.75.camel@localhost>

From: Ben Hutchings <ben@decadent.org.uk>
Date: Fri, 02 Jul 2010 23:40:50 +0100

> netdev_printk() follows the net_device's parent device pointer, so
> we must set that earlier than we previously did.
> 
> Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6 1/2] Revert "rndis_host: Poll status channel before control channel"
From: David Miller @ 2010-07-03  4:48 UTC (permalink / raw)
  To: ben; +Cc: netdev, pitxyoki
In-Reply-To: <1278110361.4878.73.camel@localhost>

From: Ben Hutchings <ben@decadent.org.uk>
Date: Fri, 02 Jul 2010 23:39:21 +0100

> This reverts commit c17b274dc2aa538b68c1f02b01a3c4e124b435ba.
> 
> That change was reported to break rndis_wlan support for the WUSB54GS.
> 
> Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

So I wasn't completely crazy when I wanted more testing feedback
for this change :-)

Applied, thanks Ben.

^ permalink raw reply

* [PATCH 2/2] virtio_net: fix oom handling on tx
From: Rusty Russell @ 2010-07-03  2:34 UTC (permalink / raw)
  To: netdev; +Cc: Michael S. Tsirkin, Herbert Xu
In-Reply-To: <201007031232.56510.rusty@rustcorp.com.au>

virtio net will never try to overflow the TX ring, so the only reason
add_buf may fail is out of memory. Thus, we can not stop the
device until some request completes - there's no guarantee anything
at all is outstanding.

Make the error message clearer as well: error here does not
indicate queue full.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (...and avoid TX_BUSY)
Cc: stable@kernel.org  # .34.x (s/virtqueue_/vi->svq->vq_ops->/)
---
 drivers/net/virtio_net.c |   21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -562,7 +562,6 @@ static netdev_tx_t start_xmit(struct sk_
 	struct virtnet_info *vi = netdev_priv(dev);
 	int capacity;
 
-again:
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(vi);
 
@@ -571,14 +570,20 @@ again:
 
 	/* This can happen with OOM and indirect buffers. */
 	if (unlikely(capacity < 0)) {
-		netif_stop_queue(dev);
-		dev_warn(&dev->dev, "Unexpected full queue\n");
-		if (unlikely(!virtqueue_enable_cb(vi->svq))) {
-			virtqueue_disable_cb(vi->svq);
-			netif_start_queue(dev);
-			goto again;
+		if (net_ratelimit()) {
+			if (likely(capacity == -ENOMEM)) {
+				dev_warn(&dev->dev,
+					 "TX queue failure: out of memory\n");
+			} else {
+				dev->stats.tx_fifo_errors++;
+				dev_warn(&dev->dev,
+					 "Unexpected TX queue failure: %d\n",
+					 capacity);
+			}
 		}
-		return NETDEV_TX_BUSY;
+		dev->stats.tx_dropped++;
+		kfree_skb(skb);
+		return NETDEV_TX_OK;
 	}
 	virtqueue_kick(vi->svq);
 

^ permalink raw reply

* [PATCH 1/2] virtio_net: do not reschedule rx refill forever
From: Rusty Russell @ 2010-07-03  2:32 UTC (permalink / raw)
  To: netdev; +Cc: Michael S. Tsirkin

From: "Michael S. Tsirkin" <mst@redhat.com>

We currently fill all of RX ring, then add_buf
returns ENOSPC, which gets mis-detected as an out of
memory condition and causes us to reschedule the work,
and so on forever. Fix this by oom = err == -ENOMEM;

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .34.x
---
 drivers/net/virtio_net.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 06c30df..85615a3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -416,7 +416,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
 static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
 {
 	int err;
-	bool oom = false;
+	bool oom;
 
 	do {
 		if (vi->mergeable_rx_bufs)
@@ -426,10 +426,9 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
 		else
 			err = add_recvbuf_small(vi, gfp);
 
-		if (err < 0) {
-			oom = true;
+		oom = err == -ENOMEM;
+		if (err < 0)
 			break;
-		}
 		++vi->num;
 	} while (err > 0);
 	if (unlikely(vi->num > vi->max))

^ permalink raw reply related

* Re: [stable] [stable-2.6.32 PATCH] ixgbe: backport bug fix for tx panic
From: Greg KH @ 2010-07-03  2:28 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Brandeburg, Jesse, linux-kernel@vger.kernel.org,
	stable@kernel.org, Brandon, netdev@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <AANLkTimr7-8TIjuCjIbqrJSVH4VI-Pjp8plpjTubuTfE@mail.gmail.com>

On Fri, Jul 02, 2010 at 02:37:13PM -0700, Jeff Kirsher wrote:
> On Tue, May 25, 2010 at 13:18, Greg KH <greg@kroah.com> wrote:
> > On Tue, May 25, 2010 at 09:27:25AM -0700, Brandeburg, Jesse wrote:
> >>
> >>
> >> On Tue, 25 May 2010, Jeff Kirsher wrote:
> >>
> >> > On Mon, May 10, 2010 at 17:46, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> >> > > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> > >
> >> > > backporting this commit:
> >> > >
> >> > > commit fdd3d631cddad20ad9d3e1eb7dbf26825a8a121f
> >> > > Author: Krishna Kumar <krkumar2@in.ibm.com>
> >> > > Date:   Wed Feb 3 13:13:10 2010 +0000
> >> > >
> >> > >    ixgbe: Fix return of invalid txq
> >> > >
> >> > >    a developer had complained of getting lots of warnings:
> >> > >
> >> > >    "eth16 selects TX queue 98, but real number of TX queues is 64"
> >> > >
> >> > >    http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02200.html
> >> > >
> >> > >    As there was no follow up on that bug, I am submitting this
> >> > >    patch assuming that the other return points will not return
> >> > >    invalid txq's, and also that this fixes the bug (not tested).
> >> > >
> >> > >    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> >> > >    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> > >    Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> >> > >    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> >> > >    Signed-off-by: David S. Miller <davem@davemloft.net>
> >> > >
> >> > > CC: Brandon <brandon@ifup.org>
> >> > > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> > > ---
> >> > >
> >> > >  drivers/net/ixgbe/ixgbe_main.c |    8 ++++++--
> >> > >  1 files changed, 6 insertions(+), 2 deletions(-)
> >> > >
> >> >
> >> > Greg - status?  Did you queue this patch for the stable release and I missed it?
> >>
> >> Maybe we didn't say (and we should have) that this fixes a panic on
> >> machines with > 64 cores.  Please apply to -stable 32.
> >
> > I'll get to it for the next release after this one, if that's ok.
> >
> > thanks,
> >
> > greg k-h
> > --
> 
> I did not see this patch in the list of patches for the next release
> of the stable kernel.  Just want to make sure this patch makes it this
> time... :)

Ick, I missed it, let me go queue it up right now, sorry about that.

greg k-h

^ permalink raw reply

* Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
From: Jesse Barnes @ 2010-07-02 23:16 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Michael Chan, Matthew Wilcox, linux-pci, netdev
In-Reply-To: <1276802196.2083.12.camel@achroite.uk.solarflarecom.com>

On Thu, 17 Jun 2010 20:16:36 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> During suspend on an SMP system, {read,write}_msi_msg_desc() may be
> called to mask and unmask interrupts on a device that is already in a
> reduced power state.  At this point memory-mapped registers including
> MSI-X tables are not accessible, and config space may not be fully
> functional either.
> 
> While a device is in a reduced power state its interrupts are
> effectively masked and its MSI(-X) state will be restored when it is
> brought back to D0.  Therefore these functions can simply read and
> write msi_desc::msg for devices not in D0.
> 
> Further, read_msi_msg_desc() should only ever be used to update a
> previously written message, so it can always read msi_desc::msg
> and never needs to touch the hardware.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied to my linux-next branch, thanks.

Matthew, let me know if you have an issue with this.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply

* Re: [PATCH net-2.6 2/2] usbnet: Set parent device early for netdev_printk()
From: Ben Hutchings @ 2010-07-02 22:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Luís Picciochi Oliveira, Joe Perches
In-Reply-To: <1278110450.4878.75.camel@localhost>

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

On Fri, 2010-07-02 at 23:40 +0100, Ben Hutchings wrote:
> netdev_printk() follows the net_device's parent device pointer, so
> we must set that earlier than we previously did.
> 
> Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[...]

This should also go into a stable update for 2.6.34.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [PATCH net-2.6 2/2] usbnet: Set parent device early for netdev_printk()
From: Ben Hutchings @ 2010-07-02 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Luís Picciochi Oliveira, Joe Perches
In-Reply-To: <1278110361.4878.73.camel@localhost>

netdev_printk() follows the net_device's parent device pointer, so
we must set that earlier than we previously did.

Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/usb/usbnet.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a95c73d..81c76ad 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1293,6 +1293,9 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 		goto out;
 	}
 
+	/* netdev_printk() needs this so do it as early as possible */
+	SET_NETDEV_DEV(net, &udev->dev);
+
 	dev = netdev_priv(net);
 	dev->udev = xdev;
 	dev->intf = udev;
@@ -1377,8 +1380,6 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 		dev->rx_urb_size = dev->hard_mtu;
 	dev->maxpacket = usb_maxpacket (dev->udev, dev->out, 1);
 
-	SET_NETDEV_DEV(net, &udev->dev);
-
 	if ((dev->driver_info->flags & FLAG_WLAN) != 0)
 		SET_NETDEV_DEVTYPE(net, &wlan_type);
 	if ((dev->driver_info->flags & FLAG_WWAN) != 0)
-- 
1.7.1



^ permalink raw reply related

* [PATCH net-2.6 1/2] Revert "rndis_host: Poll status channel before control channel"
From: Ben Hutchings @ 2010-07-02 22:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Luís Picciochi Oliveira

This reverts commit c17b274dc2aa538b68c1f02b01a3c4e124b435ba.

That change was reported to break rndis_wlan support for the WUSB54GS.

Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/usb/rndis_host.c |   18 ++++++------------
 1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/net/usb/rndis_host.c b/drivers/net/usb/rndis_host.c
index 28d3ee1..dd8a4ad 100644
--- a/drivers/net/usb/rndis_host.c
+++ b/drivers/net/usb/rndis_host.c
@@ -104,10 +104,8 @@ static void rndis_msg_indicate(struct usbnet *dev, struct rndis_indicate *msg,
 int rndis_command(struct usbnet *dev, struct rndis_msg_hdr *buf, int buflen)
 {
 	struct cdc_state	*info = (void *) &dev->data;
-	struct usb_cdc_notification notification;
 	int			master_ifnum;
 	int			retval;
-	int			partial;
 	unsigned		count;
 	__le32			rsp;
 	u32			xid = 0, msg_len, request_id;
@@ -135,17 +133,13 @@ int rndis_command(struct usbnet *dev, struct rndis_msg_hdr *buf, int buflen)
 	if (unlikely(retval < 0 || xid == 0))
 		return retval;
 
-	/* Some devices don't respond on the control channel until
-	 * polled on the status channel, so do that first. */
-	retval = usb_interrupt_msg(
-		dev->udev,
-		usb_rcvintpipe(dev->udev, dev->status->desc.bEndpointAddress),
-		&notification, sizeof(notification), &partial,
-		RNDIS_CONTROL_TIMEOUT_MS);
-	if (unlikely(retval < 0))
-		return retval;
+	// FIXME Seems like some devices discard responses when
+	// we time out and cancel our "get response" requests...
+	// so, this is fragile.  Probably need to poll for status.
 
-	/* Poll the control channel; the request probably completed immediately */
+	/* ignore status endpoint, just poll the control channel;
+	 * the request probably completed immediately
+	 */
 	rsp = buf->msg_type | RNDIS_MSG_COMPLETION;
 	for (count = 0; count < 10; count++) {
 		memset(buf, 0, CONTROL_BUFFER_SIZE);
-- 
1.7.1




^ permalink raw reply related

* Re: bnx2/5709: Strange interrupts spread
From: Christophe Ngo Van Duc @ 2010-07-02 22:12 UTC (permalink / raw)
  To: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1278104311.11828.12.camel@HP1>

Hi

Well that's the strange thing: it is IP traffic. The only difference
with eth0 and eth1 is that eth2 and eth3 belongs to a bridge (br0).

Best Regards,
Christophe.

On 7/2/10, Michael Chan <mchan@broadcom.com> wrote:
>
> On Fri, 2010-07-02 at 13:33 -0700, Christophe Ngo Van Duc wrote:
>> On eth2 (external card) all interrupts goes to CPU0
>>
>>
>>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>>   CPU6       CPU7
>>   80:   46973077          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-0
>>   81:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-1
>>   82:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-2
>>   83:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-3
>>   84:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-4
>>   85:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-5
>>   86:          0          0       2445          0         37          0
>>    8463         13   PCI-MSI-edge      eth2-6
>>   87:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-7
>
> Reformatted your output
>
>> If I understand correctly the RSS hash is used to dispatch the packets
>> into the different queues running on the different CPU.
>
> It looks like most interrupts go to eth2-0, a few go to eth2-6.  The rx
> ring for eth2-0 is for non-IP packets.  The RSS hash will hash IP
> packets and place them on eth2-1 to eth2-7.  eth2-0 also handles tx
> interrupts for TX ring 0.  TX traffic is hashed by the stack.
>
> What kind of traffic is passing through eth2?
>
> Thanks.
>
>
>

-- 
Sent from my mobile device

^ permalink raw reply

* Re: e1000e: receives no packets after resume (2.6.35-rc3-00262-g984bc96)
From: Jeff Kirsher @ 2010-07-02 21:47 UTC (permalink / raw)
  To: Nico Schottelius, LKML; +Cc: e1000-devel, netdev
In-Reply-To: <20100702072826.GB16856@schottelius.org>

On Fri, Jul 2, 2010 at 00:28, Nico Schottelius
<nico-linux-20100702@schottelius.org> wrote:
> Good morning hackers,
>
> I've seen that in kernels before, but now suffering from it:
> e1000e does not receive any packets after third resume:
>
> --------------------------------------------------------------------------------
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11589]: version 5.2.5 starting
> dhcpcd[11589]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11589]: eth0: broadcasting for a lease
> ^Cdhcpcd[11589]: received SIGINT, stopping
> dhcpcd[11589]: eth0: removing interface
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11601]: version 5.2.5 starting
> dhcpcd[11601]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11601]: eth0: broadcasting for a lease
> ^Cdhcpcd[11601]: received SIGINT, stopping
> dhcpcd[11601]: eth0: removing interface
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11784]: version 5.2.5 starting
> dhcpcd[11784]: eth0: rebinding lease of 129.132.102.115
> ^Cdhcpcd[11784]: received SIGINT, stopping
> dhcpcd[11784]: eth0: removing interface
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11830]: version 5.2.5 starting
> dhcpcd[11830]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11830]: eth0: broadcasting for a lease
> dhcpcd[11830]: timed out
> [8:17] kr:clyde# rmmod e1000e
> [8:18] kr:clyde# modprobe e1000e
> [8:18] kr:clyde# dhcpcd eth0
> dhcpcd[11855]: version 5.2.5 starting
> dhcpcd[11855]: eth0: waiting for carrier
> ^Cdhcpcd[11855]: received SIGINT, stopping
> dhcpcd[11855]: eth0: removing interface
> [8:18] kr:clyde# dhcpcd eth0
> dhcpcd[11871]: version 5.2.5 starting
> dhcpcd[11871]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11871]: eth0: acknowledged 129.132.102.115 from 129.132.57.97
> dhcpcd[11871]: eth0: checking for 129.132.102.115
> dhcpcd[11871]: eth0: leased 129.132.102.115 for 86400 seconds
> dhcpcd[11871]: forked to background, child pid 11892
> [8:18] kr:clyde#
> --------------------------------------------------------------------------------
>
> Are you aware of this issues?
>
> Cheers,
>
> Nico
>
> --

Adding the Networking Kernel mailing list (netdev) and Intel Linux LAN
Driver mailing list (e1000-patches)...

Would it be possible to get more information about the system/setup?

For example, can you provide the following information:
     - full lspci output (lspci -vvv)
     - ethtool -i ethX
     - dmesg  - after loading the driver and configuring the interfaces
     - kernel config

-- 
Cheers,
Jeff

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [rft]testers for suspend/resume support for cdc-phonet
From: Oliver Neukum @ 2010-07-02 21:46 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA

Hi,

I am looking for someone who is willing to test a patch for cdc-phonet
implementing suspend/resume.

	Regards
		Oliver

diff --git a/drivers/net/usb/cdc-phonet.c b/drivers/net/usb/cdc-phonet.c
index 109751b..d182835 100644
--- a/drivers/net/usb/cdc-phonet.c
+++ b/drivers/net/usb/cdc-phonet.c
@@ -42,9 +42,12 @@ struct usbpn_dev {
 	unsigned int		tx_pipe, rx_pipe;
 	u8 active_setting;
 	u8 disconnected;
+	u8 suspended;
+	u8 opened;
 
 	unsigned		tx_queue;
 	spinlock_t		tx_lock;
+	struct usb_anchor	tx_anchor;
 
 	spinlock_t		rx_lock;
 	struct sk_buff		*rx_skb;
@@ -73,8 +76,10 @@ static netdev_tx_t usbpn_xmit(struct sk_buff *skb, struct net_device *dev)
 	usb_fill_bulk_urb(req, pnd->usb, pnd->tx_pipe, skb->data, skb->len,
 				tx_complete, skb);
 	req->transfer_flags = URB_ZERO_PACKET;
+	usb_anchor_urb(req, &pnd->tx_anchor);
 	err = usb_submit_urb(req, GFP_ATOMIC);
 	if (err) {
+		usb_unanchor_urb(req);
 		usb_free_urb(req);
 		goto drop;
 	}
@@ -235,6 +240,7 @@ static int usbpn_open(struct net_device *dev)
 		pnd->urbs[i] = req;
 	}
 
+	pnd->opened = 1;
 	netif_wake_queue(dev);
 	return 0;
 }
@@ -256,6 +262,7 @@ static int usbpn_close(struct net_device *dev)
 		usb_free_urb(req);
 		pnd->urbs[i] = NULL;
 	}
+	pnd->opened = 0;
 
 	return usb_set_interface(pnd->usb, num, !pnd->active_setting);
 }
@@ -400,6 +407,7 @@ int usbpn_probe(struct usb_interface *intf, const struct usb_device_id *id)
 	pnd->data_intf = data_intf;
 	spin_lock_init(&pnd->tx_lock);
 	spin_lock_init(&pnd->rx_lock);
+	init_usb_anchor(&pnd->tx_anchor);
 	/* Endpoints */
 	if (usb_pipein(data_desc->endpoint[0].desc.bEndpointAddress)) {
 		pnd->rx_pipe = usb_rcvbulkpipe(usbdev,
@@ -453,10 +461,52 @@ static void usbpn_disconnect(struct usb_interface *intf)
 	usb_put_dev(usb);
 }
 
+static int usbpn_suspend(struct usb_interface *intf, pm_message_t message)
+{
+	struct usbpn_dev *pnd = usb_get_intfdata(intf);
+	int i;
+
+	if (pnd->suspended++)
+		return 0;
+	if (!pnd->opened)
+		return 0;
+	
+	for (i = 0; i < rxq_size; i++) {
+		struct urb *req = pnd->urbs[i];
+		usb_kill_urb(req);
+	}
+
+	usb_kill_anchored_urbs(&pnd->tx_anchor);
+	return 0;
+}
+
+static int usbpn_resume(struct usb_interface *intf)
+{
+	struct usbpn_dev *pnd = usb_get_intfdata(intf);
+	int i, r, rv = 0;
+
+	if (--pnd->suspended)
+		return 0;
+	if (!pnd->opened)
+		return 0;
+
+	for (i = 0; i < rxq_size; i++) {
+		struct urb *req = pnd->urbs[i];
+
+		r = rx_submit(pnd, req, GFP_NOIO);
+		if (r)
+			rv++;
+	}
+
+	return rv ? -EIO : 0;
+}
+
 static struct usb_driver usbpn_driver = {
 	.name =		"cdc_phonet",
 	.probe =	usbpn_probe,
 	.disconnect =	usbpn_disconnect,
+	.suspend =	usbpn_suspend,
+	.resume =	usbpn_resume,
 	.id_table =	usbpn_ids,
 };
 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [stable] [stable-2.6.32 PATCH] ixgbe: backport bug fix for tx panic
From: Jeff Kirsher @ 2010-07-02 21:37 UTC (permalink / raw)
  To: Greg KH
  Cc: Brandeburg, Jesse, linux-kernel@vger.kernel.org,
	stable@kernel.org, Brandon, netdev@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <20100525201837.GA13821@kroah.com>

On Tue, May 25, 2010 at 13:18, Greg KH <greg@kroah.com> wrote:
> On Tue, May 25, 2010 at 09:27:25AM -0700, Brandeburg, Jesse wrote:
>>
>>
>> On Tue, 25 May 2010, Jeff Kirsher wrote:
>>
>> > On Mon, May 10, 2010 at 17:46, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
>> > > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > >
>> > > backporting this commit:
>> > >
>> > > commit fdd3d631cddad20ad9d3e1eb7dbf26825a8a121f
>> > > Author: Krishna Kumar <krkumar2@in.ibm.com>
>> > > Date:   Wed Feb 3 13:13:10 2010 +0000
>> > >
>> > >    ixgbe: Fix return of invalid txq
>> > >
>> > >    a developer had complained of getting lots of warnings:
>> > >
>> > >    "eth16 selects TX queue 98, but real number of TX queues is 64"
>> > >
>> > >    http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02200.html
>> > >
>> > >    As there was no follow up on that bug, I am submitting this
>> > >    patch assuming that the other return points will not return
>> > >    invalid txq's, and also that this fixes the bug (not tested).
>> > >
>> > >    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>> > >    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > >    Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
>> > >    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> > >    Signed-off-by: David S. Miller <davem@davemloft.net>
>> > >
>> > > CC: Brandon <brandon@ifup.org>
>> > > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > > ---
>> > >
>> > >  drivers/net/ixgbe/ixgbe_main.c |    8 ++++++--
>> > >  1 files changed, 6 insertions(+), 2 deletions(-)
>> > >
>> >
>> > Greg - status?  Did you queue this patch for the stable release and I missed it?
>>
>> Maybe we didn't say (and we should have) that this fixes a panic on
>> machines with > 64 cores.  Please apply to -stable 32.
>
> I'll get to it for the next release after this one, if that's ok.
>
> thanks,
>
> greg k-h
> --

I did not see this patch in the list of patches for the next release
of the stable kernel.  Just want to make sure this patch makes it this
time... :)

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Oleg Nesterov @ 2010-07-02 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sridhar Samudrala, Tejun Heo, Michael S. Tsirkin, Ingo Molnar,
	netdev, lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1278094270.1917.288.camel@laptop>

On 07/02, Peter Zijlstra wrote:
>
> On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
> >
> >  Does  it (Tejun's kthread_clone() patch) also  inherit the
> > cgroup of the caller?
>
> Of course, its a simple do_fork() which inherits everything just as you
> would expect from a similar sys_clone()/sys_fork() call.

Yes. And I'm afraid it can inherit more than we want. IIUC, this is called
from ioctl(), right?

Then the new thread becomes the natural child of the caller, and it shares
->mm with the parent. And files, dup_fd() without CLONE_FS.

Signals. Say, if you send SIGKILL to this new thread, it can't sleep in
TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent
just because the parent gets SIGQUIT or abother coredumpable signal.
Or the new thread can recieve SIGSTOP via ^Z.

Perhaps this is OK, I do not know. Just to remind that kernel_thread()
is merely clone(CLONE_VM).

Oleg.

^ permalink raw reply

* Re: bnx2/5709: Strange interrupts spread
From: Michael Chan @ 2010-07-02 20:58 UTC (permalink / raw)
  To: Christophe Ngo Van Duc; +Cc: netdev@vger.kernel.org
In-Reply-To: <AANLkTiniNNPV9ztxXHtX4np7PIZabkm0I4v5O29chf8i@mail.gmail.com>


On Fri, 2010-07-02 at 13:33 -0700, Christophe Ngo Van Duc wrote:
> On eth2 (external card) all interrupts goes to CPU0
>
>
>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
>   80:   46973077          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-0
>   81:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-1
>   82:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-2
>   83:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-3
>   84:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-4
>   85:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-5
>   86:          0          0       2445          0         37          0       8463         13   PCI-MSI-edge      eth2-6
>   87:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-7

Reformatted your output

> If I understand correctly the RSS hash is used to dispatch the packets
> into the different queues running on the different CPU.

It looks like most interrupts go to eth2-0, a few go to eth2-6.  The rx
ring for eth2-0 is for non-IP packets.  The RSS hash will hash IP
packets and place them on eth2-1 to eth2-7.  eth2-0 also handles tx
interrupts for TX ring 0.  TX traffic is hashed by the stack.

What kind of traffic is passing through eth2?

Thanks.



^ permalink raw reply

* bnx2/5709: Strange interrupts spread
From: Christophe Ngo Van Duc @ 2010-07-02 20:33 UTC (permalink / raw)
  To: netdev

Dear list,

I hope I am posting to the correct place...

I am facing a strange issue on a HP DL 360.

I have 2 internal ethernet cards (the one that came by default with
the server) and 2 additional ethernet cards for a total for 4 ethernet
cards.

The 2 internal cards are running fine as of interrupts (for example eth1):
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7

  71:        604      11933         40       1537          0
0          0       6043   PCI-MSI-edge      eth1-0
  72:      24805       9795       3606          0        128
0       3365          0   PCI-MSI-edge      eth1-1
  73:          0        279          0        429         38
16540          0      30843   PCI-MSI-edge      eth1-2
  74:          0          0      25365        267          0
0         89      15541   PCI-MSI-edge      eth1-3
  75:       7244      24108          0          0      16488
0        240          0   PCI-MSI-edge      eth1-4
  76:      21378       3628       7726          0         49
247       2871          0   PCI-MSI-edge      eth1-5
  77:          0          0      47199        459         13
46      63064         18   PCI-MSI-edge      eth1-6
  78:          0       6230         67        283        259
82       7846      27130   PCI-MSI-edge      eth1-7

On eth2 (external card) all interrupts goes to CPU0
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7
  80:   46973077          0          0            0            0
   0          0   PCI-MSI-edge      eth2-0
  81:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-1
  82:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-2
  83:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-3
  84:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-4
  85:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-5
  86:          0          0       2445          0         37
0       8463         13   PCI-MSI-edge      eth2-6
  87:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-7

If I understand correctly the RSS hash is used to dispatch the packets
into the different queues running on the different CPU.

Why then my internal cards are running fine but the additional cards
(eth2 and eth3) are presenting this behavior where all interrupts goes
to one CPU?

Thanks for your help in understanding this. (see below for config details)

Christophe.

All are detected correctly at boot:
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.8e (April 13, 2010)
bnx2 0000:02:00.0: PCI INT A -> GSI 31 (level, low) -> IRQ 31
bnx2 0000:02:00.0: setting latency timer to 64
eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem f4000000, IRQ 31, node addr f4:ce:46:86:a1:00
bnx2 0000:02:00.1: PCI INT B -> GSI 39 (level, low) -> IRQ 39
bnx2 0000:02:00.1: setting latency timer to 64
eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem f2000000, IRQ 39, node addr f4:ce:46:86:a1:02
bnx2 0000:07:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
bnx2 0000:07:00.0: setting latency timer to 64
eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem fa000000, IRQ 24, node addr 00:26:55:87:17:98
bnx2 0000:07:00.1: PCI INT B -> GSI 34 (level, low) -> IRQ 34
bnx2 0000:07:00.1: setting latency timer to 64
eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem f8000000, IRQ 34, node addr 00:26:55:87:17:9a

Kernel is 2.6.31-13
Broadcom driver bnx2 v2.0.8e

eth0 is a normal interface with an Ip address
eth1 is a normal interface with an Ip address
eth2 belongs to a bridge interface without an ip address, running tc (htb)
eth3 belongs to the same bridge interface without an ip address

^ permalink raw reply

* [PATCH] ipvs: Kconfig cleanup
From: Michal Marek @ 2010-07-02 20:32 UTC (permalink / raw)
  To: lvs-devel
  Cc: netdev, Julian Anastasov, Simon Horman, Wensong Zhang,
	linux-kernel

IP_VS_PROTO_AH_ESP should be set iff either of IP_VS_PROTO_{AH,ESP} is
selected. Express this with standard kconfig syntax.

Signed-off-by: Michal Marek <mmarek@suse.cz>
---
 net/netfilter/ipvs/Kconfig |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index f2d7623..91e7373 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -83,19 +83,16 @@ config	IP_VS_PROTO_UDP
 	  protocol. Say Y if unsure.
 
 config	IP_VS_PROTO_AH_ESP
-	bool
-	depends on UNDEFINED
+	def_bool IP_VS_PROTO_ESP || IP_VS_PROTO_AH
 
 config	IP_VS_PROTO_ESP
 	bool "ESP load balancing support"
-	select IP_VS_PROTO_AH_ESP
 	---help---
 	  This option enables support for load balancing ESP (Encapsulation
 	  Security Payload) transport protocol. Say Y if unsure.
 
 config	IP_VS_PROTO_AH
 	bool "AH load balancing support"
-	select IP_VS_PROTO_AH_ESP
 	---help---
 	  This option enables support for load balancing AH (Authentication
 	  Header) transport protocol. Say Y if unsure.
-- 
1.7.1


^ permalink raw reply related

* [PATCH 8/8] net/emergency: remove locking from reycling pool if emergncy pools are not used
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Right now both users (socket and network driver) are accessing the
emergency pools locked. If there is no socket user then we have only the
NIC driver which is touching the pool in its napi callback. The locking
is not required since the driver has its own locking.

Disabling the emergency pools results in emerg_skb_users going down to
zero. Once the driver notices this then further allocations will be lock
less. This is performed while holding the list lock of the pool to
ensure that all users which touch the struct locked are gone.
Enabling the pools for the socket user requires that the NIC driver is
not touching the pool unlocked. This is ensured by the ndo_emerg_reload
callback which performs a lightweight disable/enable of the card.

As a side effect, the emergency mode of the nic is now deactivated once
the last user is gone.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c     |   10 ++++
 drivers/net/ucc_geth.c    |    8 ++++
 include/linux/netdevice.h |   55 ++++++++++++++++++++++--
 net/core/dev.c            |    1 +
 net/core/skbuff.c         |   19 ++++++++-
 net/core/sock.c           |  101 ++++++++++++++++++++++++++++++++++++++-------
 6 files changed, 172 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 1a1a249..0c891ea 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -116,6 +116,7 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 		struct sk_buff *skb);
 static int gfar_set_mac_address(struct net_device *dev);
 static int gfar_change_mtu(struct net_device *dev, int new_mtu);
+static int gfar_emerg_reload(struct net_device *dev);
 static irqreturn_t gfar_error(int irq, void *dev_id);
 static irqreturn_t gfar_transmit(int irq, void *dev_id);
 static irqreturn_t gfar_interrupt(int irq, void *dev_id);
@@ -464,6 +465,7 @@ static const struct net_device_ops gfar_netdev_ops = {
 	.ndo_start_xmit = gfar_start_xmit,
 	.ndo_stop = gfar_close,
 	.ndo_change_mtu = gfar_change_mtu,
+	.ndo_emerg_reload = gfar_emerg_reload,
 	.ndo_set_multicast_list = gfar_set_multi,
 	.ndo_tx_timeout = gfar_timeout,
 	.ndo_do_ioctl = gfar_ioctl,
@@ -1918,6 +1920,8 @@ int startup_gfar(struct net_device *ndev)
 	if (err)
 		return err;
 
+	fill_emerg_pool(ndev);
+
 	gfar_init_mac(ndev);
 
 	for (i = 0; i < priv->num_grps; i++) {
@@ -2393,6 +2397,12 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+static int gfar_emerg_reload(struct net_device *dev)
+{
+	stop_gfar(dev);
+	startup_gfar(dev);
+}
+
 /* gfar_reset_task gets scheduled when a packet has not been
  * transmitted after a set amount of time.
  * For now, assume that clearing out all the structures, and
diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 9d6097b..6280226 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -1991,6 +1991,12 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 	}
 }
 
+static void ucc_emerg_reload(struct net_device *dev)
+{
+	ucc_geth_close(dev);
+	ucc_geth_open(dev);
+}
+
 static void ucc_geth_set_multi(struct net_device *dev)
 {
 	struct ucc_geth_private *ugeth;
@@ -3499,6 +3505,7 @@ static int ucc_geth_open(struct net_device *dev)
 				  dev->name);
 		goto err;
 	}
+	fill_emerg_pool(dev);
 
 	err = request_irq(ugeth->ug_info->uf_info.irq, ucc_geth_irq_handler,
 			  0, "UCC Geth", dev);
@@ -3707,6 +3714,7 @@ static const struct net_device_ops ucc_geth_netdev_ops = {
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address	= ucc_geth_set_mac_addr,
 	.ndo_change_mtu		= eth_change_mtu,
+	.ndo_emerg_reload	= ucc_emerg_reload,
 	.ndo_set_multicast_list	= ucc_geth_set_multi,
 	.ndo_tx_timeout		= ucc_geth_timeout,
 	.ndo_do_ioctl		= ucc_geth_ioctl,
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa7e951..2606156 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -717,6 +717,10 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
  *			  struct nlattr *port[]);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * void (*ndo_emerg_reload)(struct net_device *dev);
+ *	If the card supports emergency pools then this function will perform a
+ *	lightweight reload to ensure that the card is not lockless accessing
+ *	the emergency pool for recycling purpose.
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -788,6 +792,7 @@ struct net_device_ops {
 	int			(*ndo_fcoe_get_wwn)(struct net_device *dev,
 						    u64 *wwn, int type);
 #endif
+	void			(*ndo_emerg_reload)(struct net_device *dev);
 };
 
 /*
@@ -1092,6 +1097,7 @@ struct net_device {
 	struct sk_buff_head rx_recycle;
 	u32 rx_rec_skbs_max;
 	u32 rx_rec_skb_size;
+	atomic_t emerg_skb_users;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -1138,24 +1144,63 @@ static inline void net_recycle_cleanup(struct net_device *dev)
 	skb_queue_purge(&dev->rx_recycle);
 }
 
+static inline int recycle_possible(struct net_device *dev, struct sk_buff *skb)
+{
+	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
+			skb_recycle_check(skb, dev->rx_rec_skb_size))
+		return 1;
+	else
+		return 0;
+}
+
 static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
 {
+	int emerg_active;
+
 	if (skb->emerg_dev) {
 		dev_put(skb->emerg_dev);
 		skb->emerg_dev = NULL;
 	}
-	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
-			skb_recycle_check(skb, dev->rx_rec_skb_size))
-		skb_queue_head(&dev->rx_recycle, skb);
-	else
+
+	emerg_active = atomic_read(&dev->emerg_skb_users);
+
+	if (recycle_possible(dev, skb)) {
+		if (emerg_active)
+			skb_queue_head(&dev->rx_recycle, skb);
+		else
+			__skb_queue_head(&dev->rx_recycle, skb);
+	} else {
 		dev_kfree_skb_any(skb);
+	}
+}
+
+static inline void fill_emerg_pool(struct net_device *dev)
+{
+	struct sk_buff *skb;
+
+	if (atomic_read(&dev->emerg_skb_users) == 0)
+		return;
+
+	do {
+		if (skb_queue_len(&dev->rx_recycle) >= dev->rx_rec_skbs_max)
+			return;
+
+		skb = __netdev_alloc_skb(dev, dev->rx_rec_skb_size, GFP_KERNEL);
+		if (!skb)
+			return;
+
+		net_recycle_add(dev, skb);
+	} while (1);
 }
 
 static inline struct sk_buff *net_recycle_get(struct net_device *dev)
 {
 	struct sk_buff *skb;
 
-	skb = skb_dequeue(&dev->rx_recycle);
+	if (atomic_read(&dev->emerg_skb_users) > 0)
+		skb = skb_dequeue(&dev->rx_recycle);
+	else
+		skb = __skb_dequeue(&dev->rx_recycle);
 	if (skb)
 		return skb;
 	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
diff --git a/net/core/dev.c b/net/core/dev.c
index db9acd5..5dbe356 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5405,6 +5405,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	netdev_init_queues(dev);
 
 	skb_queue_head_init(&dev->rx_recycle);
+	atomic_set(&dev->emerg_skb_users, 0);
 	INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
 	dev->ethtool_ntuple_list.count = 0;
 	INIT_LIST_HEAD(&dev->napi_list);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9e094fc..374d353 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -428,10 +428,25 @@ void __kfree_skb(struct sk_buff *skb)
 	struct net_device *ndev = skb->emerg_dev;
 
 	if (ndev) {
-		net_recycle_add(ndev, skb);
+		int emerg_en;
+		unsigned long flags;
+		int can_recycle;
+
+		skb->emerg_dev = NULL;
+		can_recycle = recycle_possible(ndev, skb);
+		spin_lock_irqsave(&ndev->rx_recycle.lock, flags);
+		emerg_en = atomic_read(&ndev->emerg_skb_users);
+		if (!emerg_en || !can_recycle) {
+			spin_unlock_irqrestore(&ndev->rx_recycle.lock, flags);
+			dev_put(ndev);
+			goto free_it;
+		}
+		__skb_queue_head(&ndev->rx_recycle, skb);
+		spin_unlock_irqrestore(&ndev->rx_recycle.lock, flags);
+		dev_put(ndev);
 		return;
 	}
-
+free_it:
 	skb_release_all(skb);
 	kfree_skbmem(skb);
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 33aa1a5..409d069 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -424,6 +424,10 @@ static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
 	if (optlen < 0)
 		goto out;
 
+	ret = -EBUSY;
+	if (sk->emerg_en)
+		goto out;
+
 	/* Bind this socket to a particular device like "eth0",
 	 * as specified in the passed interface name. If the
 	 * name is "" or the option length is zero the socket
@@ -497,10 +501,6 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 	struct net *net = sock_net(sk);
 	struct net_device *dev;
 
-	if (!val) {
-		sk->emerg_en = 0;
-		return 0;
-	}
 	if (sk->emerg_en && val)
 		return -EBUSY;
 	if (!capable(CAP_NET_ADMIN))
@@ -510,10 +510,45 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
 	if (!dev)
 		return -ENODEV;
+	if (!val) {
+		ret = 0;
+		if (!sk->emerg_en)
+			goto out;
+		/*
+		 * new skbs for this socket won't be taken from the emergency
+		 * pool anymore.
+		 */
+		sk->emerg_en = 0;
+		smp_rmb();
+		/* get rid of anyone who got into the critical section before
+		 * the flag was changed and is still there
+		 */
+		spin_lock_irq(&dev->rx_recycle.lock);
+		/*
+		 * if we fall down to 0 users, kfree will no longer add
+		 * packets to the pool but simply free them. Also the recycle
+		 * code which takes tx packets and adds them to the pool will
+		 * start working lockless since there are no further user
+		 * except the nic driver.
+		 */
+		atomic_dec(&dev->emerg_skb_users);
+		spin_unlock_irq(&dev->rx_recycle.lock);
+		goto out;
+	}
 	ret = -ENODEV;
+
 	if (!dev->rx_rec_skb_size)
 		goto out;
 
+	if (!dev->netdev_ops->ndo_emerg_reload)
+		goto out;
+
+	rtnl_lock();
+	ret = atomic_add_return(1, &dev->emerg_skb_users);
+	if (ret == 1 && (dev->flags & IFF_UP))
+		dev->netdev_ops->ndo_emerg_reload(dev);
+	rtnl_unlock();
+
 	do {
 		struct sk_buff *skb;
 
@@ -532,6 +567,8 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 
 	if (!ret)
 		sk->emerg_en = 1;
+	else
+		atomic_dec(&dev->emerg_skb_users);
 out:
 	dev_put(dev);
 	return ret;
@@ -1223,6 +1260,8 @@ EXPORT_SYMBOL(sk_alloc);
 static void __sk_free(struct sock *sk)
 {
 	struct sk_filter *filter;
+	struct net_device *dev;
+	struct net *net = sock_net(sk);
 
 	if (sk->sk_destruct)
 		sk->sk_destruct(sk);
@@ -1244,6 +1283,18 @@ static void __sk_free(struct sock *sk)
 	if (sk->sk_peer_cred)
 		put_cred(sk->sk_peer_cred);
 	put_pid(sk->sk_peer_pid);
+
+	if (sk->emerg_en) {
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (dev) {
+			spin_lock_irq(&dev->rx_recycle.lock);
+			atomic_dec(&dev->emerg_skb_users);
+			spin_unlock_irq(&dev->rx_recycle.lock);
+			WARN_ON(atomic_read(&dev->emerg_skb_users) < 0);
+			dev_put(dev);
+		}
+	}
+
 	put_net(sock_net(sk));
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
@@ -1566,6 +1617,7 @@ static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
 {
 	struct net *net = sock_net(sk);
 	struct net_device *dev;
+	unsigned long flags;
 	int err;
 	struct sk_buff *skb;
 
@@ -1580,18 +1632,33 @@ static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
 		dev_put(dev);
 		return ERR_PTR(err);
 	}
-	skb = skb_dequeue(&dev->rx_recycle);
-	if (!skb) {
-		dev_put(dev);
-		err = -ENOBUFS;
-		return ERR_PTR(err);
+
+	spin_lock_irqsave(&dev->rx_recycle.lock, flags);
+	if (sk->emerg_en) {
+		skb = __skb_dequeue(&dev->rx_recycle);
+		spin_unlock_irqrestore(&dev->rx_recycle.lock, flags);
+		if (!skb) {
+			dev_put(dev);
+			err = -ENOBUFS;
+			return ERR_PTR(err);
+		}
+		/* remove earlier skb_reserve() */
+		skb_reserve(skb, - skb_headroom(skb));
+		skb->emerg_dev = dev;
+		/*
+		 * dev will be put once the skb is back from
+		 * its journey.
+		 */
+		return skb;
 	}
 	/*
-	 * dev will be put once the skb is back from
-	 * its journey.
+	 * We got called but the emergency pools are not activated. This might
+	 * happen if the pools got deactivated between checkecing the emerg_en
+	 * flag and taking the lock.
 	 */
-	skb->emerg_dev = dev;
-	return skb;
+	spin_unlock_irqrestore(&dev->rx_recycle.lock, flags);
+	dev_put(dev);
+	return NULL;
 }
 
 /*
@@ -1627,8 +1694,12 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				if (IS_ERR(skb)) {
 					err = PTR_ERR(skb);
 					goto failure;
-				}
-				break;
+
+				} else if (skb)
+					break;
+				/*
+				 * else skb is null because emerg_en is not set
+				 */
 			}
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 6/8] net: implement emergency pools
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This patch implements emergency pools which are bound to a specific
network device. They can be activated via the socket interface and used
for a specific socket.
The pools are built on top of rx-recycling. The socket interface allows
to set the number of skbs in the pool and to active the pool.
The size of the skb which are accepted / added to the pool can not be
changed. It is set by the network driver and get altered on MTU change.
This requires to drop the current pool and re-allocate it. If the driver
does not set the skb size, the emergency pools can not be used.
Once the emergency pools are activated all rx-skbs allocation by the
network driver are taken from the pool. tx-skbs are allocated from the
emergency pool only for the relevant socket, i.e. that one which
activated the emergency mode.
Since the socket _and_ the driver can add/remove skbs to/from the pool
the list operations are using now skb_queue_head() and skb_dequeue().
There is patch later in the series which tries to bring the old unlock
behavior back if the emergency pools are not used by the user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/alpha/include/asm/socket.h   |    4 +
 arch/arm/include/asm/socket.h     |    3 +
 arch/avr32/include/asm/socket.h   |    3 +
 arch/cris/include/asm/socket.h    |    5 +-
 arch/frv/include/asm/socket.h     |    4 +-
 arch/h8300/include/asm/socket.h   |    3 +
 arch/ia64/include/asm/socket.h    |    3 +
 arch/m32r/include/asm/socket.h    |    3 +
 arch/m68k/include/asm/socket.h    |    3 +
 arch/mips/include/asm/socket.h    |    3 +
 arch/mn10300/include/asm/socket.h |    3 +
 arch/parisc/include/asm/socket.h  |    3 +
 arch/powerpc/include/asm/socket.h |    3 +
 arch/s390/include/asm/socket.h    |    3 +
 arch/sparc/include/asm/socket.h   |    3 +
 arch/xtensa/include/asm/socket.h  |    3 +
 include/asm-generic/socket.h      |    4 +
 include/linux/netdevice.h         |   52 +++++++------
 include/linux/skbuff.h            |    1 +
 include/net/sock.h                |    2 +
 net/core/skbuff.c                 |    8 ++
 net/core/sock.c                   |  142 +++++++++++++++++++++++++++++++++++++
 22 files changed, 234 insertions(+), 27 deletions(-)

diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index 06edfef..ea49db3 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -69,6 +69,10 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
+
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h
index 90ffd04..b827010 100644
--- a/arch/arm/include/asm/socket.h
+++ b/arch/arm/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h
index c8d1fae..64a7d45 100644
--- a/arch/avr32/include/asm/socket.h
+++ b/arch/avr32/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h
index 1a4a619..9b8e7ed 100644
--- a/arch/cris/include/asm/socket.h
+++ b/arch/cris/include/asm/socket.h
@@ -64,6 +64,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
-
diff --git a/arch/frv/include/asm/socket.h b/arch/frv/include/asm/socket.h
index a6b2688..15a262f 100644
--- a/arch/frv/include/asm/socket.h
+++ b/arch/frv/include/asm/socket.h
@@ -62,5 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index 04c0f45..d46d64e 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h
index 51427ea..04983aa 100644
--- a/arch/ia64/include/asm/socket.h
+++ b/arch/ia64/include/asm/socket.h
@@ -71,4 +71,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/asm/socket.h b/arch/m32r/include/asm/socket.h
index 469787c..a0e5431 100644
--- a/arch/m32r/include/asm/socket.h
+++ b/arch/m32r/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h
index 9bf49c8..7018ceb 100644
--- a/arch/m68k/include/asm/socket.h
+++ b/arch/m68k/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h
index 9de5190..9f9d93a 100644
--- a/arch/mips/include/asm/socket.h
+++ b/arch/mips/include/asm/socket.h
@@ -82,6 +82,9 @@ To add: #define SO_REUSEPORT 0x0200	/* Allow local address and port reuse.  */
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #ifdef __KERNEL__
 
 /** sock_type - Socket types
diff --git a/arch/mn10300/include/asm/socket.h b/arch/mn10300/include/asm/socket.h
index 4e60c42..70476eb 100644
--- a/arch/mn10300/include/asm/socket.h
+++ b/arch/mn10300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h
index 225b7d6..a4706d0 100644
--- a/arch/parisc/include/asm/socket.h
+++ b/arch/parisc/include/asm/socket.h
@@ -61,6 +61,9 @@
 
 #define SO_RXQ_OVFL             0x4021
 
+#define SO_EPOOL_QLEN		0x4022
+#define SO_EPOOL_SIZE		0x4023
+#define SO_EPOOL_MODE		0x4024
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h
index 866f760..dce10f9 100644
--- a/arch/powerpc/include/asm/socket.h
+++ b/arch/powerpc/include/asm/socket.h
@@ -69,4 +69,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h
index fdff1e9..73d0117 100644
--- a/arch/s390/include/asm/socket.h
+++ b/arch/s390/include/asm/socket.h
@@ -70,4 +70,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h
index 9d3fefc..39eea91 100644
--- a/arch/sparc/include/asm/socket.h
+++ b/arch/sparc/include/asm/socket.h
@@ -58,6 +58,9 @@
 
 #define SO_RXQ_OVFL             0x0024
 
+#define SO_EPOOL_QLEN           0x0025
+#define SO_EPOOL_SIZE           0x0026
+#define SO_EPOOL_MODE           0x0027
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h
index cbdf2ff..161a2e5 100644
--- a/arch/xtensa/include/asm/socket.h
+++ b/arch/xtensa/include/asm/socket.h
@@ -73,4 +73,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 9a6115e..fa9ccbb 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -64,4 +64,8 @@
 #define SO_DOMAIN		39
 
 #define SO_RXQ_OVFL             40
+
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4fa400b..fa7e951 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1095,6 +1095,28 @@ struct net_device {
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+/**
+ *	dev_put - release reference to device
+ *	@dev: network device
+ *
+ * Release reference to device to allow it to be freed.
+ */
+static inline void dev_put(struct net_device *dev)
+{
+	atomic_dec(&dev->refcnt);
+}
+
+/**
+ *	dev_hold - get reference to device
+ *	@dev: network device
+ *
+ * Hold reference to device to keep it from being freed.
+ */
+static inline void dev_hold(struct net_device *dev)
+{
+	atomic_inc(&dev->refcnt);
+}
+
 static inline void net_recycle_init(struct net_device *dev, u32 qlen, u32 size)
 {
 	dev->rx_rec_skbs_max = qlen;
@@ -1118,9 +1140,13 @@ static inline void net_recycle_cleanup(struct net_device *dev)
 
 static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
 {
+	if (skb->emerg_dev) {
+		dev_put(skb->emerg_dev);
+		skb->emerg_dev = NULL;
+	}
 	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
 			skb_recycle_check(skb, dev->rx_rec_skb_size))
-		__skb_queue_head(&dev->rx_recycle, skb);
+		skb_queue_head(&dev->rx_recycle, skb);
 	else
 		dev_kfree_skb_any(skb);
 }
@@ -1129,7 +1155,7 @@ static inline struct sk_buff *net_recycle_get(struct net_device *dev)
 {
 	struct sk_buff *skb;
 
-	skb = __skb_dequeue(&dev->rx_recycle);
+	skb = skb_dequeue(&dev->rx_recycle);
 	if (skb)
 		return skb;
 	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
@@ -1783,28 +1809,6 @@ extern int		netdev_budget;
 /* Called by rtnetlink.c:rtnl_unlock() */
 extern void netdev_run_todo(void);
 
-/**
- *	dev_put - release reference to device
- *	@dev: network device
- *
- * Release reference to device to allow it to be freed.
- */
-static inline void dev_put(struct net_device *dev)
-{
-	atomic_dec(&dev->refcnt);
-}
-
-/**
- *	dev_hold - get reference to device
- *	@dev: network device
- *
- * Hold reference to device to keep it from being freed.
- */
-static inline void dev_hold(struct net_device *dev)
-{
-	atomic_inc(&dev->refcnt);
-}
-
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
  * and _off may be called from IRQ context, but it is caller
  * who is responsible for serialization of these calls.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac74ee0..caee62c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -319,6 +319,7 @@ struct sk_buff {
 
 	struct sock		*sk;
 	struct net_device	*dev;
+	struct net_device	*emerg_dev;
 
 	/*
 	 * This is the control buffer. It is free to use for every
diff --git a/include/net/sock.h b/include/net/sock.h
index 4f26f2f..3f3518a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -314,6 +314,8 @@ struct sock {
 #endif
 	__u32			sk_mark;
 	u32			sk_classid;
+	u32			emerg_en;
+	/* XXX 4 bytes hole on 64 bit */
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 34432b4..f02737d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -425,6 +425,13 @@ static void skb_release_all(struct sk_buff *skb)
 
 void __kfree_skb(struct sk_buff *skb)
 {
+	struct net_device *ndev = skb->emerg_dev;
+
+	if (ndev) {
+		net_recycle_add(ndev, skb);
+		return;
+	}
+
 	skb_release_all(skb);
 	kfree_skbmem(skb);
 }
@@ -563,6 +570,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 {
 #define C(x) n->x = skb->x
 
+	n->emerg_dev = NULL;
 	n->next = n->prev = NULL;
 	n->sk = NULL;
 	__copy_skb_header(n, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index fef2434..33aa1a5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -472,6 +472,71 @@ static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
 		sock_reset_flag(sk, bit);
 }
 
+static int sock_epool_set_qlen(struct sock *sk, int val)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+
+	net_recycle_qlen(dev, val);
+	dev_put(dev);
+	return 0;
+}
+
+static int sock_epool_set_mode(struct sock *sk, int val)
+{
+	int ret;
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!val) {
+		sk->emerg_en = 0;
+		return 0;
+	}
+	if (sk->emerg_en && val)
+		return -EBUSY;
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+	ret = -ENODEV;
+	if (!dev->rx_rec_skb_size)
+		goto out;
+
+	do {
+		struct sk_buff *skb;
+
+		if (skb_queue_len(&dev->rx_recycle) >= dev->rx_rec_skbs_max) {
+			ret = 0;
+			break;
+		}
+
+		skb = __netdev_alloc_skb(dev, dev->rx_rec_skb_size, GFP_KERNEL);
+		if (!skb) {
+			ret = -ENOMEM;
+			break;
+		}
+		net_recycle_add(dev, skb);
+	} while (1);
+
+	if (!ret)
+		sk->emerg_en = 1;
+out:
+	dev_put(dev);
+	return ret;
+}
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.
@@ -740,6 +805,15 @@ set_rcvbuf:
 		else
 			sock_reset_flag(sk, SOCK_RXQ_OVFL);
 		break;
+	case SO_EPOOL_QLEN:
+		ret = sock_epool_set_qlen(sk, val);
+		break;
+	case SO_EPOOL_SIZE:
+		ret = -EINVAL;
+		break;
+	case SO_EPOOL_MODE:
+		ret = sock_epool_set_mode(sk, valbool);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -961,6 +1035,35 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = !!sock_flag(sk, SOCK_RXQ_OVFL);
 		break;
 
+	case SO_EPOOL_QLEN:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skbs_max;
+		break;
+	}
+	case SO_EPOOL_SIZE:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skb_size;
+		break;
+	}
+	case SO_EPOOL_MODE:
+		v.val = sk->emerg_en;
+		break;
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -1459,6 +1562,37 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 	return timeo;
 }
 
+static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+	int err;
+	struct sk_buff *skb;
+
+	err = -ENODEV;
+	if (!sk->sk_bound_dev_if)
+		return ERR_PTR(err);
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return ERR_PTR(err);
+	err = -EINVAL;
+	if (dev->rx_rec_skb_size < skb_len) {
+		dev_put(dev);
+		return ERR_PTR(err);
+	}
+	skb = skb_dequeue(&dev->rx_recycle);
+	if (!skb) {
+		dev_put(dev);
+		err = -ENOBUFS;
+		return ERR_PTR(err);
+	}
+	/*
+	 * dev will be put once the skb is back from
+	 * its journey.
+	 */
+	skb->emerg_dev = dev;
+	return skb;
+}
 
 /*
  *	Generic send/receive buffer handlers
@@ -1488,6 +1622,14 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 			goto failure;
 
 		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
+			if (sk->emerg_en) {
+				skb = alloc_emerg_skb(sk, header_len + data_len);
+				if (IS_ERR(skb)) {
+					err = PTR_ERR(skb);
+					goto failure;
+				}
+				break;
+			}
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
 				int npages;
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 7/8] net/emergency_skb: create a deep copy on clone
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

skb_clone() creates a clone of the skb: a new head is allocated from the
slab cache and the reference counter for the data part is incremented.
For the skbs from the emergency pool, we don't really want to clone
them that way:
- talking to slab may lead to lock contention which in turn increases
  the latency.
- the original (with the data part) may return earlier to the pool than
  the clone. In that case we would "lose" the skb from the emergency
  pool.

Instead we do a copy of head and data into a skb from the emergency
pool.
This patch cuts pskb_copy() into a helper function which does
the bare work and the remaining pskb_copy() allocates a new skb and
calls it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/skbuff.c |   80 +++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f02737d..9e094fc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -613,6 +613,7 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
 }
 EXPORT_SYMBOL_GPL(skb_morph);
 
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n);
 /**
  *	skb_clone	-	duplicate an sk_buff
  *	@skb: buffer to clone
@@ -631,6 +632,20 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
+	if (skb->emerg_dev) {
+		n = skb_dequeue(&skb->emerg_dev->rx_recycle);
+		if (!n)
+			goto norm_clone;
+		/* remove earlier reservers */
+		skb_reserve(n, - skb_headroom(n));
+		if (!__pskb_copy(skb, n)) {
+			n->emerg_dev = skb->emerg_dev;
+			dev_hold(skb->emerg_dev);
+			return n;
+		}
+		net_recycle_add(skb->emerg_dev, n);
+	}
+norm_clone:
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
 	    n->fclone == SKB_FCLONE_UNAVAILABLE) {
@@ -720,31 +735,22 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
 EXPORT_SYMBOL(skb_copy);
 
 /**
- *	pskb_copy	-	create copy of an sk_buff with private head.
- *	@skb: buffer to copy
- *	@gfp_mask: allocation priority
+ *      __pskb_copy     -       create copy of an sk_buff with private head.
+ *      @skb: buffer to copy
+ *      @n: skb to copy it
  *
- *	Make a copy of both an &sk_buff and part of its data, located
- *	in header. Fragmented data remain shared. This is used when
- *	the caller wishes to modify only header of &sk_buff and needs
- *	private copy of the header to alter. Returns %NULL on failure
- *	or the pointer to the buffer on success.
- *	The returned buffer has a reference count of 1.
+ *      This functions behaves like pskb_copy() except that it takes
+ *      an allready allocated skb where it will copy head and data.
+ *      The returned buffer has a reference count of 1.
  */
-
-struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n)
 {
-	/*
-	 *	Allocate the copy buffer
-	 */
-	struct sk_buff *n;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
-	n = alloc_skb(skb->end, gfp_mask);
+	if (skb->end > n->end)
 #else
-	n = alloc_skb(skb->end - skb->head, gfp_mask);
+	if ((skb->end - skb->head) > (n->end - n->head))
 #endif
-	if (!n)
-		goto out;
+		return -EMSGSIZE;
 
 	/* Set the data pointer */
 	skb_reserve(n, skb->data - skb->head);
@@ -773,8 +779,40 @@ struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
 	}
 
 	copy_skb_header(n, skb);
-out:
-	return n;
+	return 0;
+}
+
+/**
+ *	pskb_copy	-	create copy of an sk_buff with private head.
+ *	@skb: buffer to copy
+ *	@gfp_mask: allocation priority
+ *
+ *	Make a copy of both an &sk_buff and part of its data, located
+ *	in header. Fragmented data remain shared. This is used when
+ *	the caller wishes to modify only header of &sk_buff and needs
+ *	private copy of the header to alter. Returns %NULL on failure
+ *	or the pointer to the buffer on success.
+ *	The returned buffer has a reference count of 1.
+ */
+
+struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+{
+	/*
+	 *	Allocate the copy buffer
+	 */
+	struct sk_buff *n;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+	n = alloc_skb(skb->end, gfp_mask);
+#else
+	n = alloc_skb(skb->end - skb->head, gfp_mask);
+#endif
+	if (!n)
+		return NULL;
+	if (!__pskb_copy(skb, n))
+		return n;
+	kfree_skb(n);
+	return NULL;
+
 }
 EXPORT_SYMBOL(pskb_copy);
 
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 4/8] net/stmmac: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/stmmac/stmmac.h      |    1 -
 drivers/net/stmmac/stmmac_main.c |   26 +++++++-------------------
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index ebebc64..dbf9f95 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -44,7 +44,6 @@ struct stmmac_priv {
 	unsigned int dirty_rx;
 	struct sk_buff **rx_skbuff;
 	dma_addr_t *rx_skbuff_dma;
-	struct sk_buff_head rx_recycle;
 
 	struct net_device *dev;
 	int is_gmac;
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index a31d580..722a5e6 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -636,18 +636,7 @@ static void stmmac_tx(struct stmmac_priv *priv)
 			p->des3 = 0;
 
 		if (likely(skb != NULL)) {
-			/*
-			 * If there's room in the queue (limit it to size)
-			 * we add this skb back into the pool,
-			 * if it's the right size.
-			 */
-			if ((skb_queue_len(&priv->rx_recycle) <
-				priv->dma_rx_size) &&
-				skb_recycle_check(skb, priv->dma_buf_sz))
-				__skb_queue_head(&priv->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-
+			net_recycle_add(priv->dev, skb);
 			priv->tx_skbuff[entry] = NULL;
 		}
 
@@ -843,6 +832,9 @@ static int stmmac_open(struct net_device *dev)
 	priv->dma_buf_sz = STMMAC_ALIGN(buf_sz);
 	init_dma_desc_rings(dev);
 
+	net_recycle_init(priv->dev, priv->dma_rx_size, priv->dma_buf_sz +
+			NET_IP_ALIGN);
+
 	/* DMA initialization and SW reset */
 	if (unlikely(priv->hw->dma->init(ioaddr, priv->pbl, priv->dma_tx_phy,
 					 priv->dma_rx_phy) < 0)) {
@@ -894,7 +886,6 @@ static int stmmac_open(struct net_device *dev)
 		phy_start(priv->phydev);
 
 	napi_enable(&priv->napi);
-	skb_queue_head_init(&priv->rx_recycle);
 	netif_start_queue(dev);
 	return 0;
 }
@@ -925,7 +916,7 @@ static int stmmac_release(struct net_device *dev)
 		kfree(priv->tm);
 #endif
 	napi_disable(&priv->napi);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->dev);
 
 	/* Free the IRQ lines */
 	free_irq(dev->irq, dev);
@@ -1157,13 +1148,10 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv)
 		if (likely(priv->rx_skbuff[entry] == NULL)) {
 			struct sk_buff *skb;
 
-			skb = __skb_dequeue(&priv->rx_recycle);
-			if (skb == NULL)
-				skb = netdev_alloc_skb_ip_align(priv->dev,
-								bfsize);
-
+			skb = net_recycle_get(priv->dev);
 			if (unlikely(skb == NULL))
 				break;
+			skb_reserve(skb, NET_IP_ALIGN);
 
 			priv->rx_skbuff[entry] = skb;
 			priv->rx_skbuff_dma[entry] =
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 5/8] net/ucc_geth: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ucc_geth.c |   30 ++++++++----------------------
 drivers/net/ucc_geth.h |    2 --
 2 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index dc32a62..9d6097b 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -210,10 +210,7 @@ static struct sk_buff *get_new_skb(struct ucc_geth_private *ugeth,
 {
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&ugeth->rx_recycle);
-	if (!skb)
-		skb = dev_alloc_skb(ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+	skb = net_recycle_get(ugeth->ndev);
 	if (skb == NULL)
 		return NULL;
 
@@ -1992,8 +1989,6 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 		iounmap(ugeth->ug_regs);
 		ugeth->ug_regs = NULL;
 	}
-
-	skb_queue_purge(&ugeth->rx_recycle);
 }
 
 static void ucc_geth_set_multi(struct net_device *dev)
@@ -2069,6 +2064,7 @@ static void ucc_geth_stop(struct ucc_geth_private *ugeth)
 	ugeth->phydev = NULL;
 
 	ucc_geth_memclean(ugeth);
+	net_recycle_cleanup(ugeth->ndev);
 }
 
 static int ucc_struct_init(struct ucc_geth_private *ugeth)
@@ -2205,9 +2201,6 @@ static int ucc_struct_init(struct ucc_geth_private *ugeth)
 			ugeth_err("%s: Failed to ioremap regs.", __func__);
 		return -ENOMEM;
 	}
-
-	skb_queue_head_init(&ugeth->rx_recycle);
-
 	return 0;
 }
 
@@ -3213,12 +3206,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 			if (netif_msg_rx_err(ugeth))
 				ugeth_err("%s, %d: ERROR!!! skb - 0x%08x",
 					   __func__, __LINE__, (u32) skb);
-			if (skb) {
-				skb->data = skb->head + NET_SKB_PAD;
-				skb->len = 0;
-				skb_reset_tail_pointer(skb);
-				__skb_queue_head(&ugeth->rx_recycle, skb);
-			}
+			if (skb)
+				net_recycle_add(dev, skb);
 
 			ugeth->rx_skbuff[rxQ][ugeth->skb_currx[rxQ]] = NULL;
 			dev->stats.rx_dropped++;
@@ -3288,13 +3277,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
 
 		dev->stats.tx_packets++;
 
-		if (skb_queue_len(&ugeth->rx_recycle) < RX_BD_RING_LEN &&
-			     skb_recycle_check(skb,
-				    ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT))
-			__skb_queue_head(&ugeth->rx_recycle, skb);
-		else
-			dev_kfree_skb(skb);
+		net_recycle_add(dev, skb);
 
 		ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]] = NULL;
 		ugeth->skb_dirtytx[txQ] =
@@ -3929,6 +3912,9 @@ static int ucc_geth_probe(struct of_device* ofdev, const struct of_device_id *ma
 	netif_napi_add(dev, &ugeth->napi, ucc_geth_poll, 64);
 	dev->mtu = 1500;
 
+	net_recycle_init(dev, RX_BD_RING_LEN, ug_info->uf_info.max_rx_buf_length
+			+ UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+
 	ugeth->msg_enable = netif_msg_init(debug.msg_enable, UGETH_MSG_DEFAULT);
 	ugeth->phy_interface = phy_interface;
 	ugeth->max_speed = max_speed;
diff --git a/drivers/net/ucc_geth.h b/drivers/net/ucc_geth.h
index 05a9558..07c0816 100644
--- a/drivers/net/ucc_geth.h
+++ b/drivers/net/ucc_geth.h
@@ -1213,8 +1213,6 @@ struct ucc_geth_private {
 	/* index of the first skb which hasn't been transmitted yet. */
 	u16 skb_dirtytx[NUM_TX_QUEUES];
 
-	struct sk_buff_head rx_recycle;
-
 	struct ugeth_mii_info *mii_info;
 	struct phy_device *phydev;
 	phy_interface_t phy_interface;
-- 
1.6.6.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox