Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [stable] [stable-2.6.32 PATCH] ixgbe: backport bug fix for tx panic
From: Greg KH @ 2010-07-03  2:28 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Brandeburg, Jesse, linux-kernel@vger.kernel.org,
	stable@kernel.org, Brandon, netdev@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <AANLkTimr7-8TIjuCjIbqrJSVH4VI-Pjp8plpjTubuTfE@mail.gmail.com>

On Fri, Jul 02, 2010 at 02:37:13PM -0700, Jeff Kirsher wrote:
> On Tue, May 25, 2010 at 13:18, Greg KH <greg@kroah.com> wrote:
> > On Tue, May 25, 2010 at 09:27:25AM -0700, Brandeburg, Jesse wrote:
> >>
> >>
> >> On Tue, 25 May 2010, Jeff Kirsher wrote:
> >>
> >> > On Mon, May 10, 2010 at 17:46, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> >> > > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> > >
> >> > > backporting this commit:
> >> > >
> >> > > commit fdd3d631cddad20ad9d3e1eb7dbf26825a8a121f
> >> > > Author: Krishna Kumar <krkumar2@in.ibm.com>
> >> > > Date:   Wed Feb 3 13:13:10 2010 +0000
> >> > >
> >> > >    ixgbe: Fix return of invalid txq
> >> > >
> >> > >    a developer had complained of getting lots of warnings:
> >> > >
> >> > >    "eth16 selects TX queue 98, but real number of TX queues is 64"
> >> > >
> >> > >    http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02200.html
> >> > >
> >> > >    As there was no follow up on that bug, I am submitting this
> >> > >    patch assuming that the other return points will not return
> >> > >    invalid txq's, and also that this fixes the bug (not tested).
> >> > >
> >> > >    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> >> > >    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> > >    Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> >> > >    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> >> > >    Signed-off-by: David S. Miller <davem@davemloft.net>
> >> > >
> >> > > CC: Brandon <brandon@ifup.org>
> >> > > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> > > ---
> >> > >
> >> > >  drivers/net/ixgbe/ixgbe_main.c |    8 ++++++--
> >> > >  1 files changed, 6 insertions(+), 2 deletions(-)
> >> > >
> >> >
> >> > Greg - status?  Did you queue this patch for the stable release and I missed it?
> >>
> >> Maybe we didn't say (and we should have) that this fixes a panic on
> >> machines with > 64 cores.  Please apply to -stable 32.
> >
> > I'll get to it for the next release after this one, if that's ok.
> >
> > thanks,
> >
> > greg k-h
> > --
> 
> I did not see this patch in the list of patches for the next release
> of the stable kernel.  Just want to make sure this patch makes it this
> time... :)

Ick, I missed it, let me go queue it up right now, sorry about that.

greg k-h

^ permalink raw reply

* Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
From: Jesse Barnes @ 2010-07-02 23:16 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Michael Chan, Matthew Wilcox, linux-pci, netdev
In-Reply-To: <1276802196.2083.12.camel@achroite.uk.solarflarecom.com>

On Thu, 17 Jun 2010 20:16:36 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> During suspend on an SMP system, {read,write}_msi_msg_desc() may be
> called to mask and unmask interrupts on a device that is already in a
> reduced power state.  At this point memory-mapped registers including
> MSI-X tables are not accessible, and config space may not be fully
> functional either.
> 
> While a device is in a reduced power state its interrupts are
> effectively masked and its MSI(-X) state will be restored when it is
> brought back to D0.  Therefore these functions can simply read and
> write msi_desc::msg for devices not in D0.
> 
> Further, read_msi_msg_desc() should only ever be used to update a
> previously written message, so it can always read msi_desc::msg
> and never needs to touch the hardware.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied to my linux-next branch, thanks.

Matthew, let me know if you have an issue with this.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply

* Re: [PATCH net-2.6 2/2] usbnet: Set parent device early for netdev_printk()
From: Ben Hutchings @ 2010-07-02 22:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Luís Picciochi Oliveira, Joe Perches
In-Reply-To: <1278110450.4878.75.camel@localhost>

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

On Fri, 2010-07-02 at 23:40 +0100, Ben Hutchings wrote:
> netdev_printk() follows the net_device's parent device pointer, so
> we must set that earlier than we previously did.
> 
> Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[...]

This should also go into a stable update for 2.6.34.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [PATCH net-2.6 2/2] usbnet: Set parent device early for netdev_printk()
From: Ben Hutchings @ 2010-07-02 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Luís Picciochi Oliveira, Joe Perches
In-Reply-To: <1278110361.4878.73.camel@localhost>

netdev_printk() follows the net_device's parent device pointer, so
we must set that earlier than we previously did.

Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/usb/usbnet.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a95c73d..81c76ad 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1293,6 +1293,9 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 		goto out;
 	}
 
+	/* netdev_printk() needs this so do it as early as possible */
+	SET_NETDEV_DEV(net, &udev->dev);
+
 	dev = netdev_priv(net);
 	dev->udev = xdev;
 	dev->intf = udev;
@@ -1377,8 +1380,6 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 		dev->rx_urb_size = dev->hard_mtu;
 	dev->maxpacket = usb_maxpacket (dev->udev, dev->out, 1);
 
-	SET_NETDEV_DEV(net, &udev->dev);
-
 	if ((dev->driver_info->flags & FLAG_WLAN) != 0)
 		SET_NETDEV_DEVTYPE(net, &wlan_type);
 	if ((dev->driver_info->flags & FLAG_WWAN) != 0)
-- 
1.7.1



^ permalink raw reply related

* [PATCH net-2.6 1/2] Revert "rndis_host: Poll status channel before control channel"
From: Ben Hutchings @ 2010-07-02 22:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Luís Picciochi Oliveira

This reverts commit c17b274dc2aa538b68c1f02b01a3c4e124b435ba.

That change was reported to break rndis_wlan support for the WUSB54GS.

Reported-by: Luís Picciochi Oliveira <pitxyoki@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/usb/rndis_host.c |   18 ++++++------------
 1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/net/usb/rndis_host.c b/drivers/net/usb/rndis_host.c
index 28d3ee1..dd8a4ad 100644
--- a/drivers/net/usb/rndis_host.c
+++ b/drivers/net/usb/rndis_host.c
@@ -104,10 +104,8 @@ static void rndis_msg_indicate(struct usbnet *dev, struct rndis_indicate *msg,
 int rndis_command(struct usbnet *dev, struct rndis_msg_hdr *buf, int buflen)
 {
 	struct cdc_state	*info = (void *) &dev->data;
-	struct usb_cdc_notification notification;
 	int			master_ifnum;
 	int			retval;
-	int			partial;
 	unsigned		count;
 	__le32			rsp;
 	u32			xid = 0, msg_len, request_id;
@@ -135,17 +133,13 @@ int rndis_command(struct usbnet *dev, struct rndis_msg_hdr *buf, int buflen)
 	if (unlikely(retval < 0 || xid == 0))
 		return retval;
 
-	/* Some devices don't respond on the control channel until
-	 * polled on the status channel, so do that first. */
-	retval = usb_interrupt_msg(
-		dev->udev,
-		usb_rcvintpipe(dev->udev, dev->status->desc.bEndpointAddress),
-		&notification, sizeof(notification), &partial,
-		RNDIS_CONTROL_TIMEOUT_MS);
-	if (unlikely(retval < 0))
-		return retval;
+	// FIXME Seems like some devices discard responses when
+	// we time out and cancel our "get response" requests...
+	// so, this is fragile.  Probably need to poll for status.
 
-	/* Poll the control channel; the request probably completed immediately */
+	/* ignore status endpoint, just poll the control channel;
+	 * the request probably completed immediately
+	 */
 	rsp = buf->msg_type | RNDIS_MSG_COMPLETION;
 	for (count = 0; count < 10; count++) {
 		memset(buf, 0, CONTROL_BUFFER_SIZE);
-- 
1.7.1




^ permalink raw reply related

* Re: bnx2/5709: Strange interrupts spread
From: Christophe Ngo Van Duc @ 2010-07-02 22:12 UTC (permalink / raw)
  To: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1278104311.11828.12.camel@HP1>

Hi

Well that's the strange thing: it is IP traffic. The only difference
with eth0 and eth1 is that eth2 and eth3 belongs to a bridge (br0).

Best Regards,
Christophe.

On 7/2/10, Michael Chan <mchan@broadcom.com> wrote:
>
> On Fri, 2010-07-02 at 13:33 -0700, Christophe Ngo Van Duc wrote:
>> On eth2 (external card) all interrupts goes to CPU0
>>
>>
>>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>>   CPU6       CPU7
>>   80:   46973077          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-0
>>   81:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-1
>>   82:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-2
>>   83:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-3
>>   84:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-4
>>   85:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-5
>>   86:          0          0       2445          0         37          0
>>    8463         13   PCI-MSI-edge      eth2-6
>>   87:          0          0          0          0          0          0
>>       0          0   PCI-MSI-edge      eth2-7
>
> Reformatted your output
>
>> If I understand correctly the RSS hash is used to dispatch the packets
>> into the different queues running on the different CPU.
>
> It looks like most interrupts go to eth2-0, a few go to eth2-6.  The rx
> ring for eth2-0 is for non-IP packets.  The RSS hash will hash IP
> packets and place them on eth2-1 to eth2-7.  eth2-0 also handles tx
> interrupts for TX ring 0.  TX traffic is hashed by the stack.
>
> What kind of traffic is passing through eth2?
>
> Thanks.
>
>
>

-- 
Sent from my mobile device

^ permalink raw reply

* Re: e1000e: receives no packets after resume (2.6.35-rc3-00262-g984bc96)
From: Jeff Kirsher @ 2010-07-02 21:47 UTC (permalink / raw)
  To: Nico Schottelius, LKML; +Cc: e1000-devel, netdev
In-Reply-To: <20100702072826.GB16856@schottelius.org>

On Fri, Jul 2, 2010 at 00:28, Nico Schottelius
<nico-linux-20100702@schottelius.org> wrote:
> Good morning hackers,
>
> I've seen that in kernels before, but now suffering from it:
> e1000e does not receive any packets after third resume:
>
> --------------------------------------------------------------------------------
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11589]: version 5.2.5 starting
> dhcpcd[11589]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11589]: eth0: broadcasting for a lease
> ^Cdhcpcd[11589]: received SIGINT, stopping
> dhcpcd[11589]: eth0: removing interface
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11601]: version 5.2.5 starting
> dhcpcd[11601]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11601]: eth0: broadcasting for a lease
> ^Cdhcpcd[11601]: received SIGINT, stopping
> dhcpcd[11601]: eth0: removing interface
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11784]: version 5.2.5 starting
> dhcpcd[11784]: eth0: rebinding lease of 129.132.102.115
> ^Cdhcpcd[11784]: received SIGINT, stopping
> dhcpcd[11784]: eth0: removing interface
> [8:16] kr:clyde# dhcpcd eth0
> dhcpcd[11830]: version 5.2.5 starting
> dhcpcd[11830]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11830]: eth0: broadcasting for a lease
> dhcpcd[11830]: timed out
> [8:17] kr:clyde# rmmod e1000e
> [8:18] kr:clyde# modprobe e1000e
> [8:18] kr:clyde# dhcpcd eth0
> dhcpcd[11855]: version 5.2.5 starting
> dhcpcd[11855]: eth0: waiting for carrier
> ^Cdhcpcd[11855]: received SIGINT, stopping
> dhcpcd[11855]: eth0: removing interface
> [8:18] kr:clyde# dhcpcd eth0
> dhcpcd[11871]: version 5.2.5 starting
> dhcpcd[11871]: eth0: rebinding lease of 129.132.102.115
> dhcpcd[11871]: eth0: acknowledged 129.132.102.115 from 129.132.57.97
> dhcpcd[11871]: eth0: checking for 129.132.102.115
> dhcpcd[11871]: eth0: leased 129.132.102.115 for 86400 seconds
> dhcpcd[11871]: forked to background, child pid 11892
> [8:18] kr:clyde#
> --------------------------------------------------------------------------------
>
> Are you aware of this issues?
>
> Cheers,
>
> Nico
>
> --

Adding the Networking Kernel mailing list (netdev) and Intel Linux LAN
Driver mailing list (e1000-patches)...

Would it be possible to get more information about the system/setup?

For example, can you provide the following information:
     - full lspci output (lspci -vvv)
     - ethtool -i ethX
     - dmesg  - after loading the driver and configuring the interfaces
     - kernel config

-- 
Cheers,
Jeff

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [rft]testers for suspend/resume support for cdc-phonet
From: Oliver Neukum @ 2010-07-02 21:46 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA

Hi,

I am looking for someone who is willing to test a patch for cdc-phonet
implementing suspend/resume.

	Regards
		Oliver

diff --git a/drivers/net/usb/cdc-phonet.c b/drivers/net/usb/cdc-phonet.c
index 109751b..d182835 100644
--- a/drivers/net/usb/cdc-phonet.c
+++ b/drivers/net/usb/cdc-phonet.c
@@ -42,9 +42,12 @@ struct usbpn_dev {
 	unsigned int		tx_pipe, rx_pipe;
 	u8 active_setting;
 	u8 disconnected;
+	u8 suspended;
+	u8 opened;
 
 	unsigned		tx_queue;
 	spinlock_t		tx_lock;
+	struct usb_anchor	tx_anchor;
 
 	spinlock_t		rx_lock;
 	struct sk_buff		*rx_skb;
@@ -73,8 +76,10 @@ static netdev_tx_t usbpn_xmit(struct sk_buff *skb, struct net_device *dev)
 	usb_fill_bulk_urb(req, pnd->usb, pnd->tx_pipe, skb->data, skb->len,
 				tx_complete, skb);
 	req->transfer_flags = URB_ZERO_PACKET;
+	usb_anchor_urb(req, &pnd->tx_anchor);
 	err = usb_submit_urb(req, GFP_ATOMIC);
 	if (err) {
+		usb_unanchor_urb(req);
 		usb_free_urb(req);
 		goto drop;
 	}
@@ -235,6 +240,7 @@ static int usbpn_open(struct net_device *dev)
 		pnd->urbs[i] = req;
 	}
 
+	pnd->opened = 1;
 	netif_wake_queue(dev);
 	return 0;
 }
@@ -256,6 +262,7 @@ static int usbpn_close(struct net_device *dev)
 		usb_free_urb(req);
 		pnd->urbs[i] = NULL;
 	}
+	pnd->opened = 0;
 
 	return usb_set_interface(pnd->usb, num, !pnd->active_setting);
 }
@@ -400,6 +407,7 @@ int usbpn_probe(struct usb_interface *intf, const struct usb_device_id *id)
 	pnd->data_intf = data_intf;
 	spin_lock_init(&pnd->tx_lock);
 	spin_lock_init(&pnd->rx_lock);
+	init_usb_anchor(&pnd->tx_anchor);
 	/* Endpoints */
 	if (usb_pipein(data_desc->endpoint[0].desc.bEndpointAddress)) {
 		pnd->rx_pipe = usb_rcvbulkpipe(usbdev,
@@ -453,10 +461,52 @@ static void usbpn_disconnect(struct usb_interface *intf)
 	usb_put_dev(usb);
 }
 
+static int usbpn_suspend(struct usb_interface *intf, pm_message_t message)
+{
+	struct usbpn_dev *pnd = usb_get_intfdata(intf);
+	int i;
+
+	if (pnd->suspended++)
+		return 0;
+	if (!pnd->opened)
+		return 0;
+	
+	for (i = 0; i < rxq_size; i++) {
+		struct urb *req = pnd->urbs[i];
+		usb_kill_urb(req);
+	}
+
+	usb_kill_anchored_urbs(&pnd->tx_anchor);
+	return 0;
+}
+
+static int usbpn_resume(struct usb_interface *intf)
+{
+	struct usbpn_dev *pnd = usb_get_intfdata(intf);
+	int i, r, rv = 0;
+
+	if (--pnd->suspended)
+		return 0;
+	if (!pnd->opened)
+		return 0;
+
+	for (i = 0; i < rxq_size; i++) {
+		struct urb *req = pnd->urbs[i];
+
+		r = rx_submit(pnd, req, GFP_NOIO);
+		if (r)
+			rv++;
+	}
+
+	return rv ? -EIO : 0;
+}
+
 static struct usb_driver usbpn_driver = {
 	.name =		"cdc_phonet",
 	.probe =	usbpn_probe,
 	.disconnect =	usbpn_disconnect,
+	.suspend =	usbpn_suspend,
+	.resume =	usbpn_resume,
 	.id_table =	usbpn_ids,
 };
 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [stable] [stable-2.6.32 PATCH] ixgbe: backport bug fix for tx panic
From: Jeff Kirsher @ 2010-07-02 21:37 UTC (permalink / raw)
  To: Greg KH
  Cc: Brandeburg, Jesse, linux-kernel@vger.kernel.org,
	stable@kernel.org, Brandon, netdev@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <20100525201837.GA13821@kroah.com>

On Tue, May 25, 2010 at 13:18, Greg KH <greg@kroah.com> wrote:
> On Tue, May 25, 2010 at 09:27:25AM -0700, Brandeburg, Jesse wrote:
>>
>>
>> On Tue, 25 May 2010, Jeff Kirsher wrote:
>>
>> > On Mon, May 10, 2010 at 17:46, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
>> > > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > >
>> > > backporting this commit:
>> > >
>> > > commit fdd3d631cddad20ad9d3e1eb7dbf26825a8a121f
>> > > Author: Krishna Kumar <krkumar2@in.ibm.com>
>> > > Date:   Wed Feb 3 13:13:10 2010 +0000
>> > >
>> > >    ixgbe: Fix return of invalid txq
>> > >
>> > >    a developer had complained of getting lots of warnings:
>> > >
>> > >    "eth16 selects TX queue 98, but real number of TX queues is 64"
>> > >
>> > >    http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02200.html
>> > >
>> > >    As there was no follow up on that bug, I am submitting this
>> > >    patch assuming that the other return points will not return
>> > >    invalid txq's, and also that this fixes the bug (not tested).
>> > >
>> > >    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>> > >    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > >    Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
>> > >    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> > >    Signed-off-by: David S. Miller <davem@davemloft.net>
>> > >
>> > > CC: Brandon <brandon@ifup.org>
>> > > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > > ---
>> > >
>> > >  drivers/net/ixgbe/ixgbe_main.c |    8 ++++++--
>> > >  1 files changed, 6 insertions(+), 2 deletions(-)
>> > >
>> >
>> > Greg - status?  Did you queue this patch for the stable release and I missed it?
>>
>> Maybe we didn't say (and we should have) that this fixes a panic on
>> machines with > 64 cores.  Please apply to -stable 32.
>
> I'll get to it for the next release after this one, if that's ok.
>
> thanks,
>
> greg k-h
> --

I did not see this patch in the list of patches for the next release
of the stable kernel.  Just want to make sure this patch makes it this
time... :)

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Oleg Nesterov @ 2010-07-02 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sridhar Samudrala, Tejun Heo, Michael S. Tsirkin, Ingo Molnar,
	netdev, lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1278094270.1917.288.camel@laptop>

On 07/02, Peter Zijlstra wrote:
>
> On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
> >
> >  Does  it (Tejun's kthread_clone() patch) also  inherit the
> > cgroup of the caller?
>
> Of course, its a simple do_fork() which inherits everything just as you
> would expect from a similar sys_clone()/sys_fork() call.

Yes. And I'm afraid it can inherit more than we want. IIUC, this is called
from ioctl(), right?

Then the new thread becomes the natural child of the caller, and it shares
->mm with the parent. And files, dup_fd() without CLONE_FS.

Signals. Say, if you send SIGKILL to this new thread, it can't sleep in
TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent
just because the parent gets SIGQUIT or abother coredumpable signal.
Or the new thread can recieve SIGSTOP via ^Z.

Perhaps this is OK, I do not know. Just to remind that kernel_thread()
is merely clone(CLONE_VM).

Oleg.

^ permalink raw reply

* Re: bnx2/5709: Strange interrupts spread
From: Michael Chan @ 2010-07-02 20:58 UTC (permalink / raw)
  To: Christophe Ngo Van Duc; +Cc: netdev@vger.kernel.org
In-Reply-To: <AANLkTiniNNPV9ztxXHtX4np7PIZabkm0I4v5O29chf8i@mail.gmail.com>


On Fri, 2010-07-02 at 13:33 -0700, Christophe Ngo Van Duc wrote:
> On eth2 (external card) all interrupts goes to CPU0
>
>
>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
>   80:   46973077          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-0
>   81:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-1
>   82:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-2
>   83:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-3
>   84:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-4
>   85:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-5
>   86:          0          0       2445          0         37          0       8463         13   PCI-MSI-edge      eth2-6
>   87:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-7

Reformatted your output

> If I understand correctly the RSS hash is used to dispatch the packets
> into the different queues running on the different CPU.

It looks like most interrupts go to eth2-0, a few go to eth2-6.  The rx
ring for eth2-0 is for non-IP packets.  The RSS hash will hash IP
packets and place them on eth2-1 to eth2-7.  eth2-0 also handles tx
interrupts for TX ring 0.  TX traffic is hashed by the stack.

What kind of traffic is passing through eth2?

Thanks.



^ permalink raw reply

* bnx2/5709: Strange interrupts spread
From: Christophe Ngo Van Duc @ 2010-07-02 20:33 UTC (permalink / raw)
  To: netdev

Dear list,

I hope I am posting to the correct place...

I am facing a strange issue on a HP DL 360.

I have 2 internal ethernet cards (the one that came by default with
the server) and 2 additional ethernet cards for a total for 4 ethernet
cards.

The 2 internal cards are running fine as of interrupts (for example eth1):
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7

  71:        604      11933         40       1537          0
0          0       6043   PCI-MSI-edge      eth1-0
  72:      24805       9795       3606          0        128
0       3365          0   PCI-MSI-edge      eth1-1
  73:          0        279          0        429         38
16540          0      30843   PCI-MSI-edge      eth1-2
  74:          0          0      25365        267          0
0         89      15541   PCI-MSI-edge      eth1-3
  75:       7244      24108          0          0      16488
0        240          0   PCI-MSI-edge      eth1-4
  76:      21378       3628       7726          0         49
247       2871          0   PCI-MSI-edge      eth1-5
  77:          0          0      47199        459         13
46      63064         18   PCI-MSI-edge      eth1-6
  78:          0       6230         67        283        259
82       7846      27130   PCI-MSI-edge      eth1-7

On eth2 (external card) all interrupts goes to CPU0
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7
  80:   46973077          0          0            0            0
   0          0   PCI-MSI-edge      eth2-0
  81:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-1
  82:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-2
  83:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-3
  84:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-4
  85:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-5
  86:          0          0       2445          0         37
0       8463         13   PCI-MSI-edge      eth2-6
  87:          0          0          0          0          0
0          0          0   PCI-MSI-edge      eth2-7

If I understand correctly the RSS hash is used to dispatch the packets
into the different queues running on the different CPU.

Why then my internal cards are running fine but the additional cards
(eth2 and eth3) are presenting this behavior where all interrupts goes
to one CPU?

Thanks for your help in understanding this. (see below for config details)

Christophe.

All are detected correctly at boot:
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.8e (April 13, 2010)
bnx2 0000:02:00.0: PCI INT A -> GSI 31 (level, low) -> IRQ 31
bnx2 0000:02:00.0: setting latency timer to 64
eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem f4000000, IRQ 31, node addr f4:ce:46:86:a1:00
bnx2 0000:02:00.1: PCI INT B -> GSI 39 (level, low) -> IRQ 39
bnx2 0000:02:00.1: setting latency timer to 64
eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem f2000000, IRQ 39, node addr f4:ce:46:86:a1:02
bnx2 0000:07:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
bnx2 0000:07:00.0: setting latency timer to 64
eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem fa000000, IRQ 24, node addr 00:26:55:87:17:98
bnx2 0000:07:00.1: PCI INT B -> GSI 34 (level, low) -> IRQ 34
bnx2 0000:07:00.1: setting latency timer to 64
eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
at mem f8000000, IRQ 34, node addr 00:26:55:87:17:9a

Kernel is 2.6.31-13
Broadcom driver bnx2 v2.0.8e

eth0 is a normal interface with an Ip address
eth1 is a normal interface with an Ip address
eth2 belongs to a bridge interface without an ip address, running tc (htb)
eth3 belongs to the same bridge interface without an ip address

^ permalink raw reply

* [PATCH] ipvs: Kconfig cleanup
From: Michal Marek @ 2010-07-02 20:32 UTC (permalink / raw)
  To: lvs-devel
  Cc: netdev, Julian Anastasov, Simon Horman, Wensong Zhang,
	linux-kernel

IP_VS_PROTO_AH_ESP should be set iff either of IP_VS_PROTO_{AH,ESP} is
selected. Express this with standard kconfig syntax.

Signed-off-by: Michal Marek <mmarek@suse.cz>
---
 net/netfilter/ipvs/Kconfig |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index f2d7623..91e7373 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -83,19 +83,16 @@ config	IP_VS_PROTO_UDP
 	  protocol. Say Y if unsure.
 
 config	IP_VS_PROTO_AH_ESP
-	bool
-	depends on UNDEFINED
+	def_bool IP_VS_PROTO_ESP || IP_VS_PROTO_AH
 
 config	IP_VS_PROTO_ESP
 	bool "ESP load balancing support"
-	select IP_VS_PROTO_AH_ESP
 	---help---
 	  This option enables support for load balancing ESP (Encapsulation
 	  Security Payload) transport protocol. Say Y if unsure.
 
 config	IP_VS_PROTO_AH
 	bool "AH load balancing support"
-	select IP_VS_PROTO_AH_ESP
 	---help---
 	  This option enables support for load balancing AH (Authentication
 	  Header) transport protocol. Say Y if unsure.
-- 
1.7.1


^ permalink raw reply related

* [PATCH 8/8] net/emergency: remove locking from reycling pool if emergncy pools are not used
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Right now both users (socket and network driver) are accessing the
emergency pools locked. If there is no socket user then we have only the
NIC driver which is touching the pool in its napi callback. The locking
is not required since the driver has its own locking.

Disabling the emergency pools results in emerg_skb_users going down to
zero. Once the driver notices this then further allocations will be lock
less. This is performed while holding the list lock of the pool to
ensure that all users which touch the struct locked are gone.
Enabling the pools for the socket user requires that the NIC driver is
not touching the pool unlocked. This is ensured by the ndo_emerg_reload
callback which performs a lightweight disable/enable of the card.

As a side effect, the emergency mode of the nic is now deactivated once
the last user is gone.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c     |   10 ++++
 drivers/net/ucc_geth.c    |    8 ++++
 include/linux/netdevice.h |   55 ++++++++++++++++++++++--
 net/core/dev.c            |    1 +
 net/core/skbuff.c         |   19 ++++++++-
 net/core/sock.c           |  101 ++++++++++++++++++++++++++++++++++++++-------
 6 files changed, 172 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 1a1a249..0c891ea 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -116,6 +116,7 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 		struct sk_buff *skb);
 static int gfar_set_mac_address(struct net_device *dev);
 static int gfar_change_mtu(struct net_device *dev, int new_mtu);
+static int gfar_emerg_reload(struct net_device *dev);
 static irqreturn_t gfar_error(int irq, void *dev_id);
 static irqreturn_t gfar_transmit(int irq, void *dev_id);
 static irqreturn_t gfar_interrupt(int irq, void *dev_id);
@@ -464,6 +465,7 @@ static const struct net_device_ops gfar_netdev_ops = {
 	.ndo_start_xmit = gfar_start_xmit,
 	.ndo_stop = gfar_close,
 	.ndo_change_mtu = gfar_change_mtu,
+	.ndo_emerg_reload = gfar_emerg_reload,
 	.ndo_set_multicast_list = gfar_set_multi,
 	.ndo_tx_timeout = gfar_timeout,
 	.ndo_do_ioctl = gfar_ioctl,
@@ -1918,6 +1920,8 @@ int startup_gfar(struct net_device *ndev)
 	if (err)
 		return err;
 
+	fill_emerg_pool(ndev);
+
 	gfar_init_mac(ndev);
 
 	for (i = 0; i < priv->num_grps; i++) {
@@ -2393,6 +2397,12 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+static int gfar_emerg_reload(struct net_device *dev)
+{
+	stop_gfar(dev);
+	startup_gfar(dev);
+}
+
 /* gfar_reset_task gets scheduled when a packet has not been
  * transmitted after a set amount of time.
  * For now, assume that clearing out all the structures, and
diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 9d6097b..6280226 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -1991,6 +1991,12 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 	}
 }
 
+static void ucc_emerg_reload(struct net_device *dev)
+{
+	ucc_geth_close(dev);
+	ucc_geth_open(dev);
+}
+
 static void ucc_geth_set_multi(struct net_device *dev)
 {
 	struct ucc_geth_private *ugeth;
@@ -3499,6 +3505,7 @@ static int ucc_geth_open(struct net_device *dev)
 				  dev->name);
 		goto err;
 	}
+	fill_emerg_pool(dev);
 
 	err = request_irq(ugeth->ug_info->uf_info.irq, ucc_geth_irq_handler,
 			  0, "UCC Geth", dev);
@@ -3707,6 +3714,7 @@ static const struct net_device_ops ucc_geth_netdev_ops = {
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address	= ucc_geth_set_mac_addr,
 	.ndo_change_mtu		= eth_change_mtu,
+	.ndo_emerg_reload	= ucc_emerg_reload,
 	.ndo_set_multicast_list	= ucc_geth_set_multi,
 	.ndo_tx_timeout		= ucc_geth_timeout,
 	.ndo_do_ioctl		= ucc_geth_ioctl,
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa7e951..2606156 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -717,6 +717,10 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
  *			  struct nlattr *port[]);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * void (*ndo_emerg_reload)(struct net_device *dev);
+ *	If the card supports emergency pools then this function will perform a
+ *	lightweight reload to ensure that the card is not lockless accessing
+ *	the emergency pool for recycling purpose.
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -788,6 +792,7 @@ struct net_device_ops {
 	int			(*ndo_fcoe_get_wwn)(struct net_device *dev,
 						    u64 *wwn, int type);
 #endif
+	void			(*ndo_emerg_reload)(struct net_device *dev);
 };
 
 /*
@@ -1092,6 +1097,7 @@ struct net_device {
 	struct sk_buff_head rx_recycle;
 	u32 rx_rec_skbs_max;
 	u32 rx_rec_skb_size;
+	atomic_t emerg_skb_users;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -1138,24 +1144,63 @@ static inline void net_recycle_cleanup(struct net_device *dev)
 	skb_queue_purge(&dev->rx_recycle);
 }
 
+static inline int recycle_possible(struct net_device *dev, struct sk_buff *skb)
+{
+	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
+			skb_recycle_check(skb, dev->rx_rec_skb_size))
+		return 1;
+	else
+		return 0;
+}
+
 static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
 {
+	int emerg_active;
+
 	if (skb->emerg_dev) {
 		dev_put(skb->emerg_dev);
 		skb->emerg_dev = NULL;
 	}
-	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
-			skb_recycle_check(skb, dev->rx_rec_skb_size))
-		skb_queue_head(&dev->rx_recycle, skb);
-	else
+
+	emerg_active = atomic_read(&dev->emerg_skb_users);
+
+	if (recycle_possible(dev, skb)) {
+		if (emerg_active)
+			skb_queue_head(&dev->rx_recycle, skb);
+		else
+			__skb_queue_head(&dev->rx_recycle, skb);
+	} else {
 		dev_kfree_skb_any(skb);
+	}
+}
+
+static inline void fill_emerg_pool(struct net_device *dev)
+{
+	struct sk_buff *skb;
+
+	if (atomic_read(&dev->emerg_skb_users) == 0)
+		return;
+
+	do {
+		if (skb_queue_len(&dev->rx_recycle) >= dev->rx_rec_skbs_max)
+			return;
+
+		skb = __netdev_alloc_skb(dev, dev->rx_rec_skb_size, GFP_KERNEL);
+		if (!skb)
+			return;
+
+		net_recycle_add(dev, skb);
+	} while (1);
 }
 
 static inline struct sk_buff *net_recycle_get(struct net_device *dev)
 {
 	struct sk_buff *skb;
 
-	skb = skb_dequeue(&dev->rx_recycle);
+	if (atomic_read(&dev->emerg_skb_users) > 0)
+		skb = skb_dequeue(&dev->rx_recycle);
+	else
+		skb = __skb_dequeue(&dev->rx_recycle);
 	if (skb)
 		return skb;
 	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
diff --git a/net/core/dev.c b/net/core/dev.c
index db9acd5..5dbe356 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5405,6 +5405,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	netdev_init_queues(dev);
 
 	skb_queue_head_init(&dev->rx_recycle);
+	atomic_set(&dev->emerg_skb_users, 0);
 	INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
 	dev->ethtool_ntuple_list.count = 0;
 	INIT_LIST_HEAD(&dev->napi_list);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9e094fc..374d353 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -428,10 +428,25 @@ void __kfree_skb(struct sk_buff *skb)
 	struct net_device *ndev = skb->emerg_dev;
 
 	if (ndev) {
-		net_recycle_add(ndev, skb);
+		int emerg_en;
+		unsigned long flags;
+		int can_recycle;
+
+		skb->emerg_dev = NULL;
+		can_recycle = recycle_possible(ndev, skb);
+		spin_lock_irqsave(&ndev->rx_recycle.lock, flags);
+		emerg_en = atomic_read(&ndev->emerg_skb_users);
+		if (!emerg_en || !can_recycle) {
+			spin_unlock_irqrestore(&ndev->rx_recycle.lock, flags);
+			dev_put(ndev);
+			goto free_it;
+		}
+		__skb_queue_head(&ndev->rx_recycle, skb);
+		spin_unlock_irqrestore(&ndev->rx_recycle.lock, flags);
+		dev_put(ndev);
 		return;
 	}
-
+free_it:
 	skb_release_all(skb);
 	kfree_skbmem(skb);
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 33aa1a5..409d069 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -424,6 +424,10 @@ static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
 	if (optlen < 0)
 		goto out;
 
+	ret = -EBUSY;
+	if (sk->emerg_en)
+		goto out;
+
 	/* Bind this socket to a particular device like "eth0",
 	 * as specified in the passed interface name. If the
 	 * name is "" or the option length is zero the socket
@@ -497,10 +501,6 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 	struct net *net = sock_net(sk);
 	struct net_device *dev;
 
-	if (!val) {
-		sk->emerg_en = 0;
-		return 0;
-	}
 	if (sk->emerg_en && val)
 		return -EBUSY;
 	if (!capable(CAP_NET_ADMIN))
@@ -510,10 +510,45 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
 	if (!dev)
 		return -ENODEV;
+	if (!val) {
+		ret = 0;
+		if (!sk->emerg_en)
+			goto out;
+		/*
+		 * new skbs for this socket won't be taken from the emergency
+		 * pool anymore.
+		 */
+		sk->emerg_en = 0;
+		smp_rmb();
+		/* get rid of anyone who got into the critical section before
+		 * the flag was changed and is still there
+		 */
+		spin_lock_irq(&dev->rx_recycle.lock);
+		/*
+		 * if we fall down to 0 users, kfree will no longer add
+		 * packets to the pool but simply free them. Also the recycle
+		 * code which takes tx packets and adds them to the pool will
+		 * start working lockless since there are no further user
+		 * except the nic driver.
+		 */
+		atomic_dec(&dev->emerg_skb_users);
+		spin_unlock_irq(&dev->rx_recycle.lock);
+		goto out;
+	}
 	ret = -ENODEV;
+
 	if (!dev->rx_rec_skb_size)
 		goto out;
 
+	if (!dev->netdev_ops->ndo_emerg_reload)
+		goto out;
+
+	rtnl_lock();
+	ret = atomic_add_return(1, &dev->emerg_skb_users);
+	if (ret == 1 && (dev->flags & IFF_UP))
+		dev->netdev_ops->ndo_emerg_reload(dev);
+	rtnl_unlock();
+
 	do {
 		struct sk_buff *skb;
 
@@ -532,6 +567,8 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 
 	if (!ret)
 		sk->emerg_en = 1;
+	else
+		atomic_dec(&dev->emerg_skb_users);
 out:
 	dev_put(dev);
 	return ret;
@@ -1223,6 +1260,8 @@ EXPORT_SYMBOL(sk_alloc);
 static void __sk_free(struct sock *sk)
 {
 	struct sk_filter *filter;
+	struct net_device *dev;
+	struct net *net = sock_net(sk);
 
 	if (sk->sk_destruct)
 		sk->sk_destruct(sk);
@@ -1244,6 +1283,18 @@ static void __sk_free(struct sock *sk)
 	if (sk->sk_peer_cred)
 		put_cred(sk->sk_peer_cred);
 	put_pid(sk->sk_peer_pid);
+
+	if (sk->emerg_en) {
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (dev) {
+			spin_lock_irq(&dev->rx_recycle.lock);
+			atomic_dec(&dev->emerg_skb_users);
+			spin_unlock_irq(&dev->rx_recycle.lock);
+			WARN_ON(atomic_read(&dev->emerg_skb_users) < 0);
+			dev_put(dev);
+		}
+	}
+
 	put_net(sock_net(sk));
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
@@ -1566,6 +1617,7 @@ static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
 {
 	struct net *net = sock_net(sk);
 	struct net_device *dev;
+	unsigned long flags;
 	int err;
 	struct sk_buff *skb;
 
@@ -1580,18 +1632,33 @@ static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
 		dev_put(dev);
 		return ERR_PTR(err);
 	}
-	skb = skb_dequeue(&dev->rx_recycle);
-	if (!skb) {
-		dev_put(dev);
-		err = -ENOBUFS;
-		return ERR_PTR(err);
+
+	spin_lock_irqsave(&dev->rx_recycle.lock, flags);
+	if (sk->emerg_en) {
+		skb = __skb_dequeue(&dev->rx_recycle);
+		spin_unlock_irqrestore(&dev->rx_recycle.lock, flags);
+		if (!skb) {
+			dev_put(dev);
+			err = -ENOBUFS;
+			return ERR_PTR(err);
+		}
+		/* remove earlier skb_reserve() */
+		skb_reserve(skb, - skb_headroom(skb));
+		skb->emerg_dev = dev;
+		/*
+		 * dev will be put once the skb is back from
+		 * its journey.
+		 */
+		return skb;
 	}
 	/*
-	 * dev will be put once the skb is back from
-	 * its journey.
+	 * We got called but the emergency pools are not activated. This might
+	 * happen if the pools got deactivated between checkecing the emerg_en
+	 * flag and taking the lock.
 	 */
-	skb->emerg_dev = dev;
-	return skb;
+	spin_unlock_irqrestore(&dev->rx_recycle.lock, flags);
+	dev_put(dev);
+	return NULL;
 }
 
 /*
@@ -1627,8 +1694,12 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				if (IS_ERR(skb)) {
 					err = PTR_ERR(skb);
 					goto failure;
-				}
-				break;
+
+				} else if (skb)
+					break;
+				/*
+				 * else skb is null because emerg_en is not set
+				 */
 			}
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 6/8] net: implement emergency pools
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This patch implements emergency pools which are bound to a specific
network device. They can be activated via the socket interface and used
for a specific socket.
The pools are built on top of rx-recycling. The socket interface allows
to set the number of skbs in the pool and to active the pool.
The size of the skb which are accepted / added to the pool can not be
changed. It is set by the network driver and get altered on MTU change.
This requires to drop the current pool and re-allocate it. If the driver
does not set the skb size, the emergency pools can not be used.
Once the emergency pools are activated all rx-skbs allocation by the
network driver are taken from the pool. tx-skbs are allocated from the
emergency pool only for the relevant socket, i.e. that one which
activated the emergency mode.
Since the socket _and_ the driver can add/remove skbs to/from the pool
the list operations are using now skb_queue_head() and skb_dequeue().
There is patch later in the series which tries to bring the old unlock
behavior back if the emergency pools are not used by the user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/alpha/include/asm/socket.h   |    4 +
 arch/arm/include/asm/socket.h     |    3 +
 arch/avr32/include/asm/socket.h   |    3 +
 arch/cris/include/asm/socket.h    |    5 +-
 arch/frv/include/asm/socket.h     |    4 +-
 arch/h8300/include/asm/socket.h   |    3 +
 arch/ia64/include/asm/socket.h    |    3 +
 arch/m32r/include/asm/socket.h    |    3 +
 arch/m68k/include/asm/socket.h    |    3 +
 arch/mips/include/asm/socket.h    |    3 +
 arch/mn10300/include/asm/socket.h |    3 +
 arch/parisc/include/asm/socket.h  |    3 +
 arch/powerpc/include/asm/socket.h |    3 +
 arch/s390/include/asm/socket.h    |    3 +
 arch/sparc/include/asm/socket.h   |    3 +
 arch/xtensa/include/asm/socket.h  |    3 +
 include/asm-generic/socket.h      |    4 +
 include/linux/netdevice.h         |   52 +++++++------
 include/linux/skbuff.h            |    1 +
 include/net/sock.h                |    2 +
 net/core/skbuff.c                 |    8 ++
 net/core/sock.c                   |  142 +++++++++++++++++++++++++++++++++++++
 22 files changed, 234 insertions(+), 27 deletions(-)

diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index 06edfef..ea49db3 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -69,6 +69,10 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
+
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h
index 90ffd04..b827010 100644
--- a/arch/arm/include/asm/socket.h
+++ b/arch/arm/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h
index c8d1fae..64a7d45 100644
--- a/arch/avr32/include/asm/socket.h
+++ b/arch/avr32/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h
index 1a4a619..9b8e7ed 100644
--- a/arch/cris/include/asm/socket.h
+++ b/arch/cris/include/asm/socket.h
@@ -64,6 +64,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
-
diff --git a/arch/frv/include/asm/socket.h b/arch/frv/include/asm/socket.h
index a6b2688..15a262f 100644
--- a/arch/frv/include/asm/socket.h
+++ b/arch/frv/include/asm/socket.h
@@ -62,5 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index 04c0f45..d46d64e 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h
index 51427ea..04983aa 100644
--- a/arch/ia64/include/asm/socket.h
+++ b/arch/ia64/include/asm/socket.h
@@ -71,4 +71,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/asm/socket.h b/arch/m32r/include/asm/socket.h
index 469787c..a0e5431 100644
--- a/arch/m32r/include/asm/socket.h
+++ b/arch/m32r/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h
index 9bf49c8..7018ceb 100644
--- a/arch/m68k/include/asm/socket.h
+++ b/arch/m68k/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h
index 9de5190..9f9d93a 100644
--- a/arch/mips/include/asm/socket.h
+++ b/arch/mips/include/asm/socket.h
@@ -82,6 +82,9 @@ To add: #define SO_REUSEPORT 0x0200	/* Allow local address and port reuse.  */
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #ifdef __KERNEL__
 
 /** sock_type - Socket types
diff --git a/arch/mn10300/include/asm/socket.h b/arch/mn10300/include/asm/socket.h
index 4e60c42..70476eb 100644
--- a/arch/mn10300/include/asm/socket.h
+++ b/arch/mn10300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h
index 225b7d6..a4706d0 100644
--- a/arch/parisc/include/asm/socket.h
+++ b/arch/parisc/include/asm/socket.h
@@ -61,6 +61,9 @@
 
 #define SO_RXQ_OVFL             0x4021
 
+#define SO_EPOOL_QLEN		0x4022
+#define SO_EPOOL_SIZE		0x4023
+#define SO_EPOOL_MODE		0x4024
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h
index 866f760..dce10f9 100644
--- a/arch/powerpc/include/asm/socket.h
+++ b/arch/powerpc/include/asm/socket.h
@@ -69,4 +69,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h
index fdff1e9..73d0117 100644
--- a/arch/s390/include/asm/socket.h
+++ b/arch/s390/include/asm/socket.h
@@ -70,4 +70,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h
index 9d3fefc..39eea91 100644
--- a/arch/sparc/include/asm/socket.h
+++ b/arch/sparc/include/asm/socket.h
@@ -58,6 +58,9 @@
 
 #define SO_RXQ_OVFL             0x0024
 
+#define SO_EPOOL_QLEN           0x0025
+#define SO_EPOOL_SIZE           0x0026
+#define SO_EPOOL_MODE           0x0027
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h
index cbdf2ff..161a2e5 100644
--- a/arch/xtensa/include/asm/socket.h
+++ b/arch/xtensa/include/asm/socket.h
@@ -73,4 +73,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 9a6115e..fa9ccbb 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -64,4 +64,8 @@
 #define SO_DOMAIN		39
 
 #define SO_RXQ_OVFL             40
+
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4fa400b..fa7e951 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1095,6 +1095,28 @@ struct net_device {
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+/**
+ *	dev_put - release reference to device
+ *	@dev: network device
+ *
+ * Release reference to device to allow it to be freed.
+ */
+static inline void dev_put(struct net_device *dev)
+{
+	atomic_dec(&dev->refcnt);
+}
+
+/**
+ *	dev_hold - get reference to device
+ *	@dev: network device
+ *
+ * Hold reference to device to keep it from being freed.
+ */
+static inline void dev_hold(struct net_device *dev)
+{
+	atomic_inc(&dev->refcnt);
+}
+
 static inline void net_recycle_init(struct net_device *dev, u32 qlen, u32 size)
 {
 	dev->rx_rec_skbs_max = qlen;
@@ -1118,9 +1140,13 @@ static inline void net_recycle_cleanup(struct net_device *dev)
 
 static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
 {
+	if (skb->emerg_dev) {
+		dev_put(skb->emerg_dev);
+		skb->emerg_dev = NULL;
+	}
 	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
 			skb_recycle_check(skb, dev->rx_rec_skb_size))
-		__skb_queue_head(&dev->rx_recycle, skb);
+		skb_queue_head(&dev->rx_recycle, skb);
 	else
 		dev_kfree_skb_any(skb);
 }
@@ -1129,7 +1155,7 @@ static inline struct sk_buff *net_recycle_get(struct net_device *dev)
 {
 	struct sk_buff *skb;
 
-	skb = __skb_dequeue(&dev->rx_recycle);
+	skb = skb_dequeue(&dev->rx_recycle);
 	if (skb)
 		return skb;
 	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
@@ -1783,28 +1809,6 @@ extern int		netdev_budget;
 /* Called by rtnetlink.c:rtnl_unlock() */
 extern void netdev_run_todo(void);
 
-/**
- *	dev_put - release reference to device
- *	@dev: network device
- *
- * Release reference to device to allow it to be freed.
- */
-static inline void dev_put(struct net_device *dev)
-{
-	atomic_dec(&dev->refcnt);
-}
-
-/**
- *	dev_hold - get reference to device
- *	@dev: network device
- *
- * Hold reference to device to keep it from being freed.
- */
-static inline void dev_hold(struct net_device *dev)
-{
-	atomic_inc(&dev->refcnt);
-}
-
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
  * and _off may be called from IRQ context, but it is caller
  * who is responsible for serialization of these calls.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac74ee0..caee62c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -319,6 +319,7 @@ struct sk_buff {
 
 	struct sock		*sk;
 	struct net_device	*dev;
+	struct net_device	*emerg_dev;
 
 	/*
 	 * This is the control buffer. It is free to use for every
diff --git a/include/net/sock.h b/include/net/sock.h
index 4f26f2f..3f3518a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -314,6 +314,8 @@ struct sock {
 #endif
 	__u32			sk_mark;
 	u32			sk_classid;
+	u32			emerg_en;
+	/* XXX 4 bytes hole on 64 bit */
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 34432b4..f02737d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -425,6 +425,13 @@ static void skb_release_all(struct sk_buff *skb)
 
 void __kfree_skb(struct sk_buff *skb)
 {
+	struct net_device *ndev = skb->emerg_dev;
+
+	if (ndev) {
+		net_recycle_add(ndev, skb);
+		return;
+	}
+
 	skb_release_all(skb);
 	kfree_skbmem(skb);
 }
@@ -563,6 +570,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 {
 #define C(x) n->x = skb->x
 
+	n->emerg_dev = NULL;
 	n->next = n->prev = NULL;
 	n->sk = NULL;
 	__copy_skb_header(n, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index fef2434..33aa1a5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -472,6 +472,71 @@ static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
 		sock_reset_flag(sk, bit);
 }
 
+static int sock_epool_set_qlen(struct sock *sk, int val)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+
+	net_recycle_qlen(dev, val);
+	dev_put(dev);
+	return 0;
+}
+
+static int sock_epool_set_mode(struct sock *sk, int val)
+{
+	int ret;
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!val) {
+		sk->emerg_en = 0;
+		return 0;
+	}
+	if (sk->emerg_en && val)
+		return -EBUSY;
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+	ret = -ENODEV;
+	if (!dev->rx_rec_skb_size)
+		goto out;
+
+	do {
+		struct sk_buff *skb;
+
+		if (skb_queue_len(&dev->rx_recycle) >= dev->rx_rec_skbs_max) {
+			ret = 0;
+			break;
+		}
+
+		skb = __netdev_alloc_skb(dev, dev->rx_rec_skb_size, GFP_KERNEL);
+		if (!skb) {
+			ret = -ENOMEM;
+			break;
+		}
+		net_recycle_add(dev, skb);
+	} while (1);
+
+	if (!ret)
+		sk->emerg_en = 1;
+out:
+	dev_put(dev);
+	return ret;
+}
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.
@@ -740,6 +805,15 @@ set_rcvbuf:
 		else
 			sock_reset_flag(sk, SOCK_RXQ_OVFL);
 		break;
+	case SO_EPOOL_QLEN:
+		ret = sock_epool_set_qlen(sk, val);
+		break;
+	case SO_EPOOL_SIZE:
+		ret = -EINVAL;
+		break;
+	case SO_EPOOL_MODE:
+		ret = sock_epool_set_mode(sk, valbool);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -961,6 +1035,35 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = !!sock_flag(sk, SOCK_RXQ_OVFL);
 		break;
 
+	case SO_EPOOL_QLEN:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skbs_max;
+		break;
+	}
+	case SO_EPOOL_SIZE:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skb_size;
+		break;
+	}
+	case SO_EPOOL_MODE:
+		v.val = sk->emerg_en;
+		break;
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -1459,6 +1562,37 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 	return timeo;
 }
 
+static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+	int err;
+	struct sk_buff *skb;
+
+	err = -ENODEV;
+	if (!sk->sk_bound_dev_if)
+		return ERR_PTR(err);
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return ERR_PTR(err);
+	err = -EINVAL;
+	if (dev->rx_rec_skb_size < skb_len) {
+		dev_put(dev);
+		return ERR_PTR(err);
+	}
+	skb = skb_dequeue(&dev->rx_recycle);
+	if (!skb) {
+		dev_put(dev);
+		err = -ENOBUFS;
+		return ERR_PTR(err);
+	}
+	/*
+	 * dev will be put once the skb is back from
+	 * its journey.
+	 */
+	skb->emerg_dev = dev;
+	return skb;
+}
 
 /*
  *	Generic send/receive buffer handlers
@@ -1488,6 +1622,14 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 			goto failure;
 
 		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
+			if (sk->emerg_en) {
+				skb = alloc_emerg_skb(sk, header_len + data_len);
+				if (IS_ERR(skb)) {
+					err = PTR_ERR(skb);
+					goto failure;
+				}
+				break;
+			}
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
 				int npages;
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 7/8] net/emergency_skb: create a deep copy on clone
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

skb_clone() creates a clone of the skb: a new head is allocated from the
slab cache and the reference counter for the data part is incremented.
For the skbs from the emergency pool, we don't really want to clone
them that way:
- talking to slab may lead to lock contention which in turn increases
  the latency.
- the original (with the data part) may return earlier to the pool than
  the clone. In that case we would "lose" the skb from the emergency
  pool.

Instead we do a copy of head and data into a skb from the emergency
pool.
This patch cuts pskb_copy() into a helper function which does
the bare work and the remaining pskb_copy() allocates a new skb and
calls it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/skbuff.c |   80 +++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f02737d..9e094fc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -613,6 +613,7 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
 }
 EXPORT_SYMBOL_GPL(skb_morph);
 
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n);
 /**
  *	skb_clone	-	duplicate an sk_buff
  *	@skb: buffer to clone
@@ -631,6 +632,20 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
+	if (skb->emerg_dev) {
+		n = skb_dequeue(&skb->emerg_dev->rx_recycle);
+		if (!n)
+			goto norm_clone;
+		/* remove earlier reservers */
+		skb_reserve(n, - skb_headroom(n));
+		if (!__pskb_copy(skb, n)) {
+			n->emerg_dev = skb->emerg_dev;
+			dev_hold(skb->emerg_dev);
+			return n;
+		}
+		net_recycle_add(skb->emerg_dev, n);
+	}
+norm_clone:
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
 	    n->fclone == SKB_FCLONE_UNAVAILABLE) {
@@ -720,31 +735,22 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
 EXPORT_SYMBOL(skb_copy);
 
 /**
- *	pskb_copy	-	create copy of an sk_buff with private head.
- *	@skb: buffer to copy
- *	@gfp_mask: allocation priority
+ *      __pskb_copy     -       create copy of an sk_buff with private head.
+ *      @skb: buffer to copy
+ *      @n: skb to copy it
  *
- *	Make a copy of both an &sk_buff and part of its data, located
- *	in header. Fragmented data remain shared. This is used when
- *	the caller wishes to modify only header of &sk_buff and needs
- *	private copy of the header to alter. Returns %NULL on failure
- *	or the pointer to the buffer on success.
- *	The returned buffer has a reference count of 1.
+ *      This functions behaves like pskb_copy() except that it takes
+ *      an allready allocated skb where it will copy head and data.
+ *      The returned buffer has a reference count of 1.
  */
-
-struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n)
 {
-	/*
-	 *	Allocate the copy buffer
-	 */
-	struct sk_buff *n;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
-	n = alloc_skb(skb->end, gfp_mask);
+	if (skb->end > n->end)
 #else
-	n = alloc_skb(skb->end - skb->head, gfp_mask);
+	if ((skb->end - skb->head) > (n->end - n->head))
 #endif
-	if (!n)
-		goto out;
+		return -EMSGSIZE;
 
 	/* Set the data pointer */
 	skb_reserve(n, skb->data - skb->head);
@@ -773,8 +779,40 @@ struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
 	}
 
 	copy_skb_header(n, skb);
-out:
-	return n;
+	return 0;
+}
+
+/**
+ *	pskb_copy	-	create copy of an sk_buff with private head.
+ *	@skb: buffer to copy
+ *	@gfp_mask: allocation priority
+ *
+ *	Make a copy of both an &sk_buff and part of its data, located
+ *	in header. Fragmented data remain shared. This is used when
+ *	the caller wishes to modify only header of &sk_buff and needs
+ *	private copy of the header to alter. Returns %NULL on failure
+ *	or the pointer to the buffer on success.
+ *	The returned buffer has a reference count of 1.
+ */
+
+struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+{
+	/*
+	 *	Allocate the copy buffer
+	 */
+	struct sk_buff *n;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+	n = alloc_skb(skb->end, gfp_mask);
+#else
+	n = alloc_skb(skb->end - skb->head, gfp_mask);
+#endif
+	if (!n)
+		return NULL;
+	if (!__pskb_copy(skb, n))
+		return n;
+	kfree_skb(n);
+	return NULL;
+
 }
 EXPORT_SYMBOL(pskb_copy);
 
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 4/8] net/stmmac: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/stmmac/stmmac.h      |    1 -
 drivers/net/stmmac/stmmac_main.c |   26 +++++++-------------------
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index ebebc64..dbf9f95 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -44,7 +44,6 @@ struct stmmac_priv {
 	unsigned int dirty_rx;
 	struct sk_buff **rx_skbuff;
 	dma_addr_t *rx_skbuff_dma;
-	struct sk_buff_head rx_recycle;
 
 	struct net_device *dev;
 	int is_gmac;
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index a31d580..722a5e6 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -636,18 +636,7 @@ static void stmmac_tx(struct stmmac_priv *priv)
 			p->des3 = 0;
 
 		if (likely(skb != NULL)) {
-			/*
-			 * If there's room in the queue (limit it to size)
-			 * we add this skb back into the pool,
-			 * if it's the right size.
-			 */
-			if ((skb_queue_len(&priv->rx_recycle) <
-				priv->dma_rx_size) &&
-				skb_recycle_check(skb, priv->dma_buf_sz))
-				__skb_queue_head(&priv->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-
+			net_recycle_add(priv->dev, skb);
 			priv->tx_skbuff[entry] = NULL;
 		}
 
@@ -843,6 +832,9 @@ static int stmmac_open(struct net_device *dev)
 	priv->dma_buf_sz = STMMAC_ALIGN(buf_sz);
 	init_dma_desc_rings(dev);
 
+	net_recycle_init(priv->dev, priv->dma_rx_size, priv->dma_buf_sz +
+			NET_IP_ALIGN);
+
 	/* DMA initialization and SW reset */
 	if (unlikely(priv->hw->dma->init(ioaddr, priv->pbl, priv->dma_tx_phy,
 					 priv->dma_rx_phy) < 0)) {
@@ -894,7 +886,6 @@ static int stmmac_open(struct net_device *dev)
 		phy_start(priv->phydev);
 
 	napi_enable(&priv->napi);
-	skb_queue_head_init(&priv->rx_recycle);
 	netif_start_queue(dev);
 	return 0;
 }
@@ -925,7 +916,7 @@ static int stmmac_release(struct net_device *dev)
 		kfree(priv->tm);
 #endif
 	napi_disable(&priv->napi);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->dev);
 
 	/* Free the IRQ lines */
 	free_irq(dev->irq, dev);
@@ -1157,13 +1148,10 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv)
 		if (likely(priv->rx_skbuff[entry] == NULL)) {
 			struct sk_buff *skb;
 
-			skb = __skb_dequeue(&priv->rx_recycle);
-			if (skb == NULL)
-				skb = netdev_alloc_skb_ip_align(priv->dev,
-								bfsize);
-
+			skb = net_recycle_get(priv->dev);
 			if (unlikely(skb == NULL))
 				break;
+			skb_reserve(skb, NET_IP_ALIGN);
 
 			priv->rx_skbuff[entry] = skb;
 			priv->rx_skbuff_dma[entry] =
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 5/8] net/ucc_geth: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ucc_geth.c |   30 ++++++++----------------------
 drivers/net/ucc_geth.h |    2 --
 2 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index dc32a62..9d6097b 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -210,10 +210,7 @@ static struct sk_buff *get_new_skb(struct ucc_geth_private *ugeth,
 {
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&ugeth->rx_recycle);
-	if (!skb)
-		skb = dev_alloc_skb(ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+	skb = net_recycle_get(ugeth->ndev);
 	if (skb == NULL)
 		return NULL;
 
@@ -1992,8 +1989,6 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 		iounmap(ugeth->ug_regs);
 		ugeth->ug_regs = NULL;
 	}
-
-	skb_queue_purge(&ugeth->rx_recycle);
 }
 
 static void ucc_geth_set_multi(struct net_device *dev)
@@ -2069,6 +2064,7 @@ static void ucc_geth_stop(struct ucc_geth_private *ugeth)
 	ugeth->phydev = NULL;
 
 	ucc_geth_memclean(ugeth);
+	net_recycle_cleanup(ugeth->ndev);
 }
 
 static int ucc_struct_init(struct ucc_geth_private *ugeth)
@@ -2205,9 +2201,6 @@ static int ucc_struct_init(struct ucc_geth_private *ugeth)
 			ugeth_err("%s: Failed to ioremap regs.", __func__);
 		return -ENOMEM;
 	}
-
-	skb_queue_head_init(&ugeth->rx_recycle);
-
 	return 0;
 }
 
@@ -3213,12 +3206,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 			if (netif_msg_rx_err(ugeth))
 				ugeth_err("%s, %d: ERROR!!! skb - 0x%08x",
 					   __func__, __LINE__, (u32) skb);
-			if (skb) {
-				skb->data = skb->head + NET_SKB_PAD;
-				skb->len = 0;
-				skb_reset_tail_pointer(skb);
-				__skb_queue_head(&ugeth->rx_recycle, skb);
-			}
+			if (skb)
+				net_recycle_add(dev, skb);
 
 			ugeth->rx_skbuff[rxQ][ugeth->skb_currx[rxQ]] = NULL;
 			dev->stats.rx_dropped++;
@@ -3288,13 +3277,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
 
 		dev->stats.tx_packets++;
 
-		if (skb_queue_len(&ugeth->rx_recycle) < RX_BD_RING_LEN &&
-			     skb_recycle_check(skb,
-				    ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT))
-			__skb_queue_head(&ugeth->rx_recycle, skb);
-		else
-			dev_kfree_skb(skb);
+		net_recycle_add(dev, skb);
 
 		ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]] = NULL;
 		ugeth->skb_dirtytx[txQ] =
@@ -3929,6 +3912,9 @@ static int ucc_geth_probe(struct of_device* ofdev, const struct of_device_id *ma
 	netif_napi_add(dev, &ugeth->napi, ucc_geth_poll, 64);
 	dev->mtu = 1500;
 
+	net_recycle_init(dev, RX_BD_RING_LEN, ug_info->uf_info.max_rx_buf_length
+			+ UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+
 	ugeth->msg_enable = netif_msg_init(debug.msg_enable, UGETH_MSG_DEFAULT);
 	ugeth->phy_interface = phy_interface;
 	ugeth->max_speed = max_speed;
diff --git a/drivers/net/ucc_geth.h b/drivers/net/ucc_geth.h
index 05a9558..07c0816 100644
--- a/drivers/net/ucc_geth.h
+++ b/drivers/net/ucc_geth.h
@@ -1213,8 +1213,6 @@ struct ucc_geth_private {
 	/* index of the first skb which hasn't been transmitted yet. */
 	u16 skb_dirtytx[NUM_TX_QUEUES];
 
-	struct sk_buff_head rx_recycle;
-
 	struct ugeth_mii_info *mii_info;
 	struct phy_device *phydev;
 	phy_interface_t phy_interface;
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 2/8] net/gianfar: use generic recycling infrasstructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c         |   28 ++++++++--------------------
 drivers/net/gianfar.h         |    2 --
 drivers/net/gianfar_ethtool.c |    1 +
 3 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index fccb7a3..1a1a249 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1148,6 +1148,9 @@ static int gfar_probe(struct of_device *ofdev,
 		priv->rx_queue[i]->rxic = DEFAULT_RXIC;
 	}
 
+	net_recycle_init(dev, DEFAULT_RX_RING_SIZE,
+			priv->rx_buffer_size + RXBUF_ALIGNMENT);
+
 	/* enable filer if using multiple RX queues*/
 	if(priv->num_rx_queues > 1)
 		priv->rx_filer_enable = 1;
@@ -1768,7 +1771,7 @@ static void free_skb_resources(struct gfar_private *priv)
 			sizeof(struct rxbd8) * priv->total_rx_ring_size,
 			priv->tx_queue[0]->tx_bd_base,
 			priv->tx_queue[0]->tx_bd_dma_base);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->ndev);
 }
 
 void gfar_start(struct net_device *dev)
@@ -1949,8 +1952,6 @@ static int gfar_enet_open(struct net_device *dev)
 
 	enable_napi(priv);
 
-	skb_queue_head_init(&priv->rx_recycle);
-
 	/* Initialize a bunch of registers */
 	init_registers(dev);
 
@@ -2366,6 +2367,7 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 		stop_gfar(dev);
 
 	priv->rx_buffer_size = tempsize;
+	net_recycle_size(dev, tempsize + RXBUF_ALIGNMENT);
 
 	dev->mtu = new_mtu;
 
@@ -2498,16 +2500,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 			bdp = next_txbd(bdp, base, tx_ring_size);
 		}
 
-		/*
-		 * If there's room in the queue (limit it to rx_buffer_size)
-		 * we add this skb back into the pool, if it's the right size
-		 */
-		if (skb_queue_len(&priv->rx_recycle) < rx_queue->rx_ring_size &&
-				skb_recycle_check(skb, priv->rx_buffer_size +
-					RXBUF_ALIGNMENT))
-			__skb_queue_head(&priv->rx_recycle, skb);
-		else
-			dev_kfree_skb_any(skb);
+		net_recycle_add(dev, skb);
 
 		tx_queue->tx_skbuff[skb_dirtytx] = NULL;
 
@@ -2573,14 +2566,9 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 struct sk_buff * gfar_new_skb(struct net_device *dev)
 {
 	unsigned int alignamount;
-	struct gfar_private *priv = netdev_priv(dev);
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&priv->rx_recycle);
-	if (!skb)
-		skb = netdev_alloc_skb(dev,
-				priv->rx_buffer_size + RXBUF_ALIGNMENT);
-
+	skb = net_recycle_get(dev);
 	if (!skb)
 		return NULL;
 
@@ -2753,7 +2741,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 				 * recycle list.
 				 */
 				skb_reserve(skb, -GFAR_CB(skb)->alignamount);
-				__skb_queue_head(&priv->rx_recycle, skb);
+				net_recycle_add(dev, skb);
 			}
 		} else {
 			/* Increment the number of packets */
diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index 710810e..66f8c04 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -1068,8 +1068,6 @@ struct gfar_private {
 
 	u32 cur_filer_idx;
 
-	struct sk_buff_head rx_recycle;
-
 	struct vlan_group *vlgrp;
 
 
diff --git a/drivers/net/gianfar_ethtool.c b/drivers/net/gianfar_ethtool.c
index 9bda023..8a6d567 100644
--- a/drivers/net/gianfar_ethtool.c
+++ b/drivers/net/gianfar_ethtool.c
@@ -508,6 +508,7 @@ static int gfar_sringparam(struct net_device *dev, struct ethtool_ringparam *rva
 		priv->tx_queue[i]->tx_ring_size = rvals->tx_pending;
 		priv->tx_queue[i]->num_txbdfree = priv->tx_queue[i]->tx_ring_size;
 	}
+	net_recycle_qlen(dev, rvals->rx_pending);
 
 	/* Rebuild the rings with the new size */
 	if (dev->flags & IFF_UP) {
-- 
1.6.6.1


^ permalink raw reply related

* Generic rx-recycling and emergency skb pool
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx

This is version two of generic rx-recycling followed by version one of
emergency skb pools which are built on top of rx-recycling.
The change from v1 of generic rx-recycling is that the list access is
unlocked instead of locked.
Patch six which introduces the emergency pools adds the locking back.
This is required since we now have two not serialized users. In order
not to punish everyone patch eight removes this locking again. That
patch converts only two drivers so you have an idea what I think is
required to get the locking removed.

The idea behind emergency pools is to have pre-allocated skbs for TX and
RX. Using the memory allocator for it leads to latencies during memory
pressure. The pre-allocated skb are "tagged" and should get back to the
pool once they are through the stack so the pool should never get
exhausted.

While it was easy to convert the drivers which share the same concept of
rx-recycling to use the emergency pools it was difficult to hook up the
more complex drivers like e1000e. The e1000e can use split skbs / a frag
list which is different from the allocation currently used. So instead of
forcing all drivers to use the same way of doing things I've been thinking
about providing a dedicated callback for skb allocation and checking if
this skb is "good enough". This is not yet implemented.

I would be glad to receive some feedback on this patch series before I go
any further. Unfortunately I'm on vacation for the next two weeks so I
can't respond earlier. tglx is on Cc and should be able respond earlier :)

Sebastian

^ permalink raw reply

* [PATCH 3/8] net/mv643xx: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/mv643xx_eth.c |   27 +++++++++------------------
 1 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 82b720f..a58ba48 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -404,8 +404,6 @@ struct mv643xx_eth_private {
 	u8 work_rx_refill;
 
 	int skb_size;
-	struct sk_buff_head rx_recycle;
-
 	/*
 	 * RX state.
 	 */
@@ -649,6 +647,7 @@ err:
 static int rxq_refill(struct rx_queue *rxq, int budget)
 {
 	struct mv643xx_eth_private *mp = rxq_to_mp(rxq);
+	struct net_device *dev = mp->dev;
 	int refilled;
 
 	refilled = 0;
@@ -658,10 +657,7 @@ static int rxq_refill(struct rx_queue *rxq, int budget)
 		struct rx_desc *rx_desc;
 		int size;
 
-		skb = __skb_dequeue(&mp->rx_recycle);
-		if (skb == NULL)
-			skb = dev_alloc_skb(mp->skb_size);
-
+		skb = net_recycle_get(dev);
 		if (skb == NULL) {
 			mp->oom = 1;
 			goto oom;
@@ -921,6 +917,7 @@ out:
 static int txq_reclaim(struct tx_queue *txq, int budget, int force)
 {
 	struct mv643xx_eth_private *mp = txq_to_mp(txq);
+	struct net_device *dev = mp->dev;
 	struct netdev_queue *nq = netdev_get_tx_queue(mp->dev, txq->index);
 	int reclaimed;
 
@@ -967,14 +964,8 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force)
 				       desc->byte_cnt, DMA_TO_DEVICE);
 		}
 
-		if (skb != NULL) {
-			if (skb_queue_len(&mp->rx_recycle) <
-					mp->rx_ring_size &&
-			    skb_recycle_check(skb, mp->skb_size))
-				__skb_queue_head(&mp->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-		}
+		if (skb)
+			net_recycle_add(dev, skb);
 	}
 
 	__netif_tx_unlock(nq);
@@ -1563,7 +1554,7 @@ mv643xx_eth_set_ringparam(struct net_device *dev, struct ethtool_ringparam *er)
 
 	mp->rx_ring_size = er->rx_pending < 4096 ? er->rx_pending : 4096;
 	mp->tx_ring_size = er->tx_pending < 4096 ? er->tx_pending : 4096;
-
+	net_recycle_qlen(dev, mp->rx_ring_size);
 	if (netif_running(dev)) {
 		mv643xx_eth_stop(dev);
 		if (mv643xx_eth_open(dev)) {
@@ -2344,9 +2335,9 @@ static int mv643xx_eth_open(struct net_device *dev)
 
 	mv643xx_eth_recalc_skb_size(mp);
 
-	napi_enable(&mp->napi);
+	net_recycle_init(mp->dev, mp->rx_ring_size, mp->skb_size);
 
-	skb_queue_head_init(&mp->rx_recycle);
+	napi_enable(&mp->napi);
 
 	mp->int_mask = INT_EXT;
 
@@ -2442,7 +2433,7 @@ static int mv643xx_eth_stop(struct net_device *dev)
 	mib_counters_update(mp);
 	del_timer_sync(&mp->mib_counters_timer);
 
-	skb_queue_purge(&mp->rx_recycle);
+	net_recycle_cleanup(dev);
 
 	for (i = 0; i < mp->rxq_count; i++)
 		rxq_deinit(mp->rxq + i);
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 1/8] net: implement generic rx recycling
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

the code is basically what other drivers like gianfar are using right
now.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/netdevice.h |   67 +++++++++++++++++++++++++++++++++++++--------
 net/core/dev.c            |    1 +
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8fa5e5a..4fa400b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -595,6 +595,18 @@ struct netdev_rx_queue {
 } ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RPS */
 
+/* Use this variant when it is known for sure that it
+ * is executing from hardware interrupt context or with hardware interrupts
+ * disabled.
+ */
+extern void dev_kfree_skb_irq(struct sk_buff *skb);
+
+/* Use this variant in places where it could be invoked
+ * from either hardware interrupt or other context, with hardware interrupts
+ * either disabled or enabled.
+ */
+extern void dev_kfree_skb_any(struct sk_buff *skb);
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1077,9 +1089,52 @@ struct net_device {
 #endif
 	/* n-tuple filter list attached to this device */
 	struct ethtool_rx_ntuple_list ethtool_ntuple_list;
+	struct sk_buff_head rx_recycle;
+	u32 rx_rec_skbs_max;
+	u32 rx_rec_skb_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+static inline void net_recycle_init(struct net_device *dev, u32 qlen, u32 size)
+{
+	dev->rx_rec_skbs_max = qlen;
+	dev->rx_rec_skb_size = size;
+}
+
+static inline void net_recycle_size(struct net_device *dev, u32 size)
+{
+	dev->rx_rec_skb_size = size;
+}
+
+static inline void net_recycle_qlen(struct net_device *dev, u32 qlen)
+{
+	dev->rx_rec_skbs_max = qlen;
+}
+
+static inline void net_recycle_cleanup(struct net_device *dev)
+{
+	skb_queue_purge(&dev->rx_recycle);
+}
+
+static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
+{
+	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
+			skb_recycle_check(skb, dev->rx_rec_skb_size))
+		__skb_queue_head(&dev->rx_recycle, skb);
+	else
+		dev_kfree_skb_any(skb);
+}
+
+static inline struct sk_buff *net_recycle_get(struct net_device *dev)
+{
+	struct sk_buff *skb;
+
+	skb = __skb_dequeue(&dev->rx_recycle);
+	if (skb)
+		return skb;
+	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
+}
+
 #define	NETDEV_ALIGN		32
 
 static inline
@@ -1672,18 +1727,6 @@ static inline int netif_is_multiqueue(const struct net_device *dev)
 	return (dev->num_tx_queues > 1);
 }
 
-/* Use this variant when it is known for sure that it
- * is executing from hardware interrupt context or with hardware interrupts
- * disabled.
- */
-extern void dev_kfree_skb_irq(struct sk_buff *skb);
-
-/* Use this variant in places where it could be invoked
- * from either hardware interrupt or other context, with hardware interrupts
- * either disabled or enabled.
- */
-extern void dev_kfree_skb_any(struct sk_buff *skb);
-
 #define HAVE_NETIF_RX 1
 extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index e85cc5f..db9acd5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5404,6 +5404,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	netdev_init_queues(dev);
 
+	skb_queue_head_init(&dev->rx_recycle);
 	INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
 	dev->ethtool_ntuple_list.count = 0;
 	INIT_LIST_HEAD(&dev->napi_list);
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH] s2io: resolve statistics issues
From: Jon Mason @ 2010-07-02 19:13 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Ramkrishna Vepa, Sivakumar Subramani,
	Sreenivasa Honnur, Michal Schmidt
In-Reply-To: <20100630005454.GJ21324@exar.com>

This patch resolves a number of issues in the statistics gathering of
the s2io driver.

On Xframe adapters, the received multicast statistics counter includes
pause frames which are not indicated to the driver.  This can cause
issues where the multicast packet count is higher than what has actually
been received, possibly higher than the number of packets received.
    
The driver software counters are replaced with the adapter hardware
statistics for rx_packets, rx_bytes, and tx_bytes.  It also uses the
overflow registers to determine if the statistics wrapped the 32bit
register (removing the window of having a statistic value less than the
previous call).  rx_length_errors statistic now includes undersized
packets in addition to oversized packets in its counting.  Finally,
rx_crc_errors are now being counted.
    
Signed-off-by: Jon Mason <jon.mason@exar.com>

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 22371f1..d0af924 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -3129,7 +3129,6 @@ static void tx_intr_handler(struct fifo_info *fifo_data)
 		pkt_cnt++;
 
 		/* Updating the statistics block */
-		nic->dev->stats.tx_bytes += skb->len;
 		swstats->mem_freed += skb->truesize;
 		dev_kfree_skb_irq(skb);
 
@@ -4900,48 +4899,81 @@ static void s2io_updt_stats(struct s2io_nic *sp)
  *  Return value:
  *  pointer to the updated net_device_stats structure.
  */
-
 static struct net_device_stats *s2io_get_stats(struct net_device *dev)
 {
 	struct s2io_nic *sp = netdev_priv(dev);
-	struct config_param *config = &sp->config;
 	struct mac_info *mac_control = &sp->mac_control;
 	struct stat_block *stats = mac_control->stats_info;
-	int i;
+	u64 delta;
 
 	/* Configure Stats for immediate updt */
 	s2io_updt_stats(sp);
 
-	/* Using sp->stats as a staging area, because reset (due to mtu
-	   change, for example) will clear some hardware counters */
-	dev->stats.tx_packets += le32_to_cpu(stats->tmac_frms) -
-		sp->stats.tx_packets;
-	sp->stats.tx_packets = le32_to_cpu(stats->tmac_frms);
-
-	dev->stats.tx_errors += le32_to_cpu(stats->tmac_any_err_frms) -
-		sp->stats.tx_errors;
-	sp->stats.tx_errors = le32_to_cpu(stats->tmac_any_err_frms);
-
-	dev->stats.rx_errors += le64_to_cpu(stats->rmac_drop_frms) -
-		sp->stats.rx_errors;
-	sp->stats.rx_errors = le64_to_cpu(stats->rmac_drop_frms);
-
-	dev->stats.multicast = le32_to_cpu(stats->rmac_vld_mcst_frms) -
-		sp->stats.multicast;
-	sp->stats.multicast = le32_to_cpu(stats->rmac_vld_mcst_frms);
-
-	dev->stats.rx_length_errors = le64_to_cpu(stats->rmac_long_frms) -
-		sp->stats.rx_length_errors;
-	sp->stats.rx_length_errors = le64_to_cpu(stats->rmac_long_frms);
+	/* A device reset will cause the on-adapter statistics to be zero'ed.
+	 * This can be done while running by changing the MTU.  To prevent the
+	 * system from having the stats zero'ed, the driver keeps a copy of the
+	 * last update to the system (which is also zero'ed on reset).  This
+	 * enables the driver to accurately know the delta between the last
+	 * update and the current update.
+	 */
+	delta = ((u64) le32_to_cpu(stats->rmac_vld_frms_oflow) << 32 |
+		le32_to_cpu(stats->rmac_vld_frms)) - sp->stats.rx_packets;
+	sp->stats.rx_packets += delta;
+	dev->stats.rx_packets += delta;
+
+	delta = ((u64) le32_to_cpu(stats->tmac_frms_oflow) << 32 |
+		le32_to_cpu(stats->tmac_frms)) - sp->stats.tx_packets;
+	sp->stats.tx_packets += delta;
+	dev->stats.tx_packets += delta;
+
+	delta = ((u64) le32_to_cpu(stats->rmac_data_octets_oflow) << 32 |
+		le32_to_cpu(stats->rmac_data_octets)) - sp->stats.rx_bytes;
+	sp->stats.rx_bytes += delta;
+	dev->stats.rx_bytes += delta;
+
+	delta = ((u64) le32_to_cpu(stats->tmac_data_octets_oflow) << 32 |
+		le32_to_cpu(stats->tmac_data_octets)) - sp->stats.tx_bytes;
+	sp->stats.tx_bytes += delta;
+	dev->stats.tx_bytes += delta;
+
+	delta = le64_to_cpu(stats->rmac_drop_frms) - sp->stats.rx_errors;
+	sp->stats.rx_errors += delta;
+	dev->stats.rx_errors += delta;
+
+	delta = ((u64) le32_to_cpu(stats->tmac_any_err_frms_oflow) << 32 |
+		le32_to_cpu(stats->tmac_any_err_frms)) - sp->stats.tx_errors;
+	sp->stats.tx_errors += delta;
+	dev->stats.tx_errors += delta;
+
+	delta = le64_to_cpu(stats->rmac_drop_frms) - sp->stats.rx_dropped;
+	sp->stats.rx_dropped += delta;
+	dev->stats.rx_dropped += delta;
+
+	delta = le64_to_cpu(stats->tmac_drop_frms) - sp->stats.tx_dropped;
+	sp->stats.tx_dropped += delta;
+	dev->stats.tx_dropped += delta;
+
+	/* The adapter MAC interprets pause frames as multicast packets, but
+	 * does not pass them up.  This erroneously increases the multicast
+	 * packet count and needs to be deducted when the multicast frame count
+	 * is queried.
+	 */
+	delta = (u64) le32_to_cpu(stats->rmac_vld_mcst_frms_oflow) << 32 |
+		le32_to_cpu(stats->rmac_vld_mcst_frms);
+	delta -= le64_to_cpu(stats->rmac_pause_ctrl_frms);
+	delta -= sp->stats.multicast;
+	sp->stats.multicast += delta;
+	dev->stats.multicast += delta;
 
-	/* collect per-ring rx_packets and rx_bytes */
-	dev->stats.rx_packets = dev->stats.rx_bytes = 0;
-	for (i = 0; i < config->rx_ring_num; i++) {
-		struct ring_info *ring = &mac_control->rings[i];
+	delta = ((u64) le32_to_cpu(stats->rmac_usized_frms_oflow) << 32 |
+		le32_to_cpu(stats->rmac_usized_frms)) +
+		le64_to_cpu(stats->rmac_long_frms) - sp->stats.rx_length_errors;
+	sp->stats.rx_length_errors += delta;
+	dev->stats.rx_length_errors += delta;
 
-		dev->stats.rx_packets += ring->rx_packets;
-		dev->stats.rx_bytes += ring->rx_bytes;
-	}
+	delta = le64_to_cpu(stats->rmac_fcs_err_frms) - sp->stats.rx_crc_errors;
+	sp->stats.rx_crc_errors += delta;
+	dev->stats.rx_crc_errors += delta;
 
 	return &dev->stats;
 }
@@ -7494,15 +7526,11 @@ static int rx_osm_handler(struct ring_info *ring_data, struct RxD_t * rxdp)
 		}
 	}
 
-	/* Updating statistics */
-	ring_data->rx_packets++;
 	rxdp->Host_Control = 0;
 	if (sp->rxd_mode == RXD_MODE_1) {
 		int len = RXD_GET_BUFFER0_SIZE_1(rxdp->Control_2);
 
-		ring_data->rx_bytes += len;
 		skb_put(skb, len);
-
 	} else if (sp->rxd_mode == RXD_MODE_3B) {
 		int get_block = ring_data->rx_curr_get_info.block_index;
 		int get_off = ring_data->rx_curr_get_info.offset;
@@ -7511,7 +7539,6 @@ static int rx_osm_handler(struct ring_info *ring_data, struct RxD_t * rxdp)
 		unsigned char *buff = skb_push(skb, buf0_len);
 
 		struct buffAdd *ba = &ring_data->ba[get_block][get_off];
-		ring_data->rx_bytes += buf0_len + buf2_len;
 		memcpy(buff, ba->ba_0, buf0_len);
 		skb_put(skb, buf2_len);
 	}
diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h
index 47c36e0..5e52c75 100644
--- a/drivers/net/s2io.h
+++ b/drivers/net/s2io.h
@@ -745,10 +745,6 @@ struct ring_info {
 
 	/* Buffer Address store. */
 	struct buffAdd **ba;
-
-	/* per-Ring statistics */
-	unsigned long rx_packets;
-	unsigned long rx_bytes;
 } ____cacheline_aligned;
 
 /* Fifo specific structure */

^ permalink raw reply related

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-02 18:11 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Tejun Heo, Oleg Nesterov, Michael S. Tsirkin, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <4C2E2987.9040702@us.ibm.com>

On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
>  
>  Does  it (Tejun's kthread_clone() patch) also  inherit the 
> cgroup of the caller?

Of course, its a simple do_fork() which inherits everything just as you
would expect from a similar sys_clone()/sys_fork() call.

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Sridhar Samudrala @ 2010-07-02 18:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Oleg Nesterov, Michael S. Tsirkin, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277996135.1917.198.camel@laptop>

On 7/1/2010 7:55 AM, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 16:53 +0200, Tejun Heo wrote:
>    
>> Hello,
>>
>> On 07/01/2010 04:46 PM, Oleg Nesterov wrote:
>>      
>>>> It might be a good idea to make the function take extra clone flags
>>>> but anyways once created cloned task can be treated the same way as
>>>> other kthreads, so nothing else needs to be changed.
>>>>          
>>> This makes kthread_stop() work. Otherwise the new thread is just
>>> the CLONE_VM child of the caller, and the caller is the user-mode
>>> task doing ioctl() ?
>>>        
>> Hmmm, indeed.  It makes the attribute inheritance work but circumvents
>> the whole reason there is kthreadd.
>>      
> I thought the whole reason there was threadd was to avoid the
> inheritance? So avoiding the avoiding of inheritance seems like the goal
> here, no?
>    
I think so. Does  it (Tejun's kthread_clone() patch) also  inherit the 
cgroup of the caller? or do we still need the explicit
call to attach the thread to the current task's cgroup?

I am on vacation next week and cannot look into this until Jul 12. Hope 
this will be resoved by then.
If not, i will look into after i am back.

Thanks
Sridhar


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox