Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] virtio_net: indicate oom when addbuf returns failure
From: Herbert Xu @ 2010-06-06 22:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, stable, Bruce Rogers, netdev, virtualization,
	Shirley Ma
In-Reply-To: <20100606201258.GA21196@redhat.com>

On Sun, Jun 06, 2010 at 11:13:00PM +0300, Michael S. Tsirkin wrote:
>
> Actually this code looks strange:
> Note that add_buf inicates out of memory
> condition with a positive return value, and ring full
> (which is not an error!) with -ENOSPC.

Indeed, this ultimately came from

commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
Author: Shirley Ma <mashirle@us.ibm.com>
Date:   Fri Jan 29 03:20:04 2010 +0000

    virtio_net: Defer skb allocation in receive path Date: Wed, 13 Jan 2010 12:53:38 -0800

(Greg, please don't apply this even though I've just given you
the upstream commit ID that you were asking for :)

where it confuses a memory allocation error with an add_buf failure.

> Possibly the right thing to do is to
> 1. handle ENOMEM specially
> 2. fix add_buf to return ENOMEM on error

I think we should make it so that only a memory allocation error
is returned as before.  There is no need for returning the add_buf
error unless add_buf is now doing an allocation itself that needs
to be retried.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] r8169: fix random mdio_write failures
From: David Miller @ 2010-06-06 22:39 UTC (permalink / raw)
  To: romieu; +Cc: timo.teras, netdev, edward_hsu, hayeswang
In-Reply-To: <20100605124103.GA3213@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Sat, 5 Jun 2010 14:41:03 +0200

> Timo Teräs <timo.teras@iki.fi> :
> [...]
>> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
>> index 217e709..03a8318 100644
>> --- a/drivers/net/r8169.c
>> +++ b/drivers/net/r8169.c
>> @@ -559,6 +559,11 @@ static void mdio_write(void __iomem *ioaddr, int reg_addr, int value)
>>  			break;
>>  		udelay(25);
>>  	}
>> +	/*
>> +	 * Some configurations require a small delay even after the write
>> +	 * completed indication or the next write might fail.
>> +	 */
>> +	udelay(25);
> 
> Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [RFC] pm_qos: get rid of the allocation in pm_qos_add_request()
From: mark gross @ 2010-06-06 23:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: pm list, markgross, netdev, alsa-devel, Takashi Iwai,
	Jaroslav Kysela
In-Reply-To: <1275765614.7227.42.camel@mulgrave.site>

On Sat, Jun 05, 2010 at 02:20:14PM -0500, James Bottomley wrote:
> [alsa-devel says it's a moderated list, so feel free to drop before
> replying]
> 
> Since every caller has to squirrel away the returned pointer anyway,
> they might as well supply the memory area.  This fixes a bug in a few of
> the call sites where the returned pointer was dereferenced without
> checking it for NULL (which gets returned if the kzalloc failed).
> 
> I'd like to hear how sound and netdev feels about this: it will add
> about two more pointers worth of data to struct netdev and struct
> snd_pcm_substream .. but I think it's worth it.  If you're OK, I'll add
> your acks and send through the pm tree.
> 
> This also looks to me like an android independent clean up (even though
> it renders the request_add atomically callable).  I also added include
> guards to include/linux/pm_qos_params.h
> 
> James
> 
> ---
> 
>  drivers/net/e1000e/netdev.c            |   17 ++++------
>  drivers/net/igbvf/netdev.c             |    9 ++---
>  drivers/net/wireless/ipw2x00/ipw2100.c |   12 +++---
>  include/linux/netdevice.h              |    2 +-
>  include/linux/pm_qos_params.h          |   12 +++++--
>  include/sound/pcm.h                    |    2 +-
>  kernel/pm_qos_params.c                 |   55 ++++++++++++++++---------------
>  sound/core/pcm_native.c                |   12 ++-----
>  8 files changed, 60 insertions(+), 61 deletions(-)
> 
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index 24507f3..47ea62f 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -2901,10 +2901,10 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
>  			 * dropped transactions.
>  			 */
>  			pm_qos_update_request(
> -				adapter->netdev->pm_qos_req, 55);
> +				&adapter->netdev->pm_qos_req, 55);
>  		} else {
>  			pm_qos_update_request(
> -				adapter->netdev->pm_qos_req,
> +				&adapter->netdev->pm_qos_req,
>  				PM_QOS_DEFAULT_VALUE);
>  		}
>  	}
> @@ -3196,9 +3196,9 @@ int e1000e_up(struct e1000_adapter *adapter)
>  
>  	/* DMA latency requirement to workaround early-receive/jumbo issue */
>  	if (adapter->flags & FLAG_HAS_ERT)
> -		adapter->netdev->pm_qos_req =
> -			pm_qos_add_request(PM_QOS_CPU_DMA_LATENCY,
> -				       PM_QOS_DEFAULT_VALUE);
> +		pm_qos_add_request(&adapter->netdev->pm_qos_req,
> +				   PM_QOS_CPU_DMA_LATENCY,
> +				   PM_QOS_DEFAULT_VALUE);
>  
>  	/* hardware has been reset, we need to reload some things */
>  	e1000_configure(adapter);
> @@ -3263,11 +3263,8 @@ void e1000e_down(struct e1000_adapter *adapter)
>  	e1000_clean_tx_ring(adapter);
>  	e1000_clean_rx_ring(adapter);
>  
> -	if (adapter->flags & FLAG_HAS_ERT) {
> -		pm_qos_remove_request(
> -			      adapter->netdev->pm_qos_req);
> -		adapter->netdev->pm_qos_req = NULL;
> -	}
> +	if (adapter->flags & FLAG_HAS_ERT)
> +		pm_qos_remove_request(&adapter->netdev->pm_qos_req);
>  
>  	/*
>  	 * TODO: for power management, we could drop the link and
> diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
> index 5e2b2a8..5da569f 100644
> --- a/drivers/net/igbvf/netdev.c
> +++ b/drivers/net/igbvf/netdev.c
> @@ -48,7 +48,7 @@
>  #define DRV_VERSION "1.0.0-k0"
>  char igbvf_driver_name[] = "igbvf";
>  const char igbvf_driver_version[] = DRV_VERSION;
> -struct pm_qos_request_list *igbvf_driver_pm_qos_req;
> +struct pm_qos_request_list igbvf_driver_pm_qos_req;
>  static const char igbvf_driver_string[] =
>  				"Intel(R) Virtual Function Network Driver";
>  static const char igbvf_copyright[] = "Copyright (c) 2009 Intel Corporation.";
> @@ -2902,8 +2902,8 @@ static int __init igbvf_init_module(void)
>  	printk(KERN_INFO "%s\n", igbvf_copyright);
>  
>  	ret = pci_register_driver(&igbvf_driver);
> -	igbvf_driver_pm_qos_req = pm_qos_add_request(PM_QOS_CPU_DMA_LATENCY,
> -	                       PM_QOS_DEFAULT_VALUE);
> +	pm_qos_add_request(igbvf_driver_pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
should be:

+	pm_qos_add_request(&igbvf_driver_pm_qos_req, PM_QOS_CPU_DMA_LATENCY,


--mgross

> +			   PM_QOS_DEFAULT_VALUE);
>  
>  	return ret;
>  }
> @@ -2918,8 +2918,7 @@ module_init(igbvf_init_module);
>  static void __exit igbvf_exit_module(void)
>  {
>  	pci_unregister_driver(&igbvf_driver);
> -	pm_qos_remove_request(igbvf_driver_pm_qos_req);
> -	igbvf_driver_pm_qos_req = NULL;
> +	pm_qos_remove_request(&igbvf_driver_pm_qos_req);
>  }
>  module_exit(igbvf_exit_module);
>  
> diff --git a/drivers/net/wireless/ipw2x00/ipw2100.c b/drivers/net/wireless/ipw2x00/ipw2100.c
> index 0bd4dfa..7f0d98b 100644
> --- a/drivers/net/wireless/ipw2x00/ipw2100.c
> +++ b/drivers/net/wireless/ipw2x00/ipw2100.c
> @@ -174,7 +174,7 @@ that only one external action is invoked at a time.
>  #define DRV_DESCRIPTION	"Intel(R) PRO/Wireless 2100 Network Driver"
>  #define DRV_COPYRIGHT	"Copyright(c) 2003-2006 Intel Corporation"
>  
> -struct pm_qos_request_list *ipw2100_pm_qos_req;
> +struct pm_qos_request_list ipw2100_pm_qos_req;
>  
>  /* Debugging stuff */
>  #ifdef CONFIG_IPW2100_DEBUG
> @@ -1741,7 +1741,7 @@ static int ipw2100_up(struct ipw2100_priv *priv, int deferred)
>  	/* the ipw2100 hardware really doesn't want power management delays
>  	 * longer than 175usec
>  	 */
> -	pm_qos_update_request(ipw2100_pm_qos_req, 175);
> +	pm_qos_update_request(&ipw2100_pm_qos_req, 175);
>  
>  	/* If the interrupt is enabled, turn it off... */
>  	spin_lock_irqsave(&priv->low_lock, flags);
> @@ -1889,7 +1889,7 @@ static void ipw2100_down(struct ipw2100_priv *priv)
>  	ipw2100_disable_interrupts(priv);
>  	spin_unlock_irqrestore(&priv->low_lock, flags);
>  
> -	pm_qos_update_request(ipw2100_pm_qos_req, PM_QOS_DEFAULT_VALUE);
> +	pm_qos_update_request(&ipw2100_pm_qos_req, PM_QOS_DEFAULT_VALUE);
>  
>  	/* We have to signal any supplicant if we are disassociating */
>  	if (associated)
> @@ -6669,8 +6669,8 @@ static int __init ipw2100_init(void)
>  	if (ret)
>  		goto out;
>  
> -	ipw2100_pm_qos_req = pm_qos_add_request(PM_QOS_CPU_DMA_LATENCY,
> -			PM_QOS_DEFAULT_VALUE);
> +	pm_qos_add_request(&ipw2100_pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
> +			   PM_QOS_DEFAULT_VALUE);
>  #ifdef CONFIG_IPW2100_DEBUG
>  	ipw2100_debug_level = debug;
>  	ret = driver_create_file(&ipw2100_pci_driver.driver,
> @@ -6692,7 +6692,7 @@ static void __exit ipw2100_exit(void)
>  			   &driver_attr_debug_level);
>  #endif
>  	pci_unregister_driver(&ipw2100_pci_driver);
> -	pm_qos_remove_request(ipw2100_pm_qos_req);
> +	pm_qos_remove_request(&ipw2100_pm_qos_req);
>  }
>  
>  module_init(ipw2100_init);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 40291f3..393555a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -779,7 +779,7 @@ struct net_device {
>  	 */
>  	char			name[IFNAMSIZ];
>  
> -	struct pm_qos_request_list *pm_qos_req;
> +	struct pm_qos_request_list pm_qos_req;
>  
>  	/* device name hash chain */
>  	struct hlist_node	name_hlist;
> diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
> index 8ba440e..d823cc0 100644
> --- a/include/linux/pm_qos_params.h
> +++ b/include/linux/pm_qos_params.h
> @@ -1,8 +1,10 @@
> +#ifndef _LINUX_PM_QOS_PARAMS_H
> +#define _LINUX_PM_QOS_PARAMS_H
>  /* interface for the pm_qos_power infrastructure of the linux kernel.
>   *
>   * Mark Gross <mgross@linux.intel.com>
>   */
> -#include <linux/list.h>
> +#include <linux/plist.h>
>  #include <linux/notifier.h>
>  #include <linux/miscdevice.h>
>  
> @@ -14,9 +16,12 @@
>  #define PM_QOS_NUM_CLASSES 4
>  #define PM_QOS_DEFAULT_VALUE -1
>  
> -struct pm_qos_request_list;
> +struct pm_qos_request_list {
> +	struct plist_node list;
> +	int pm_qos_class;
> +};
>  
> -struct pm_qos_request_list *pm_qos_add_request(int pm_qos_class, s32 value);
> +void pm_qos_add_request(struct pm_qos_request_list *l, int pm_qos_class, s32 value);
>  void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
>  		s32 new_value);
>  void pm_qos_remove_request(struct pm_qos_request_list *pm_qos_req);
> @@ -25,3 +30,4 @@ int pm_qos_request(int pm_qos_class);
>  int pm_qos_add_notifier(int pm_qos_class, struct notifier_block *notifier);
>  int pm_qos_remove_notifier(int pm_qos_class, struct notifier_block *notifier);
>  
> +#endif
> diff --git a/include/sound/pcm.h b/include/sound/pcm.h
> index dd76cde..6e3a297 100644
> --- a/include/sound/pcm.h
> +++ b/include/sound/pcm.h
> @@ -366,7 +366,7 @@ struct snd_pcm_substream {
>  	int number;
>  	char name[32];			/* substream name */
>  	int stream;			/* stream (direction) */
> -	struct pm_qos_request_list *latency_pm_qos_req; /* pm_qos request */
> +	struct pm_qos_request_list latency_pm_qos_req; /* pm_qos request */
>  	size_t buffer_bytes_max;	/* limit ring buffer size */
>  	struct snd_dma_buffer dma_buffer;
>  	unsigned int dma_buf_id;
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index 241fa79..f1d3d23 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -30,7 +30,6 @@
>  /*#define DEBUG*/
>  
>  #include <linux/pm_qos_params.h>
> -#include <linux/plist.h>
>  #include <linux/sched.h>
>  #include <linux/spinlock.h>
>  #include <linux/slab.h>
> @@ -49,11 +48,6 @@
>   * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
>   * held, taken with _irqsave.  One lock to rule them all
>   */
> -struct pm_qos_request_list {
> -	struct plist_node list;
> -	int pm_qos_class;
> -};
> -
>  enum pm_qos_type {
>  	PM_QOS_MAX,		/* return the largest value */
>  	PM_QOS_MIN,		/* return the smallest value */
> @@ -195,6 +189,11 @@ int pm_qos_request(int pm_qos_class)
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_request);
>  
> +static int pm_qos_request_active(struct pm_qos_request_list *req)
> +{
> +	return req->pm_qos_class != 0;
> +}
> +
>  /**
>   * pm_qos_add_request - inserts new qos request into the list
>   * @pm_qos_class: identifies which list of qos request to us
> @@ -206,25 +205,22 @@ EXPORT_SYMBOL_GPL(pm_qos_request);
>   * element as a handle for use in updating and removal.  Call needs to save
>   * this handle for later use.
>   */
> -struct pm_qos_request_list *pm_qos_add_request(int pm_qos_class, s32 value)
> +void pm_qos_add_request(struct pm_qos_request_list *dep,
> +			int pm_qos_class, s32 value)
>  {
> -	struct pm_qos_request_list *dep;
> -
> -	dep = kzalloc(sizeof(struct pm_qos_request_list), GFP_KERNEL);
> -	if (dep) {
> -		struct pm_qos_object *o =  pm_qos_array[pm_qos_class];
> -		int new_value;
> -
> -		if (value == PM_QOS_DEFAULT_VALUE)
> -			new_value = o->default_value;
> -		else
> -			new_value = value;
> -		plist_node_init(&dep->list, new_value);
> -		dep->pm_qos_class = pm_qos_class;
> -		update_target(o, &dep->list, 0);
> -	}
> +	struct pm_qos_object *o =  pm_qos_array[pm_qos_class];
> +	int new_value;
> +
> +	if (pm_qos_request_active(dep))
> +		return;
>  
> -	return dep;
> +	if (value == PM_QOS_DEFAULT_VALUE)
> +		new_value = o->default_value;
> +	else
> +		new_value = value;
> +	plist_node_init(&dep->list, new_value);
> +	dep->pm_qos_class = pm_qos_class;
> +	update_target(o, &dep->list, 0);
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_add_request);
>  
> @@ -286,7 +282,7 @@ void pm_qos_remove_request(struct pm_qos_request_list *pm_qos_req)
>  
>  	o = pm_qos_array[pm_qos_req->pm_qos_class];
>  	update_target(o, &pm_qos_req->list, 1);
> -	kfree(pm_qos_req);
> +	memset(pm_qos_req, 0, sizeof(*pm_qos_req));
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_remove_request);
>  
> @@ -334,8 +330,12 @@ static int pm_qos_power_open(struct inode *inode, struct file *filp)
>  
>  	pm_qos_class = find_pm_qos_object_by_minor(iminor(inode));
>  	if (pm_qos_class >= 0) {
> -		filp->private_data = (void *) pm_qos_add_request(pm_qos_class,
> -				PM_QOS_DEFAULT_VALUE);
> +		struct pm_qos_request_list *req = kzalloc(GFP_KERNEL, sizeof(*req));
> +		if (!req)
> +			return -ENOMEM;
> +
> +		pm_qos_add_request(req, pm_qos_class, PM_QOS_DEFAULT_VALUE);
> +		filp->private_data = req;
>  
>  		if (filp->private_data)
>  			return 0;
> @@ -347,8 +347,9 @@ static int pm_qos_power_release(struct inode *inode, struct file *filp)
>  {
>  	struct pm_qos_request_list *req;
>  
> -	req = (struct pm_qos_request_list *)filp->private_data;
> +	req = filp->private_data;
>  	pm_qos_remove_request(req);
> +	kfree(req);
>  
>  	return 0;
>  }
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index 303ac04..d3b8b51 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -451,13 +451,10 @@ static int snd_pcm_hw_params(struct snd_pcm_substream *substream,
>  	snd_pcm_timer_resolution_change(substream);
>  	runtime->status->state = SNDRV_PCM_STATE_SETUP;
>  
> -	if (substream->latency_pm_qos_req) {
> -		pm_qos_remove_request(substream->latency_pm_qos_req);
> -		substream->latency_pm_qos_req = NULL;
> -	}
> +	pm_qos_remove_request(&substream->latency_pm_qos_req);
>  	if ((usecs = period_to_usecs(runtime)) >= 0)
> -		substream->latency_pm_qos_req = pm_qos_add_request(
> -					PM_QOS_CPU_DMA_LATENCY, usecs);
> +		pm_qos_add_request(&substream->latency_pm_qos_req,
> +				   PM_QOS_CPU_DMA_LATENCY, usecs);
>  	return 0;
>   _error:
>  	/* hardware might be unuseable from this time,
> @@ -512,8 +509,7 @@ static int snd_pcm_hw_free(struct snd_pcm_substream *substream)
>  	if (substream->ops->hw_free)
>  		result = substream->ops->hw_free(substream);
>  	runtime->status->state = SNDRV_PCM_STATE_OPEN;
> -	pm_qos_remove_request(substream->latency_pm_qos_req);
> -	substream->latency_pm_qos_req = NULL;
> +	pm_qos_remove_request(&substream->latency_pm_qos_req);
>  	return result;
>  }
>  
> 
> 

^ permalink raw reply

* Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Stephen Hemminger @ 2010-06-06 23:13 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike
In-Reply-To: <1275732899-5423-1-git-send-email-xiaohui.xin@intel.com>

Still not sure this is a good idea for a couple of reasons:

1. We already have lots of special cases with skb's (frags and fraglist),
   and skb's travel through a lot of different parts of the kernel.  So any
   new change like this creates lots of exposed points for new bugs. Look
   at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
   and ppp and ...

2. SKB's can have infinite lifetime in the kernel. If these buffers come from
   a fixed size pool in an external device, they can easily all get tied up
   if you have a slow listener. What happens then?

^ permalink raw reply

* [PATCH] asix: check packet size against mtu+ETH_HLEN instead of ETH_FRAME_LEN
From: Jussi Kivilinna @ 2010-06-06 23:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, David Hollis

Driver checks received packet is too large in asix_rx_fixup() and fails if it is. Problem is
that MTU might be set larger than 1500 and asix fails to work correctly with VLAN tagged
packets. The check should be 'dev->net->mtu + ETH_HLEN' instead.

Tested with AX88772.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
---
 drivers/net/usb/asix.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 7e797ed..aea4645 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -344,7 +344,7 @@ static int asix_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 			return 2;
 		}

-		if (size > ETH_FRAME_LEN) {
+		if (size > dev->net->mtu + ETH_HLEN) {
 			netdev_err(dev->net, "asix_rx_fixup() Bad RX Length %d\n",
 				   size);
 			return 0;

^ permalink raw reply related

* Re: [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: Grant Likely @ 2010-06-07  0:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, netdev, Brian Hill, michal.simek, John Linn,
	john.williams
In-Reply-To: <1275861445.1931.2974.camel@pasglop>

On Sun, Jun 6, 2010 at 3:57 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Wed, 2010-05-26 at 11:29 -0600, John Linn wrote:
>> The code is not checking the interrupt for DMA correctly so that an
>> interrupt number of 0 will cause a false error.
>>
>> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
>> Signed-off-by: John Linn <john.linn@xilinx.com>
>> ---
>>  drivers/net/ll_temac_main.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
>> index fa7620e..0615737 100644
>> --- a/drivers/net/ll_temac_main.c
>> +++ b/drivers/net/ll_temac_main.c
>> @@ -950,7 +950,7 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
>>
>>       lp->rx_irq = irq_of_parse_and_map(np, 0);
>>       lp->tx_irq = irq_of_parse_and_map(np, 1);
>> -     if (!lp->rx_irq || !lp->tx_irq) {
>> +     if ((lp->rx_irq == NO_IRQ) || (lp->tx_irq == NO_IRQ)) {
>>               dev_err(&op->dev, "could not determine irqs\n");
>>               rc = -ENOMEM;
>>               goto nodev;
>
> Hasn't NO_IRQ been deprecated ?

ARM still uses it, plus less than a handful of other arches.  There is
no reason for Microblaze to use NO_IRQ as -1.

In fact, on ARM, as part of the device tree work I'm hoping to piggy
back in irq remapping and make 0 no longer a valid irq.

> Linus made it clear a while back that interrupt 0 was not valid and
> that's the way it should be.
>
> We now have an interrupt remapping scheme on powerpc, so we ensure that
> 0 always mean no interrupt. Other archs might need some fixups. Which
> are specifically needs this patch ?

Microblaze.

BTW jlinn, for xilinx ARM support we should make sure 0 is never used
as a valid IRQ from day one.  It will result it far less pain in the
future.

g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* Re: [PATCH net-next 00/11] tg3: Bugfixes and 5719 support
From: David Miller @ 2010-06-07  1:00 UTC (permalink / raw)
  To: mcarlson; +Cc: netdev, andy
In-Reply-To: <1275794679-11085-1-git-send-email-mcarlson@broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Sat, 5 Jun 2010 20:24:29 -0700

> This patchset adds some bugfixes and adds 5719 device support.

All applied to net-next-2.6 but there are two things I'm very disappointed
with in this series:

1) Naming register bits things like "TX_MBUF_FIX" isn't descriptive,
   and I suspect the actual bit name used in your programming manuals
   is very different and would be more helpful to someone reading the
   code and trying to understand exactly what that bit does.

   How does it change the chips internal MBUF handling behavior?  Does
   it insert a delay in accesses or state changes?  Does it change
   the MBUF arbitration?  What the heck does that bit do exactly?

2) Removing register definitions is something we really shouldn't do.
   Just because you're not using the register any more in the driver,
   doesn't mean you should remove it's definition from tg3.h

   What if some other developer wants to play with that register and
   use it for some other purpose or experiment?

You really have to handle situations like #1 and #2 better especially
since you guys do not public post the full PDF hardware programming
manuals of your chips online for other developers to use.

I wouldn't, therefore, impose these rules on Intel and their drivers
because they do public the programming manuals for their networking
chips.

^ permalink raw reply

* Re: [PATCH] virtio_net: indicate oom when addbuf returns failure
From: Rusty Russell @ 2010-06-07  2:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: stable, Bruce Rogers, Herbert Xu, netdev, virtualization
In-Reply-To: <20100606201258.GA21196@redhat.com>

On Mon, 7 Jun 2010 05:43:00 am Michael S. Tsirkin wrote:
> On Fri, Jun 04, 2010 at 10:28:56AM +0930, Rusty Russell wrote:
> > This patch is a subset of an already upstream patch, but this portion
> > is useful in earlier releases.
> > 
> > Please consider for the 2.6.32 and 2.6.33 stable trees.
> > 
> > If the add_buf operation fails, indicate failure to the caller.
> > 
> > Signed-off-by: Bruce Rogers <brogers@novell.com>
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> Actually this code looks strange:
> Note that add_buf inicates out of memory
> condition with a positive return value, and ring full
> (which is not an error!) with -ENOSPC.
> 
> So it seems that this patch (and upstream code) will fill
> the ring and then end up setting oom = true and rescheduling the work
> forever.  And I suspect I actually saw this at some point
> on one of my systems: observed BW would drop
> with high CPU usage until reboot.
> Can't reproduce it now anymore ..

I thought that at first too, but it's subtler than that.

When the ring is exactly filled, err = 0.  With mergeable bufs and small
bufs that's the.  With big buffers, it's probably not, and the code will
indeed respond as if always out of memory, always trying to refill.

Our current code has the same error.  Probably a combination of noone using
big buffers, and noone noticing one timer every 1/2 second.

Want to fix that properly?  And comment it? :)

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCH] virtio_net: indicate oom when addbuf returns failure
From: Rusty Russell @ 2010-06-07  2:24 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, stable, Bruce Rogers, netdev, virtualization,
	Shirley Ma
In-Reply-To: <20100606222441.GA5992@gondor.apana.org.au>

On Mon, 7 Jun 2010 07:54:41 am Herbert Xu wrote:
> On Sun, Jun 06, 2010 at 11:13:00PM +0300, Michael S. Tsirkin wrote:
> >
> > Actually this code looks strange:
> > Note that add_buf inicates out of memory
> > condition with a positive return value, and ring full
> > (which is not an error!) with -ENOSPC.
> 
> Indeed, this ultimately came from
> 
> commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
> Author: Shirley Ma <mashirle@us.ibm.com>
> Date:   Fri Jan 29 03:20:04 2010 +0000
> 
>     virtio_net: Defer skb allocation in receive path Date: Wed, 13 Jan 2010 12:53:38 -0800
> 
> (Greg, please don't apply this even though I've just given you
> the upstream commit ID that you were asking for :)
> 
> where it confuses a memory allocation error with an add_buf failure.
> 
> > Possibly the right thing to do is to
> > 1. handle ENOMEM specially
> > 2. fix add_buf to return ENOMEM on error
> 
> I think we should make it so that only a memory allocation error
> is returned as before.  There is no need for returning the add_buf
> error unless add_buf is now doing an allocation itself that needs
> to be retried.

With indirect bufs, this is indeed the case.  The code works except
for the bigpackets !mergable case, but should be clarified anyway.

See my other mail...

Thanks,
Rusty.

^ permalink raw reply

* Re: [RFC] pm_qos: get rid of the allocation in pm_qos_add_request()
From: James Bottomley @ 2010-06-07  2:47 UTC (permalink / raw)
  To: markgross; +Cc: pm list, netdev, alsa-devel, Takashi Iwai, Jaroslav Kysela
In-Reply-To: <20100606213224.GC8610@gvim.org>

On Sun, 2010-06-06 at 14:32 -0700, mark gross wrote:
> so the test for an un-registerd or in-initialized request is if list == null.

Actually I was using pm_qos_class == 0, but all current users use NULL
as the pointer test, so they must all be allocated in zero initialised
memory.

> >  
> > -struct pm_qos_request_list *pm_qos_add_request(int pm_qos_class, s32 value);
> > +void pm_qos_add_request(struct pm_qos_request_list *l, int pm_qos_class, s32 value);
> >  void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> >  		s32 new_value);
> >  void pm_qos_remove_request(struct pm_qos_request_list *pm_qos_req);
> > @@ -25,3 +30,4 @@ int pm_qos_request(int pm_qos_class);
> >  int pm_qos_add_notifier(int pm_qos_class, struct notifier_block *notifier);
> >  int pm_qos_remove_notifier(int pm_qos_class, struct notifier_block *notifier);
> >  
> > +#endif
> > diff --git a/include/sound/pcm.h b/include/sound/pcm.h
> > index dd76cde..6e3a297 100644
> > --- a/include/sound/pcm.h
> > +++ b/include/sound/pcm.h
> > @@ -366,7 +366,7 @@ struct snd_pcm_substream {
> >  	int number;
> >  	char name[32];			/* substream name */
> >  	int stream;			/* stream (direction) */
> > -	struct pm_qos_request_list *latency_pm_qos_req; /* pm_qos request */
> > +	struct pm_qos_request_list latency_pm_qos_req; /* pm_qos request */
> >  	size_t buffer_bytes_max;	/* limit ring buffer size */
> >  	struct snd_dma_buffer dma_buffer;
> >  	unsigned int dma_buf_id;
> > diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> > index 241fa79..f1d3d23 100644
> > --- a/kernel/pm_qos_params.c
> > +++ b/kernel/pm_qos_params.c
> > @@ -30,7 +30,6 @@
> >  /*#define DEBUG*/
> >  
> >  #include <linux/pm_qos_params.h>
> > -#include <linux/plist.h>
> ? plist pm_qos isn't yet in the code base yet. ;)
> Is this patch assumed after your RFC patch?
> It must be....

Yes, they're stacked.

> >  #include <linux/sched.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/slab.h>
> > @@ -49,11 +48,6 @@
> >   * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
> >   * held, taken with _irqsave.  One lock to rule them all
> >   */
> > -struct pm_qos_request_list {
> > -	struct plist_node list;
> > -	int pm_qos_class;
> > -};
> > -
> >  enum pm_qos_type {
> >  	PM_QOS_MAX,		/* return the largest value */
> >  	PM_QOS_MIN,		/* return the smallest value */
> > @@ -195,6 +189,11 @@ int pm_qos_request(int pm_qos_class)
> >  }
> >  EXPORT_SYMBOL_GPL(pm_qos_request);
> >  
> > +static int pm_qos_request_active(struct pm_qos_request_list *req)
> > +{
> > +	return req->pm_qos_class != 0;
> > +}
> > +
> >  /**
> >   * pm_qos_add_request - inserts new qos request into the list
> >   * @pm_qos_class: identifies which list of qos request to us
> > @@ -206,25 +205,22 @@ EXPORT_SYMBOL_GPL(pm_qos_request);
> >   * element as a handle for use in updating and removal.  Call needs to save
> >   * this handle for later use.
> >   */
> > -struct pm_qos_request_list *pm_qos_add_request(int pm_qos_class, s32 value)
> > +void pm_qos_add_request(struct pm_qos_request_list *dep,
> > +			int pm_qos_class, s32 value)
> >  {
> > -	struct pm_qos_request_list *dep;
> > -
> > -	dep = kzalloc(sizeof(struct pm_qos_request_list), GFP_KERNEL);
> > -	if (dep) {
> > -		struct pm_qos_object *o =  pm_qos_array[pm_qos_class];
> > -		int new_value;
> > -
> > -		if (value == PM_QOS_DEFAULT_VALUE)
> > -			new_value = o->default_value;
> > -		else
> > -			new_value = value;
> > -		plist_node_init(&dep->list, new_value);
> > -		dep->pm_qos_class = pm_qos_class;
> > -		update_target(o, &dep->list, 0);
> > -	}
> > +	struct pm_qos_object *o =  pm_qos_array[pm_qos_class];
> > +	int new_value;
> > +
> > +	if (pm_qos_request_active(dep))
> > +		return;
> >  
> > -	return dep;
> > +	if (value == PM_QOS_DEFAULT_VALUE)
> > +		new_value = o->default_value;
> > +	else
> > +		new_value = value;
> > +	plist_node_init(&dep->list, new_value);
> > +	dep->pm_qos_class = pm_qos_class;
> > +	update_target(o, &dep->list, 0);
> >  }
> >  EXPORT_SYMBOL_GPL(pm_qos_add_request);
> >  
> > @@ -286,7 +282,7 @@ void pm_qos_remove_request(struct pm_qos_request_list *pm_qos_req)
> >  
> >  	o = pm_qos_array[pm_qos_req->pm_qos_class];
> >  	update_target(o, &pm_qos_req->list, 1);
> > -	kfree(pm_qos_req);
> > +	memset(pm_qos_req, 0, sizeof(*pm_qos_req));
> 
> I was wondering how to tell if a pm_qos_request was un-initialized
> un-registered request.

The assumption is zero initialised.

> >  }
> >  EXPORT_SYMBOL_GPL(pm_qos_remove_request);
> >  
> > @@ -334,8 +330,12 @@ static int pm_qos_power_open(struct inode *inode, struct file *filp)
> >  
> >  	pm_qos_class = find_pm_qos_object_by_minor(iminor(inode));
> >  	if (pm_qos_class >= 0) {
> > -		filp->private_data = (void *) pm_qos_add_request(pm_qos_class,
> > -				PM_QOS_DEFAULT_VALUE);
> > +		struct pm_qos_request_list *req = kzalloc(GFP_KERNEL, sizeof(*req));
> > +		if (!req)
> > +			return -ENOMEM;
> > +
> > +		pm_qos_add_request(req, pm_qos_class, PM_QOS_DEFAULT_VALUE);
> > +		filp->private_data = req;
> >  
> >  		if (filp->private_data)
> >  			return 0;
> > @@ -347,8 +347,9 @@ static int pm_qos_power_release(struct inode *inode, struct file *filp)
> >  {
> >  	struct pm_qos_request_list *req;
> >  
> > -	req = (struct pm_qos_request_list *)filp->private_data;
> > +	req = filp->private_data;
> >  	pm_qos_remove_request(req);
> > +	kfree(req);
> >  
> >  	return 0;
> >  }
> > diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> > index 303ac04..d3b8b51 100644
> > --- a/sound/core/pcm_native.c
> > +++ b/sound/core/pcm_native.c
> > @@ -451,13 +451,10 @@ static int snd_pcm_hw_params(struct snd_pcm_substream *substream,
> >  	snd_pcm_timer_resolution_change(substream);
> >  	runtime->status->state = SNDRV_PCM_STATE_SETUP;
> >  
> > -	if (substream->latency_pm_qos_req) {
> > -		pm_qos_remove_request(substream->latency_pm_qos_req);
> > -		substream->latency_pm_qos_req = NULL;
> > -	}
> > +	pm_qos_remove_request(&substream->latency_pm_qos_req);
> >  	if ((usecs = period_to_usecs(runtime)) >= 0)
> > -		substream->latency_pm_qos_req = pm_qos_add_request(
> > -					PM_QOS_CPU_DMA_LATENCY, usecs);
> > +		pm_qos_add_request(&substream->latency_pm_qos_req,
> > +				   PM_QOS_CPU_DMA_LATENCY, usecs);
> How are we going to avoid re-adding the latency_pm_qos_request multiple
> times to the list?  the last time I looked at this I really wanted to
> change this to update_request.  But, I got pushback so I added the file
> gard in pmqos_params.c instead. 

There's a guard check in add that does this ... it seemed better putting
it there than replicating through the subsystems.

James



^ permalink raw reply

* Re: [RFC] pm_qos: get rid of the allocation in pm_qos_add_request()
From: James Bottomley @ 2010-06-07  2:50 UTC (permalink / raw)
  To: markgross; +Cc: pm list, netdev, alsa-devel, Takashi Iwai, Jaroslav Kysela
In-Reply-To: <20100606231021.GA17031@gvim.org>

On Sun, 2010-06-06 at 16:10 -0700, mark gross wrote:
> On Sat, Jun 05, 2010 at 02:20:14PM -0500, James Bottomley wrote:
> > [alsa-devel says it's a moderated list, so feel free to drop before
> > replying]
> > 
> > Since every caller has to squirrel away the returned pointer anyway,
> > they might as well supply the memory area.  This fixes a bug in a few of
> > the call sites where the returned pointer was dereferenced without
> > checking it for NULL (which gets returned if the kzalloc failed).
> > 
> > I'd like to hear how sound and netdev feels about this: it will add
> > about two more pointers worth of data to struct netdev and struct
> > snd_pcm_substream .. but I think it's worth it.  If you're OK, I'll add
> > your acks and send through the pm tree.
> > 
> > This also looks to me like an android independent clean up (even though
> > it renders the request_add atomically callable).  I also added include
> > guards to include/linux/pm_qos_params.h
> > 
> > James
> > 
> > ---
> > 
> >  drivers/net/e1000e/netdev.c            |   17 ++++------
> >  drivers/net/igbvf/netdev.c             |    9 ++---
> >  drivers/net/wireless/ipw2x00/ipw2100.c |   12 +++---
> >  include/linux/netdevice.h              |    2 +-
> >  include/linux/pm_qos_params.h          |   12 +++++--
> >  include/sound/pcm.h                    |    2 +-
> >  kernel/pm_qos_params.c                 |   55 ++++++++++++++++---------------
> >  sound/core/pcm_native.c                |   12 ++-----
> >  8 files changed, 60 insertions(+), 61 deletions(-)
> > 
> > diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> > index 24507f3..47ea62f 100644
> > --- a/drivers/net/e1000e/netdev.c
> > +++ b/drivers/net/e1000e/netdev.c
> > @@ -2901,10 +2901,10 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
> >  			 * dropped transactions.
> >  			 */
> >  			pm_qos_update_request(
> > -				adapter->netdev->pm_qos_req, 55);
> > +				&adapter->netdev->pm_qos_req, 55);
> >  		} else {
> >  			pm_qos_update_request(
> > -				adapter->netdev->pm_qos_req,
> > +				&adapter->netdev->pm_qos_req,
> >  				PM_QOS_DEFAULT_VALUE);
> >  		}
> >  	}
> > @@ -3196,9 +3196,9 @@ int e1000e_up(struct e1000_adapter *adapter)
> >  
> >  	/* DMA latency requirement to workaround early-receive/jumbo issue */
> >  	if (adapter->flags & FLAG_HAS_ERT)
> > -		adapter->netdev->pm_qos_req =
> > -			pm_qos_add_request(PM_QOS_CPU_DMA_LATENCY,
> > -				       PM_QOS_DEFAULT_VALUE);
> > +		pm_qos_add_request(&adapter->netdev->pm_qos_req,
> > +				   PM_QOS_CPU_DMA_LATENCY,
> > +				   PM_QOS_DEFAULT_VALUE);
> >  
> >  	/* hardware has been reset, we need to reload some things */
> >  	e1000_configure(adapter);
> > @@ -3263,11 +3263,8 @@ void e1000e_down(struct e1000_adapter *adapter)
> >  	e1000_clean_tx_ring(adapter);
> >  	e1000_clean_rx_ring(adapter);
> >  
> > -	if (adapter->flags & FLAG_HAS_ERT) {
> > -		pm_qos_remove_request(
> > -			      adapter->netdev->pm_qos_req);
> > -		adapter->netdev->pm_qos_req = NULL;
> > -	}
> > +	if (adapter->flags & FLAG_HAS_ERT)
> > +		pm_qos_remove_request(&adapter->netdev->pm_qos_req);
> >  
> >  	/*
> >  	 * TODO: for power management, we could drop the link and
> > diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
> > index 5e2b2a8..5da569f 100644
> > --- a/drivers/net/igbvf/netdev.c
> > +++ b/drivers/net/igbvf/netdev.c
> > @@ -48,7 +48,7 @@
> >  #define DRV_VERSION "1.0.0-k0"
> >  char igbvf_driver_name[] = "igbvf";
> >  const char igbvf_driver_version[] = DRV_VERSION;
> > -struct pm_qos_request_list *igbvf_driver_pm_qos_req;
> > +struct pm_qos_request_list igbvf_driver_pm_qos_req;
> >  static const char igbvf_driver_string[] =
> >  				"Intel(R) Virtual Function Network Driver";
> >  static const char igbvf_copyright[] = "Copyright (c) 2009 Intel Corporation.";
> > @@ -2902,8 +2902,8 @@ static int __init igbvf_init_module(void)
> >  	printk(KERN_INFO "%s\n", igbvf_copyright);
> >  
> >  	ret = pci_register_driver(&igbvf_driver);
> > -	igbvf_driver_pm_qos_req = pm_qos_add_request(PM_QOS_CPU_DMA_LATENCY,
> > -	                       PM_QOS_DEFAULT_VALUE);
> > +	pm_qos_add_request(igbvf_driver_pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
> should be:
> 
> +	pm_qos_add_request(&igbvf_driver_pm_qos_req, PM_QOS_CPU_DMA_LATENCY,

Yes, thanks ... must be not compiling this driver for some reason in the
test case.

James



^ permalink raw reply

* Read Mail
From: Andrew Lu Sang @ 2010-06-06 10:40 UTC (permalink / raw)


My Dear Beloved,
I have an obscured business suggestion for you of $24,500,000USD.I will update you with further information 
as soon as I receive your complete name and phone number.Sincerely,Andrew

^ permalink raw reply

* [PATCH] tcp: Fix slowness in read /proc/net/tcp
From: Tom Herbert @ 2010-06-07  3:27 UTC (permalink / raw)
  To: davem; +Cc: netdev

This patch address a serious performance issue in reading the
TCP sockets table (/proc/net/tcp).

Reading the full table is done by a number of sequential read
operations.  At each read operation, a seek is done to find the
last socket that was previously read.  This seek operation requires
that the sockets in the table need to be counted up to the current
file position, and to count each of these requires taking a lock for
each non-empty bucket.  The whole algorithm is O(n^2).

The fix is to cache the last bucket value, offset within the bucket,
and the file position returned by the last read operation.   On the
next sequential read, the bucket and offset are used to find the
last read socket immediately without needing ot scan the previous
buckets  the table.  This algorithm t read the whole table is O(n).

The improvement offered by this patch is easily show by performing
cat'ing /proc/net/tcp on a machine with a lot of connections.  With
about 182K connections in the table, I see the following:

- Without patch
time cat /proc/net/tcp > /dev/null

real	1m56.729s
user	0m0.214s
sys	1m56.344s

- With patch
time cat /proc/net/tcp > /dev/null

real	0m0.894s
user	0m0.290s
sys	0m0.594s

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a144914..5731664 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1413,7 +1413,8 @@ struct tcp_iter_state {
 	sa_family_t		family;
 	enum tcp_seq_states	state;
 	struct sock		*syn_wait_sk;
-	int			bucket, sbucket, num, uid;
+	int			bucket, offset, sbucket, num, uid;
+	loff_t			last_pos;
 };
 
 extern int tcp_proc_register(struct net *net, struct tcp_seq_afinfo *afinfo);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 202cf09..4af6f20 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1977,6 +1977,11 @@ static inline struct inet_timewait_sock *tw_next(struct inet_timewait_sock *tw)
 		hlist_nulls_entry(tw->tw_node.next, typeof(*tw), tw_node) : NULL;
 }
 
+/*
+ * Get next listener socket follow cur.  If cur is NULL, get first socket
+ * starting from bucket given in st->bucket; when st->bucket is zero the
+ * very first socket in the hash table is returned.
+ */
 static void *listening_get_next(struct seq_file *seq, void *cur)
 {
 	struct inet_connection_sock *icsk;
@@ -1987,14 +1992,15 @@ static void *listening_get_next(struct seq_file *seq, void *cur)
 	struct net *net = seq_file_net(seq);
 
 	if (!sk) {
-		st->bucket = 0;
-		ilb = &tcp_hashinfo.listening_hash[0];
+		ilb = &tcp_hashinfo.listening_hash[st->bucket];
 		spin_lock_bh(&ilb->lock);
 		sk = sk_nulls_head(&ilb->head);
+		st->offset = 0;
 		goto get_sk;
 	}
 	ilb = &tcp_hashinfo.listening_hash[st->bucket];
 	++st->num;
+	++st->offset;
 
 	if (st->state == TCP_SEQ_STATE_OPENREQ) {
 		struct request_sock *req = cur;
@@ -2009,6 +2015,7 @@ static void *listening_get_next(struct seq_file *seq, void *cur)
 				}
 				req = req->dl_next;
 			}
+			st->offset = 0;
 			if (++st->sbucket >= icsk->icsk_accept_queue.listen_opt->nr_table_entries)
 				break;
 get_req:
@@ -2044,6 +2051,7 @@ start_req:
 		read_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
 	}
 	spin_unlock_bh(&ilb->lock);
+	st->offset = 0;
 	if (++st->bucket < INET_LHTABLE_SIZE) {
 		ilb = &tcp_hashinfo.listening_hash[st->bucket];
 		spin_lock_bh(&ilb->lock);
@@ -2057,7 +2065,12 @@ out:
 
 static void *listening_get_idx(struct seq_file *seq, loff_t *pos)
 {
-	void *rc = listening_get_next(seq, NULL);
+	struct tcp_iter_state *st = seq->private;
+	void *rc;
+
+	st->bucket = 0;
+	st->offset = 0;
+	rc = listening_get_next(seq, NULL);
 
 	while (rc && *pos) {
 		rc = listening_get_next(seq, rc);
@@ -2072,13 +2085,18 @@ static inline int empty_bucket(struct tcp_iter_state *st)
 		hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
 }
 
+/*
+ * Get first established socket starting from bucket given in st->bucket.
+ * If st->bucket is zero, the very first socket in the hash is returned.
+ */
 static void *established_get_first(struct seq_file *seq)
 {
 	struct tcp_iter_state *st = seq->private;
 	struct net *net = seq_file_net(seq);
 	void *rc = NULL;
 
-	for (st->bucket = 0; st->bucket <= tcp_hashinfo.ehash_mask; ++st->bucket) {
+	st->offset = 0;
+	for (; st->bucket <= tcp_hashinfo.ehash_mask; ++st->bucket) {
 		struct sock *sk;
 		struct hlist_nulls_node *node;
 		struct inet_timewait_sock *tw;
@@ -2123,6 +2141,7 @@ static void *established_get_next(struct seq_file *seq, void *cur)
 	struct net *net = seq_file_net(seq);
 
 	++st->num;
+	++st->offset;
 
 	if (st->state == TCP_SEQ_STATE_TIME_WAIT) {
 		tw = cur;
@@ -2139,6 +2158,7 @@ get_tw:
 		st->state = TCP_SEQ_STATE_ESTABLISHED;
 
 		/* Look for next non empty bucket */
+		st->offset = 0;
 		while (++st->bucket <= tcp_hashinfo.ehash_mask &&
 				empty_bucket(st))
 			;
@@ -2166,7 +2186,11 @@ out:
 
 static void *established_get_idx(struct seq_file *seq, loff_t pos)
 {
-	void *rc = established_get_first(seq);
+	struct tcp_iter_state *st = seq->private;
+	void *rc;
+
+	st->bucket = 0;
+	rc = established_get_first(seq);
 
 	while (rc && pos) {
 		rc = established_get_next(seq, rc);
@@ -2191,24 +2215,72 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
 	return rc;
 }
 
+static void *tcp_seek_last_pos(struct seq_file *seq)
+{
+	struct tcp_iter_state *st = seq->private;
+	int offset = st->offset;
+	int orig_num = st->num;
+	void *rc = NULL;
+	
+	switch (st->state) {
+	case TCP_SEQ_STATE_OPENREQ:
+	case TCP_SEQ_STATE_LISTENING:
+		if (st->bucket >= INET_LHTABLE_SIZE)
+			break;
+		st->state = TCP_SEQ_STATE_LISTENING;
+		rc = listening_get_next(seq, NULL);
+		while (offset-- && rc)
+			rc = listening_get_next(seq, rc);
+		if (rc)
+			break;
+		st->bucket = 0;
+		/* Fallthrough */
+	case TCP_SEQ_STATE_ESTABLISHED:
+	case TCP_SEQ_STATE_TIME_WAIT:
+		st->state = TCP_SEQ_STATE_ESTABLISHED;
+		if (st->bucket > tcp_hashinfo.ehash_mask)
+			break;
+		rc = established_get_first(seq);
+		while (offset-- && rc)
+			rc = established_get_next(seq, rc);
+	}
+
+	st->num = orig_num;
+
+	return rc;
+}
+
 static void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct tcp_iter_state *st = seq->private;
+	void *rc;
+
+	if (*pos && *pos == st->last_pos) {
+		rc = tcp_seek_last_pos(seq);
+		if (rc)
+			goto out;
+	}
+
 	st->state = TCP_SEQ_STATE_LISTENING;
 	st->num = 0;
-	return *pos ? tcp_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
+	st->bucket = 0;
+	st->offset = 0;
+	rc = *pos ? tcp_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
+
+out:
+	st->last_pos = *pos;
+	return rc;
 }
 
 static void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	struct tcp_iter_state *st = seq->private;
 	void *rc = NULL;
-	struct tcp_iter_state *st;
 
 	if (v == SEQ_START_TOKEN) {
 		rc = tcp_get_idx(seq, 0);
 		goto out;
 	}
-	st = seq->private;
 
 	switch (st->state) {
 	case TCP_SEQ_STATE_OPENREQ:
@@ -2216,6 +2288,8 @@ static void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 		rc = listening_get_next(seq, v);
 		if (!rc) {
 			st->state = TCP_SEQ_STATE_ESTABLISHED;
+			st->bucket = 0;
+			st->offset = 0;
 			rc	  = established_get_first(seq);
 		}
 		break;
@@ -2226,6 +2300,7 @@ static void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	}
 out:
 	++*pos;
+	st->last_pos = *pos;
 	return rc;
 }
 
@@ -2264,6 +2339,7 @@ static int tcp_seq_open(struct inode *inode, struct file *file)
 
 	s = ((struct seq_file *)file->private_data)->private;
 	s->family		= afinfo->family;
+	s->last_pos 		= 0;
 	return 0;
 }
 


^ permalink raw reply related

* Re: [PATCH net-next 00/11] tg3: Bugfixes and 5719 support
From: Matt Carlson @ 2010-06-07  3:59 UTC (permalink / raw)
  To: David Miller; +Cc: Matthew Carlson, netdev@vger.kernel.org, andy@greyhouse.net
In-Reply-To: <20100606.180029.42789560.davem@davemloft.net>

On Sun, Jun 06, 2010 at 06:00:29PM -0700, David Miller wrote:
> From: "Matt Carlson" <mcarlson@broadcom.com>
> Date: Sat, 5 Jun 2010 20:24:29 -0700
> 
> > This patchset adds some bugfixes and adds 5719 device support.
> 
> All applied to net-next-2.6 but there are two things I'm very disappointed
> with in this series:
> 
> 1) Naming register bits things like "TX_MBUF_FIX" isn't descriptive,
>    and I suspect the actual bit name used in your programming manuals
>    is very different and would be more helpful to someone reading the
>    code and trying to understand exactly what that bit does.

Actually, the programming manual describes this register just as
tersely.

>    How does it change the chips internal MBUF handling behavior?  Does
>    it insert a delay in accesses or state changes?  Does it change
>    the MBUF arbitration?  What the heck does that bit do exactly?

The programming manual says this bit prevents a tx state machine lockup
due to tx mbuf corruption.  The conditions under which the tx mbufs get
corrupted is complicated, but the net effect of this bit is that the
RDMA engine acts a little more conservatively.

> 2) Removing register definitions is something we really shouldn't do.
>    Just because you're not using the register any more in the driver,
>    doesn't mean you should remove it's definition from tg3.h
> 
>    What if some other developer wants to play with that register and
>    use it for some other purpose or experiment?

The problem is that register definitions can change from asic rev to
asic rev.  Someone interested in playing with their hardware would have
to know more about their chip before it could even be considered safe to
use these bits.  Rather than have a bit do something other than appears
advertised, it would be safer to just remove it from view.

The patch that removed the register definition exists precisely because
that register definition was changing.  The motivating reason for the
patch was to make the code cleaner and more maintainable, but anyone
attempting to use that definition on a non-5717 device would be using it
incorrectly.

> You really have to handle situations like #1 and #2 better especially
> since you guys do not public post the full PDF hardware programming
> manuals of your chips online for other developers to use.
> 
> I wouldn't, therefore, impose these rules on Intel and their drivers
> because they do public the programming manuals for their networking
> chips.

Understood.

^ permalink raw reply

* Re: [PATCH] tcp: Fix slowness in read /proc/net/tcp
From: Eric Dumazet @ 2010-06-07  7:19 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, Yakov Lerner
In-Reply-To: <alpine.DEB.1.00.1006062020180.24064@pokey.mtv.corp.google.com>

Le dimanche 06 juin 2010 à 20:27 -0700, Tom Herbert a écrit :
> This patch address a serious performance issue in reading the
> TCP sockets table (/proc/net/tcp).
> 
> Reading the full table is done by a number of sequential read
> operations.  At each read operation, a seek is done to find the
> last socket that was previously read.  This seek operation requires
> that the sockets in the table need to be counted up to the current
> file position, and to count each of these requires taking a lock for
> each non-empty bucket.  The whole algorithm is O(n^2).
> 
> The fix is to cache the last bucket value, offset within the bucket,
> and the file position returned by the last read operation.   On the
> next sequential read, the bucket and offset are used to find the
> last read socket immediately without needing ot scan the previous
> buckets  the table.  This algorithm t read the whole table is O(n).
> 
> The improvement offered by this patch is easily show by performing
> cat'ing /proc/net/tcp on a machine with a lot of connections.  With
> about 182K connections in the table, I see the following:
> 
> - Without patch
> time cat /proc/net/tcp > /dev/null
> 
> real	1m56.729s
> user	0m0.214s
> sys	1m56.344s
> 
> - With patch
> time cat /proc/net/tcp > /dev/null
> 
> real	0m0.894s
> user	0m0.290s
> sys	0m0.594s
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---

This problem raises every year,  (last attempt from Yakov Lerner :
http://kerneltrap.org/mailarchive/linux-netdev/2009/9/26/6256119 )

And finally, someone motivated enough to use /proc/net/tcp found the
right answer ;)

Most netdev people tend to push inet_diag (netlink) interface instead of
old /proc/net/tcp, but it seems old interface will still be used in
2030, so :

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

BTW, another problem of /proc/net/tcp is the buffer size used by netstat
utility : 1024 bytes instead of PAGE_SIZE, making O(N^2) behavior even
more palpable.




^ permalink raw reply

* Re: [PATCH net-next 00/11] tg3: Bugfixes and 5719 support
From: David Miller @ 2010-06-07  7:28 UTC (permalink / raw)
  To: mcarlson; +Cc: netdev, andy
In-Reply-To: <20100607035934.GB11599@mcarlson.broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Sun, 6 Jun 2010 20:59:34 -0700

> On Sun, Jun 06, 2010 at 06:00:29PM -0700, David Miller wrote:
> The programming manual says this bit prevents a tx state machine lockup
> due to tx mbuf corruption.  The conditions under which the tx mbufs get
> corrupted is complicated, but the net effect of this bit is that the
> RDMA engine acts a little more conservatively.

That last phrase is a good description and could have been used to
better name the register bit macro in question.  You're basically
confirming that you named the register bit too tersely.

> The problem is that register definitions can change from asic rev to
> asic rev.

Frankly, this kind of justification really ticks me off.

How the heck does that make a difference?  Describe which chip revs
the register bit is valid for in a comment for crying out loud.

Document your hardware properly, not selectively or where you see
it fit to do so.

It's bad enough that you guys don't publish your hardware specs.
As a consequence as much knowledge as possible must go into the
driver sources.  Any piece of information you take away is bad.

There are tons of chip register bits in this driver already which are
only valid on certain hardware revs.  In fact, there are many.  Yet
they are all still there in tg3.h whether they are used by the driver
or not.  In fact, if I recall correctly, the DMA burst size controls
are pretty much different on every single rev of the chip.

Documenting where for which chips a register bit is valid is
pervasively done elsewhere, and is nothing new.  Your driver and your
hardware is not special.

^ permalink raw reply

* Re: [PATCH] [ath5k][leds] Ability to disable leds support. If leds support enabled do not force mac802.11 leds layer selection.
From: Dmytro Milinevskyy @ 2010-06-07  7:39 UTC (permalink / raw)
  To: Pavel Roskin
  Cc: ath5k-devel, Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez,
	Bob Copeland, John W. Linville, GeunSik Lim, Greg Kroah-Hartman,
	Lukas Turek, Mark Hindley, Johannes Berg, Jiri Kosina, Kalle Valo,
	Keng-Yu Lin, Luca Verdesca, Shahar Or, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1275668535.17251.20.camel@mj>

Pavel, thank you for the response here.

I agree with all your comments and will adjust the patch according to them.

I'm new to the submitting patches into the community and I appreciate
telling criticism so that in future I will not cause that much
troubles.

Regards,

-- Dima

On Fri, Jun 4, 2010 at 7:22 PM, Pavel Roskin <proski@gnu.org> wrote:
> On Fri, 2010-06-04 at 16:43 +0300, Dmytro Milinevskyy wrote:
>> Hi!
>>
>> Here is the patch to disable ath5k leds support on build stage.
>> However if the leds support was enabled do not force selection of 802.11 leds layer.
>> Depency on LEDS_CLASS is kept.
>>
>> Suggestion given by Pavel Roskin and Bob Copeland applied.
>
> It's great that you did it.  The patch is much clearer now.  That makes
> smaller issues visible.  Please don't be discouraged by the criticism,
> you are on the right track.
>
> First of all, your patch doesn't apply cleanly to the current
> wireless-testing because of formatting changes in Makefile.  Please
> update.
>
>> +config ATH5K_LEDS
>> +       tristate "Atheros 5xxx wireless cards LEDs support"
>> +       depends on ATH5K
>> +       select NEW_LEDS
>> +       select LEDS_CLASS
>> +       ---help---
>> +         Atheros 5xxx LED support.
>
> "tristate" is wrong here.  "tristate" would allow users select "m",
> which is wrong, since LED support is not a separate module.  I think you
> want "bool" here.
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>  /*
>>   * State for LED triggers
>>   */
>> @@ -95,6 +96,7 @@ struct ath5k_led
>>       struct ath5k_softc *sc;                 /* driver state */
>>       struct led_classdev led_dev;            /* led classdev */
>>  };
>> +#endif
>
> This shouldn't be needed.  I'll rather see a structure that is not used
> in some cases than an extra pair of preprocessor conditionals.
>
>> diff --git a/drivers/net/wireless/ath/ath5k/gpio.c b/drivers/net/wireless/ath/ath5k/gpio.c
>> index 64a27e7..9e757b3 100644
>> --- a/drivers/net/wireless/ath/ath5k/gpio.c
>> +++ b/drivers/net/wireless/ath/ath5k/gpio.c
>> @@ -25,6 +25,7 @@
>>  #include "debug.h"
>>  #include "base.h"
>>
>> +#ifdef CONFIG_ATH5K_LEDS
>>  /*
>>   * Set led state
>>   */
>> @@ -76,6 +77,7 @@ void ath5k_hw_set_ledstate(struct ath5k_hw *ah, unsigned int state)
>>       else
>>               AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG, led_5210);
>>  }
>> +#endif
>
> I would just move that function to led.c (and don't forget to include
> reg.h).  The Makefile should take care of the rest.
>
> --
> Regards,
> Pavel Roskin
>

^ permalink raw reply

* Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Andi Kleen @ 2010-06-07  7:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: xiaohui.xin, netdev, kvm, linux-kernel, mst, mingo, davem,
	herbert, jdike
In-Reply-To: <20100606161348.427822fb@nehalam>

Stephen Hemminger <shemminger@vyatta.com> writes:

> Still not sure this is a good idea for a couple of reasons:
>
> 1. We already have lots of special cases with skb's (frags and fraglist),
>    and skb's travel through a lot of different parts of the kernel.  So any
>    new change like this creates lots of exposed points for new bugs. Look
>    at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>    and ppp and ...
>
> 2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>    a fixed size pool in an external device, they can easily all get tied up
>    if you have a slow listener. What happens then?

3. If they come from an internal pool what happens when the kernel runs 
low on memory? How is that pool balanced against other kernel
memory users?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH] tcp: Fix slowness in read /proc/net/tcp
From: David Miller @ 2010-06-07  7:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev, iler.ml
In-Reply-To: <1275895167.2545.8.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 07 Jun 2010 09:19:27 +0200

> Le dimanche 06 juin 2010 à 20:27 -0700, Tom Herbert a écrit :
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
> 
> This problem raises every year,  (last attempt from Yakov Lerner :
> http://kerneltrap.org/mailarchive/linux-netdev/2009/9/26/6256119 )
> 
> And finally, someone motivated enough to use /proc/net/tcp found the
> right answer ;)
> 
> Most netdev people tend to push inet_diag (netlink) interface instead of
> old /proc/net/tcp, but it seems old interface will still be used in
> 2030, so :
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Indeed this is the best attempt of this I've seen so far,
applied to net-next-2.6, thanks Tom.

> BTW, another problem of /proc/net/tcp is the buffer size used by netstat
> utility : 1024 bytes instead of PAGE_SIZE, making O(N^2) behavior even
> more palpable.

Probably cap it at 8K like we do for netlink atomic reads, because
there are all sorts of crazy PAGE_SIZE configurations possible out
there, even as high as 512K.

^ permalink raw reply

* Re: [PATCH] asix: check packet size against mtu+ETH_HLEN instead of ETH_FRAME_LEN
From: David Miller @ 2010-06-07  7:57 UTC (permalink / raw)
  To: jussi.kivilinna; +Cc: netdev, dhollis
In-Reply-To: <20100606233531.23585.10292.stgit@fate.lan>

From: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Date: Mon, 07 Jun 2010 02:35:31 +0300

> Driver checks received packet is too large in asix_rx_fixup() and fails if it is. Problem is
> that MTU might be set larger than 1500 and asix fails to work correctly with VLAN tagged
> packets. The check should be 'dev->net->mtu + ETH_HLEN' instead.
> 
> Tested with AX88772.
> 
> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net-caif: Added missing lock validator constants
From: David Miller @ 2010-06-07  8:01 UTC (permalink / raw)
  To: alex.lorca; +Cc: sjur.brandeland, netdev
In-Reply-To: <1275761704-8758-1-git-send-email-alex.lorca@gmail.com>

From: Alex Lorca <alex.lorca@gmail.com>
Date: Sat,  5 Jun 2010 18:15:04 +0000

> CAIF is using "xxx-AF_MAX" strings for the lock validator. It should use
> its own strings.
> 
> Signed-off-by: Alex Lorca <alex.lorca@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next-2.6] raw: avoid two atomics in xmit
From: David Miller @ 2010-06-07  8:09 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1275639837.2482.17.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 04 Jun 2010 10:23:57 +0200

> Avoid two atomic ops per raw_send_hdrinc() call
> 
> Avoid two atomic ops per raw6_send_hdrinc() call
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

I'm starting to feel like we now use the implicit skb dst ref stuff
more than the usual cloning, which is great! :-)

^ permalink raw reply

* Re: [PATCH net-next-2.6] net sched: make pedit check for clones instead
From: David Miller @ 2010-06-07  8:09 UTC (permalink / raw)
  To: herbert; +Cc: hadi, jpirko, netdev
In-Reply-To: <20100604130511.GA10925@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 4 Jun 2010 23:05:11 +1000

> On Fri, Jun 04, 2010 at 08:43:06AM -0400, jamal wrote:
>> And here's the second one. I noticed Changli's changes are not yet in
>> net-next. I will wait for that change to propagate and then send the
>> fix to pedit that i discussed with Herbert.
> 
> Thanks Jamal!
> 
> BTW, I think this patch should go in first (before the one that
> removes the OK2MUNGE setting) to be on the safe side.

I applied them this way to net-next-2.6, thanks!

^ permalink raw reply

* Re: [PATCH] 8139too: fix buffer overrun in rtl8139_init_board
From: David Miller @ 2010-06-07  8:14 UTC (permalink / raw)
  To: dkirjanov; +Cc: jeff, netdev
In-Reply-To: <20100605094549.GA11559@hera.kernel.org>

From: Denis Kirjanov <dkirjanov@hera.kernel.org>
Date: Sat, 5 Jun 2010 09:45:49 +0000

> Fix rtl_chip_info buffer overrun when we can't identify the chip.
> (i = ARRAY_SIZE (rtl_chip_info) in this case)
> 
> Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>

Applied, thanks!

^ permalink raw reply

* Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Mitchell Erblich @ 2010-06-07  8:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Stephen Hemminger, xiaohui.xin, netdev, kvm, linux-kernel, mst,
	mingo, davem, herbert, jdike
In-Reply-To: <87pr03gu1c.fsf@basil.nowhere.org>


On Jun 7, 2010, at 12:51 AM, Andi Kleen wrote:

> Stephen Hemminger <shemminger@vyatta.com> writes:
> 
>> Still not sure this is a good idea for a couple of reasons:
>> 
>> 1. We already have lots of special cases with skb's (frags and fraglist),
>>   and skb's travel through a lot of different parts of the kernel.  So any
>>   new change like this creates lots of exposed points for new bugs. Look
>>   at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>>   and ppp and ...
>> 
>> 2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>>   a fixed size pool in an external device, they can easily all get tied up
>>   if you have a slow listener. What happens then?
> 
> 3. If they come from an internal pool what happens when the kernel runs 
> low on memory? How is that pool balanced against other kernel
> memory users?
> 
> -Andi
> 
> -- 
> ak@linux.intel.com -- Speaking for myself only.

In general,

When an internal pool is created/used, their SHOULD be a reason.
Maybe, to keep allocation latency to a min, OR ...

Now IMO, 

internal pool objects should have a ref count and
if that count is 0, then under memory pressure and/or num 
of objects are above a high water mark, then they are freed,

OR if there is a last reference age field, then the object is to be 
cleaned if dirty, then freed,  

Else, the pool is allowed to grow if the number of objects in the
pool is below a set max (max COULD equal Infinity).



Mitchell Erblich

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox