Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next] iproute2: Add new command to IP link to enable/disable VF spoof check
From: Stephen Hemminger @ 2011-11-18  0:40 UTC (permalink / raw)
  To: Greg Rose; +Cc: Jeff Kirsher, davem, netdev, gospo
In-Reply-To: <4EC59B34.4080501@intel.com>

On Thu, 17 Nov 2011 15:39:32 -0800
Greg Rose <gregory.v.rose@intel.com> wrote:

> 
> 
> On 9/25/2011 1:23 AM, Jeff Kirsher wrote:
> > From: Greg Rose <gregory.v.rose@intel.com>
> >
> > Add IP link command parsing for VF spoof checking enable/disable
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

I am limiting patches at this point to those that will build against 3.0.
The 3.1 version of iproute2 has not been released because kernel.org tool
to do the release is still in beta (kup), and they are working on it.

After 3.1 version of iproute2 is released, any fixes that require stuff
from 3.2 will go in.

^ permalink raw reply

* Re: [net-next] iproute2: Add new command to IP link to enable/disable VF spoof check
From: Greg Rose @ 2011-11-18  0:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com
In-Reply-To: <20111117164044.21b916dd@nehalam.linuxnetplumber.net>

On 11/17/2011 4:40 PM, Stephen Hemminger wrote:
> On Thu, 17 Nov 2011 15:39:32 -0800
> Greg Rose<gregory.v.rose@intel.com>  wrote:
>
>>
>>
>> On 9/25/2011 1:23 AM, Jeff Kirsher wrote:
>>> From: Greg Rose<gregory.v.rose@intel.com>
>>>
>>> Add IP link command parsing for VF spoof checking enable/disable
>>>
>>> Signed-off-by: Greg Rose<gregory.v.rose@intel.com>
>>> Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>
> I am limiting patches at this point to those that will build against 3.0.
> The 3.1 version of iproute2 has not been released because kernel.org tool
> to do the release is still in beta (kup), and they are working on it.
>
> After 3.1 version of iproute2 is released, any fixes that require stuff
> from 3.2 will go in.

Great, sounds good.

Would you prefer that I Jeff (Kirsher) and I repost the patch at that time?

thanks,

- Greg

>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
From: Ben Hutchings @ 2011-11-18  0:44 UTC (permalink / raw)
  To: Greg Rose
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com
In-Reply-To: <4EC5A785.3060108@intel.com>

On Thu, 2011-11-17 at 16:32 -0800, Greg Rose wrote:
> On 11/17/2011 4:15 PM, Ben Hutchings wrote:
> > Sorry to come to this rather late.
> >
> > On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> > [...]
> >> v2 ->  v3
> >> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> >> - Support for SRIOV VFs.
> >>          [Note: The get filters msg (in the way current get rtnetlink handles
> >>          it) might get too big for SRIOV vfs. This patch follows existing sriov
> >>          vf get code and tries to accomodate filters for all VF's in a PF.
> >>          And for the SRIOV case I have only tested the fact that the VF
> >>          arguments are getting delivered to rtnetlink correctly. The code
> >>          follows existing sriov vf handling code so rest of it should work fine]
> > [...]
> >
> > This is already broken for large numbers of VFs, and increasing the
> > amount of information per VF is going to make the situation worse.  I am
> > no netlink expert but I think that the current approach of bundling all
> > information about an interface in a single message may not be
> > sustainable.
> >
> > Also, I'm unclear on why this interface is to be used to set filtering
> > for the (PF) net device as well as for related VFs.  Doesn't that
> > duplicate the functionality of ndo_set_rx_mode and
> > ndo_vlan_rx_{add,kill}_vid?
> 
> Functionally yes but contextually no.  This allows the PF driver to know 
> that it is setting these filters in the context of the existence of VFs, 
> allowing it to take appropriate action.  The other two functions may be 
> called without the presence of SR-IOV enablement and the existence of VFs.
> 
> Anyway, that's why I asked Roopa to add that capability.

I don't follow.  The PF driver already knows whether it has enabled VFs.

How do filters set this way interact with filters set through the
existing operations?  Should they override promiscuous mode?  None of
this has been specified.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [net-next] iproute2: Add new command to IP link to enable/disable VF spoof check
From: Stephen Hemminger @ 2011-11-18  0:45 UTC (permalink / raw)
  To: Greg Rose
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com
In-Reply-To: <4EC5AA1B.4040009@intel.com>

On Thu, 17 Nov 2011 16:43:07 -0800
Greg Rose <gregory.v.rose@intel.com> wrote:

> On 11/17/2011 4:40 PM, Stephen Hemminger wrote:
> > On Thu, 17 Nov 2011 15:39:32 -0800
> > Greg Rose<gregory.v.rose@intel.com>  wrote:
> >
> >>
> >>
> >> On 9/25/2011 1:23 AM, Jeff Kirsher wrote:
> >>> From: Greg Rose<gregory.v.rose@intel.com>
> >>>
> >>> Add IP link command parsing for VF spoof checking enable/disable
> >>>
> >>> Signed-off-by: Greg Rose<gregory.v.rose@intel.com>
> >>> Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
> >
> > I am limiting patches at this point to those that will build against 3.0.
> > The 3.1 version of iproute2 has not been released because kernel.org tool
> > to do the release is still in beta (kup), and they are working on it.
> >
> > After 3.1 version of iproute2 is released, any fixes that require stuff
> > from 3.2 will go in.
> 
> Great, sounds good.
> 
> Would you prefer that I Jeff (Kirsher) and I repost the patch at that time?

not required, if they show up in patchwork

^ permalink raw reply

* Re: [RFC] [ver3 PATCH 3/6] virtio_net: virtio_net driver changes
From: Ben Hutchings @ 2011-11-18  1:08 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: rusty, mst, netdev, kvm, davem, virtualization
In-Reply-To: <20111111130420.9878.19729.sendpatchset@krkumar2.in.ibm.com>

On Fri, 2011-11-11 at 18:34 +0530, Krishna Kumar wrote:
> Changes for multiqueue virtio_net driver.
[...]
> @@ -677,25 +730,35 @@ static struct rtnl_link_stats64 *virtnet
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	int cpu;
> -	unsigned int start;
>  
>  	for_each_possible_cpu(cpu) {
> -		struct virtnet_stats __percpu *stats
> -			= per_cpu_ptr(vi->stats, cpu);
> -		u64 tpackets, tbytes, rpackets, rbytes;
> -
> -		do {
> -			start = u64_stats_fetch_begin(&stats->syncp);
> -			tpackets = stats->tx_packets;
> -			tbytes   = stats->tx_bytes;
> -			rpackets = stats->rx_packets;
> -			rbytes   = stats->rx_bytes;
> -		} while (u64_stats_fetch_retry(&stats->syncp, start));
> -
> -		tot->rx_packets += rpackets;
> -		tot->tx_packets += tpackets;
> -		tot->rx_bytes   += rbytes;
> -		tot->tx_bytes   += tbytes;
> +		int qpair;
> +
> +		for (qpair = 0; qpair < vi->num_queue_pairs; qpair++) {
> +			struct virtnet_send_stats __percpu *tx_stat;
> +			struct virtnet_recv_stats __percpu *rx_stat;

While you're at it, you can drop the per-CPU stats and make them only
per-queue.  There is unlikely to be any benefit in maintaining them
per-CPU while receive and transmit processing is serialised per-queue.

[...]
> +static int invoke_find_vqs(struct virtnet_info *vi)
> +{
> +	vq_callback_t **callbacks;
> +	struct virtqueue **vqs;
> +	int ret = -ENOMEM;
> +	int i, total_vqs;
> +	char **names;
> +
> +	/*
> +	 * We expect 1 RX virtqueue followed by 1 TX virtqueue, followed
> +	 * by the same 'vi->num_queue_pairs-1' more times, and optionally
> +	 * one control virtqueue.
> +	 */
> +	total_vqs = vi->num_queue_pairs * 2 +
> +		    virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
> +
> +	/* Allocate space for find_vqs parameters */
> +	vqs = kmalloc(total_vqs * sizeof(*vqs), GFP_KERNEL);
> +	callbacks = kmalloc(total_vqs * sizeof(*callbacks), GFP_KERNEL);
> +	names = kmalloc(total_vqs * sizeof(*names), GFP_KERNEL);
> +	if (!vqs || !callbacks || !names)
> +		goto err;
> +
> +	/* Allocate/initialize parameters for recv virtqueues */
> +	for (i = 0; i < vi->num_queue_pairs * 2; i += 2) {
> +		callbacks[i] = skb_recv_done;
> +		names[i] = kasprintf(GFP_KERNEL, "input.%d", i / 2);
> +		if (!names[i])
> +			goto err;
> +	}
> +
> +	/* Allocate/initialize parameters for send virtqueues */
> +	for (i = 1; i < vi->num_queue_pairs * 2; i += 2) {
> +		callbacks[i] = skb_xmit_done;
> +		names[i] = kasprintf(GFP_KERNEL, "output.%d", i / 2);
> +		if (!names[i])
> +			goto err;
> +	}
[...]

The RX and TX interrupt names for a multiqueue device should follow the
formats "<device>-rx-<index>" and "<device>-tx-<index>".

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH] iwl-debug: Shrink object by using dev_err and deduplicating formats
From: Joe Perches @ 2011-11-18  1:46 UTC (permalink / raw)
  To: Wey-Yi Guy, Intel Linux Wireless
  Cc: John W. Linville, linux-wireless, netdev, linux-kernel

Using dev_err instead of dev_printk(KERN_ERR uses fewer
arguments and is a bit smaller.

Deduplicating formats used by IWL_DEBUG_QUIET_RFKILL also
makes the object a bit smaller.
 
Neatened the macros, used ##__VA_ARGS__.

$ size drivers/net/wireless/iwlwifi/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 462652	   8646	  92576	 563874	  89aa2	drivers/net/wireless/iwlwifi/built-in.o.new
 467557	   8646	  92592	 568795	  8addb	drivers/net/wireless/iwlwifi/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/wireless/iwlwifi/iwl-debug.h |   37 +++++++++++++++++-------------
 1 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-debug.h b/drivers/net/wireless/iwlwifi/iwl-debug.h
index 40ef97b..44a7bdd 100644
--- a/drivers/net/wireless/iwlwifi/iwl-debug.h
+++ b/drivers/net/wireless/iwlwifi/iwl-debug.h
@@ -47,20 +47,21 @@ do {									\
 } while (0)
 
 #ifdef CONFIG_IWLWIFI_DEBUG
-#define IWL_DEBUG(m, level, fmt, args...)				\
+#define IWL_DEBUG(m, level, fmt, ...)					\
 do {									\
 	if (iwl_get_debug_level((m)->shrd) & (level))			\
-		dev_printk(KERN_ERR, bus(m)->dev,			\
-			 "%c %s " fmt, in_interrupt() ? 'I' : 'U',	\
-			__func__ , ## args);				\
+		dev_err(bus(m)->dev, "%c %s " fmt,			\
+			in_interrupt() ? 'I' : 'U', __func__,		\
+			##__VA_ARGS__);					\
 } while (0)
 
-#define IWL_DEBUG_LIMIT(m, level, fmt, args...)				\
+#define IWL_DEBUG_LIMIT(m, level, fmt, ...)				\
 do {									\
-	if (iwl_get_debug_level((m)->shrd) & (level) && net_ratelimit())\
-		dev_printk(KERN_ERR, bus(m)->dev,			\
-			"%c %s " fmt, in_interrupt() ? 'I' : 'U',	\
-			 __func__ , ## args);				\
+	if (iwl_get_debug_level((m)->shrd) & (level) &&			\
+	    net_ratelimit())						\
+		dev_err(bus(m)->dev, "%c %s " fmt,			\
+			in_interrupt() ? 'I' : 'U', __func__,		\
+			##__VA_ARGS__);					\
 } while (0)
 
 #define iwl_print_hex_dump(m, level, p, len)				\
@@ -70,14 +71,18 @@ do {                                            			\
 			       DUMP_PREFIX_OFFSET, 16, 1, p, len, 1);	\
 } while (0)
 
-#define IWL_DEBUG_QUIET_RFKILL(p, fmt, args...)			\
+#define IWL_DEBUG_QUIET_RFKILL(p, fmt, ...)				\
 do {									\
-	if (!iwl_is_rfkill(p->shrd))				\
-		dev_printk(KERN_ERR, bus(p)->dev, "%c %s " fmt,	\
-		(in_interrupt() ? 'I' : 'U'), __func__ , ##args);	\
-	else if	(iwl_get_debug_level(p->shrd) & IWL_DL_RADIO)	\
-		dev_printk(KERN_ERR, bus(p)->dev, "(RFKILL) %c %s " fmt, \
-		(in_interrupt() ? 'I' : 'U'), __func__ , ##args);	\
+	if (!iwl_is_rfkill(p->shrd))					\
+		dev_err(bus(p)->dev, "%s%c %s " fmt,			\
+			"",						\
+			in_interrupt() ? 'I' : 'U', __func__,		\
+			##__VA_ARGS__);					\
+	else if	(iwl_get_debug_level(p->shrd) & IWL_DL_RADIO)		\
+		dev_err(bus(p)->dev, "%s%c %s " fmt,			\
+			"(RFKILL) ",					\
+			in_interrupt() ? 'I' : 'U', __func__,		\
+			##__VA_ARGS__);					\
 } while (0)
 
 #else

^ permalink raw reply related

* Re: [PATCH net-next] sky2: fix hang in napi_disable
From: David Miller @ 2011-11-18  1:52 UTC (permalink / raw)
  To: shemminger; +Cc: svenjoac, netdev
In-Reply-To: <20111117163735.0a16660b@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 17 Nov 2011 16:37:35 -0800

> If IRQ was never initialized, then calling napi_disable() would hang.
> Add more bookkeeping to track whether IRQ was ever initialized.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Stephen, your other 6 patches went into 'net' so you have to target
these fixes there too.

^ permalink raw reply

* Re: [PATCH net-next] sky2: fix hang in napi_disable
From: Stephen Hemminger @ 2011-11-18  2:10 UTC (permalink / raw)
  To: David Miller; +Cc: svenjoac, netdev
In-Reply-To: <20111117.205234.394141688162787655.davem@davemloft.net>


> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Thu, 17 Nov 2011 16:37:35 -0800
> 
> > If IRQ was never initialized, then calling napi_disable() would
> > hang.
> > Add more bookkeeping to track whether IRQ was ever initialized.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Stephen, your other 6 patches went into 'net' so you have to target
> these fixes there too.
> 

Do you need me to respin them?

^ permalink raw reply

* Re: [PATCH net-next] sky2: fix hang in napi_disable
From: David Miller @ 2011-11-18  2:15 UTC (permalink / raw)
  To: stephen.hemminger; +Cc: svenjoac, netdev
In-Reply-To: <f582e628-7770-43c5-aa90-51cf74141462@tahiti.vyatta.com>

From: Stephen Hemminger <stephen.hemminger@vyatta.com>
Date: Thu, 17 Nov 2011 18:10:49 -0800 (PST)

> 
>> From: Stephen Hemminger <shemminger@vyatta.com>
>> Date: Thu, 17 Nov 2011 16:37:35 -0800
>> 
>> > If IRQ was never initialized, then calling napi_disable() would
>> > hang.
>> > Add more bookkeeping to track whether IRQ was ever initialized.
>> > 
>> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>> 
>> Stephen, your other 6 patches went into 'net' so you have to target
>> these fixes there too.
>> 
> 
> Do you need me to respin them?

If they apply cleanly to 'net', no.  Do they?

^ permalink raw reply

* [PATCH net-next v2] r8169: Add 64bit statistics
From: Junchang Wang @ 2011-11-18  2:15 UTC (permalink / raw)
  To: nic_swsd, romieu, shemminger, eric.dumazet; +Cc: netdev


Switch to use ndo_get_stats64 to get 64bit statistics.
Two sync entries are used (one for Rx and one for Tx).


Signed-off-by: Junchang Wang <junchangwang@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c |   61 +++++++++++++++++++++++++++++-----
 1 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index cdf66d6..b9f8240 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -670,6 +670,12 @@ struct rtl8169_counters {
 	__le16	tx_underun;
 };
 
+struct rtl8169_stats {
+	struct u64_stats_sync	syncp;
+	u64			packets;
+	u64			bytes;
+};
+
 struct rtl8169_private {
 	void __iomem *mmio_addr;	/* memory map physical address */
 	struct pci_dev *pci_dev;
@@ -685,6 +691,8 @@ struct rtl8169_private {
 	u32 dirty_tx;
 	struct TxDesc *TxDescArray;	/* 256-aligned Tx descriptor ring */
 	struct RxDesc *RxDescArray;	/* 256-aligned Rx descriptor ring */
+	struct rtl8169_stats rx_stats;
+	struct rtl8169_stats tx_stats;
 	dma_addr_t TxPhyAddr;
 	dma_addr_t RxPhyAddr;
 	void *Rx_databuff[NUM_RX_DESC];	/* Rx data buffers */
@@ -766,7 +774,9 @@ static void rtl_hw_start(struct net_device *dev);
 static int rtl8169_close(struct net_device *dev);
 static void rtl_set_rx_mode(struct net_device *dev);
 static void rtl8169_tx_timeout(struct net_device *dev);
-static struct net_device_stats *rtl8169_get_stats(struct net_device *dev);
+static struct rtnl_link_stats64 *rtl8169_get_stats64(struct net_device *dev,
+						     struct rtnl_link_stats64
+						     *stats);
 static int rtl8169_rx_interrupt(struct net_device *, struct rtl8169_private *,
 				void __iomem *, u32 budget);
 static int rtl8169_change_mtu(struct net_device *dev, int new_mtu);
@@ -3454,7 +3464,7 @@ static void rtl_disable_msi(struct pci_dev *pdev, struct rtl8169_private *tp)
 static const struct net_device_ops rtl8169_netdev_ops = {
 	.ndo_open		= rtl8169_open,
 	.ndo_stop		= rtl8169_close,
-	.ndo_get_stats		= rtl8169_get_stats,
+	.ndo_get_stats64	= rtl8169_get_stats64,
 	.ndo_start_xmit		= rtl8169_start_xmit,
 	.ndo_tx_timeout		= rtl8169_tx_timeout,
 	.ndo_validate_addr	= eth_validate_addr,
@@ -5641,8 +5651,10 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
 		rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
-			dev->stats.tx_packets++;
-			dev->stats.tx_bytes += tx_skb->skb->len;
+			u64_stats_update_begin(&tp->tx_stats.syncp);
+			tp->tx_stats.packets++;
+			tp->tx_stats.bytes += tx_skb->skb->len;
+			u64_stats_update_end(&tp->tx_stats.syncp);
 			dev_kfree_skb(tx_skb->skb);
 			tx_skb->skb = NULL;
 		}
@@ -5771,8 +5783,10 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 
 			napi_gro_receive(&tp->napi, skb);
 
-			dev->stats.rx_bytes += pkt_size;
-			dev->stats.rx_packets++;
+			u64_stats_update_begin(&tp->rx_stats.syncp);
+			tp->rx_stats.packets++;
+			tp->rx_stats.bytes += pkt_size;
+			u64_stats_update_end(&tp->rx_stats.syncp);
 		}
 
 		/* Work around for AMD plateform. */
@@ -6034,16 +6048,19 @@ static void rtl_set_rx_mode(struct net_device *dev)
 }
 
 /**
- *  rtl8169_get_stats - Get rtl8169 read/write statistics
+ *  rtl8169_get_stats64 - Get rtl8169 read/write statistics
  *  @dev: The Ethernet Device to get statistics for
  *
  *  Get TX/RX statistics for rtl8169
  */
-static struct net_device_stats *rtl8169_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *
+rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
+	u64 _packets, _bytes;
 	unsigned long flags;
+	unsigned int start;
 
 	if (netif_running(dev)) {
 		spin_lock_irqsave(&tp->lock, flags);
@@ -6051,7 +6068,33 @@ static struct net_device_stats *rtl8169_get_stats(struct net_device *dev)
 		spin_unlock_irqrestore(&tp->lock, flags);
 	}
 
-	return &dev->stats;
+	do {
+		start = u64_stats_fetch_begin_bh(&tp->rx_stats.syncp);
+		_packets = tp->rx_stats.packets;
+		_bytes = tp->rx_stats.bytes;
+	} while (u64_stats_fetch_retry_bh(&tp->rx_stats.syncp, start));
+
+	stats->rx_packets = _packets;
+	stats->rx_bytes	= _bytes;
+
+	do {
+		start = u64_stats_fetch_begin_bh(&tp->tx_stats.syncp);
+		_packets = tp->tx_stats.packets;
+		_bytes= tp->tx_stats.bytes;
+	} while (u64_stats_fetch_retry_bh(&tp->tx_stats.syncp, start));
+
+	stats->tx_packets = _packets;
+	stats->tx_bytes	= _bytes;
+
+	stats->rx_dropped	= dev->stats.rx_dropped;
+	stats->tx_dropped	= dev->stats.tx_dropped;
+	stats->rx_length_errors = dev->stats.rx_length_errors;
+	stats->rx_errors	= dev->stats.rx_errors;
+	stats->rx_crc_errors	= dev->stats.rx_crc_errors;
+	stats->rx_fifo_errors	= dev->stats.rx_fifo_errors;
+	stats->rx_missed_errors = dev->stats.rx_missed_errors;
+
+	return stats;
 }
 
 static void rtl8169_net_suspend(struct net_device *dev)
--

^ permalink raw reply related

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Junchang Wang @ 2011-11-18  2:24 UTC (permalink / raw)
  To: Francois Romieu; +Cc: nic_swsd, eric.dumazet, netdev
In-Reply-To: <20111117093635.GA9112@electric-eye.fr.zoreil.com>

> The 816x chipsets have hardware stats registers. The driver already
> use them. Please use them more.

Hi Francois,

I'm not sure which hardware stats registers are accurate. Besides, it
seem r8169.c are also designed for other chipsets (8129?).

I would like other people who are quite familiar with those chipsets
to do this job.

Is that OK? Thanks.

-- 
--Junchang

^ permalink raw reply

* Re: [PATCH net-next] sky2: fix hang in napi_disable
From: David Miller @ 2011-11-18  2:44 UTC (permalink / raw)
  To: shemminger; +Cc: svenjoac, netdev
In-Reply-To: <20111117163735.0a16660b@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 17 Nov 2011 16:37:35 -0800

> If IRQ was never initialized, then calling napi_disable() would hang.
> Add more bookkeeping to track whether IRQ was ever initialized.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied to 'net'.

^ permalink raw reply

* Re: [PATCH net-next] sky2: enforce minimum ring size
From: David Miller @ 2011-11-18  2:44 UTC (permalink / raw)
  To: shemminger; +Cc: svenjoac, netdev
In-Reply-To: <20111117163723.67964841@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 17 Nov 2011 16:37:23 -0800

> The hardware has a restriction that the minimum ring size possible
> is 128. The number of elements used is controlled by tx_pending and
> the overall number of elements in the ring tx_ring_size, therefore it
> is okay to limit the number of elements in use to a small value (63)
> but still provide a bigger ring.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied to 'net'.

^ permalink raw reply

* Re: [PATCH 1/2] net: add network priority cgroup infrastructure (v2)
From: John Fastabend @ 2011-11-18  3:17 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev@vger.kernel.org, Love, Robert W, David S. Miller
In-Reply-To: <1321566472-28969-2-git-send-email-nhorman@tuxdriver.com>

On 11/17/2011 1:47 PM, Neil Horman wrote:
> This patch adds in the infrastructure code to create the network priority
> cgroup.  The cgroup, in addition to the standard processes file creates two
> control files:
> 
> 1) prioidx - This is a read-only file that exports the index of this cgroup.
> This is a value that is both arbitrary and unique to a cgroup in this subsystem,
> and is used to index the per-device priority map
> 
> 2) priomap - This is a writeable file.  On read it reports a table of 2-tuples
> <name:priority> where name is the name of a network interface and priority is
> indicates the priority assigned to frames egresessing on the named interface and
> originating from a pid in this cgroup
> 
> This cgroup allows for skb priority to be set prior to a root qdisc getting
> selected. This is benenficial for DCB enabled systems, in that it allows for any
> application to use dcb configured priorities so without application modification
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> CC: Robert Love <robert.w.love@intel.com>
> CC: "David S. Miller" <davem@davemloft.net>
> ---

one more nit... can we convert the rcu_dereference() into rtnl_dereference()
where it is relevant?

/**
 * rtnl_dereference - fetch RCU pointer when updates are prevented by RTNL
 * @p: The pointer to read, prior to dereferencing
 *
 * Return the value of the specified RCU-protected pointer, but omit
 * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
 * caller holds RTNL.
 */
#define rtnl_dereference(p)                                     \
        rcu_dereference_protected(p, lockdep_rtnl_is_held())

[...]

> +
> +static void extend_netdev_table(struct net_device *dev, u32 new_len)
> +{
> +       size_t new_size = sizeof(struct netprio_map) +
> +                          ((sizeof(u32) * new_len));
> +       struct netprio_map *new_priomap = kzalloc(new_size, GFP_KERNEL);
> +       struct netprio_map *old_priomap;
> +       int i;
> +
> +       old_priomap  = rcu_dereference(dev->priomap);
> +

This could be rtnl_dereference(dev->priomap) to annotate that we always
have the rtnl lock here.

> +       if (!new_priomap) {
> +               printk(KERN_WARNING "Unable to alloc new priomap!\n");
> +               return;
> +       }
> +
> +       for (i = 0;
> +            old_priomap && (i < old_priomap->priomap_len);
> +            i++)
> +               new_priomap->priomap[i] = old_priomap->priomap[i];
> +
> +       new_priomap->priomap_len = new_len;
> +
> +       rcu_assign_pointer(dev->priomap, new_priomap);
> +       if (old_priomap)
> +               kfree_rcu(old_priomap, rcu);
> +}
> +
> +static void update_netdev_tables(void)
> +{
> +       struct net_device *dev;
> +       u32 max_len = atomic_read(&max_prioidx);
> +       struct netprio_map *map;
> +
> +       rtnl_lock();
          ^^^^^^^^^^^
> +
> +
> +       for_each_netdev(&init_net, dev) {
> +               map = rcu_dereference(dev->priomap);

same here rtnl_dereference(dev->priomap);

> +               if ((!map) ||
> +                   (map->priomap_len < max_len))
> +                       extend_netdev_table(dev, max_len);
> +       }
> +
> +       rtnl_unlock();
> +}
> +
> +static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
> +                                                struct cgroup *cgrp)
> +{
> +       struct cgroup_netprio_state *cs;
> +       int ret;
> +
> +       cs = kzalloc(sizeof(*cs), GFP_KERNEL);
> +       if (!cs)
> +               return ERR_PTR(-ENOMEM);
> +
> +       if (cgrp->parent && cgrp_netprio_state(cgrp->parent)->prioidx) {
> +               kfree(cs);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       ret = get_prioidx(&cs->prioidx);
> +       if (ret != 0) {
> +               printk(KERN_WARNING "No space in priority index array\n");
> +               kfree(cs);
> +               return ERR_PTR(ret);
> +       }
> +
> +       return &cs->css;
> +}
> +
> +static void cgrp_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
> +{
> +       struct cgroup_netprio_state *cs;
> +       struct net_device *dev;
> +       struct netprio_map *map;
> +
> +       cs = cgrp_netprio_state(cgrp);
> +       rtnl_lock();
> +       for_each_netdev(&init_net, dev) {
> +               map = rcu_dereference(dev->priomap);

map = rtnl_dereference(dev->priomap)

> +               if (map)
> +                       map->priomap[cs->prioidx] = 0;
> +       }
> +       rtnl_unlock();
> +       put_prioidx(cs->prioidx);
> +       kfree(cs);
> +}
> +
> +static u64 read_prioidx(struct cgroup *cgrp, struct cftype *cft)
> +{
> +       return (u64)cgrp_netprio_state(cgrp)->prioidx;
> +}
> +
> +static int read_priomap(struct cgroup *cont, struct cftype *cft,
> +                       struct cgroup_map_cb *cb)
> +{
> +       struct net_device *dev;
> +       u32 prioidx = cgrp_netprio_state(cont)->prioidx;
> +       u32 priority;
> +       struct netprio_map *map;
> +
> +       rcu_read_lock();
> +
> +       for_each_netdev_rcu(&init_net, dev) {
> +               map = rcu_dereference(dev->priomap);
> +               priority = map ? map->priomap[prioidx] : 0;
> +               cb->fill(cb, dev->name, priority);
> +       }
> +       rcu_read_unlock();
> +       return 0;
> +}
> +

[...]

> +static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
> +{
> +       return cgroup_add_files(cgrp, ss, ss_files, ARRAY_SIZE(ss_files));
> +}
> +
> +static int netprio_device_event(struct notifier_block *unused,
> +                               unsigned long event, void *ptr)
> +{
> +       struct net_device *dev = ptr;
> +       struct netprio_map *old;
> +       u32 max_len = atomic_read(&max_prioidx);
> +
> +       old = rcu_dereference_protected(dev->priomap, 1);

This is protected because of the rtnl lock so use,

old = rtnl_dereference(dev->priomap);

> +       /*
> +        * Note this is called with rtnl_lock held so we have update side
> +        * protection on our rcu assignments
> +        */
> +
> +       switch (event) {
> +
> +       case NETDEV_REGISTER:
> +               if (max_len)
> +                       extend_netdev_table(dev, max_len);
> +               break;
> +       case NETDEV_UNREGISTER:
> +               rcu_assign_pointer(dev->priomap, NULL);
> +               if (old)
> +                       kfree_rcu(old, rcu);
> +               break;
> +       }
> +       return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block netprio_device_notifier = {
> +       .notifier_call = netprio_device_event
> +};
> +

I can roll an update if you want, just let me know.

Thanks,
John

^ permalink raw reply

* ipv4 udplite broken in >=linux-3.0 ?
From: Jinxin Zheng @ 2011-11-18  4:09 UTC (permalink / raw)
  To: gerrit; +Cc: linux-kernel, netdev

Hi,

I recently migrated my work from udp to udplite. Thanks for your great effort.

But after I upgraded the kernel from 2.6.38 to 3.0, the ipv4 version
of udplite stops working.
I tested the applications downloaded from Gerrit's udplite wiki, and
found that the ipv6 version works well, but not ipv4.

I did the tests on a Gentoo. The kernels are downloaded from the
official web site.

I don't know how to debug udplite. Can any one of you give me a tip on
how to get more debug info, then I can do further test and provide
with it?

Thanks!

--
Zheng Jinxin

^ permalink raw reply

* Re: [PATCH] iwl-debug: Shrink object by using dev_err and deduplicating formats
From: Guy, Wey-Yi @ 2011-11-18  3:14 UTC (permalink / raw)
  To: Joe Perches
  Cc: Intel Linux Wireless, John W. Linville,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1321580775.22117.8.camel@Joe-Laptop>

On Thu, 2011-11-17 at 17:46 -0800, Joe Perches wrote:
> Using dev_err instead of dev_printk(KERN_ERR uses fewer
> arguments and is a bit smaller.
> 
> Deduplicating formats used by IWL_DEBUG_QUIET_RFKILL also
> makes the object a bit smaller.
>  
> Neatened the macros, used ##__VA_ARGS__.
> 
> $ size drivers/net/wireless/iwlwifi/built-in.o*
>    text	   data	    bss	    dec	    hex	filename
>  462652	   8646	  92576	 563874	  89aa2	drivers/net/wireless/iwlwifi/built-in.o.new
>  467557	   8646	  92592	 568795	  8addb	drivers/net/wireless/iwlwifi/built-in.o.old
> 
> Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
> ---
> \

very interesting. Thanks

Wey

^ permalink raw reply

* Re: [PATCH net-next] bnx2: switch to build_skb() infrastructure
From: Michael Chan @ 2011-11-18  4:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Eilon Greenstein
In-Reply-To: <1321378205.2856.24.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>


On Tue, 2011-11-15 at 09:30 -0800, Eric Dumazet wrote:
> This is very similar to bnx2x conversion, but bnx2 only requires 16bytes
> alignement at start of the received frame to store its l2_fhdr, so goal
> was not to reduce skb truesize (in fact it should not change after this
> patch)
> 
> Using build_skb() reduces cache line misses in the driver, since we
> use cache hot skb instead of cold ones. Number of in-flight sk_buff
> structures is lower, they are more likely recycled in SLUB caches
> while still hot.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Michael Chan <mchan@broadcom.com>
> CC: Eilon Greenstein <eilong@broadcom.com>

Looks good.  Thanks.

Reviewed-by: Michael Chan <mchan@broadcom.com>

^ permalink raw reply

* Re: root_lock vs. device's TX lock
From: Eric Dumazet @ 2011-11-18  5:02 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Andy Fleming, Dave Taht, Linux Netdev List
In-Reply-To: <CA+mtBx8KfVQYcZFCJrQF__LJh7e0cRSPqSk1nn537Z_ATMjD9Q@mail.gmail.com>

Le jeudi 17 novembre 2011 à 16:35 -0800, Tom Herbert a écrit :
> > Actually, I'm interested in circumventing *both* locks. Our SoC has
> > some quite-versatile queueing infrastructure, such that (for many
> > queueing setups) we can do all of the queueing in hardware, using
> > per-cpu access portals. By hacking around the qdisc lock, and using a
> > tx queue per core, we were able to achieve a significant speedup.

If packet reordering is not a concern (or taken into account in the
hardware)... A task sending tcp flow can migrate from cpu1 to cpu2...

> This was actually one of the motivations for my question.  If we have
> a one TX queue per core, and and use a trivial mq aware qdisc for
> instance, the locking becomes mostly overhead.  I don't mind taking a
> lock once per TX, but right now were taking three! (root lock twice,
> and device lock once).
> 

> Even without one queue per TX, I think the overhead savings may still
> be present.  Eric, I realize that a point of dropping the root lock in
> sch_direct_xmit is to possibly allow queuing to to qdisc and device
> xmit in parallel, but if you're using a trivial qdisc then the time in
> qdisc may be << time in device xmit, so the overhead of locking could
> mitigate the gains in parallelism.  At the very least, this benefit is
> hugely variable depending on the qdisc used.

Andy speaks more of a way to bypass qdisc (direct to device), while I
thought Tom wanted to optimize htb/cbq/..complex qdisc handling...

Note we have LLTX thing to avoid taking dev->lock for some devices.

We could have a qdisc->fast_enqueue() method for some (qdiscs/device)
combinations, able to short cut __dev_xmit_skb() and do their own stuff
(using rcu locking or other synch method, percpu bytes/counter stats...)

But if device is really that smart, why even use a qdisc in the first
place ?

If qdisc not needed : We take only one lock (dev lock) to transmit a
packet. If device is LLTX, no lock taken at all.

If qdisc is needed : We need to take qdisc lock to enqueue packet.
   Then, _if_ we own the __QDISC___STATE_RUNNING flag, we enter the loop
to dequeue and xmit packets (__qdisc_run() / sch_direct_xmit() doing the
insane lock flips)

Its a tough choice : Do we want to avoid false sharing in the device
itself or qdisc...

Current handling was designed so that one cpu was feeding the device and
other cpus feeding the qdisc.

^ permalink raw reply

* RE: [PATCH v7] Phonet: set the pipe handle using setsockopt
From: Dinesh Kumar SHARMA (STE) @ 2011-11-18  5:27 UTC (permalink / raw)
  To: Hemant-vilas RAMDASI, netdev-owner@vger.kernel.org,
	davem@davemloft.net
  Cc: netdev@vger.kernel.org, remi.denis-courmont@nokia.com,
	Hemant-vilas RAMDASI
In-Reply-To: <1321522140-10133-1-git-send-email-hemant.ramdasi@stericsson.com>

Hi,

Can this be applied now?

Thanks.

Regards,
-Dinesh

-----Original Message-----
From: Hemant Vilas RAMDASI [mailto:hemant.ramdasi@stericsson.com] 
Sent: Thursday, November 17, 2011 2:59 PM
To: netdev-owner@vger.kernel.org
Cc: netdev@vger.kernel.org; remi.denis-courmont@nokia.com; Dinesh Kumar SHARMA (STE); Hemant-vilas RAMDASI
Subject: [PATCH v7] Phonet: set the pipe handle using setsockopt

From: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>

This provides flexibility to set the pipe handle
using setsockopt. The pipe can be enabled (if disabled) later
using ioctl.

Signed-off-by: Hemant Ramdasi <hemant.ramdasi@stericsson.com>
Signed-off-by: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>
---
 include/linux/phonet.h |    2 +
 net/phonet/pep.c       |   96 ++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/include/linux/phonet.h b/include/linux/phonet.h
index 6fb1384..1847ef9 100644
--- a/include/linux/phonet.h
+++ b/include/linux/phonet.h
@@ -37,6 +37,7 @@
 #define PNPIPE_ENCAP		1
 #define PNPIPE_IFINDEX		2
 #define PNPIPE_HANDLE		3
+#define PNPIPE_INITSTATE	4
 
 #define PNADDR_ANY		0
 #define PNADDR_BROADCAST	0xFC
@@ -48,6 +49,7 @@
 
 /* ioctls */
 #define SIOCPNGETOBJECT		(SIOCPROTOPRIVATE + 0)
+#define SIOCPNENABLEPIPE	(SIOCPROTOPRIVATE + 13)
 #define SIOCPNADDRESOURCE	(SIOCPROTOPRIVATE + 14)
 #define SIOCPNDELRESOURCE	(SIOCPROTOPRIVATE + 15)
 
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index f17fd84..7acd262 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -533,6 +533,29 @@ static int pep_connresp_rcv(struct sock *sk, struct sk_buff *skb)
 	return pipe_handler_send_created_ind(sk);
 }
 
+static int pep_enableresp_rcv(struct sock *sk, struct sk_buff *skb)
+{
+	struct pnpipehdr *hdr = pnp_hdr(skb);
+
+	if (hdr->error_code != PN_PIPE_NO_ERROR)
+		return -ECONNREFUSED;
+
+	return pep_indicate(sk, PNS_PIPE_ENABLED_IND, 0 /* sub-blocks */,
+		NULL, 0, GFP_ATOMIC);
+
+}
+
+static void pipe_start_flow_control(struct sock *sk)
+{
+	struct pep_sock *pn = pep_sk(sk);
+
+	if (!pn_flow_safe(pn->tx_fc)) {
+		atomic_set(&pn->tx_credits, 1);
+		sk->sk_write_space(sk);
+	}
+	pipe_grant_credits(sk, GFP_ATOMIC);
+}
+
 /* Queue an skb to an actively connected sock.
  * Socket lock must be held. */
 static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
@@ -578,13 +601,25 @@ static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
 			sk->sk_state = TCP_CLOSE_WAIT;
 			break;
 		}
+		if (pn->init_enable == PN_PIPE_DISABLE)
+			sk->sk_state = TCP_SYN_RECV;
+		else {
+			sk->sk_state = TCP_ESTABLISHED;
+			pipe_start_flow_control(sk);
+		}
+		break;
 
-		sk->sk_state = TCP_ESTABLISHED;
-		if (!pn_flow_safe(pn->tx_fc)) {
-			atomic_set(&pn->tx_credits, 1);
-			sk->sk_write_space(sk);
+	case PNS_PEP_ENABLE_RESP:
+		if (sk->sk_state != TCP_SYN_SENT)
+			break;
+
+		if (pep_enableresp_rcv(sk, skb)) {
+			sk->sk_state = TCP_CLOSE_WAIT;
+			break;
 		}
-		pipe_grant_credits(sk, GFP_ATOMIC);
+
+		sk->sk_state = TCP_ESTABLISHED;
+		pipe_start_flow_control(sk);
 		break;
 
 	case PNS_PEP_DISCONNECT_RESP:
@@ -863,14 +898,32 @@ static int pep_sock_connect(struct sock *sk, struct sockaddr *addr, int len)
 	int err;
 	u8 data[4] = { 0 /* sub-blocks */, PAD, PAD, PAD };
 
-	pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+	if (pn->pipe_handle == PN_PIPE_INVALID_HANDLE)
+		pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+
 	err = pipe_handler_request(sk, PNS_PEP_CONNECT_REQ,
-					PN_PIPE_ENABLE, data, 4);
+				pn->init_enable, data, 4);
 	if (err) {
 		pn->pipe_handle = PN_PIPE_INVALID_HANDLE;
 		return err;
 	}
+
 	sk->sk_state = TCP_SYN_SENT;
+
+	return 0;
+}
+
+static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
+{
+	int err;
+
+	err = pipe_handler_request(sk, PNS_PEP_ENABLE_REQ, PAD,
+				NULL, 0);
+	if (err)
+		return err;
+
+	sk->sk_state = TCP_SYN_SENT;
+
 	return 0;
 }
 
@@ -894,6 +947,19 @@ static int pep_ioctl(struct sock *sk, int cmd, unsigned long arg)
 			answ = 0;
 		release_sock(sk);
 		return put_user(answ, (int __user *)arg);
+		break;
+
+	case SIOCPNENABLEPIPE:
+		lock_sock(sk);
+		if (sk->sk_state == TCP_SYN_SENT)
+			answ =  -EBUSY;
+		else if (sk->sk_state == TCP_ESTABLISHED)
+			answ = -EISCONN;
+		else
+			answ = pep_sock_enable(sk, NULL, 0);
+		release_sock(sk);
+		return answ;
+		break;
 	}
 
 	return -ENOIOCTLCMD;
@@ -959,6 +1025,18 @@ static int pep_setsockopt(struct sock *sk, int level, int optname,
 		}
 		goto out_norel;
 
+	case PNPIPE_HANDLE:
+		if ((sk->sk_state == TCP_CLOSE) &&
+			(val >= 0) && (val < PN_PIPE_INVALID_HANDLE))
+			pn->pipe_handle = val;
+		else
+			err = -EINVAL;
+		break;
+
+	case PNPIPE_INITSTATE:
+		pn->init_enable = !!val;
+		break;
+
 	default:
 		err = -ENOPROTOOPT;
 	}
@@ -994,6 +1072,10 @@ static int pep_getsockopt(struct sock *sk, int level, int optname,
 			return -EINVAL;
 		break;
 
+	case PNPIPE_INITSTATE:
+		val = pn->init_enable;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
1.7.4.3

^ permalink raw reply related

* Re: [RFC] [ver3 PATCH 3/6] virtio_net: virtio_net driver changes
From: Sasha Levin @ 2011-11-18  6:24 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Krishna Kumar, rusty, mst, netdev, kvm, davem, virtualization
In-Reply-To: <1321578488.2749.66.camel@bwh-desktop>

On Fri, 2011-11-18 at 01:08 +0000, Ben Hutchings wrote:
> On Fri, 2011-11-11 at 18:34 +0530, Krishna Kumar wrote:
> > Changes for multiqueue virtio_net driver.
> [...]
> > @@ -677,25 +730,35 @@ static struct rtnl_link_stats64 *virtnet
> >  {
> >  	struct virtnet_info *vi = netdev_priv(dev);
> >  	int cpu;
> > -	unsigned int start;
> >  
> >  	for_each_possible_cpu(cpu) {
> > -		struct virtnet_stats __percpu *stats
> > -			= per_cpu_ptr(vi->stats, cpu);
> > -		u64 tpackets, tbytes, rpackets, rbytes;
> > -
> > -		do {
> > -			start = u64_stats_fetch_begin(&stats->syncp);
> > -			tpackets = stats->tx_packets;
> > -			tbytes   = stats->tx_bytes;
> > -			rpackets = stats->rx_packets;
> > -			rbytes   = stats->rx_bytes;
> > -		} while (u64_stats_fetch_retry(&stats->syncp, start));
> > -
> > -		tot->rx_packets += rpackets;
> > -		tot->tx_packets += tpackets;
> > -		tot->rx_bytes   += rbytes;
> > -		tot->tx_bytes   += tbytes;
> > +		int qpair;
> > +
> > +		for (qpair = 0; qpair < vi->num_queue_pairs; qpair++) {
> > +			struct virtnet_send_stats __percpu *tx_stat;
> > +			struct virtnet_recv_stats __percpu *rx_stat;
> 
> While you're at it, you can drop the per-CPU stats and make them only
> per-queue.  There is unlikely to be any benefit in maintaining them
> per-CPU while receive and transmit processing is serialised per-queue.

It allows you to update stats without a lock.

Whats the benefit of having them per queue?

-- 

Sasha.


^ permalink raw reply

* [PATCH 1/1]  PHY configuration for compatible issue
From: AriesLee @ 2011-11-18 14:54 UTC (permalink / raw)
  To: Aries Lee, Guo-Fu Tseng, netdev; +Cc: AriesLee

To perform PHY calibration and set a different EA value by chip ID,
Whenever the NIC chip power on, ie booting or resuming, we need to
force HW to calibrate PHY parameter again, and also set a proper EA
value which gather from experiment.

Those procedures help to reduce compatible issues(NIC is unable to link
up in some special case) in giga speed.

Signed-off-by: AriesLee <AriesLee@jmicron.com>
---
 drivers/net/ethernet/jme.c |  119 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/ethernet/jme.h |   19 +++++++
 2 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
index df3ab83..46c68a4 100644
--- a/drivers/net/ethernet/jme.c
+++ b/drivers/net/ethernet/jme.c
@@ -1744,6 +1744,118 @@ jme_phy_off(struct jme_adapter *jme)
 		jme_new_phy_off(jme);
 }
 
+static void
+jme_phy_specreg_read(struct jme_adapter *jme, u32 specreg, u32 *phy_data)
+{
+	u32 phy_addr;
+
+	phy_addr = JM_PHY_SPEC_REG_READ | specreg;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
+			phy_addr);
+	*phy_data = jme_mdio_read(jme->dev, jme->mii_if.phy_id,
+			JM_PHY_SPEC_DATA_REG);
+}
+
+static void
+jme_phy_specreg_write(struct jme_adapter *jme, u32 ext_reg, u32 phy_data)
+{
+	u32 phy_addr;
+
+	phy_addr = JM_PHY_SPEC_REG_WRITE | ext_reg;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_DATA_REG,
+			phy_data);
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
+			phy_addr);
+}
+
+static int
+jme_phy_calibration(struct jme_adapter *jme)
+{
+	u32 ctrl1000, bmcr, phy_addr, phy_data;
+
+	/*  Turn PHY off */
+	bmcr = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_BMCR);
+	bmcr |= BMCR_PDOWN;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_BMCR, bmcr);
+	/*  Turn PHY on */
+	bmcr = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_BMCR);
+	bmcr &= ~BMCR_PDOWN;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_BMCR, bmcr);
+	/*  Enabel PHY test mode 1 */
+	ctrl1000 = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_CTRL1000);
+	ctrl1000 &= ~PHY_GAD_TEST_MODE_MSK;
+	ctrl1000 |= PHY_GAD_TEST_MODE_1;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_CTRL1000, ctrl1000);
+
+	jme_phy_specreg_read(jme, JM_PHY_EXT_COMM_2_REG, &phy_data);
+	phy_data &= ~JM_PHY_EXT_COMM_2_CALI_MODE_0;
+	phy_data |= JM_PHY_EXT_COMM_2_CALI_LATCH |
+			JM_PHY_EXT_COMM_2_CALI_ENABLE;
+	jme_phy_specreg_write(jme, JM_PHY_EXT_COMM_2_REG, phy_data);
+	msleep(20);
+	jme_phy_specreg_read(jme, JM_PHY_EXT_COMM_2_REG, &phy_data);
+	phy_data &= ~(JM_PHY_EXT_COMM_2_CALI_ENABLE |
+			JM_PHY_EXT_COMM_2_CALI_MODE_0 |
+			JM_PHY_EXT_COMM_2_CALI_LATCH);
+	jme_phy_specreg_write(jme, JM_PHY_EXT_COMM_2_REG, phy_data);
+
+	/*  Disable PHY test mode */
+	ctrl1000 = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_CTRL1000);
+	ctrl1000 &= ~PHY_GAD_TEST_MODE_MSK;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_CTRL1000, ctrl1000);
+	return 0;
+}
+
+static int
+jme_phy_setEA(struct jme_adapter *jme)
+{
+	u32 phy_addr, phy_comm0 = 0, phy_comm1 = 0;
+	u8 nic_ctrl;
+
+	pci_read_config_byte(jme->pdev, PCI_PRIV_SHARE_NICCTRL, &nic_ctrl);
+	if ((nic_ctrl & 0x3) == JME_FLAG_PHYEA_ENABLE)
+		return 0;
+
+	switch (jme->pdev->device) {
+	case PCI_DEVICE_ID_JMICRON_JMC250:
+		if (((jme->chip_main_rev == 5) &&
+			((jme->chip_sub_rev == 0) || (jme->chip_sub_rev == 1) ||
+			(jme->chip_sub_rev == 3))) ||
+			(jme->chip_main_rev >= 6)) {
+			phy_comm0 = 0x008A;
+			phy_comm1 = 0x4109;
+		}
+		if ((jme->chip_main_rev == 3) &&
+			((jme->chip_sub_rev == 1) || (jme->chip_sub_rev == 2)))
+			phy_comm0 = 0xE088;
+		break;
+	case PCI_DEVICE_ID_JMICRON_JMC260:
+		if (((jme->chip_main_rev == 5) &&
+			((jme->chip_sub_rev == 0) || (jme->chip_sub_rev == 1) ||
+			(jme->chip_sub_rev == 3))) ||
+			(jme->chip_main_rev >= 6)) {
+			phy_comm0 = 0x008A;
+			phy_comm1 = 0x4109;
+		}
+		if ((jme->chip_main_rev == 3) &&
+			((jme->chip_sub_rev == 1) || (jme->chip_sub_rev == 2)))
+			phy_comm0 = 0xE088;
+		if ((jme->chip_main_rev == 2) && (jme->chip_sub_rev == 0))
+			phy_comm0 = 0x608A;
+		if ((jme->chip_main_rev == 2) && (jme->chip_sub_rev == 2))
+			phy_comm0 = 0x408A;
+		break;
+	default:
+		return -ENODEV;
+	}
+	if (phy_comm0)
+		jme_phy_specreg_write(jme, JM_PHY_EXT_COMM_0_REG, phy_comm0);
+	if (phy_comm1)
+		jme_phy_specreg_write(jme, JM_PHY_EXT_COMM_1_REG, phy_comm1);
+
+	return 0;
+}
+
 static int
 jme_open(struct net_device *netdev)
 {
@@ -1769,7 +1881,8 @@ jme_open(struct net_device *netdev)
 		jme_set_settings(netdev, &jme->old_ecmd);
 	else
 		jme_reset_phy_processor(jme);
-
+	jme_phy_calibration(jme);
+	jme_phy_setEA(jme);
 	jme_reset_link(jme);
 
 	return 0;
@@ -3184,7 +3297,8 @@ jme_resume(struct device *dev)
 		jme_set_settings(netdev, &jme->old_ecmd);
 	else
 		jme_reset_phy_processor(jme);
-
+	jme_phy_calibration(jme);
+	jme_phy_setEA(jme);
 	jme_start_irq(jme);
 	netif_device_attach(netdev);
 
@@ -3239,4 +3353,3 @@ MODULE_DESCRIPTION("JMicron JMC2x0 PCI Express Ethernet driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);
 MODULE_DEVICE_TABLE(pci, jme_pci_tbl);
-
diff --git a/drivers/net/ethernet/jme.h b/drivers/net/ethernet/jme.h
index 02ea27c..4304072 100644
--- a/drivers/net/ethernet/jme.h
+++ b/drivers/net/ethernet/jme.h
@@ -760,6 +760,25 @@ enum jme_rxmcs_bits {
 				  RXMCS_CHECKSUM,
 };
 
+/*	Extern PHY common register 2	*/
+
+#define PHY_GAD_TEST_MODE_1			0x00002000
+#define PHY_GAD_TEST_MODE_MSK			0x0000E000
+#define JM_PHY_SPEC_REG_READ			0x00004000
+#define JM_PHY_SPEC_REG_WRITE			0x00008000
+#define PHY_CALIBRATION_DELAY			20
+#define JM_PHY_SPEC_ADDR_REG			0x1E
+#define JM_PHY_SPEC_DATA_REG			0x1F
+
+#define JM_PHY_EXT_COMM_0_REG			0x30
+#define JM_PHY_EXT_COMM_1_REG			0x31
+#define JM_PHY_EXT_COMM_2_REG			0x32
+#define JM_PHY_EXT_COMM_2_CALI_ENABLE		0x01
+#define JM_PHY_EXT_COMM_2_CALI_MODE_0		0x02
+#define JM_PHY_EXT_COMM_2_CALI_LATCH		0x10
+#define PCI_PRIV_SHARE_NICCTRL			0xF5
+#define JME_FLAG_PHYEA_ENABLE			0x2
+
 /*
  * Wakeup Frame setup interface registers
  */
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH net-next] bnx2: switch to build_skb() infrastructure
From: David Miller @ 2011-11-18  7:09 UTC (permalink / raw)
  To: mchan; +Cc: eric.dumazet, netdev, eilong
In-Reply-To: <1321591644.8833.1.camel@HP1>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 17 Nov 2011 20:47:24 -0800

> 
> On Tue, 2011-11-15 at 09:30 -0800, Eric Dumazet wrote:
>> This is very similar to bnx2x conversion, but bnx2 only requires 16bytes
>> alignement at start of the received frame to store its l2_fhdr, so goal
>> was not to reduce skb truesize (in fact it should not change after this
>> patch)
>> 
>> Using build_skb() reduces cache line misses in the driver, since we
>> use cache hot skb instead of cold ones. Number of in-flight sk_buff
>> structures is lower, they are more likely recycled in SLUB caches
>> while still hot.
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> CC: Michael Chan <mchan@broadcom.com>
>> CC: Eilon Greenstein <eilong@broadcom.com>
> 
> Looks good.  Thanks.
> 
> Reviewed-by: Michael Chan <mchan@broadcom.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH FIX net-next v1] net-forcedeth: fix possible stats inaccuracies on 32b hosts
From: David Miller @ 2011-11-18  7:09 UTC (permalink / raw)
  To: eric.dumazet
  Cc: david.decotigny, netdev, linux-kernel, ian.campbell, rick.jones2
In-Reply-To: <1321560495.2444.1.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 17 Nov 2011 21:08:15 +0100

> Le jeudi 17 novembre 2011 à 11:38 -0800, David Decotigny a écrit :
>> The software stats are updated from BH, this change ensures that 32b
>> UP hosts use appropriate protection.
>> 
>> Tested:
>>   - HW/SW stats consistent with pktgen (in particular tx=rx)
>>   - HW/SW stats consistent when tx/rx offloads disabled
>>   - no problem with+without lockdep (SMP 16-way)
>> 
>> 
>> 
>> Signed-off-by: David Decotigny <david.decotigny@google.com>
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks.

^ permalink raw reply

* RE: [PATCH 1/1]  PHY configuration for compatible issue
From: Aries Lee @ 2011-11-18  7:13 UTC (permalink / raw)
  To: 'Guo-Fu Tseng', netdev; +Cc: 'AriesLee'
In-Reply-To: <20111117070537.M4900@cooldavid.org>

Hi Guo-Fu and All

	Because jme_phy_on() and jme_phy_off() just turn on/off the PHY, the
value of extern register is still the power on default value, not the most
robust value which we collect in the LAB.
It still need to config those registers to the proper value when the chip
lost the power and power on again.

B.R
Aries
-----Original Message-----
From: Guo-Fu Tseng [mailto:cooldavid@cooldavid.org] 
Sent: Thursday, November 17, 2011 3:15 PM
To: AriesLee; netdev@vger.kernel.org
Cc: AriesLee
Subject: Re: [PATCH 1/1] PHY configuration for compatible issue

On Thu, 17 Nov 2011 22:05:42 +0800, AriesLee wrote
> From: Aries Lee <AriesLee@jmicron.com>
> 
> To perform PHY calibration and set a different EA value by chip ID,
> Whenever the NIC chip power on, ie booting or resuming, we need to
> force HW to calibrate PHY parameter again, and also set a proper EA
> value which gathered from experiment.
> 
> That process resolve the compatible issues(NIC is unable to link
> up in some special case) in giga speed.
Thank you Aries.

Here is some suggestions after a quick review:

It would be better if you implement the read/write function
for extended-phy-register, instead of using JM_PHY_SPEC_ADDR_REG
and JM_PHY_SPEC_DATA_REG directly all the time.

There are jme_phy_on() and jme_phy_off() function in place.
Should you simply using it?

^ permalink raw reply

* lock-warning seen on giving rds-info command
From: Kumar Sanghvi @ 2011-11-18  8:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, venkat.x.venkatsubra, Kumar Sanghvi

Hi netdev team,

I am trying to investigate one issue with RDS.
On giving command rds-info, we get below trace in dmesg:

------------[ cut here ]------------
WARNING: at /root/chelsio/git_repos/linux-2.6/kernel/softirq.c:159 local_bh_enable_ip+0x7a/0xa0()
Hardware name: X7DB8
Modules linked in: rds_rdma rds rdma_cm iw_cm ib_addr autofs4 sunrpc
p4_clockmod freq_table speedstep_lib ib_ipoib ib_cm ib_sa ipv6 ib_uverbs
ib_umad ib_mad iw_nes libcrc32c ib_core dm_mirror dm_region_hash dm_log
uinput ppdev sg parport_pc parport pcspkr serio_raw i2c_i801 iTCO_wdt
iTCO_vendor_support ioatdma dca i5k_amb i5000_edac edac_core e1000e shpchp
ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix
floppy radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod
[last unloaded: mdio]
Pid: 1937, comm: rds-info Not tainted 3.1.0 #1
Call Trace:
 [<ffffffff8106446f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810644ca>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8106af6a>] local_bh_enable_ip+0x7a/0xa0
 [<ffffffff814d3126>] _raw_read_unlock_bh+0x16/0x20
 [<ffffffff813f83b4>] sock_i_ino+0x44/0x60
 [<ffffffffa046ab02>] rds_sock_info+0xd2/0x140 [rds]
 [<ffffffffa046c7a7>] rds_info_getsockopt+0x197/0x200 [rds]
 [<ffffffffa046a1b4>] rds_getsockopt+0x84/0xe0 [rds]
 [<ffffffff813f3453>] sys_getsockopt+0x73/0xe0
 [<ffffffff814dafd7>] tracesys+0xd9/0xde
---[ end trace 4e7838d2e1e53231 ]---

I tested this on 3.1 kernel. However, this problem seems to be present on most
recent kernels. I tried to investigate, and the RDS problem seems to stem(I might be wrong)
after this upstream commit:

----
commit f064af1e500a2bf4607706f0f458163bdb2a6ea5
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Wed Sep 22 12:43:39 2010 +0000

    net: fix a lockdep splat
----

In the above commit, sock_i_ino is modified to use read_lock/unlock_bh instead
of plain read_lock/unlock.

Now, everywhere in Linux network stack, it seems that callers of sock_i_ino
are not disabling interrupts and are also not holding any lock while calling
sock_i_ino function. However, RDS seems to be only one who is calling sock_i_ino
with spin_lock_irqsave i.e. with interrupts disabled.
As a result, when sock_i_ino finally calls read_unlock_bh -  which in turn will lead to
local_bh_enable_ip - which in turn throws the WARN-ON-ONCE since interrupts are
disabled at this stage.

Below patch is a quick fix which seems to make this lock-warning disappear.
With this patch, the WARN-ON-ONCE is no longer seen on giving rds-info command.

However, I am not much aware of RDS stack, and I don't know the motive of using
spin-lock-irqsave in rds code for the sock-lock. I am also not aware if the below
patch could introduce regressions in RDS code. So, it would be helpful if someone
who knows RDS can review the patch and comment on what should be the proper fix.

Thanks,
Kumar.

----------------------
[PATCH] net/rds: Use spin_lock for rds_sock_lock

On giving rds-info command, we get below lock-warning in dmesg:
....
Call Trace:
 [<ffffffff8106446f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810644ca>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8106af6a>] local_bh_enable_ip+0x7a/0xa0
 [<ffffffff814d3126>] _raw_read_unlock_bh+0x16/0x20
 [<ffffffff813f83b4>] sock_i_ino+0x44/0x60
 [<ffffffffa046ab02>] rds_sock_info+0xd2/0x140 [rds]
 [<ffffffffa046c7a7>] rds_info_getsockopt+0x197/0x200 [rds]
 [<ffffffffa046a1b4>] rds_getsockopt+0x84/0xe0 [rds]
 [<ffffffff813f3453>] sys_getsockopt+0x73/0xe0
 [<ffffffff814dafd7>] tracesys+0xd9/0xde
....

Fix this by using spin_lock for rds_sock_lock, instead of using
spin_lock_irqsave.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 net/rds/af_rds.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index bb6ad81..5080ae2 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -68,7 +68,6 @@ static int rds_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct rds_sock *rs;
-	unsigned long flags;

 	if (!sk)
 		goto out;
@@ -94,10 +93,10 @@ static int rds_release(struct socket *sock)
 	rds_rdma_drop_keys(rs);
 	rds_notify_queue_get(rs, NULL);

-	spin_lock_irqsave(&rds_sock_lock, flags);
+	spin_lock(&rds_sock_lock);
 	list_del_init(&rs->rs_item);
 	rds_sock_count--;
-	spin_unlock_irqrestore(&rds_sock_lock, flags);
+	spin_unlock(&rds_sock_lock);

 	rds_trans_put(rs->rs_transport);

@@ -409,7 +408,6 @@ static const struct proto_ops rds_proto_ops = {

 static int __rds_create(struct socket *sock, struct sock *sk, int protocol)
 {
-	unsigned long flags;
 	struct rds_sock *rs;

 	sock_init_data(sock, sk);
@@ -426,10 +424,10 @@ static int __rds_create(struct socket *sock, struct sock *sk, int protocol)
 	spin_lock_init(&rs->rs_rdma_lock);
 	rs->rs_rdma_keys = RB_ROOT;

-	spin_lock_irqsave(&rds_sock_lock, flags);
+	spin_lock(&rds_sock_lock);
 	list_add_tail(&rs->rs_item, &rds_sock_list);
 	rds_sock_count++;
-	spin_unlock_irqrestore(&rds_sock_lock, flags);
+	spin_unlock(&rds_sock_lock);

 	return 0;
 }
@@ -471,12 +469,11 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
 {
 	struct rds_sock *rs;
 	struct rds_incoming *inc;
-	unsigned long flags;
 	unsigned int total = 0;

 	len /= sizeof(struct rds_info_message);

-	spin_lock_irqsave(&rds_sock_lock, flags);
+	spin_lock(&rds_sock_lock);

 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
 		read_lock(&rs->rs_recv_lock);
@@ -492,7 +489,7 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
 		read_unlock(&rs->rs_recv_lock);
 	}

-	spin_unlock_irqrestore(&rds_sock_lock, flags);
+	spin_unlock(&rds_sock_lock);

 	lens->nr = total;
 	lens->each = sizeof(struct rds_info_message);
@@ -504,11 +501,10 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
 {
 	struct rds_info_socket sinfo;
 	struct rds_sock *rs;
-	unsigned long flags;

 	len /= sizeof(struct rds_info_socket);

-	spin_lock_irqsave(&rds_sock_lock, flags);
+	spin_lock(&rds_sock_lock);

 	if (len < rds_sock_count)
 		goto out;
@@ -529,7 +525,7 @@ out:
 	lens->nr = rds_sock_count;
 	lens->each = sizeof(struct rds_info_socket);

-	spin_unlock_irqrestore(&rds_sock_lock, flags);
+	spin_unlock(&rds_sock_lock);
 }

 static void rds_exit(void)
-- 
1.7.6.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox