Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/9] forcedeth: stats & debug enhancements
From: David Decotigny @ 2011-11-15  0:11 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny

These changes implement the ndo_get_stats64 API and add a few more
stats and debugging features for forcedeth. They also ensure that
stats updates are correct in SMP systems, 32 or 64-bits.

Changes since v2:
  - patch 1/9 is the cherry-pick of 898bdf2cb43e ("forcedeth: fix
    stats on hardware without extended stats support")
  - removed patch 5/10 "stats for rx_packets based on hardware
    registers" because packets&bytes stats are updated in software
    only (898bdf2cb43e)

Changes since v1:
  - patch 1/10 is the same as
    http://patchwork.ozlabs.org/patch/125017/ (targetting net)
  - other patches updated to take patch 1/10 into account
  - various commit message updates


Tested:
  ~150Mbps incoming TCP, ethtool -S in a loop, x86_64 16-way:
     tx_bytes: 5441989419
     rx_packets: 5439224
     tx_timeout: 0
     tx_packets: 5456705
     rx_bytes: 5566763850

Tested:
  pktgen + loopback report same RX/TX packets and bytes stats

Tested:
  tests above with Kconfig DEBUG_PAGEALLOC DEBUG_MUTEXES
  DEBUG_SPINLOCK LOCKUP_DETECTOR DEBUG_RT_MUTEXES DEBUG_LOCK_ALLOC
  PROVE_LOCKING DEBUG_ATOMIC_SLEEP DEBUG_STACK_USAGE DEBUG_KOBJECT
  DEBUG_VM DEBUG_LIST DEBUG_SG DEBUG_NOTIFIERS TEST_KSTRTOX
  STRICT_DEVMEM DEBUG_STACKOVERFLOW


############################################
# Patch Set Summary:

David Decotigny (6):
  forcedeth: expose module parameters in /sys/module
  forcedeth: implement ndo_get_stats64() API
  forcedeth: account for dropped RX frames
  forcedeth: new ethtool stat counter for TX timeouts
  forcedeth: stats updated with a deferrable timer
  forcedeth: whitespace/indentation fixes

Mike Ditto (1):
  forcedeth: Add messages to indicate using MSI or MSI-X

Sameer Nanda (1):
  forcedeth: allow to silence "TX timeout" debug messages

david decotigny (1):
  forcedeth: fix stats on hardware without extended stats support

 drivers/net/ethernet/nvidia/forcedeth.c |  342 ++++++++++++++++++++++---------
 1 files changed, 246 insertions(+), 96 deletions(-)

-- 
1.7.3.1

^ permalink raw reply

* [PATCH net-next v3 4/9] forcedeth: expose module parameters in /sys/module
From: David Decotigny @ 2011-11-15  0:11 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321313729.git.david.decotigny@google.com>

In particular, debug_tx_timeout can be updated at runtime.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 9b917ff..ee8cce5 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -6153,23 +6153,23 @@ static void __exit exit_nic(void)
 	pci_unregister_driver(&driver);
 }
 
-module_param(max_interrupt_work, int, 0);
+module_param(max_interrupt_work, int, S_IRUGO);
 MODULE_PARM_DESC(max_interrupt_work, "forcedeth maximum events handled per interrupt");
-module_param(optimization_mode, int, 0);
+module_param(optimization_mode, int, S_IRUGO);
 MODULE_PARM_DESC(optimization_mode, "In throughput mode (0), every tx & rx packet will generate an interrupt. In CPU mode (1), interrupts are controlled by a timer. In dynamic mode (2), the mode toggles between throughput and CPU mode based on network load.");
-module_param(poll_interval, int, 0);
+module_param(poll_interval, int, S_IRUGO);
 MODULE_PARM_DESC(poll_interval, "Interval determines how frequent timer interrupt is generated by [(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is 65535.");
-module_param(msi, int, 0);
+module_param(msi, int, S_IRUGO);
 MODULE_PARM_DESC(msi, "MSI interrupts are enabled by setting to 1 and disabled by setting to 0.");
-module_param(msix, int, 0);
+module_param(msix, int, S_IRUGO);
 MODULE_PARM_DESC(msix, "MSIX interrupts are enabled by setting to 1 and disabled by setting to 0.");
-module_param(dma_64bit, int, 0);
+module_param(dma_64bit, int, S_IRUGO);
 MODULE_PARM_DESC(dma_64bit, "High DMA is enabled by setting to 1 and disabled by setting to 0.");
-module_param(phy_cross, int, 0);
+module_param(phy_cross, int, S_IRUGO);
 MODULE_PARM_DESC(phy_cross, "Phy crossover detection for Realtek 8201 phy is enabled by setting to 1 and disabled by setting to 0.");
-module_param(phy_power_down, int, 0);
+module_param(phy_power_down, int, S_IRUGO);
 MODULE_PARM_DESC(phy_power_down, "Power down phy and disable link when interface is down (1), or leave phy powered up (0).");
-module_param(debug_tx_timeout, bool, 0);
+module_param(debug_tx_timeout, bool, S_IRUGO|S_IWUSR);
 MODULE_PARM_DESC(debug_tx_timeout,
 		 "Dump tx related registers and ring when tx_timeout happens");
 
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v3 8/9] forcedeth: stats updated with a deferrable timer
From: David Decotigny @ 2011-11-15  0:11 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321313729.git.david.decotigny@google.com>

Mark stats timer as deferrable: punctuality in waking the stats timer
callback doesn't matter much, as it is responsible only to avoid
integer wraparound.

We need at least 1 other timer to fire within 17s (fully loaded 1Gbps)
to avoid wrap-arounds. Desired period is still 10s.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index dd24035..50133f1 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -5540,7 +5540,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	init_timer(&np->nic_poll);
 	np->nic_poll.data = (unsigned long) dev;
 	np->nic_poll.function = nv_do_nic_poll;	/* timer handler */
-	init_timer(&np->stats_poll);
+	init_timer_deferrable(&np->stats_poll);
 	np->stats_poll.data = (unsigned long) dev;
 	np->stats_poll.function = nv_do_stats_poll;	/* timer handler */
 
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v3 9/9] forcedeth: whitespace/indentation fixes
From: David Decotigny @ 2011-11-15  0:11 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321313729.git.david.decotigny@google.com>



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 50133f1..3b4b706 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -65,7 +65,7 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/prefetch.h>
-#include  <linux/io.h>
+#include <linux/io.h>
 
 #include <asm/irq.h>
 #include <asm/system.h>
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v3 6/9] forcedeth: account for dropped RX frames
From: David Decotigny @ 2011-11-15  0:11 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321313729.git.david.decotigny@google.com>

This adds the stats counter for dropped RX frames.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index a67c1f4..84e8d17 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -824,6 +824,7 @@ struct fe_priv {
 	struct nv_driver_stat stat_rx_packets;
 	struct nv_driver_stat stat_rx_bytes; /* not always available in HW */
 	struct nv_driver_stat stat_rx_missed_errors;
+	struct nv_driver_stat stat_rx_dropped;
 
 	/* media detection workaround.
 	 * Locking: Within irq hander or disable_irq+spin_lock(&np->lock);
@@ -1740,6 +1741,7 @@ static void nv_update_stats(struct net_device *dev)
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_packets);
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_bytes);
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_missed_errors);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_dropped);
 
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_packets);
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_bytes);
@@ -1786,6 +1788,8 @@ nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
 			&np->stat_tx_bytes);
 		storage->rx_errors  = np->estats.rx_errors_total;
 		storage->tx_errors  = np->estats.tx_errors_total;
+		storage->rx_dropped = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_dropped);
 		storage->tx_dropped = NV_DRIVER_STAT_GET_TOTAL(
 			&np->stat_tx_dropped);
 
@@ -1841,8 +1845,10 @@ static int nv_alloc_rx(struct net_device *dev)
 				np->put_rx.orig = np->first_rx.orig;
 			if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
 				np->put_rx_ctx = np->first_rx_ctx;
-		} else
+		} else {
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_dropped);
 			return 1;
+		}
 	}
 	return 0;
 }
@@ -1873,8 +1879,10 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 				np->put_rx.ex = np->first_rx.ex;
 			if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
 				np->put_rx_ctx = np->first_rx_ctx;
-		} else
+		} else {
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_dropped);
 			return 1;
+		}
 	}
 	return 0;
 }
-- 
1.7.3.1

^ permalink raw reply related

* Re: [RFC PATCH net-next] enable virtio_net to return bus_info in ethtool -i consistent with emulated NICs
From: Rick Jones @ 2011-11-15  0:14 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, virtualization, Michael Tsirkin
In-Reply-To: <1321315839.2827.25.camel@bwh-desktop>

>>> Please use the existing 'not implemented' value, which is the empty
>>> string.   If you think ethtool should print some helpful message instead
>>> of an empty string, please submit a patch for ethtool.
>>
>>
>> One question - will those actually be called via an ethtool path?  In my
>> poking about through the virtio code, I got the impression those modules
>> were for "other than networking" sorts of things.
>
> I don't know; I just assumed that was why you were adding them!  In
> other contexts such as dev_printk() this string would make even less
> sense.

Those were added to make sure there were no dangling references in the 
config_ops structure defined in those files and that the code calling 
through wouldn't go off into la-la land.  Perhaps it isn't necessary 
with Rusty's suggestion that I check ".bus_info" against NULL? But that 
is why those were there, and not simply the instance in virtio_pci.c. 
I'll spin a v2 regardless.

rick

^ permalink raw reply

* [RFC PATCH net-next v2] enable virtio_net to return bus_info in ethtool -i consistent with emulated NICs
From: Rick Jones @ 2011-11-15  0:17 UTC (permalink / raw)
  To: netdev, rusty, mst, virtualization

From: Rick Jones <rick.jones2@hp.com>

Add a new .bus_name to virtio_config_ops then modify virtio_net to 
call through to it in an ethtool .get_drvinfo routine to report
bus_info in ethtool -i output which is consistent with other
emulated NICs and the output of lspci.  

Signed-off-by: Rick Jones <rick.jones2@hp.com>

---

The changes to drivers/lguest/lguest_device.c, drivers/s390/kvm/kvm_virtio.c,
and drivers/virtio/virtio_mmio.c code inspected only, not compiled.

j-ubuntu-guest:~$ ethtool -i eth0
driver: virtio_net
version: 1.0.0
firmware-version: 
bus-info: 0000:00:03.0
raj@raj-ubuntu-guest:~$ lspci | grep Ether
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device

 drivers/lguest/lguest_device.c |    6 ++++++
 drivers/net/virtio_net.c       |   15 +++++++++++++++
 drivers/s390/kvm/kvm_virtio.c  |    6 ++++++
 drivers/virtio/virtio_mmio.c   |    6 ++++++
 drivers/virtio/virtio_pci.c    |    8 ++++++++
 include/linux/virtio_config.h  |   14 ++++++++++++++
 6 files changed, 55 insertions(+), 0 deletions(-)


diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 0dc30ff..595d731 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -381,6 +381,11 @@ error:
 	return PTR_ERR(vqs[i]);
 }
 
+static const char *lg_bus_name(struct virtio_device *vdev)
+{
+	return "";
+}
+
 /* The ops structure which hooks everything together. */
 static struct virtio_config_ops lguest_config_ops = {
 	.get_features = lg_get_features,
@@ -392,6 +397,7 @@ static struct virtio_config_ops lguest_config_ops = {
 	.reset = lg_reset,
 	.find_vqs = lg_find_vqs,
 	.del_vqs = lg_del_vqs,
+	.bus_name = lg_bus_name,
 };
 
 /*
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6ee8410..4dc9d84 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -39,6 +39,7 @@ module_param(gso, bool, 0444);
 #define GOOD_COPY_LEN	128
 
 #define VIRTNET_SEND_COMMAND_SG_MAX    2
+#define VIRTNET_DRIVER_VERSION "1.0.0"
 
 struct virtnet_stats {
 	struct u64_stats_sync syncp;
@@ -889,7 +890,21 @@ static void virtnet_get_ringparam(struct net_device *dev,
 
 }
 
+
+static void virtnet_get_drvinfo(struct net_device *dev,
+				struct ethtool_drvinfo *info)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct virtio_device *vdev = vi->vdev;
+
+	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
+	strlcpy(info->version, VIRTNET_DRIVER_VERSION, sizeof(info->version));
+	strlcpy(info->bus_info, virtio_bus_name(vdev), sizeof(info->bus_info));
+
+}
+
 static const struct ethtool_ops virtnet_ethtool_ops = {
+	.get_drvinfo = virtnet_get_drvinfo,
 	.get_link = ethtool_op_get_link,
 	.get_ringparam = virtnet_get_ringparam,
 };
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 94f49ff..8af868b 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -263,6 +263,11 @@ error:
 	return PTR_ERR(vqs[i]);
 }
 
+static const char *kvm_bus_name(struct virtio_device *vdev)
+{
+	return "";
+}
+
 /*
  * The config ops structure as defined by virtio config
  */
@@ -276,6 +281,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = {
 	.reset = kvm_reset,
 	.find_vqs = kvm_find_vqs,
 	.del_vqs = kvm_del_vqs,
+	.bus_name = kvm_bus_name,
 };
 
 /*
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index acc5e43..2f57380 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -361,7 +361,12 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 	return 0;
 }
 
+static const char *vm_bus_name(struct virtio_device *vdev)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
 
+	return vm_dev->pdev->name;
+}
 
 static struct virtio_config_ops virtio_mmio_config_ops = {
 	.get		= vm_get,
@@ -373,6 +378,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = {
 	.del_vqs	= vm_del_vqs,
 	.get_features	= vm_get_features,
 	.finalize_features = vm_finalize_features,
+	.bus_name	= vm_bus_name,
 };
 
 
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 79a31e5..764ec05 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -580,6 +580,13 @@ static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 				  false, false);
 }
 
+static const char *vp_bus_name(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	return pci_name(vp_dev->pci_dev);
+}
+
 static struct virtio_config_ops virtio_pci_config_ops = {
 	.get		= vp_get,
 	.set		= vp_set,
@@ -590,6 +597,7 @@ static struct virtio_config_ops virtio_pci_config_ops = {
 	.del_vqs	= vp_del_vqs,
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
+	.bus_name	= vp_bus_name,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index add4790..63f98d0 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -100,6 +100,10 @@
  *	vdev: the virtio_device
  *	This gives the final feature bits for the device: it can change
  *	the dev->feature bits if it wants.
+ * @bus_name: return the bus name associated with the device
+ *	vdev: the virtio_device
+ *      This returns a pointer to the bus name a la pci_name from which
+ *      the caller can then copy.
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -117,6 +121,7 @@ struct virtio_config_ops {
 	void (*del_vqs)(struct virtio_device *);
 	u32 (*get_features)(struct virtio_device *vdev);
 	void (*finalize_features)(struct virtio_device *vdev);
+	const char *(*bus_name)(struct virtio_device *vdev);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
@@ -182,5 +187,14 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev,
 		return ERR_PTR(err);
 	return vq;
 }
+
+static inline
+const char *virtio_bus_name(struct virtio_device *vdev)
+{
+	if (!vdev->config->bus_name)
+		return "virtio";
+	return vdev->config->bus_name(vdev);
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_VIRTIO_CONFIG_H */

^ permalink raw reply related

* [PATCH net-next v3 5/9] forcedeth: implement ndo_get_stats64() API
From: David Decotigny @ 2011-11-15  0:11 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321313729.git.david.decotigny@google.com>

This commit implements the ndo_get_stats64() API for forcedeth. Since
these stats are being updated from different contexts (process and
timer), this commit adds protection (locking + atomic variables).

Tested:
  - 16-way SMP x86_64 ->
    RX bytes:7244556582 (7.2 GB)  TX bytes:181904254 (181.9 MB)
  - pktgen + loopback: identical rx_bytes/tx_bytes and rx_packets/tx_packets



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |  182 ++++++++++++++++++++++++-------
 1 files changed, 141 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index ee8cce5..a67c1f4 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -692,6 +692,21 @@ struct nv_ethtool_stats {
 #define NV_DEV_STATISTICS_V2_COUNT (NV_DEV_STATISTICS_V3_COUNT - 3)
 #define NV_DEV_STATISTICS_V1_COUNT (NV_DEV_STATISTICS_V2_COUNT - 6)
 
+/* driver statistics */
+struct nv_driver_stat {
+	atomic_t delta;  /* increase since last nv_update_stats() */
+	u64 total;  /* cumulative, requires netdev_priv(dev)->stats_lock */
+};
+
+#define NV_DRIVER_STAT_ATOMIC_INC(ptr_stat) /* atomic */ \
+	({ atomic_inc(&(ptr_stat)->delta); })
+#define NV_DRIVER_STAT_ATOMIC_ADD(ptr_stat,increment) /* atomic */	\
+	({ atomic_add((increment), &(ptr_stat)->delta); })
+#define NV_DRIVER_STAT_UPDATE_TOTAL(ptr_stat) /* requires stats_lock */ \
+	({ (ptr_stat)->total += atomic_xchg(&(ptr_stat)->delta, 0); })
+#define NV_DRIVER_STAT_GET_TOTAL(ptr_stat) /* requires stats_lock */ \
+	((ptr_stat)->total)
+
 /* diagnostics */
 #define NV_TEST_COUNT_BASE 3
 #define NV_TEST_COUNT_EXTENDED 4
@@ -736,6 +751,12 @@ struct nv_skb_map {
  * - tx setup is lockless: it relies on netif_tx_lock. Actual submission
  *	needs netdev_priv(dev)->lock :-(
  * - set_multicast_list: preparation lockless, relies on netif_tx_lock.
+ *
+ * Stats are protected with stats_lock:
+ * - updated by nv_do_stats_poll (timer). This is meant to avoid
+ *   integer wraparound in the NIC stats registers, at low frequency
+ *   (0.1 Hz)
+ * - updated by nv_get_ethtool_stats + nv_get_stats64
  */
 
 /* in dev: base, irq */
@@ -745,9 +766,10 @@ struct fe_priv {
 	struct net_device *dev;
 	struct napi_struct napi;
 
-	/* General data:
-	 * Locking: spin_lock(&np->lock); */
+	/* stats are updated in syscall and timer */
+	spinlock_t stats_lock;
 	struct nv_ethtool_stats estats;
+
 	int in_shutdown;
 	u32 linkspeed;
 	int duplex;
@@ -798,6 +820,11 @@ struct fe_priv {
 	u32 nic_poll_irq;
 	int rx_ring_size;
 
+	/* RX software stats */
+	struct nv_driver_stat stat_rx_packets;
+	struct nv_driver_stat stat_rx_bytes; /* not always available in HW */
+	struct nv_driver_stat stat_rx_missed_errors;
+
 	/* media detection workaround.
 	 * Locking: Within irq hander or disable_irq+spin_lock(&np->lock);
 	 */
@@ -820,6 +847,11 @@ struct fe_priv {
 	struct nv_skb_map *tx_end_flip;
 	int tx_stop;
 
+	/* TX software stats */
+	struct nv_driver_stat stat_tx_packets; /* not always available in HW */
+	struct nv_driver_stat stat_tx_bytes;
+	struct nv_driver_stat stat_tx_dropped;
+
 	/* msi/msi-x fields */
 	u32 msi_flags;
 	struct msix_entry msi_x_entry[NV_MSI_X_MAX_VECTORS];
@@ -1635,11 +1667,19 @@ static void nv_mac_reset(struct net_device *dev)
 	pci_push(base);
 }
 
-static void nv_get_hw_stats(struct net_device *dev)
+/* Caller must appropriately lock netdev_priv(dev)->stats_lock */
+static void nv_update_stats(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 
+	/* If it happens that this is run in top-half context, then
+	 * replace the spin_lock of stats_lock with
+	 * spin_lock_irqsave() in calling functions. */
+	WARN_ONCE(in_irq(), "forcedeth: estats spin_lock(_bh) from top-half");
+	assert_spin_locked(&np->stats_lock);
+
+	/* query hardware */
 	np->estats.tx_bytes += readl(base + NvRegTxCnt);
 	np->estats.tx_zero_rexmt += readl(base + NvRegTxZeroReXmt);
 	np->estats.tx_one_rexmt += readl(base + NvRegTxOneReXmt);
@@ -1695,21 +1735,35 @@ static void nv_get_hw_stats(struct net_device *dev)
 		np->estats.tx_multicast += readl(base + NvRegTxMulticast);
 		np->estats.tx_broadcast += readl(base + NvRegTxBroadcast);
 	}
+
+	/* update software stats */
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_packets);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_bytes);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_missed_errors);
+
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_packets);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_bytes);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_dropped);
 }
 
 /*
- * nv_get_stats: dev->get_stats function
+ * nv_get_stats64: dev->ndo_get_stats64 function
  * Get latest stats value from the nic.
  * Called with read_lock(&dev_base_lock) held for read -
  * only synchronized against unregister_netdevice.
  */
-static struct net_device_stats *nv_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64*
+nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
+	__acquires(&netdev_priv(dev)->stats_lock)
+	__releases(&netdev_priv(dev)->stats_lock)
 {
 	struct fe_priv *np = netdev_priv(dev);
 
 	/* If the nic supports hw counters then retrieve latest values */
-	if (np->driver_data & (DEV_HAS_STATISTICS_V1|DEV_HAS_STATISTICS_V2|DEV_HAS_STATISTICS_V3)) {
-		nv_get_hw_stats(dev);
+	if (np->driver_data & DEV_HAS_STATISTICS_V123) {
+		spin_lock_bh(&np->stats_lock);
+
+		nv_update_stats(dev);
 
 		/*
 		 * Note: because HW stats are not always available and
@@ -1721,17 +1775,40 @@ static struct net_device_stats *nv_get_stats(struct net_device *dev)
 		 * packet (Ethernet FCS CRC).
 		 */
 
-		/* copy to net_device stats */
-		dev->stats.tx_fifo_errors = np->estats.tx_fifo_errors;
-		dev->stats.tx_carrier_errors = np->estats.tx_carrier_errors;
-		dev->stats.rx_crc_errors = np->estats.rx_crc_errors;
-		dev->stats.rx_over_errors = np->estats.rx_over_errors;
-		dev->stats.rx_fifo_errors = np->estats.rx_drop_frame;
-		dev->stats.rx_errors = np->estats.rx_errors_total;
-		dev->stats.tx_errors = np->estats.tx_errors_total;
-	}
-
-	return &dev->stats;
+		/* generic stats */
+		storage->rx_packets = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_packets);
+		storage->tx_packets = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_tx_packets);
+		storage->rx_bytes   = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_bytes);
+		storage->tx_bytes   = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_tx_bytes);
+		storage->rx_errors  = np->estats.rx_errors_total;
+		storage->tx_errors  = np->estats.tx_errors_total;
+		storage->tx_dropped = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_tx_dropped);
+
+		/* meaningful only when NIC supports stats v3 */
+		storage->multicast  = np->estats.rx_multicast;
+
+		/* detailed rx_errors */
+		storage->rx_length_errors = np->estats.rx_length_error;
+		storage->rx_over_errors   = np->estats.rx_over_errors;
+		storage->rx_crc_errors    = np->estats.rx_crc_errors;
+		storage->rx_frame_errors  = np->estats.rx_frame_align_error;
+		storage->rx_fifo_errors   = np->estats.rx_drop_frame;
+		storage->rx_missed_errors = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_missed_errors);
+
+		/* detailed tx_errors */
+		storage->tx_carrier_errors = np->estats.tx_carrier_errors;
+		storage->tx_fifo_errors    = np->estats.tx_fifo_errors;
+
+		spin_unlock_bh(&np->stats_lock);
+	}
+
+	return storage;
 }
 
 /*
@@ -1933,7 +2010,7 @@ static void nv_drain_tx(struct net_device *dev)
 			np->tx_ring.ex[i].buflow = 0;
 		}
 		if (nv_release_txskb(np, &np->tx_skb[i]))
-			dev->stats.tx_dropped++;
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_tx_dropped);
 		np->tx_skb[i].dma = 0;
 		np->tx_skb[i].dma_len = 0;
 		np->tx_skb[i].dma_single = 0;
@@ -2390,11 +2467,15 @@ static int nv_tx_done(struct net_device *dev, int limit)
 		if (np->desc_ver == DESC_VER_1) {
 			if (flags & NV_TX_LASTPACKET) {
 				if (flags & NV_TX_ERROR) {
-					if ((flags & NV_TX_RETRYERROR) && !(flags & NV_TX_RETRYCOUNT_MASK))
+					if ((flags & NV_TX_RETRYERROR)
+					    && !(flags & NV_TX_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
 				} else {
-					dev->stats.tx_packets++;
-					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
+					NV_DRIVER_STAT_ATOMIC_INC(
+						&np->stat_tx_packets);
+					NV_DRIVER_STAT_ATOMIC_ADD(
+						&np->stat_tx_bytes,
+						np->get_tx_ctx->skb->len);
 				}
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
@@ -2403,11 +2484,15 @@ static int nv_tx_done(struct net_device *dev, int limit)
 		} else {
 			if (flags & NV_TX2_LASTPACKET) {
 				if (flags & NV_TX2_ERROR) {
-					if ((flags & NV_TX2_RETRYERROR) && !(flags & NV_TX2_RETRYCOUNT_MASK))
+					if ((flags & NV_TX2_RETRYERROR)
+					    && !(flags & NV_TX2_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
 				} else {
-					dev->stats.tx_packets++;
-					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
+					NV_DRIVER_STAT_ATOMIC_INC(
+						&np->stat_tx_packets);
+					NV_DRIVER_STAT_ATOMIC_ADD(
+						&np->stat_tx_bytes,
+						np->get_tx_ctx->skb->len);
 				}
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
@@ -2441,15 +2526,18 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit)
 
 		if (flags & NV_TX2_LASTPACKET) {
 			if (flags & NV_TX2_ERROR) {
-				if ((flags & NV_TX2_RETRYERROR) && !(flags & NV_TX2_RETRYCOUNT_MASK)) {
+				if ((flags & NV_TX2_RETRYERROR)
+				    && !(flags & NV_TX2_RETRYCOUNT_MASK)) {
 					if (np->driver_data & DEV_HAS_GEAR_MODE)
 						nv_gear_backoff_reseed(dev);
 					else
 						nv_legacybackoff_reseed(dev);
 				}
 			} else {
-				dev->stats.tx_packets++;
-				dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
+				NV_DRIVER_STAT_ATOMIC_INC(&np->stat_tx_packets);
+				NV_DRIVER_STAT_ATOMIC_ADD(
+					&np->stat_tx_bytes,
+					np->get_tx_ctx->skb->len);
 			}
 
 			dev_kfree_skb_any(np->get_tx_ctx->skb);
@@ -2663,7 +2751,7 @@ static int nv_rx_process(struct net_device *dev, int limit)
 					/* the rest are hard errors */
 					else {
 						if (flags & NV_RX_MISSEDFRAME)
-							dev->stats.rx_missed_errors++;
+							NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_missed_errors);
 						dev_kfree_skb(skb);
 						goto next_pkt;
 					}
@@ -2706,8 +2794,8 @@ static int nv_rx_process(struct net_device *dev, int limit)
 		skb_put(skb, len);
 		skb->protocol = eth_type_trans(skb, dev);
 		napi_gro_receive(&np->napi, skb);
-		dev->stats.rx_packets++;
-		dev->stats.rx_bytes += len;
+		NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_packets);
+		NV_DRIVER_STAT_ATOMIC_ADD(&np->stat_rx_bytes, len);
 next_pkt:
 		if (unlikely(np->get_rx.orig++ == np->last_rx.orig))
 			np->get_rx.orig = np->first_rx.orig;
@@ -2790,8 +2878,8 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 				__vlan_hwaccel_put_tag(skb, vid);
 			}
 			napi_gro_receive(&np->napi, skb);
-			dev->stats.rx_packets++;
-			dev->stats.rx_bytes += len;
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_packets);
+			NV_DRIVER_STAT_ATOMIC_ADD(&np->stat_rx_bytes, len);
 		} else {
 			dev_kfree_skb(skb);
 		}
@@ -4000,11 +4088,18 @@ static void nv_poll_controller(struct net_device *dev)
 #endif
 
 static void nv_do_stats_poll(unsigned long data)
+	__acquires(&netdev_priv(dev)->stats_lock)
+	__releases(&netdev_priv(dev)->stats_lock)
 {
 	struct net_device *dev = (struct net_device *) data;
 	struct fe_priv *np = netdev_priv(dev);
 
-	nv_get_hw_stats(dev);
+	/* If lock is currently taken, the stats are being refreshed
+	 * and hence fresh enough */
+	if (spin_trylock(&np->stats_lock)) {
+		nv_update_stats(dev);
+		spin_unlock(&np->stats_lock);
+	}
 
 	if (!np->in_shutdown)
 		mod_timer(&np->stats_poll,
@@ -4711,14 +4806,18 @@ static int nv_get_sset_count(struct net_device *dev, int sset)
 	}
 }
 
-static void nv_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *estats, u64 *buffer)
+static void nv_get_ethtool_stats(struct net_device *dev,
+				 struct ethtool_stats *estats, u64 *buffer)
+	__acquires(&netdev_priv(dev)->stats_lock)
+	__releases(&netdev_priv(dev)->stats_lock)
 {
 	struct fe_priv *np = netdev_priv(dev);
 
-	/* update stats */
-	nv_get_hw_stats(dev);
-
-	memcpy(buffer, &np->estats, nv_get_sset_count(dev, ETH_SS_STATS)*sizeof(u64));
+	spin_lock_bh(&np->stats_lock);
+	nv_update_stats(dev);
+	memcpy(buffer, &np->estats,
+	       nv_get_sset_count(dev, ETH_SS_STATS)*sizeof(u64));
+	spin_unlock_bh(&np->stats_lock);
 }
 
 static int nv_link_test(struct net_device *dev)
@@ -5362,7 +5461,7 @@ static int nv_close(struct net_device *dev)
 static const struct net_device_ops nv_netdev_ops = {
 	.ndo_open		= nv_open,
 	.ndo_stop		= nv_close,
-	.ndo_get_stats		= nv_get_stats,
+	.ndo_get_stats64	= nv_get_stats64,
 	.ndo_start_xmit		= nv_start_xmit,
 	.ndo_tx_timeout		= nv_tx_timeout,
 	.ndo_change_mtu		= nv_change_mtu,
@@ -5379,7 +5478,7 @@ static const struct net_device_ops nv_netdev_ops = {
 static const struct net_device_ops nv_netdev_ops_optimized = {
 	.ndo_open		= nv_open,
 	.ndo_stop		= nv_close,
-	.ndo_get_stats		= nv_get_stats,
+	.ndo_get_stats64	= nv_get_stats64,
 	.ndo_start_xmit		= nv_start_xmit_optimized,
 	.ndo_tx_timeout		= nv_tx_timeout,
 	.ndo_change_mtu		= nv_change_mtu,
@@ -5418,6 +5517,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	np->dev = dev;
 	np->pci_dev = pci_dev;
 	spin_lock_init(&np->lock);
+	spin_lock_init(&np->stats_lock);
 	SET_NETDEV_DEV(dev, &pci_dev->dev);
 
 	init_timer(&np->oom_kick);
-- 
1.7.3.1

^ permalink raw reply related

* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: Grant Grundler @ 2011-11-15  0:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, allan, freddy
In-Reply-To: <20111114.184240.1832866874941847383.davem@davemloft.net>

On Mon, Nov 14, 2011 at 3:42 PM, David Miller <davem@davemloft.net> wrote:
> From: Grant Grundler <grundler@chromium.org>
> Date: Mon, 14 Nov 2011 15:22:01 -0800
>
>> No functional changes. Three things targeted:
>> o update VERSION to reflect previous changes.
>> o fix up white space/formatting of some lines
>> o define 150 to be AX_SWRESET_MDELAY (self documenting code)
>>
>> Signed-off-by: Grant Grundler <grundler@chromium.org>
>
> Why have such a wonderful set of real bug fixes that I can apply right
> now and send off to Linus's tree, only to sabotage it with whitespace
> changes at the end which are no appropriate?
>
> Please, just submit this without the whitespace and other cleanup
> stuff.  Restrict it to real bug fixes, and I'll push it around.

No Problem! :)

I wasn't expecting this to go to linus immediately but it makes that it should.

> The whitespace and other bits can be submitted for net-next afterwards.

Will do.

cheers,
grant

^ permalink raw reply

* [PATCH 5/5] net-next:asix: v2 update VERSION only
From: Grant Grundler @ 2011-11-15  0:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler

Update VERSION to reflect previous changes.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -36,7 +36,7 @@
 #include <linux/usb/usbnet.h>
 #include <linux/slab.h>
 
-#define DRIVER_VERSION "26-Sep-2011"
+#define DRIVER_VERSION "08-Nov-2011"
 #define DRIVER_NAME "asix"
 
 /* ASIX AX8817X based USB 2.0 Ethernet Devices */

^ permalink raw reply

* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: Grant Grundler @ 2011-11-15  0:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, allan, freddy
In-Reply-To: <20111114.184240.1832866874941847383.davem@davemloft.net>

On Mon, Nov 14, 2011 at 3:42 PM, David Miller <davem@davemloft.net> wrote:
...
> Please, just submit this without the whitespace and other cleanup
> stuff.  Restrict it to real bug fixes, and I'll push it around.

I just sent a "v2" of 5/5 patch which only has the VERSION change. Is
that sufficient or should I resend all 5?

thanks!
grant

^ permalink raw reply

* Re: [PATCH net-next v3 5/9] forcedeth: implement ndo_get_stats64() API
From: Stephen Hemminger @ 2011-11-15  1:10 UTC (permalink / raw)
  To: David Decotigny
  Cc: netdev, linux-kernel, David S. Miller, Ian Campbell, Eric Dumazet,
	Jeff Kirsher, Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla
In-Reply-To: <e2dd16618bc1bd5108e6dea3c7531ee229e4c699.1321313729.git.david.decotigny@google.com>

On Mon, 14 Nov 2011 16:11:15 -0800
David Decotigny <david.decotigny@google.com> wrote:

> This commit implements the ndo_get_stats64() API for forcedeth. Since
> these stats are being updated from different contexts (process and
> timer), this commit adds protection (locking + atomic variables).
> 
> Tested:
>   - 16-way SMP x86_64 ->
>     RX bytes:7244556582 (7.2 GB)  TX bytes:181904254 (181.9 MB)
>   - pktgen + loopback: identical rx_bytes/tx_bytes and rx_packets/tx_packets
> 
> 
> 
> Signed-off-by: David Decotigny <david.decotigny@google.com>
> ---
>  drivers/net/ethernet/nvidia/forcedeth.c |  182 ++++++++++++++++++++++++-------
>  1 files changed, 141 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
> index ee8cce5..a67c1f4 100644
> --- a/drivers/net/ethernet/nvidia/forcedeth.c
> +++ b/drivers/net/ethernet/nvidia/forcedeth.c
> @@ -692,6 +692,21 @@ struct nv_ethtool_stats {
>  #define NV_DEV_STATISTICS_V2_COUNT (NV_DEV_STATISTICS_V3_COUNT - 3)
>  #define NV_DEV_STATISTICS_V1_COUNT (NV_DEV_STATISTICS_V2_COUNT - 6)
>  
> +/* driver statistics */
> +struct nv_driver_stat {
> +	atomic_t delta;  /* increase since last nv_update_stats() */
> +	u64 total;  /* cumulative, requires netdev_priv(dev)->stats_lock */
> +};

Please existing u64_stats_sync rather than inventing your own
method. The u64_stats_sync is faster and does require locking.

^ permalink raw reply

* Re: [patch net-next V8] net: introduce ethernet teaming device
From: Andy Gospodarek @ 2011-11-15  1:56 UTC (permalink / raw)
  To: Rick Jones
  Cc: Jiri Pirko, netdev, davem, eric.dumazet, bhutchings, shemminger,
	fubar, andy, tgraf, ebiederm, mirqus, kaber, greearb, jesse, fbl,
	benjamin.poirier, jzupka, ivecera
In-Reply-To: <4EC160B4.2040709@hp.com>

On Mon, Nov 14, 2011 at 10:40:52AM -0800, Rick Jones wrote:
> On 11/12/2011 12:16 AM, Jiri Pirko wrote:
>> This patch introduces new network device called team. It supposes to be
>> very fast, simple, userspace-driven alternative to existing bonding
>> driver.
>
> What is the definition of "very" here - relative to bonding I presume?  
> Are there actual performance figures available at this point?  Something  
> along the lines of test through both on the same hardware.  I don't have  
> HW on which to run myself but would be quite happy to help with say  
> netperf command selection to demonstrate the difference between the two.
>
> happy benchmrking,
>

On most modern systems I suspect there will be little to no difference
between bonding RX peformance and team performance.

If there is any now, I suspect team and bond performance to be similar
by the time team has to account for the corner-cases bonding has already
resolved.  :-)

Benchmarks may prove otherwise, but I've yet to see Jiri produce
anything.  My initial testing doesn't demonstrate any measureable
differences with 1Gbps interfaces on a multi-core, multi-socket system.
 

^ permalink raw reply

* [PATCH] ipv4: return NET_RX_DROP when arp_rcv drops the received packet.
From: roy.qing.li @ 2011-11-15  2:09 UTC (permalink / raw)
  To: netdev

From: RongQing.Li <roy.qing.li@gmail.com>

return NET_RX_DROP when arp_rcv drops the received packet.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
---
 net/ipv4/arp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index d732827..9967385 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -969,7 +969,7 @@ static int arp_rcv(struct sk_buff *skb, struct net_device *dev,
 freeskb:
 	kfree_skb(skb);
 out_of_mem:
-	return 0;
+	return NET_RX_DROP;
 }
 
 /*
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] tcp: fixes for DSACK-based undo of cwnd reduction during fast recovery
From: Ilpo Järvinen @ 2011-11-15  2:05 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, Netdev, nanditad, ycheng, therbert
In-Reply-To: <CADVnQyn7czRocYV3-goTtwqCDg9MBv+bTHeZ-A8R4n45LH8L=g@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3531 bytes --]

On Mon, 14 Nov 2011, Neal Cardwell wrote:

> On Mon, Nov 14, 2011 at 3:23 PM, David Miller <davem@davemloft.net> wrote:
> > From: Neal Cardwell <ncardwell@google.com>
> > Date: Tue,  8 Nov 2011 13:07:11 -0500
> >
> >> (2)

When you need to say "fixes 1), 2), and 3)", please stop for a moment 
thinking if the fixes really need to be sent as one fix or 3 fixes (a 
series PATCH x/3, see netdev for examples). It would make review much 
simpler if unrelated fixes are not put together even if all of them 
would be needed to get DSACK working as intented.

> >>  When the ACK field is below snd_una (the "old_ack" goto label),
> >> process any DSACKs and try to send out more packets based on
> >> newly-SACKed packets.
> >
> > This seems dubious.

Indeed. However, there's nothing wrong in the processing of DSACKs nor 
SACKs in the given case but we lack call to tcp_fastretrans_alert. But the 
commit message does not say that, in fact, I failed to understand what 2) 
tried to say as I knew that DSACK and SACK processing itself is ok and 
done (it also mixes "DSACKs"/"SACKed" illogically). This commit message 
must be fixed so that guys who read it later won't get confused too. BUT
I'd on the same time split the whole fix to 3 different patches unless 
there's some very good reason why the changes are interlocked so that the 
splitting is not possible.

> > An older ACK should not have more uptodate SACK information than a newer
> > ACK.

This is not true with DSACK that are not cumulative. And also the 
reported SACK state is cumulative up to a limit.

> I grant that it is a rare corner case, but it is possible if there is
> reordering and packet loss in both directions.
> 
> In fact, AFAICT the code path to handle this corner case is already
> there: AFAICT we call tcp_sacktag_write_queue() in the old_ack code
> path exactly because it is possible for old ACKs to carry SACK ranges
> that we haven't seen yet.

I agree. ...And also DSACKs that can only be seen in that particular ACK.

> Let me try to lay out a specific example:
> 
> Suppose a sender sends packets 1-12, and there is lots of loss and
> reordering in both directions.
> 
> Suppose the receiver sends the following ACKs with SACK blocks, in the
> following order (using packet numbers rather than sequence numbers):
> 
> (1) ack 1, sack <3-4>
> (2) ack 1, sack <3-4 6>
> (3) ack 1, sack <3-4 6 8>
> (4) ack 1, sack <3-4 6 8 10>
> (5) ack 1, sack <6 8 10 12>
> (6) ack 2, sack <6 8 10 12>
> 
> Note that starting at ACK (5) the receiver is limited by the fact that
> TCP options can only hold 4 SACK ranges, so the SACK for 3-4 "falls
> off" the SACK block set.
> 
> Now suppose the ACKs are reordered, and some are dropped, so that the
> sender receives the ACKs in this order:
> 
> (6) ack 2, sack <6 8 10 12>
> (1) ack 1, sack <3-4>
> 
> The arrival of (6) advances snd_una to 2. Now the question is what to
> do when (1), with an ACK field below snd_una, arrives at the
> sender. The existing code tags packets 3-4 as SACKed, but does not use
> this newly SACK-tagged data to send out new data. This commit proposes 
> to use the newly SACK-tagged 3-4 to send out new data.

Contrary to your commit message, new data sending is already done outside 
of tcp_ack() in tcp_data_snd_check? ...It's tcp_fastretrans_alert call 
that is lacking so no rexmits could be done currently?

In addition, this given scenarios was not DSACK related!?! I think your 
commit message is just bogus discussing (only) DSACK processing for this 
case.


-- 
 i.

^ permalink raw reply

* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: David Miller @ 2011-11-15  2:32 UTC (permalink / raw)
  To: grundler; +Cc: netdev, linux-kernel, allan, freddy
In-Reply-To: <CANEJEGuzdxDuVysuAJPcdC7-mywa8B8JggVHe_VMTK_1j4QguQ@mail.gmail.com>

From: Grant Grundler <grundler@chromium.org>
Date: Mon, 14 Nov 2011 16:54:35 -0800

> On Mon, Nov 14, 2011 at 3:42 PM, David Miller <davem@davemloft.net> wrote:
> ...
>> Please, just submit this without the whitespace and other cleanup
>> stuff.  Restrict it to real bug fixes, and I'll push it around.
> 
> I just sent a "v2" of 5/5 patch which only has the VERSION change. Is
> that sufficient or should I resend all 5?

Just respinning #5 is fine, thanks.

^ permalink raw reply

* [RFC PATCH 1/2] powerpc: Remove duplicate cacheable_memcpy/memzero functions
From: Kyle Moffett @ 2011-11-15  2:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-kernel, benh, galak, scottwood, B04825, paul.gortmaker,
	Kyle Moffett, Paul Mackerras, Andrew Morton, Milton Miller,
	Mike Frysinger, Oleg Nesterov, Anton Blanchard, David S. Miller,
	Ian Campbell, Eric Dumazet, Jeff Kirsher, Jiri Pirko,
	linuxppc-dev, netdev
In-Reply-To: <1320986410.21206.48.camel@pasglop>

These functions are only used from one place each.  If the cacheable_*
versions really are more efficient, then those changes should be
migrated into the common code instead.

NOTE: The old routines are just flat buggy on kernels that support
      hardware with different cacheline sizes.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
---
 arch/powerpc/include/asm/system.h    |    2 -
 arch/powerpc/kernel/ppc_ksyms.c      |    2 -
 arch/powerpc/lib/copy_32.S           |  127 ----------------------------------
 arch/powerpc/mm/ppc_mmu_32.c         |    2 +-
 drivers/net/ethernet/ibm/emac/core.c |   12 +---
 5 files changed, 3 insertions(+), 142 deletions(-)

diff --git a/arch/powerpc/include/asm/system.h b/arch/powerpc/include/asm/system.h
index e30a13d..25389d1 100644
--- a/arch/powerpc/include/asm/system.h
+++ b/arch/powerpc/include/asm/system.h
@@ -189,8 +189,6 @@ static inline void flush_spe_to_thread(struct task_struct *t)
 #endif
 
 extern int call_rtas(const char *, int, int, unsigned long *, ...);
-extern void cacheable_memzero(void *p, unsigned int nb);
-extern void *cacheable_memcpy(void *, const void *, unsigned int);
 extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
 extern void bad_page_fault(struct pt_regs *, unsigned long, int);
 extern int die(const char *, struct pt_regs *, long);
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index d3114a7..acba8ce 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -159,8 +159,6 @@ EXPORT_SYMBOL(screen_info);
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(timer_interrupt);
 EXPORT_SYMBOL(tb_ticks_per_jiffy);
-EXPORT_SYMBOL(cacheable_memcpy);
-EXPORT_SYMBOL(cacheable_memzero);
 #endif
 
 #ifdef CONFIG_PPC32
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index 55f19f9..6813f80 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -69,54 +69,6 @@ CACHELINE_BYTES = L1_CACHE_BYTES
 LG_CACHELINE_BYTES = L1_CACHE_SHIFT
 CACHELINE_MASK = (L1_CACHE_BYTES-1)
 
-/*
- * Use dcbz on the complete cache lines in the destination
- * to set them to zero.  This requires that the destination
- * area is cacheable.  -- paulus
- */
-_GLOBAL(cacheable_memzero)
-	mr	r5,r4
-	li	r4,0
-	addi	r6,r3,-4
-	cmplwi	0,r5,4
-	blt	7f
-	stwu	r4,4(r6)
-	beqlr
-	andi.	r0,r6,3
-	add	r5,r0,r5
-	subf	r6,r0,r6
-	clrlwi	r7,r6,32-LG_CACHELINE_BYTES
-	add	r8,r7,r5
-	srwi	r9,r8,LG_CACHELINE_BYTES
-	addic.	r9,r9,-1	/* total number of complete cachelines */
-	ble	2f
-	xori	r0,r7,CACHELINE_MASK & ~3
-	srwi.	r0,r0,2
-	beq	3f
-	mtctr	r0
-4:	stwu	r4,4(r6)
-	bdnz	4b
-3:	mtctr	r9
-	li	r7,4
-10:	dcbz	r7,r6
-	addi	r6,r6,CACHELINE_BYTES
-	bdnz	10b
-	clrlwi	r5,r8,32-LG_CACHELINE_BYTES
-	addi	r5,r5,4
-2:	srwi	r0,r5,2
-	mtctr	r0
-	bdz	6f
-1:	stwu	r4,4(r6)
-	bdnz	1b
-6:	andi.	r5,r5,3
-7:	cmpwi	0,r5,0
-	beqlr
-	mtctr	r5
-	addi	r6,r6,3
-8:	stbu	r4,1(r6)
-	bdnz	8b
-	blr
-
 _GLOBAL(memset)
 	rlwimi	r4,r4,8,16,23
 	rlwimi	r4,r4,16,0,15
@@ -142,85 +94,6 @@ _GLOBAL(memset)
 	bdnz	8b
 	blr
 
-/*
- * This version uses dcbz on the complete cache lines in the
- * destination area to reduce memory traffic.  This requires that
- * the destination area is cacheable.
- * We only use this version if the source and dest don't overlap.
- * -- paulus.
- */
-_GLOBAL(cacheable_memcpy)
-	add	r7,r3,r5		/* test if the src & dst overlap */
-	add	r8,r4,r5
-	cmplw	0,r4,r7
-	cmplw	1,r3,r8
-	crand	0,0,4			/* cr0.lt &= cr1.lt */
-	blt	memcpy			/* if regions overlap */
-
-	addi	r4,r4,-4
-	addi	r6,r3,-4
-	neg	r0,r3
-	andi.	r0,r0,CACHELINE_MASK	/* # bytes to start of cache line */
-	beq	58f
-
-	cmplw	0,r5,r0			/* is this more than total to do? */
-	blt	63f			/* if not much to do */
-	andi.	r8,r0,3			/* get it word-aligned first */
-	subf	r5,r0,r5
-	mtctr	r8
-	beq+	61f
-70:	lbz	r9,4(r4)		/* do some bytes */
-	stb	r9,4(r6)
-	addi	r4,r4,1
-	addi	r6,r6,1
-	bdnz	70b
-61:	srwi.	r0,r0,2
-	mtctr	r0
-	beq	58f
-72:	lwzu	r9,4(r4)		/* do some words */
-	stwu	r9,4(r6)
-	bdnz	72b
-
-58:	srwi.	r0,r5,LG_CACHELINE_BYTES /* # complete cachelines */
-	clrlwi	r5,r5,32-LG_CACHELINE_BYTES
-	li	r11,4
-	mtctr	r0
-	beq	63f
-53:
-	dcbz	r11,r6
-	COPY_16_BYTES
-#if L1_CACHE_BYTES >= 32
-	COPY_16_BYTES
-#if L1_CACHE_BYTES >= 64
-	COPY_16_BYTES
-	COPY_16_BYTES
-#if L1_CACHE_BYTES >= 128
-	COPY_16_BYTES
-	COPY_16_BYTES
-	COPY_16_BYTES
-	COPY_16_BYTES
-#endif
-#endif
-#endif
-	bdnz	53b
-
-63:	srwi.	r0,r5,2
-	mtctr	r0
-	beq	64f
-30:	lwzu	r0,4(r4)
-	stwu	r0,4(r6)
-	bdnz	30b
-
-64:	andi.	r0,r5,3
-	mtctr	r0
-	beq+	65f
-40:	lbz	r0,4(r4)
-	stb	r0,4(r6)
-	addi	r4,r4,1
-	addi	r6,r6,1
-	bdnz	40b
-65:	blr
-
 _GLOBAL(memmove)
 	cmplw	0,r3,r4
 	bgt	backwards_memcpy
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 11571e1..9f16b9f 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -224,7 +224,7 @@ void __init MMU_init_hw(void)
 	 */
 	if ( ppc_md.progress ) ppc_md.progress("hash:find piece", 0x322);
 	Hash = __va(memblock_alloc(Hash_size, Hash_size));
-	cacheable_memzero(Hash, Hash_size);
+	memset(Hash, 0, Hash_size);
 	_SDR1 = __pa(Hash) | SDR1_LOW_BITS;
 
 	Hash_end = (struct hash_pte *) ((unsigned long)Hash + Hash_size);
diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index ed79b2d..be214ad 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -77,13 +77,6 @@ MODULE_AUTHOR
     ("Eugene Surovegin <eugene.surovegin@zultys.com> or <ebs@ebshome.net>");
 MODULE_LICENSE("GPL");
 
-/*
- * PPC64 doesn't (yet) have a cacheable_memcpy
- */
-#ifdef CONFIG_PPC64
-#define cacheable_memcpy(d,s,n) memcpy((d),(s),(n))
-#endif
-
 /* minimum number of free TX descriptors required to wake up TX process */
 #define EMAC_TX_WAKEUP_THRESH		(NUM_TX_BUFF / 4)
 
@@ -1637,7 +1630,7 @@ static inline int emac_rx_sg_append(struct emac_instance *dev, int slot)
 			dev_kfree_skb(dev->rx_sg_skb);
 			dev->rx_sg_skb = NULL;
 		} else {
-			cacheable_memcpy(skb_tail_pointer(dev->rx_sg_skb),
+			memcpy(skb_tail_pointer(dev->rx_sg_skb),
 					 dev->rx_skb[slot]->data, len);
 			skb_put(dev->rx_sg_skb, len);
 			emac_recycle_rx_skb(dev, slot, len);
@@ -1694,8 +1687,7 @@ static int emac_poll_rx(void *param, int budget)
 				goto oom;
 
 			skb_reserve(copy_skb, EMAC_RX_SKB_HEADROOM + 2);
-			cacheable_memcpy(copy_skb->data - 2, skb->data - 2,
-					 len + 2);
+			memcpy(copy_skb->data - 2, skb->data - 2, len + 2);
 			emac_recycle_rx_skb(dev, slot, len);
 			skb = copy_skb;
 		} else if (unlikely(emac_alloc_rx_skb(dev, slot, GFP_ATOMIC)))
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: David Miller @ 2011-11-15  2:41 UTC (permalink / raw)
  To: grundler; +Cc: netdev, linux-kernel, allan, freddy
In-Reply-To: <20111114.213255.2061756136146824209.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 14 Nov 2011 21:32:55 -0500 (EST)

> From: Grant Grundler <grundler@chromium.org>
> Date: Mon, 14 Nov 2011 16:54:35 -0800
> 
>> On Mon, Nov 14, 2011 at 3:42 PM, David Miller <davem@davemloft.net> wrote:
>> ...
>>> Please, just submit this without the whitespace and other cleanup
>>> stuff.  Restrict it to real bug fixes, and I'll push it around.
>> 
>> I just sent a "v2" of 5/5 patch which only has the VERSION change. Is
>> that sufficient or should I resend all 5?
> 
> Just respinning #5 is fine, thanks.

But the fact that the other bits don't even compile... that's not fine.

drivers/net/usb/asix.c: In function ‘ax88178_bind’:
drivers/net/usb/asix.c:1463:2: error: implicit declaration of function ‘ax88178_sw_reset’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors

[davem@boricha net]$ git grep ax88178_sw_reset
drivers/net/usb/asix.c:	ax88178_sw_reset(dev, 0);

Come on man... are you kidding me?

^ permalink raw reply

* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: David Miller @ 2011-11-15  2:45 UTC (permalink / raw)
  To: grundler; +Cc: netdev, linux-kernel, allan, freddy, kernel
In-Reply-To: <20111114.214151.1988775203701808195.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 14 Nov 2011 21:41:51 -0500 (EST)

> Come on man... are you kidding me?

Want to know what really pisses me off about this?

All of Mark Lord's hard work to bring the entire vendor driver over
was thrown out.

And it was thrown out in favor of this!  Code that doesn't even
compile.

^ permalink raw reply

* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Krishna Kumar2 @ 2011-11-15  4:44 UTC (permalink / raw)
  To: Sasha Levin
  Cc: penberg, kvm, Michael S. Tsirkin, Asias He, virtualization,
	gorcunov, netdev, mingo
In-Reply-To: <1321265740.2425.7.camel@sasha>

Sasha Levin <levinsasha928@gmail.com> wrote on 11/14/2011 03:45:40 PM:

> > Why both the bandwidth and latency performance are dropping so
> > dramatically with multiple VQ?
>
> It looks like theres no hash sync between host and guest, which makes
> the RX VQ change for every packet. This is my guess.

Yes, I confirmed this happens for macvtap. I am
using ixgbe - it calls skb_record_rx_queue when
a skb is allocated, but sets rxhash when a packet
arrives. Macvtap is relying on record_rx_queue
first ahead of rxhash (as part of my patch making
macvtap multiqueue), hence different skbs result
in macvtap selecting different vq's.

Reordering macvtap to use rxhash first results in
all packets going to the same VQ. The code snippet
is:

{
	...
	if (!numvtaps)
                goto out;

	rxq = skb_get_rxhash(skb);
	if (rxq) {
		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
		if (tap)
			goto out;
	}

	if (likely(skb_rx_queue_recorded(skb))) {
		rxq = skb_get_rx_queue(skb);

		while (unlikely(rxq >= numvtaps))
			rxq -= numvtaps;
			tap = rcu_dereference(vlan->taps[rxq]);
			if (tap)
				goto out;
	}
}

I will submit a patch for macvtap separately. I am working
towards the other issue pointed out - different vhost
threads handling rx/tx of a single flow.

thanks,

- KK

^ permalink raw reply

* Re: [PATCH] tcp: fixes for DSACK-based undo of cwnd reduction during fast recovery
From: Neal Cardwell @ 2011-11-15  5:00 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev, nanditad, ycheng, therbert
In-Reply-To: <alpine.DEB.2.00.1111150333360.24223@melkinpaasi.cs.helsinki.fi>

On Mon, Nov 14, 2011 at 9:05 PM, Ilpo Järvinen
<ilpo.jarvinen@helsinki.fi> wrote:
> On Mon, 14 Nov 2011, Neal Cardwell wrote:
>
>> On Mon, Nov 14, 2011 at 3:23 PM, David Miller <davem@davemloft.net> wrote:
>> > From: Neal Cardwell <ncardwell@google.com>
>> > Date: Tue,  8 Nov 2011 13:07:11 -0500
>> >
>> >> (2)
>
> When you need to say "fixes 1), 2), and 3)", please stop for a moment
> thinking if the fixes really need to be sent as one fix or 3 fixes (a
> series PATCH x/3, see netdev for examples). It would make review much
> simpler if unrelated fixes are not put together even if all of them
> would be needed to get DSACK working as intented.

I'm happy to split this patch into a few smaller patches. I was on the
fence about how fine-grained to make them; I'll make them smaller.

>> >>  When the ACK field is below snd_una (the "old_ack" goto label),
>> >> process any DSACKs and try to send out more packets based on
>> >> newly-SACKed packets.
>> >
>> > This seems dubious.
>
> Indeed. However, there's nothing wrong in the processing of DSACKs nor
> SACKs in the given case but we lack call to tcp_fastretrans_alert. But the
> commit message does not say that, in fact, I failed to understand what 2)
> tried to say as I knew that DSACK and SACK processing itself is ok and
> done (it also mixes "DSACKs"/"SACKed" illogically). This commit message
> must be fixed so that guys who read it later won't get confused too. BUT
> I'd on the same time split the whole fix to 3 different patches unless
> there's some very good reason why the changes are interlocked so that the
> splitting is not possible.

I will attempt to make the commit messages more clear as I split up
the patch into smaller patches.

>> > An older ACK should not have more uptodate SACK information than a newer
>> > ACK.
>
> This is not true with DSACK that are not cumulative. And also the
> reported SACK state is cumulative up to a limit.
>
>> I grant that it is a rare corner case, but it is possible if there is
>> reordering and packet loss in both directions.
>>
>> In fact, AFAICT the code path to handle this corner case is already
>> there: AFAICT we call tcp_sacktag_write_queue() in the old_ack code
>> path exactly because it is possible for old ACKs to carry SACK ranges
>> that we haven't seen yet.
>
> I agree. ...And also DSACKs that can only be seen in that particular ACK.
>
>> Let me try to lay out a specific example:
>>
>> Suppose a sender sends packets 1-12, and there is lots of loss and
>> reordering in both directions.
>>
>> Suppose the receiver sends the following ACKs with SACK blocks, in the
>> following order (using packet numbers rather than sequence numbers):
>>
>> (1) ack 1, sack <3-4>
>> (2) ack 1, sack <3-4 6>
>> (3) ack 1, sack <3-4 6 8>
>> (4) ack 1, sack <3-4 6 8 10>
>> (5) ack 1, sack <6 8 10 12>
>> (6) ack 2, sack <6 8 10 12>
>>
>> Note that starting at ACK (5) the receiver is limited by the fact that
>> TCP options can only hold 4 SACK ranges, so the SACK for 3-4 "falls
>> off" the SACK block set.
>>
>> Now suppose the ACKs are reordered, and some are dropped, so that the
>> sender receives the ACKs in this order:
>>
>> (6) ack 2, sack <6 8 10 12>
>> (1) ack 1, sack <3-4>
>>
>> The arrival of (6) advances snd_una to 2. Now the question is what to
>> do when (1), with an ACK field below snd_una, arrives at the
>> sender. The existing code tags packets 3-4 as SACKed, but does not use
>> this newly SACK-tagged data to send out new data. This commit proposes
>> to use the newly SACK-tagged 3-4 to send out new data.
>
> Contrary to your commit message, new data sending is already done outside
> of tcp_ack() in tcp_data_snd_check?

If my commit message seemed to be talking about sending new data, then
it was only because of a lack of clarity. Where the commit message
referred to "more packets" I meant either new/untransmitted data or
retransmitted data. I will try to make this more clear.

> ...It's tcp_fastretrans_alert call
> that is lacking so no rexmits could be done currently?

Yes, in terms of missing some opportunities for SACKs to trigger
sends, the tcp_fastretrans_alert call is basically all that's lacking.

> In addition, this given scenarios was not DSACK related!?! I think your
> commit message is just bogus discussing (only) DSACK processing for this
> case.

Case 2)/(2) in the commit message does mention both DSACK and SACK
processing for this case. Or maybe you are referring to one of the
other cases? I don' think SACK processing applies to the other cases.

In any case, I'm hoping this will all be more clear once I split up
the patch into smaller pieces, and do a little wordsmithing on the
commit messages. I'll be sending this out soon.

Thanks to both of you for taking a look at this.

neal

^ permalink raw reply

* Re: [PATCH] net: fsl_pq_mdio: fix non tbi phy access
From: Baruch Siach @ 2011-11-15  5:17 UTC (permalink / raw)
  To: Fleming Andy-AFLEMING; +Cc: netdev@vger.kernel.org, linuxppc-dev
In-Reply-To: <C89A2124-7AB4-45B6-ACD1-8C5936027531@freescale.com>

Hi Andy,

On Mon, Nov 14, 2011 at 09:04:47PM +0000, Fleming Andy-AFLEMING wrote:
> Well, this got applied quickly, so I guess I can't NAK, but this requires discussion.
> 
> On Nov 14, 2011, at 0:22, "Baruch Siach" <baruch@tkos.co.il> wrote:
> 
> > Since 952c5ca1 (fsl_pq_mdio: Clean up tbi address configuration) .probe returns
> > -EBUSY when the "tbi-phy" node is missing. Fix this.
> 
> It returns an error because it finds no tbi node. Because without the tbi 
> node, there is no way for the driver to determine which address to set.
> 
> Your solution is to ignore the error, and hope. That's a broken approach.  
> The real solution for a p1010 should be to have a tbi node in the dts.

Can you elaborate a bit on why this approach is broken? The PHY used to work 
for me until 952c5ca1, and with this applied.

> And looking at the p1010si.dtsi, I see that it's automatically there for 
> you.
> 
> How were you breaking?

Adding linuxppc to Cc.

My board is P1011 based, the single core version of P1020, not P1010. In 
p1020si.dtsi I see no tbi node. In p1020rdb.dts I see a tbi node but only for 
mdio@25000, not mdio@24000, which is what I'm using.

Am I missing something?

baruch

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply

* Re: [PATCH] ipv4: fix for ip_options_rcv_srr() daddr update.
From: Li Wei @ 2011-11-15  5:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111109.155936.1972852290616131285.davem@davemloft.net>

> From: Li Wei <lw@cn.fujitsu.com>
> Date: Wed, 09 Nov 2011 15:39:28 +0800
> 
>> When opt->srr_is_hit is set skb_rtable(skb) has been updated for
>> 'nexthop' and iph->daddr should always equals to skb_rtable->rt_dst
>> holds, We need update iph->daddr either.
>>
>> Signed-off-by: Li Wei <lw@cn.fujitsu.com>
> 
> Applied, thank you.
> 
> 

Hi, David

These days i am doing some ICMP tests and sadly found that wen can't update
iph->daddr in ip_options_rcv_srr(), It's too early. When some exception
ocurred later (eg. in ip_forward() when goto sr_failed) we need the ip header
be identical to the original one as ICMP need it.

It seems we need rt->rt_dst in ip_forward_options() and ip_forward() instead
of iph->daddr.

^ permalink raw reply

* Re: [PATCH] tcp: fixes for DSACK-based undo of cwnd reduction during fast recovery
From: Ilpo Järvinen @ 2011-11-15  6:13 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, Netdev, nanditad, ycheng, therbert
In-Reply-To: <CADVnQykYW1kBopj2qpy8gt4gWejn+2k2YJg-UDHZ8uUxBXw5Zw@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1232 bytes --]

On Tue, 15 Nov 2011, Neal Cardwell wrote:
> On Mon, Nov 14, 2011 at 9:05 PM, Ilpo Järvinen
> <ilpo.jarvinen@helsinki.fi> wrote:
> >
> > When you need to say "fixes 1), 2), and 3)", please stop for a moment
> > thinking if the fixes really need to be sent as one fix or 3 fixes (a
> > series PATCH x/3, see netdev for examples). It would make review much
> > simpler if unrelated fixes are not put together even if all of them
> > would be needed to get DSACK working as intented.
> 
> I'm happy to split this patch into a few smaller patches. I was on the
> fence about how fine-grained to make them; I'll make them smaller.

Thanks. In fixes smaller is almost always better. You can then state per 
fix very exactly what is wrong in the code and in what scenario that 
causes problem to materialize for real. Smallness helps the review 
enourmously and all questions raised against can then also remain more 
focused (possibly the rest could be acked/applied immediately, which 
happens quite often actually). You will be surprised yourself too then, 
once you've split each patch becomes very obvious and short (and then I'll 
be more lax about the commit message too as the diff speaks more 
succesfully for itself ;-)).

-- 
 i.

^ permalink raw reply

* Re: [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink
From: Maciej Żenczykowski @ 2011-11-15  6:13 UTC (permalink / raw)
  To: David Miller
  Cc: Linux NetDev, MuraliRaja Muniraju, Stephen Hemminger,
	Eric Dumazet
In-Reply-To: <20111114.010752.2129864130433732501.davem@davemloft.net>

2011/11/13 David Miller <davem@davemloft.net>:
>>>> Yes, but an ipv6 socket is permitted to setsockopt SOL_IP in addition to SOL_IPV6.
>>>> As such, one can simply setsockopt(ipv6_socket, SOL_IP, IP_TOS, value).
>>>
>>> That's my whole point, this operation isn't currently possible, the
>>> socket ops for ipv4 setsockopt() aren't hooked up to ipv6 mapped
>>> sockets in the kernel right now.
>>
>> http://lxr.linux.no/linux+v3.1/net/ipv6/ipv6_sockglue.c#L822
>
> That definitely destroys the basis for all of my arguments :-)
>
> I'm going to add the TCLASS inet_diag patch, but can you find a cleaner
> way to make sure that all types of sockets have the value initialized
> to something sane?  The one you posted wasn't... optimal :-)
>
> Thanks.

I only realized just now that I had replied only to you and not to all.
I hope you don't mind that I'm adding everyone back into the loop.

(a) if we keep tos/tclass distinct.

We should make sure to get this patch (and the as yet unwritten
followup) into 3.2-final and not 3.3-rc1 (ie. net/master and not
net-next/master).
Since it does slightly change userspace visible API (thankfully, it
was just added so it's not yet too late to change it).
[FYI, I already see this patch in net/master - thanks!]

As for the followup, I can see three ways to proceed.
- The first involves figuring out what type of socket we're dealing
with when we look at it (ie. something like the RFC patch I posted - I
don't think it's going to be much prettier no matter what one does -
there's simply a lot of complexity...)
- The second involves keeping some sort of type-of-ip-socket
field/bitmap (none, ipv4 only, ipv4/ipv6 dual stack, ipv6 only)
directly in the socket.  This is probably much harder to get right
(changes all over the stack), but may have long term benefits???
- The third involves punting on the problem, we handle the simple
cases in kernel and sometimes return excess information to userspace
(ie. we take the RFC patch and strip out the address comparisons and
error on the side of returning 'true') -> it's userspace's problem to
sift through the excess garbage.

(b) personally I would still prefer to just merge tos and tclass and
get rid of the distinction... is this minor-but-userspace-visible-api
change out of the question?

- Maciej

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox