Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-2.6 PATCH] ixgbe: add IntMode module parameter
From: David Miller @ 2010-04-19  5:09 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, nicholasx.d.nunley, john.ronciak
In-Reply-To: <20100419050315.24758.62479.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Sun, 18 Apr 2010 22:03:59 -0700

> This patch adds the IntMode module parameter to ixgbe to allow
> user selection of interrupt mode (MSI-X, MSI, legacy) on driver
> load.  To work around the errata described above, ixgbe needs to
> be able to disable MSI-X for affected configurations.

This is not how we handle situations like this.

Here is one acceptable way to handle this:

Turn MSI/MSI-X off for every system that uses this PCI-E
complex chipset.  Add a whitelist that allows enabling.

Here is another:

Turn MSI/MSI-X on by default and have blacklists.

Anything that requires unusual user interactions, such as specifying
special module parameters, for correct operations is absolutely
unacceptable.

I don't want to hear how difficult it is to determine whether a
system will hit this bug or not.  If it's hard, just turn MSI
off unconditionally with this chipset until you can detect things
properly.

^ permalink raw reply

* [net-2.6 PATCH] ixgbe: add IntMode module parameter
From: Jeff Kirsher @ 2010-04-19  5:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Nicholas Nunley, John Ronciak, Jeff Kirsher

From: Nick Nunley <nicholasx.d.nunley@intel.com>

The 82598 has an erratum where some system designs can increase the
likelihood of a system hang due to a PCIe resource contention between the
82598 and another device in legacy INTx mode.

This issue can happen in any number of configurations with the 82598
and another device in legacy interrupt mode.  While we know of a
specific case where this is happening, this can happen with other
devices as well if configured in a certain way depending on the
system's slot configuration among other things.  This is alos why we
need to have the flexibility to disable MSI-X interrupts for the 82598
when these configurations are found having this deadlock situation.

The errata being worked - around with this patch - has to do with
any type of device that has a pending legacy INTx to the IOH's
internal IOAPIC which has priority from the shared resources
(round-robin to ensure forward progress / fairness).  The internal
IOAPIC is a PCI-E endpoint hence transactions to it are viewed as
peer-to-peer from a shared resource perspective.  To make forward
progress on the legacy INTx, the outbound completion FIFOs for the
IOU (resource pool1) are checked for available space due to shared
resources.  The IOH is allowed to have in/out dependency as a root
is full and not making forward progress because the 82598 has entered
its "erratum window"  where the 82598 is throttling the release of posted
credits back to the IOH as it has pending MSI-X interrupts to the IOH.
The 82598 will only release the posted credits back to IOH after the
inbound posted writes are serviced and the IOH releases the posted
credits back to the 82598 (i.e., the 82598 in/out dependency
erratum).  Since the priority is on the other device's legacy INTx,
the 82598 inbound posted transaction cannot make forward progress
that results in a link/forward progress deadlock condition.

The 82598 Specification Update has the errata list and can be found
here, it's erratum 38:
http://www.intel.com/design/network/specupdt/321040.pdf

This patch adds the IntMode module parameter to ixgbe to allow
user selection of interrupt mode (MSI-X, MSI, legacy) on driver
load.  To work around the errata described above, ixgbe needs to
be able to disable MSI-X for affected configurations.

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/Makefile      |    2 -
 drivers/net/ixgbe/ixgbe.h       |    1 
 drivers/net/ixgbe/ixgbe_main.c  |   26 ++++---
 drivers/net/ixgbe/ixgbe_param.c |  153 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 172 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_param.c

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index 8f81efb..820491f 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
               ixgbe_82599.o ixgbe_82598.o ixgbe_phy.o ixgbe_sriov.o \
-              ixgbe_mbx.o
+              ixgbe_mbx.o ixgbe_param.o
 
 ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
                               ixgbe_dcb_82599.o ixgbe_dcb_nl.o
diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index 79c35ae..106f197 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -436,6 +436,7 @@ extern int ixgbe_copy_dcb_cfg(struct ixgbe_dcb_config *src_dcb_cfg,
 extern char ixgbe_driver_name[];
 extern const char ixgbe_driver_version[];
 
+extern void ixgbe_check_options(struct ixgbe_adapter *adapter);
 extern int ixgbe_up(struct ixgbe_adapter *adapter);
 extern void ixgbe_down(struct ixgbe_adapter *adapter);
 extern void ixgbe_reinit_locked(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a98ff0e..d04e095 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -3949,6 +3949,9 @@ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 	int err = 0;
 	int vector, v_budget;
 
+	if (!(adapter->flags & IXGBE_FLAG_MSIX_CAPABLE))
+		goto try_msi;
+
 	/*
 	 * It's easy to be greedy for MSI-X vectors, but it really
 	 * doesn't do us much good if we have a lot more vectors
@@ -3981,16 +3984,10 @@ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 			goto out;
 	}
 
-	adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED;
-	adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED;
-	adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
-	adapter->flags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE;
-	adapter->atr_sample_rate = 0;
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
-		ixgbe_disable_sriov(adapter);
-
-	ixgbe_set_num_queues(adapter);
 
+try_msi:
+	if (!(adapter->flags & IXGBE_FLAG_MSI_CAPABLE))
+		goto legacy;
 	err = pci_enable_msi(adapter->pdev);
 	if (!err) {
 		adapter->flags |= IXGBE_FLAG_MSI_ENABLED;
@@ -4000,7 +3997,16 @@ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 		/* reset err */
 		err = 0;
 	}
+legacy:
+	adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED;
+	adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED;
+	adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
+	adapter->flags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE;
+	adapter->atr_sample_rate = 0;
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		ixgbe_disable_sriov(adapter);
 
+	ixgbe_set_num_queues(adapter);
 out:
 	return err;
 }
@@ -6199,6 +6205,8 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 		goto err_sw_init;
 	}
 
+	ixgbe_check_options(adapter);
+
 	ixgbe_probe_vf(adapter, ii);
 
 	netdev->features = NETIF_F_SG |
diff --git a/drivers/net/ixgbe/ixgbe_param.c b/drivers/net/ixgbe/ixgbe_param.c
new file mode 100644
index 0000000..98c5626
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_param.c
@@ -0,0 +1,153 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2010 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#include <linux/types.h>
+#include <linux/module.h>
+
+#include "ixgbe.h"
+
+/* This is the only thing that needs to be changed to adjust the
+ * maximum number of ports that the driver can manage.
+ */
+
+#define IXGBE_MAX_NIC 32
+
+#define OPTION_UNSET    -1
+
+/* All parameters are treated the same, as an integer array of values.
+ * This macro just reduces the need to repeat the same declaration code
+ * over and over (plus this helps to avoid typo bugs).
+ */
+
+#define IXGBE_PARAM_INIT { [0 ... IXGBE_MAX_NIC] = OPTION_UNSET }
+#define IXGBE_PARAM(X, desc) \
+	static int __devinitdata X[IXGBE_MAX_NIC+1] = IXGBE_PARAM_INIT; \
+	static unsigned int num_##X; \
+	module_param_array_named(X, X, int, &num_##X, 0); \
+	MODULE_PARM_DESC(X, desc);
+
+/* IntMode (Interrupt Mode)
+ *
+ * Valid Range: 0-2
+ *  - 0 - Legacy Interrupt
+ *  - 1 - MSI Interrupt
+ *  - 2 - MSI-X Interrupt(s)
+ *
+ * Default Value: 2
+ */
+IXGBE_PARAM(IntMode, "Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default 2");
+#define IXGBE_INT_LEGACY		      0
+#define IXGBE_INT_MSI			      1
+#define IXGBE_INT_MSIX			      2
+#define IXGBE_DEFAULT_INT	 IXGBE_INT_MSIX
+
+struct ixgbe_option {
+	const char *name;
+	const char *err;
+	int def;
+	int min;
+	int max;
+};
+
+static int __devinit ixgbe_validate_option(unsigned int *value,
+					   struct ixgbe_option *opt,
+					   struct ixgbe_adapter *adapter)
+{
+	if (*value == OPTION_UNSET) {
+		*value = opt->def;
+		return 0;
+	}
+
+	if (*value >= opt->min && *value <= opt->max) {
+		dev_info(&adapter->pdev->dev,
+			 "%s set to %d\n", opt->name, *value);
+		return 0;
+	} else {
+		dev_info(&adapter->pdev->dev,
+			 "Invalid %s specified (%d),  %s\n",
+			 opt->name, *value, opt->err);
+		*value = opt->def;
+		return -1;
+	}
+}
+
+/**
+ * ixgbe_check_options - Range Checking for Command Line Parameters
+ * @adapter: board private structure
+ *
+ * This routine checks all command line parameters for valid user
+ * input.  If an invalid value is given, or if no user specified
+ * value exists, a default value is used.  The final value is stored
+ * in a variable in the adapter structure.
+ **/
+void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter)
+{
+	int bd = adapter->bd_number;
+	u32 *aflags = &adapter->flags;
+
+	if (bd >= IXGBE_MAX_NIC) {
+		dev_notice(&adapter->pdev->dev,
+			   "Warning: no configuration for board #%d\n", bd);
+		dev_notice(&adapter->pdev->dev,
+			   "Using defaults for all values\n");
+	}
+
+	{ /* Interrupt Mode */
+		unsigned int int_mode;
+		static struct ixgbe_option opt = {
+			.name = "Interrupt Mode",
+			.err =
+			  "using default of "__MODULE_STRING(IXGBE_DEFAULT_INT),
+			.def = IXGBE_DEFAULT_INT,
+			.min = IXGBE_INT_LEGACY,
+			.max = IXGBE_INT_MSIX
+		};
+
+		if (num_IntMode > bd) {
+			int_mode = IntMode[bd];
+			ixgbe_validate_option(&int_mode, &opt, adapter);
+			switch (int_mode) {
+			case IXGBE_INT_MSIX:
+				*aflags |= IXGBE_FLAG_MSIX_CAPABLE;
+				*aflags |= IXGBE_FLAG_MSI_CAPABLE;
+				break;
+			case IXGBE_INT_MSI:
+				*aflags &= ~IXGBE_FLAG_MSIX_CAPABLE;
+				*aflags |= IXGBE_FLAG_MSI_CAPABLE;
+				break;
+			case IXGBE_INT_LEGACY:
+			default:
+				*aflags &= ~IXGBE_FLAG_MSIX_CAPABLE;
+				*aflags &= ~IXGBE_FLAG_MSI_CAPABLE;
+				break;
+			}
+		} else {
+			*aflags |= IXGBE_FLAG_MSIX_CAPABLE;
+			*aflags |= IXGBE_FLAG_MSI_CAPABLE;
+		}
+	}
+}


^ permalink raw reply related

* [PATCH RFC: linux-next 2/2] ixgbe: Example usage of the new IRQ affinity_hint callback
From: Peter P Waskiewicz Jr @ 2010-04-19  4:58 UTC (permalink / raw)
  To: tglx, davem, arjan; +Cc: netdev, linux-kernel
In-Reply-To: <20100419045741.30276.23233.stgit@ppwaskie-hc2.jf.intel.com>

This patch uses the new IRQ affinity_hint callback mechanism.
It serves purely as an example of how a low-level driver can
utilize this new interface.

An official ixgbe patch will be pushed through netdev once the
IRQ patches have been accepted and merged.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 drivers/net/ixgbe/ixgbe.h      |    2 ++
 drivers/net/ixgbe/ixgbe_main.c |   51 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index 79c35ae..c220b9f 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -32,6 +32,7 @@
 #include <linux/pci.h>
 #include <linux/netdevice.h>
 #include <linux/aer.h>
+#include <linux/cpumask.h>
 
 #include "ixgbe_type.h"
 #include "ixgbe_common.h"
@@ -236,6 +237,7 @@ struct ixgbe_q_vector {
 	u8 tx_itr;
 	u8 rx_itr;
 	u32 eitr;
+	cpumask_var_t affinity_mask;
 };
 
 /* Helper macros to switch between ints/sec and what the register uses.
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 1b1419c..3e00d41 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1031,6 +1031,36 @@ next_desc:
 	return cleaned;
 }
 
+static unsigned int ixgbe_irq_affinity_callback(cpumask_var_t mask,
+                                                unsigned int irq, void *dev)
+{
+	struct ixgbe_adapter *adapter = (struct ixgbe_adapter *)dev;
+	int i, q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return -EINVAL;
+
+	/*
+	 * Loop through the msix_entries array, looking for the vector that
+	 * matches the irq passed to us.  Once we find it, use that index to
+	 * grab the corresponding q_vector (1 to 1 mapping), and grab the
+	 * cpumask from that q_vector.
+	 */
+	for (i = 0; i < q_vectors; i++)
+		if (adapter->msix_entries[i].vector == irq)
+			break;
+
+	if (i == q_vectors)
+		return -EINVAL;
+
+	if (adapter->q_vector[i]->affinity_mask)
+		cpumask_copy(mask, adapter->q_vector[i]->affinity_mask);
+	else
+		return -EINVAL;
+
+	return 0;
+}
+
 static int ixgbe_clean_rxonly(struct napi_struct *, int);
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
@@ -1083,6 +1113,16 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 			q_vector->eitr = adapter->rx_eitr_param;
 
 		ixgbe_write_eitr(q_vector);
+
+		/*
+		 * Allocate the affinity_hint cpumask, assign the mask for
+		 * this vector, and register our affinity_hint callback.
+		 */
+		alloc_cpumask_var(&q_vector->affinity_mask, GFP_KERNEL);
+		cpumask_set_cpu(v_idx, q_vector->affinity_mask);
+		irq_register_affinity_hint(adapter->msix_entries[v_idx].vector,
+		                           adapter,
+		                           &ixgbe_irq_affinity_callback);
 	}
 
 	if (adapter->hw.mac.type == ixgbe_mac_82598EB)
@@ -3218,7 +3258,7 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 rxctrl;
 	u32 txdctl;
-	int i, j;
+	int i, j, num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
 
 	/* signal that we are down to the interrupt handler */
 	set_bit(__IXGBE_DOWN, &adapter->state);
@@ -3251,6 +3291,14 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 
 	ixgbe_napi_disable_all(adapter);
 
+	for (i = 0; i < num_q_vectors; i++) {
+		struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
+		/* unregister the affinity_hint callback */
+		irq_unregister_affinity_hint(adapter->msix_entries[i].vector);
+		/* release the CPU mask memory */
+		free_cpumask_var(q_vector->affinity_mask);
+	}
+
 	clear_bit(__IXGBE_SFP_MODULE_NOT_FOUND, &adapter->state);
 	del_timer_sync(&adapter->sfp_timer);
 	del_timer_sync(&adapter->watchdog_timer);
@@ -4052,6 +4100,7 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[q_idx];
 		adapter->q_vector[q_idx] = NULL;
 		netif_napi_del(&q_vector->napi);
+		free_cpumask_var(q_vector->affinity_mask);
 		kfree(q_vector);
 	}
 }

^ permalink raw reply related

* [PATCH RFC: linux-next 1/2] irq: Add CPU mask affinity hint callback framework
From: Peter P Waskiewicz Jr @ 2010-04-19  4:57 UTC (permalink / raw)
  To: tglx, davem, arjan; +Cc: netdev, linux-kernel

This patch adds a callback function pointer to the irq_desc
structure, along with a registration function and a read-only
proc entry for each interrupt.

This affinity_hint handle for each interrupt can be used by
underlying drivers that need a better mechanism to control
interrupt affinity.  The underlying driver can register a
callback for the interrupt, which will allow the driver to
provide the CPU mask for the interrupt to anything that
requests it.  The intent is to extend the userspace daemon,
irqbalance, to help hint to it a preferred CPU mask to balance
the interrupt into.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 include/linux/interrupt.h |   27 +++++++++++++++++++++++++++
 include/linux/irq.h       |    1 +
 kernel/irq/manage.c       |   39 +++++++++++++++++++++++++++++++++++++++
 kernel/irq/proc.c         |   41 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 108 insertions(+), 0 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 75f3f00..f2a7d0b 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -78,6 +78,8 @@ enum {
 };
 
 typedef irqreturn_t (*irq_handler_t)(int, void *);
+typedef unsigned int (*irq_affinity_hint_t)(cpumask_var_t, unsigned int,
+                                            void *);
 
 /**
  * struct irqaction - per interrupt action descriptor
@@ -105,6 +107,18 @@ struct irqaction {
 	unsigned long thread_flags;
 };
 
+/**
+ * struct irqaffinityhint - per interrupt affinity helper
+ * @callback:	device driver callback function
+ * @dev:	reference for the affected device
+ * @irq:	interrupt number
+ */
+struct irqaffinityhint {
+	irq_affinity_hint_t callback;
+	void *dev;
+	int irq;
+};
+
 extern irqreturn_t no_action(int cpl, void *dev_id);
 
 #ifdef CONFIG_GENERIC_HARDIRQS
@@ -209,6 +223,9 @@ extern int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask);
 extern int irq_can_set_affinity(unsigned int irq);
 extern int irq_select_affinity(unsigned int irq);
 
+extern int irq_register_affinity_hint(unsigned int irq, void *dev,
+                                      irq_affinity_hint_t callback);
+extern int irq_unregister_affinity_hint(unsigned int irq);
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
@@ -223,6 +240,16 @@ static inline int irq_can_set_affinity(unsigned int irq)
 
 static inline int irq_select_affinity(unsigned int irq)  { return 0; }
 
+static inline int irq_register_affinity_hint(unsigned int irq, void *dev,
+                                             irq_affinity_hint_t callback)
+{
+	return -EINVAL;
+}
+
+static inline int irq_unregister_affinity_hint(unsigned int irq);
+{
+	return -EINVAL;
+}
 #endif /* CONFIG_SMP && CONFIG_GENERIC_HARDIRQS */
 
 #ifdef CONFIG_GENERIC_HARDIRQS
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 707ab12..bd73e9b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -206,6 +206,7 @@ struct irq_desc {
 	struct proc_dir_entry	*dir;
 #endif
 	const char		*name;
+	struct irqaffinityhint	*hint;
 } ____cacheline_internodealigned_in_smp;
 
 extern void arch_init_copy_chip_data(struct irq_desc *old_desc,
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 704e488..3674b6a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -138,6 +138,42 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 	return 0;
 }
 
+int irq_register_affinity_hint(unsigned int irq, void *dev,
+                               irq_affinity_hint_t callback)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+
+	if (!desc->hint)
+		desc->hint = kmalloc(sizeof(struct irqaffinityhint),
+		                     GFP_KERNEL);
+	desc->hint->callback = callback;
+	desc->hint->dev = dev;
+	desc->hint->irq = irq;
+
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL(irq_register_affinity_hint);
+
+int irq_unregister_affinity_hint(unsigned int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+
+	kfree(desc->hint);
+	desc->hint = NULL;
+
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL(irq_unregister_affinity_hint);
 #ifndef CONFIG_AUTO_IRQ_AFFINITY
 /*
  * Generic version of the affinity autoselector.
@@ -916,6 +952,9 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id)
 			desc->chip->disable(irq);
 	}
 
+	/* make sure affinity_hint callback is cleaned up */
+	kfree(desc->hint);
+
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 
 	unregister_handler_proc(irq, action);
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 7a6eb04..59110a3 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -32,6 +32,23 @@ static int irq_affinity_proc_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+static int irq_affinity_hint_proc_show(struct seq_file *m, void *v)
+{
+	struct irq_desc *desc = irq_to_desc((long)m->private);
+	struct cpumask mask;
+	unsigned int ret = 0;
+
+	if (desc->hint && desc->hint->callback) {
+		ret = desc->hint->callback(&mask, (long)m->private,
+		                           desc->hint->dev);
+		if (!ret)
+			seq_cpumask(m, &mask);
+	}
+
+	seq_putc(m, '\n');
+	return ret;
+}
+
 #ifndef is_affinity_mask_valid
 #define is_affinity_mask_valid(val) 1
 #endif
@@ -79,11 +96,23 @@ free_cpumask:
 	return err;
 }
 
+static ssize_t irq_affinity_hint_proc_write(struct file *file,
+		const char __user *buffer, size_t count, loff_t *pos)
+{
+	/* affinity_hint is read-only from proc */
+	return -EOPNOTSUPP;
+}
+
 static int irq_affinity_proc_open(struct inode *inode, struct file *file)
 {
 	return single_open(file, irq_affinity_proc_show, PDE(inode)->data);
 }
 
+static int irq_affinity_hint_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, irq_affinity_hint_proc_show, PDE(inode)->data);
+}
+
 static const struct file_operations irq_affinity_proc_fops = {
 	.open		= irq_affinity_proc_open,
 	.read		= seq_read,
@@ -92,6 +121,14 @@ static const struct file_operations irq_affinity_proc_fops = {
 	.write		= irq_affinity_proc_write,
 };
 
+static const struct file_operations irq_affinity_hint_proc_fops = {
+	.open		= irq_affinity_hint_proc_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+	.write		= irq_affinity_hint_proc_write,
+};
+
 static int default_affinity_show(struct seq_file *m, void *v)
 {
 	seq_cpumask(m, irq_default_affinity);
@@ -231,6 +268,10 @@ void register_irq_proc(unsigned int irq, struct irq_desc *desc)
 	/* create /proc/irq/<irq>/smp_affinity */
 	proc_create_data("smp_affinity", 0600, desc->dir,
 			 &irq_affinity_proc_fops, (void *)(long)irq);
+
+	/* create /proc/irq/<irq>/affinity_hint */
+	proc_create_data("affinity_hint", 0400, desc->dir,
+			 &irq_affinity_hint_proc_fops, (void *)(long)irq);
 #endif
 
 	proc_create_data("spurious", 0444, desc->dir,


^ permalink raw reply related

* Re: [PATCH] ks8842: Add module param for setting mac address
From: David Miller @ 2010-04-19  4:18 UTC (permalink / raw)
  To: richard.rojfors; +Cc: netdev
In-Reply-To: <1271611557.24099.11.camel@debian>

From: Richard Röjfors <richard.rojfors@pelagicore.com>
Date: Sun, 18 Apr 2010 19:25:57 +0200

> This patch adds a module parameter for setting the MAC address.
> 
> To ensure this MAC address is used, the MAC address is written
> after each hardware reset.
> 
> Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>

Your driver isn't unique, or special, and it doesn't need
to do special things to handle this situation.

Either probe the information from somewhere, or use a random
MAC address.

I'm not applying this.

^ permalink raw reply

* Re: [PATCH 10/13] bnx2x: Set default value of num_queues to 1 on 32-bit platforms
From: David Miller @ 2010-04-19  4:10 UTC (permalink / raw)
  To: vladz; +Cc: eilong, netdev, dmitry
In-Reply-To: <1271602224.27235.199.camel@lb-tlvb-vladz>

From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Sun, 18 Apr 2010 17:50:24 +0300

> The default was changed to save memory on 32bits systems.
> 
> Author: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

This is absolutely rediculious.

There is no reason in the world to restrict multiqueue to only 64-bit
systems.

If there is a memory allocation issue, fix that!  Is it vmalloc() space
usage?

You don't even describe what the hell the problem is, so nobody can
figure out _WHY_ you're making this change.  That's so incredibly
frustrating because it makes it impossible for anyone to sanely
evaluate the patch.

The more I read of this patch series, the more I am incredibly
disappointed with the poor quality of these changes.

This is the worst patch series submitted by Broadcom engieers,
ever!

^ permalink raw reply

* Re: [PATCH 0/13] bnx2x: Buf fixes and enhancements
From: David Miller @ 2010-04-19  4:07 UTC (permalink / raw)
  To: vladz; +Cc: netdev, eilong, dmitry, yanivr
In-Reply-To: <1271602046.27235.165.camel@lb-tlvb-vladz>

From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Sun, 18 Apr 2010 17:47:26 +0300

> Dave, hi.
> Pls., apply the following series of patches.
> It includes a few bug fixes and enhancements.

Several of these patches are inappropriate or need corrections, and
you also have not specified where you want these patches applied
(net-2.6 or net-next-2.6)

You also misspelled "bug" in the subject line here, which to me is yet
another sign that you put very little care into this patch series.  It
feels very "rushed" and "reactionary" rather than a well thought out
set of changes.

You really need to fix all of these issues, do the necessary
research, take your time, and resubmit a higher quality set
of well documented changes.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/13] bnx2x: Use VPD-R V0 entry to display firmware revision
From: David Miller @ 2010-04-19  4:00 UTC (permalink / raw)
  To: bhutchings; +Cc: vladz, netdev, eilong, dmitry, mcarlson
In-Reply-To: <1271603308.3679.208.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Sun, 18 Apr 2010 16:08:28 +0100

> On Sun, 2010-04-18 at 17:48 +0300, Vladislav Zolotarov wrote:
>> Use VPD-R V0 entry to display firmware revision
> [...]
> 
> Matt Carlson already added VPD support functions and definitions to the
> PCI core; you should use them.

Agreed.

Really disappointed that you guy's can't keep track of generic
infrastructure created by co-workers let alone other people
in the networking developer community.

Judging by this and the LRO disabling change, I can only
come to the conclusion that you do development in something
similar to a bubble and are impervious to what's going on
outside of your domain :-)

^ permalink raw reply

* Re: [PATCH 11/13] bnx2x: Forbid LRO on when XEN is enabled
From: David Miller @ 2010-04-19  3:58 UTC (permalink / raw)
  To: bhutchings; +Cc: vladz, eilong, netdev, dmitry
In-Reply-To: <1271603485.3679.210.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Sun, 18 Apr 2010 16:11:25 +0100

> On Sun, 2010-04-18 at 17:50 +0300, Vladislav Zolotarov wrote:
>> LRO cannot be used on virtualized environment, so it is best
>> to disable it on compile time in XEN kernel.
> [...]
> 
> If you want to cripple your performance, I'm not going to stand in your
> way, but be aware that most distribution kernels now enable CONFIG_XEN.

There is also zero reason to do this.

When the XEN code or whatever setups up the bridge, LRO will be
disabled by the bridge code.

It automatically issues ethtool LRO disable commands to the devices
involved in the bridge configuration.  And it has done this for
a _long_ time.  The ipv4 routing code will do the same.

This change is bogus and likely based upon behavior in some ancient
distribution kernel that lacks the ethtool facilities.

I absolutely refuse to apply this.

^ permalink raw reply

* Re: [PATCH BUG-FIX] ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than  IPV6_MIN_MTU
From: Herbert Xu @ 2010-04-19  3:55 UTC (permalink / raw)
  To: Shan Wei
  Cc: David Miller, yoshfuji@linux-ipv6.org >> YOSHIFUJI Hideaki,
	魏勇军, vladislav.yasevich, kuznet, pekkas,
	jmorris, Patrick McHardy, eric.dumazet, sri,
	netdev@vger.kernel.org, linux-sctp
In-Reply-To: <4BCBC6CE.9020302@cn.fujitsu.com>

On Mon, Apr 19, 2010 at 10:58:22AM +0800, Shan Wei wrote:
> 
> According to RFC2460, PMTU is set to the IPv6 Minimum Link
> MTU (1280) and a fragment header should always be included
> after a node receiving Too Big message reporting PMTU is
> less than the IPv6 Minimum Link MTU.
> 
> After receiving a ICMPv6 Too Big message reporting PMTU is
> less than the IPv6 Minimum Link MTU, sctp *can't* send any
> data/control chunk that total length including IPv6 head 
> and IPv6 extend head is less than IPV6_MIN_MTU(1280 bytes).
> 
> The failure occured in p6_fragment(), about reason 
> see following(take SHUTDOWN chunk for example):
> sctp_packet_transmit (SHUTDOWN chunk, len=16 byte)
> |------sctp_v6_xmit (local_df=0)
>    |------ip6_xmit
>        |------ip6_output (dst_allfrag is ture)
>            |------ip6_fragment
> 
> In ip6_fragment(), for local_df=0, drops the the packet
> and returns EMSGSIZE.
> 
> The patch fixes it with adding check length of skb->len.
> In this case, Ipv6 not to fragment upper protocol data,
> just only add a fragment header before it. 
> 
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>

The patch looks good to me.

If we wanted to optimise the allfrags case it may be better
to reserve the space beforehand and generate the fragment header
at the same time as we're doing the IPv6 header.

But it can't be all that important as it's been broken for so
many years.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH BUG-FIX] ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than  IPV6_MIN_MTU
From: Shan Wei @ 2010-04-19  2:58 UTC (permalink / raw)
  To: David Miller, yoshfuji@linux-ipv6.org >> YOSHIFUJI Hideaki,
	魏勇军, vladislav.yasevich
  Cc: kuznet, pekkas, jmorris, Patrick McHardy, eric.dumazet, Shan Wei,
	sri, Herbert Xu, netdev@vger.kernel.org, linux-sctp

According to RFC2460, PMTU is set to the IPv6 Minimum Link
MTU (1280) and a fragment header should always be included
after a node receiving Too Big message reporting PMTU is
less than the IPv6 Minimum Link MTU.

After receiving a ICMPv6 Too Big message reporting PMTU is
less than the IPv6 Minimum Link MTU, sctp *can't* send any
data/control chunk that total length including IPv6 head 
and IPv6 extend head is less than IPV6_MIN_MTU(1280 bytes).

The failure occured in p6_fragment(), about reason 
see following(take SHUTDOWN chunk for example):
sctp_packet_transmit (SHUTDOWN chunk, len=16 byte)
|------sctp_v6_xmit (local_df=0)
   |------ip6_xmit
       |------ip6_output (dst_allfrag is ture)
           |------ip6_fragment

In ip6_fragment(), for local_df=0, drops the the packet
and returns EMSGSIZE.

The patch fixes it with adding check length of skb->len.
In this case, Ipv6 not to fragment upper protocol data,
just only add a fragment header before it. 

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
 net/ipv6/ip6_output.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 5129a16..c3edb6c 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -625,7 +625,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 	/* We must not fragment if the socket is set to force MTU discovery
 	 * or if the skb it not generated by a local socket.
 	 */
-	if (!skb->local_df) {
+	if (!skb->local_df && skb->len > mtu) {
 		skb->dev = skb_dst(skb)->dev;
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
--
1.6.3.3 

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-19  2:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <1271590476.16881.4925.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

Thanks Eric. I tried to visualize your results - attached.
There are 2-3 odd numbers (labelled with *) but other
than that results are as expected...

I did run some experiments with some udp sink server
and i saw the IPIs amortized; unfortunately sky2 h/ware 
proved to be bottleneck (at > 750Kpps incoming, it started 
dropping and wasnt recording the drops, so i had to slow things down). I
need to digest my results a little more - but it seems i was getting
better throughput results with RPS (i.e it was able to sink
more packets)..

cheers,
jamal

[-- Attachment #2: erichw.pdf --]
[-- Type: application/pdf, Size: 187023 bytes --]

^ permalink raw reply

* Re: [PATCH 1/6] X25: Use identifiers for X25 to device interface
From: andrew hendry @ 2010-04-18 23:51 UTC (permalink / raw)
  To: John Hughes; +Cc: netdev
In-Reply-To: <4BCAE84A.2030701@Calva.COM>

Hi John,

Thanks, missed that, ill see if any of the driver areas have comments
and make a V2 patch with explicit values.

Your right, it would be nice if xotd/xoe could pickup the same
definition from user space.
include/net/x25device.h is not intended as a user space header.
include/linux/net/x25.h is the user space header for the x25 socket
layer interface, so I don't think these definitions should go there.

I think the right way to do it is add a new userspace header
include/linux/if_x25.h then include that from net/x25_device.h?

Regards,
Andrew.

On Sun, Apr 18, 2010 at 9:08 PM, John Hughes <john@calva.com> wrote:
> John Hughes wrote:
>>
>> Shouldn't you use explicit values here?
>>
>> enum {
>>    X25_IFACE_DATA = 0x00, /* explicit value for ABI stability */
>>    ...
>
> Oh, and is net/x25device.h suitable for inclusion from user space (xotd for
> example)?
>
>

^ permalink raw reply

* [PATCH] gianfar: Wait for both RX and TX to stop
From: Andy Fleming @ 2010-04-18 23:13 UTC (permalink / raw)
  To: davem; +Cc: netdev

When gracefully stopping the controller, the driver was continuing if
*either* RX or TX had stopped.  We need to wait for both, or the
controller could get into an invalid state.

Signed-off-by: Andy Fleming <afleming@freescale.com>
---
 drivers/net/gianfar.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 032073d..6038397 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1571,8 +1571,9 @@ static void gfar_halt_nodisable(struct net_device *dev)
 		tempval |= (DMACTRL_GRS | DMACTRL_GTS);
 		gfar_write(&regs->dmactrl, tempval);
 
-		while (!(gfar_read(&regs->ievent) &
-			 (IEVENT_GRSC | IEVENT_GTSC)))
+		while ((gfar_read(&regs->ievent) &
+			 (IEVENT_GRSC | IEVENT_GTSC)) !=
+			 (IEVENT_GRSC | IEVENT_GTSC))
 			cpu_relax();
 	}
 }
-- 
1.6.5.2.g6ff9a


^ permalink raw reply related

* [PATCH] ks8842: Add platform data for setting mac address
From: Richard Röjfors @ 2010-04-18 22:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, bhutchings

This patch adds platform data to the ks8842 driver.

Via the platform data a MAC address, to be used by the controller,
can be passed.

To ensure this MAC address is used, the MAC address is written
after each hardware reset.

Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>
---
diff --git a/drivers/net/ks8842.c b/drivers/net/ks8842.c
index 5c45cb5..423dd8d 100644
--- a/drivers/net/ks8842.c
+++ b/drivers/net/ks8842.c
@@ -1,5 +1,5 @@
 /*
- * ks8842_main.c timberdale KS8842 ethernet driver
+ * ks8842.c timberdale KS8842 ethernet driver
  * Copyright (c) 2009 Intel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
@@ -26,6 +26,7 @@
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/ethtool.h>
+#include <linux/ks8842.h>
 
 #define DRV_NAME "ks8842"
 
@@ -302,6 +303,20 @@ static void ks8842_read_mac_addr(struct ks8842_adapter *adapter, u8 *dest)
 	ks8842_write16(adapter, 39, mac, REG_MACAR3);
 }
 
+static void ks8842_write_mac_addr(struct ks8842_adapter *adapter, u8 *mac)
+{
+	unsigned long flags;
+	unsigned i;
+
+	spin_lock_irqsave(&adapter->lock, flags);
+	for (i = 0; i < ETH_ALEN; i++) {
+		ks8842_write8(adapter, 2, mac[ETH_ALEN - i - 1], REG_MARL + i);
+		ks8842_write8(adapter, 39, mac[ETH_ALEN - i - 1],
+			REG_MACAR1 + i);
+	}
+	spin_unlock_irqrestore(&adapter->lock, flags);
+}
+
 static inline u16 ks8842_tx_fifo_space(struct ks8842_adapter *adapter)
 {
 	return ks8842_read16(adapter, 16, REG_TXMIR) & 0x1fff;
@@ -520,6 +535,8 @@ static int ks8842_open(struct net_device *netdev)
 	/* reset the HW */
 	ks8842_reset_hw(adapter);
 
+	ks8842_write_mac_addr(adapter, netdev->dev_addr);
+
 	ks8842_update_link_status(netdev, adapter);
 
 	err = request_irq(adapter->irq, ks8842_irq, IRQF_SHARED, DRV_NAME,
@@ -567,10 +584,8 @@ static netdev_tx_t ks8842_xmit_frame(struct sk_buff *skb,
 static int ks8842_set_mac(struct net_device *netdev, void *p)
 {
 	struct ks8842_adapter *adapter = netdev_priv(netdev);
-	unsigned long flags;
 	struct sockaddr *addr = p;
 	char *mac = (u8 *)addr->sa_data;
-	int i;
 
 	dev_dbg(&adapter->pdev->dev, "%s: entry\n", __func__);
 
@@ -579,13 +594,7 @@ static int ks8842_set_mac(struct net_device *netdev, void *p)
 
 	memcpy(netdev->dev_addr, mac, netdev->addr_len);
 
-	spin_lock_irqsave(&adapter->lock, flags);
-	for (i = 0; i < ETH_ALEN; i++) {
-		ks8842_write8(adapter, 2, mac[ETH_ALEN - i - 1], REG_MARL + i);
-		ks8842_write8(adapter, 39, mac[ETH_ALEN - i - 1],
-			REG_MACAR1 + i);
-	}
-	spin_unlock_irqrestore(&adapter->lock, flags);
+	ks8842_write_mac_addr(adapter, mac);
 	return 0;
 }
 
@@ -604,6 +613,8 @@ static void ks8842_tx_timeout(struct net_device *netdev)
 
 	ks8842_reset_hw(adapter);
 
+	ks8842_write_mac_addr(adapter, netdev->dev_addr);
+
 	ks8842_update_link_status(netdev, adapter);
 }
 
@@ -626,7 +637,9 @@ static int __devinit ks8842_probe(struct platform_device *pdev)
 	struct resource *iomem;
 	struct net_device *netdev;
 	struct ks8842_adapter *adapter;
+	struct ks8842_platform_data *pdata = pdev->dev.platform_data;
 	u16 id;
+	unsigned i;
 
 	iomem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!request_mem_region(iomem->start, resource_size(iomem), DRV_NAME))
@@ -657,7 +670,25 @@ static int __devinit ks8842_probe(struct platform_device *pdev)
 	netdev->netdev_ops = &ks8842_netdev_ops;
 	netdev->ethtool_ops = &ks8842_ethtool_ops;
 
-	ks8842_read_mac_addr(adapter, netdev->dev_addr);
+	/* Check if a mac address was given */
+	i = netdev->addr_len;
+	if (pdata) {
+		for (i = 0; i < netdev->addr_len; i++)
+			if (pdata->macaddr[i] != 0)
+				break;
+
+		if (i < netdev->addr_len)
+			/* an address was passed, use it */
+			memcpy(netdev->dev_addr, pdata->macaddr,
+				netdev->addr_len);
+	}
+
+	if (i == netdev->addr_len) {
+		ks8842_read_mac_addr(adapter, netdev->dev_addr);
+
+		if (!is_valid_ether_addr(netdev->dev_addr))
+			random_ether_addr(netdev->dev_addr);
+	}
 
 	id = ks8842_read16(adapter, 32, REG_SW_ID_AND_ENABLE);
 
diff --git a/include/linux/ks8842.h b/include/linux/ks8842.h
new file mode 100644
index 0000000..da0341b
--- /dev/null
+++ b/include/linux/ks8842.h
@@ -0,0 +1,34 @@
+/*
+ * ks8842.h KS8842 platform data struct definition
+ * Copyright (c) 2010 Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _LINUX_KS8842_H
+#define _LINUX_KS8842_H
+
+#include <linux/if_ether.h>
+
+/**
+ * struct ks8842_platform_data - Platform data of the KS8842 network driver
+ * @macaddr:	The MAC address of the device, set to all 0:s to use the on in
+ *		the chip.
+ *
+ */
+struct ks8842_platform_data {
+	u8 macaddr[ETH_ALEN];
+};
+
+#endif


^ permalink raw reply related

* Re: [PATCH] ks8842: Add module param for setting mac address
From: Richard Röjfors @ 2010-04-18 21:35 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, davem
In-Reply-To: <1271620221.3679.265.camel@localhost>

On 04/18/2010 09:50 PM, Ben Hutchings wrote:
> On Sun, 2010-04-18 at 19:25 +0200, Richard Röjfors wrote:
>> This patch adds a module parameter for setting the MAC address.
>>
>> To ensure this MAC address is used, the MAC address is written
>> after each hardware reset.
> [...]
>
> This is not an accepted way of setting the MAC address.

I agree it's not the cleanest way of doing this, I saw  some of the
other drivers do like this.

>  The accepted ways to initialise a network controller's address are:
>
> 1. a. Controller reads it from dedicated NVRAM.  Driver reads it from
>        controller.
>     b. Driver reads it from dedicated NVRAM.

Not possible with the current hardware.

> 2. Platform firmware or boot loader passes platform data (OpenFirmware,
>     device tree, etc.) to the kernel, which includes the assigned MAC
>     address.  Driver uses kernel functions to read it from platform data.

On our system (X86 based) the ks8842 is connected via a FPGA, the FPGA is
connected via PCI express. The system has a standard BIOS.
In linux we have a MFD driver chunking up the PCI memory space into
platform devices. In that case we would need to feed the data as a param
to the MFD driver which copies into the platform data of the ks8842, would
be doable.

> 3. Platform firmware or boot loader programs it into the controller.
>     Driver reads it from the controller.

We use standard BIOS and boot loaders from the X86 distros -> more or less
not doable.

> 4. Driver generates random address.

That's the current fallback if none is given and the random address
in the chip isn't valid.

>
> In any case, userland can change the MAC address later.

If we don't need to have a known MAC before the root FS is mounted.

--Richard

^ permalink raw reply

* Re: [PATCH v2] TCP: avoid to send keepalive probes if it is receiving data
From: Ilpo Järvinen @ 2010-04-18 20:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Flavio Leitner, Netdev
In-Reply-To: <1271610919.16881.5564.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3631 bytes --]

On Sun, 18 Apr 2010, Eric Dumazet wrote:

> Le dimanche 18 avril 2010 à 11:55 -0300, Flavio Leitner a écrit :
> > RFC 1122 says the following:
> > ...
> >   Keep-alive packets MUST only be sent when no data or
> >   acknowledgement packets have been received for the
> >   connection within an interval.
> > ...
> > 
> > Fix this by storing the timestamp of last received data
> > packet and checking for it when the keepalive timer expires.
> > 
> > -v2 fix do_tcp_setsockopt() as pointed by Eric Dumazet <eric.dumazet@gmail.com>
> > 
> > Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> 
> 
> I find this patch very welcome, and we could easily use this new
> lrcvtime information available in diagnostic tools (ss command)
> 
> But are you sure you update it for all valid packets ?
> 
> If we receive a pure ACK, it seems you do not ...

I fail to see why the addition of this new variable is necessary at all, 
could either of you enlight me why exactly it's necessary and rcv_tstamp 
will not suffice?

> > ---
> >  include/linux/tcp.h  |    1 +
> >  net/ipv4/tcp.c       |    5 ++++-
> >  net/ipv4/tcp_input.c |    3 +++
> >  net/ipv4/tcp_timer.c |    8 ++++++++
> >  4 files changed, 16 insertions(+), 1 deletions(-)
> > 
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index a778ee0..405678f 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -314,6 +314,7 @@ struct tcp_sock {
> >   	u32	snd_sml;	/* Last byte of the most recently transmitted small packet */
> >  	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
> >  	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
> > +	u32	lrcvtime;	/* timestamp of last received data packet (for keepalives) */
> >  
> >  	/* Data for direct copy to user */
> >  	struct {
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 0f8caf6..a4048d7 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -2298,7 +2298,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
> >  			if (sock_flag(sk, SOCK_KEEPOPEN) &&
> >  			    !((1 << sk->sk_state) &
> >  			      (TCPF_CLOSE | TCPF_LISTEN))) {
> > -				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
> > +				u32 elapsed = min_t(u32,
> > +						      tcp_time_stamp - tp->rcv_tstamp,
> > +						      tcp_time_stamp - tp->lrcvtime);
> > +
> >  				if (tp->keepalive_time > elapsed)
> >  					elapsed = tp->keepalive_time - elapsed;
> >  				else
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index f240f57..60d2980 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -5391,6 +5391,8 @@ no_ack:
> >  				__kfree_skb(skb);
> >  			else
> >  				sk->sk_data_ready(sk, 0);
> > +
> > +			tp->lrcvtime = tcp_time_stamp;
> >  			return 0;
> >  		}
> >  	}
> > @@ -5421,6 +5423,7 @@ step5:
> >  
> >  	tcp_data_snd_check(sk);
> >  	tcp_ack_snd_check(sk);
> > +	tp->lrcvtime = tcp_time_stamp;
> >  	return 0;
> >  
> >  csum_error:
> > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> > index 8a0ab29..74dd804 100644
> > --- a/net/ipv4/tcp_timer.c
> > +++ b/net/ipv4/tcp_timer.c
> > @@ -554,6 +554,14 @@ static void tcp_keepalive_timer (unsigned long data)
> >  	if (tp->packets_out || tcp_send_head(sk))
> >  		goto resched;
> >  
> > +	elapsed = tcp_time_stamp - tp->lrcvtime;
> > +	
> > +	/* receiving data means alive */
> > +	if (elapsed < keepalive_time_when(tp)) {
> > +		elapsed = keepalive_time_when(tp) - elapsed;
> > +		goto resched;
> > +	}
> > +
> >  	elapsed = tcp_time_stamp - tp->rcv_tstamp;
> >  
> >  	if (elapsed >= keepalive_time_when(tp)) {


-- 
 i.

^ permalink raw reply

* Re: [PATCH] ks8842: Add module param for setting mac address
From: Ben Hutchings @ 2010-04-18 19:50 UTC (permalink / raw)
  To: Richard Röjfors; +Cc: netdev, davem
In-Reply-To: <1271611557.24099.11.camel@debian>

On Sun, 2010-04-18 at 19:25 +0200, Richard Röjfors wrote:
> This patch adds a module parameter for setting the MAC address.
> 
> To ensure this MAC address is used, the MAC address is written
> after each hardware reset.
[...]

This is not an accepted way of setting the MAC address.  The accepted
ways to initialise a network controller's address are:

1. a. Controller reads it from dedicated NVRAM.  Driver reads it from
      controller.
   b. Driver reads it from dedicated NVRAM.
2. Platform firmware or boot loader passes platform data (OpenFirmware,
   device tree, etc.) to the kernel, which includes the assigned MAC
   address.  Driver uses kernel functions to read it from platform data.
3. Platform firmware or boot loader programs it into the controller.
   Driver reads it from the controller.
4. Driver generates random address.

In any case, userland can change the MAC address later.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH] drivers/net/usb: Add new driver ipheth
From: L. Alberto Giménez @ 2010-04-18 18:35 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: dgiagio-Re5JQEeQqe8AvxtiuMwx3w, dborca-/E1597aS9LQAvxtiuMwx3w,
	David S. Miller, James Bottomley, Ralf Baechle,
	Greg Kroah-Hartman, Jonas Sjöquist, Torgny Johansson,
	Steve Glendinning, David Brownell, Omar Laazimani,
	Rémi Denis-Courmont, L. Alberto Giménez,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1269984864-28159-1-git-send-email-agimenez-lqZFv/KUvpAxAGwisGp4zA@public.gmane.org>

From: Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>

Add new driver to use tethering with an iPhone device. After initial submission,
apply fixes to fit the new driver into the kernel standards.

There are still a couple of minor (almost cosmetic-level) issues, but the driver
is fully functional right now.

Signed-off-by: L. Alberto Giménez <agimenez-lqZFv/KUvpAxAGwisGp4zA@public.gmane.org>
---
This time I've cleared the authorship as requested by David. Diego is the
original author of the code, although Daniel Borca has contributed a lot of
changes as well (I did the kernel-tree integration and fixed checkpatch issues
as well as some of the problems pointed out in the list on previous
submissions).

Diego and/or Daniel: please sign-off (specially Diego since he is the one who
will be officially attributed as the driver author) this patch so the authorship
and DCO is clear.

 drivers/net/Makefile     |    1 +
 drivers/net/usb/Kconfig  |   12 +
 drivers/net/usb/Makefile |    1 +
 drivers/net/usb/ipheth.c |  568 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 582 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/usb/ipheth.c

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a583b50..12b280a 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -273,6 +273,7 @@ obj-$(CONFIG_USB_RTL8150)       += usb/
 obj-$(CONFIG_USB_HSO)		+= usb/
 obj-$(CONFIG_USB_USBNET)        += usb/
 obj-$(CONFIG_USB_ZD1201)        += usb/
+obj-$(CONFIG_USB_IPHETH)        += usb/
 
 obj-y += wireless/
 obj-$(CONFIG_NET_TULIP) += tulip/
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index ba56ce4..63be4ca 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -385,4 +385,16 @@ config USB_CDC_PHONET
 	  cellular modem, as found on most Nokia handsets with the
 	  "PC suite" USB profile.
 
+config USB_IPHETH
+	tristate "Apple iPhone USB Ethernet driver"
+	default n
+	---help---
+	  Module used to share Internet connection (tethering) from your
+	  iPhone (Original, 3G and 3GS) to your system.
+	  Note that you need userspace libraries and programs that are needed
+	  to pair your device with your system and that understand the iPhone
+	  protocol.
+
+	  For more information: http://giagio.com/wiki/moin.cgi/iPhoneEthernetDriver
+
 endmenu
diff --git a/drivers/net/usb/Makefile b/drivers/net/usb/Makefile
index 82ea629..edb09c0 100644
--- a/drivers/net/usb/Makefile
+++ b/drivers/net/usb/Makefile
@@ -23,4 +23,5 @@ obj-$(CONFIG_USB_NET_MCS7830)	+= mcs7830.o
 obj-$(CONFIG_USB_USBNET)	+= usbnet.o
 obj-$(CONFIG_USB_NET_INT51X1)	+= int51x1.o
 obj-$(CONFIG_USB_CDC_PHONET)	+= cdc-phonet.o
+obj-$(CONFIG_USB_IPHETH)	+= ipheth.o
 
diff --git a/drivers/net/usb/ipheth.c b/drivers/net/usb/ipheth.c
new file mode 100644
index 0000000..fd10331
--- /dev/null
+++ b/drivers/net/usb/ipheth.c
@@ -0,0 +1,568 @@
+/*
+ * ipheth.c - Apple iPhone USB Ethernet driver
+ *
+ * Copyright (c) 2009 Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of GIAGIO.COM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * The provided data structures and external interfaces from this code
+ * are not restricted to be used by modules with a GPL compatible license.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ *
+ * Attention: iPhone device must be paired, otherwise it won't respond to our
+ * driver. For more info: http://giagio.com/wiki/moin.cgi/iPhoneEthernetDriver
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/usb.h>
+#include <linux/workqueue.h>
+
+#define USB_VENDOR_APPLE        0x05ac
+#define USB_PRODUCT_IPHONE      0x1290
+#define USB_PRODUCT_IPHONE_3G   0x1292
+#define USB_PRODUCT_IPHONE_3GS  0x1294
+
+#define IPHETH_USBINTF_CLASS    255
+#define IPHETH_USBINTF_SUBCLASS 253
+#define IPHETH_USBINTF_PROTO    1
+
+#define IPHETH_BUF_SIZE         1516
+#define IPHETH_TX_TIMEOUT       (5 * HZ)
+
+#define IPHETH_INTFNUM          2
+#define IPHETH_ALT_INTFNUM      1
+
+#define IPHETH_CTRL_ENDP        0x00
+#define IPHETH_CTRL_BUF_SIZE    0x40
+#define IPHETH_CTRL_TIMEOUT     (5 * HZ)
+
+#define IPHETH_CMD_GET_MACADDR   0x00
+#define IPHETH_CMD_CARRIER_CHECK 0x45
+
+#define IPHETH_CARRIER_CHECK_TIMEOUT round_jiffies_relative(1 * HZ)
+#define IPHETH_CARRIER_ON       0x04
+
+static struct usb_device_id ipheth_table[] = {
+	{ USB_DEVICE_AND_INTERFACE_INFO(
+		USB_VENDOR_APPLE, USB_PRODUCT_IPHONE,
+		IPHETH_USBINTF_CLASS, IPHETH_USBINTF_SUBCLASS,
+		IPHETH_USBINTF_PROTO) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(
+		USB_VENDOR_APPLE, USB_PRODUCT_IPHONE_3G,
+		IPHETH_USBINTF_CLASS, IPHETH_USBINTF_SUBCLASS,
+		IPHETH_USBINTF_PROTO) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(
+		USB_VENDOR_APPLE, USB_PRODUCT_IPHONE_3GS,
+		IPHETH_USBINTF_CLASS, IPHETH_USBINTF_SUBCLASS,
+		IPHETH_USBINTF_PROTO) },
+	{ }
+};
+MODULE_DEVICE_TABLE(usb, ipheth_table);
+
+struct ipheth_device {
+	struct usb_device *udev;
+	struct usb_interface *intf;
+	struct net_device *net;
+	struct sk_buff *tx_skb;
+	struct urb *tx_urb;
+	struct urb *rx_urb;
+	unsigned char *tx_buf;
+	unsigned char *rx_buf;
+	unsigned char *ctrl_buf;
+	u8 bulk_in;
+	u8 bulk_out;
+	struct delayed_work carrier_work;
+};
+
+static int ipheth_rx_submit(struct ipheth_device *dev, gfp_t mem_flags);
+
+static int ipheth_alloc_urbs(struct ipheth_device *iphone)
+{
+	struct urb *tx_urb = NULL;
+	struct urb *rx_urb = NULL;
+	u8 *tx_buf = NULL;
+	u8 *rx_buf = NULL;
+
+	tx_urb = usb_alloc_urb(0, GFP_KERNEL);
+	if (tx_urb == NULL)
+		goto error;
+
+	rx_urb = usb_alloc_urb(0, GFP_KERNEL);
+	if (rx_urb == NULL)
+		goto error;
+
+	tx_buf = usb_buffer_alloc(iphone->udev,
+				  IPHETH_BUF_SIZE,
+				  GFP_KERNEL,
+				  &tx_urb->transfer_dma);
+	if (tx_buf == NULL)
+		goto error;
+
+	rx_buf = usb_buffer_alloc(iphone->udev,
+				  IPHETH_BUF_SIZE,
+				  GFP_KERNEL,
+				  &rx_urb->transfer_dma);
+	if (rx_buf == NULL)
+		goto error;
+
+
+	iphone->tx_urb = tx_urb;
+	iphone->rx_urb = rx_urb;
+	iphone->tx_buf = tx_buf;
+	iphone->rx_buf = rx_buf;
+	return 0;
+
+error:
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, rx_buf,
+			rx_urb->transfer_dma);
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, tx_buf,
+			tx_urb->transfer_dma);
+	usb_free_urb(rx_urb);
+	usb_free_urb(tx_urb);
+	return -ENOMEM;
+}
+
+static void ipheth_free_urbs(struct ipheth_device *iphone)
+{
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, iphone->rx_buf,
+			iphone->rx_urb->transfer_dma);
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, iphone->tx_buf,
+			iphone->tx_urb->transfer_dma);
+	usb_free_urb(iphone->rx_urb);
+	usb_free_urb(iphone->tx_urb);
+}
+
+static void ipheth_kill_urbs(struct ipheth_device *dev)
+{
+	usb_kill_urb(dev->tx_urb);
+	usb_kill_urb(dev->rx_urb);
+}
+
+static void ipheth_rcvbulk_callback(struct urb *urb)
+{
+	struct ipheth_device *dev;
+	struct sk_buff *skb;
+	int status;
+	char *buf;
+	int len;
+
+	dev = urb->context;
+	if (dev == NULL)
+		return;
+
+	status = urb->status;
+	switch (status) {
+	case -ENOENT:
+	case -ECONNRESET:
+	case -ESHUTDOWN:
+		return;
+	case 0:
+		break;
+	default:
+		err("%s: urb status: %d", __func__, urb->status);
+		return;
+	}
+
+	len = urb->actual_length;
+	buf = urb->transfer_buffer;
+
+	skb = dev_alloc_skb(NET_IP_ALIGN + len);
+	if (!skb) {
+		err("%s: dev_alloc_skb: -ENOMEM", __func__);
+		dev->net->stats.rx_dropped++;
+		return;
+	}
+
+	skb_reserve(skb, NET_IP_ALIGN);
+	memcpy(skb_put(skb, len), buf + NET_IP_ALIGN, len - NET_IP_ALIGN);
+	skb->dev = dev->net;
+	skb->protocol = eth_type_trans(skb, dev->net);
+
+	dev->net->stats.rx_packets++;
+	dev->net->stats.rx_bytes += len;
+
+	netif_rx(skb);
+	ipheth_rx_submit(dev, GFP_ATOMIC);
+}
+
+static void ipheth_sndbulk_callback(struct urb *urb)
+{
+	struct ipheth_device *dev;
+
+	dev = urb->context;
+	if (dev == NULL)
+		return;
+
+	if (urb->status != 0 &&
+	    urb->status != -ENOENT &&
+	    urb->status != -ECONNRESET &&
+	    urb->status != -ESHUTDOWN)
+		err("%s: urb status: %d", __func__, urb->status);
+
+	dev_kfree_skb_irq(dev->tx_skb);
+	netif_wake_queue(dev->net);
+}
+
+static int ipheth_carrier_set(struct ipheth_device *dev)
+{
+	struct usb_device *udev = dev->udev;
+	int retval;
+
+	retval = usb_control_msg(udev,
+			usb_rcvctrlpipe(udev, IPHETH_CTRL_ENDP),
+			IPHETH_CMD_CARRIER_CHECK, /* request */
+			0xc0, /* request type */
+			0x00, /* value */
+			0x02, /* index */
+			dev->ctrl_buf, IPHETH_CTRL_BUF_SIZE,
+			IPHETH_CTRL_TIMEOUT);
+	if (retval < 0) {
+		err("%s: usb_control_msg: %d", __func__, retval);
+		return retval;
+	}
+
+	if (dev->ctrl_buf[0] == IPHETH_CARRIER_ON)
+		netif_carrier_on(dev->net);
+	else
+		netif_carrier_off(dev->net);
+
+	return 0;
+}
+
+static void ipheth_carrier_check_work(struct work_struct *work)
+{
+	struct ipheth_device *dev = container_of(work, struct ipheth_device,
+						 carrier_work.work);
+
+	ipheth_carrier_set(dev);
+	schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT);
+}
+
+static int ipheth_get_macaddr(struct ipheth_device *dev)
+{
+	struct usb_device *udev = dev->udev;
+	struct net_device *net = dev->net;
+	int retval;
+
+	retval = usb_control_msg(udev,
+				 usb_rcvctrlpipe(udev, IPHETH_CTRL_ENDP),
+				 IPHETH_CMD_GET_MACADDR, /* request */
+				 0xc0, /* request type */
+				 0x00, /* value */
+				 0x02, /* index */
+				 dev->ctrl_buf,
+				 IPHETH_CTRL_BUF_SIZE,
+				 IPHETH_CTRL_TIMEOUT);
+	if (retval < 0) {
+		err("%s: usb_control_msg: %d", __func__, retval);
+	} else if (retval < ETH_ALEN) {
+		err("%s: usb_control_msg: short packet: %d bytes",
+			__func__, retval);
+		retval = -EINVAL;
+	} else {
+		memcpy(net->dev_addr, dev->ctrl_buf, ETH_ALEN);
+		retval = 0;
+	}
+
+	return retval;
+}
+
+static int ipheth_rx_submit(struct ipheth_device *dev, gfp_t mem_flags)
+{
+	struct usb_device *udev = dev->udev;
+	int retval;
+
+	usb_fill_bulk_urb(dev->rx_urb, udev,
+			  usb_rcvbulkpipe(udev, dev->bulk_in),
+			  dev->rx_buf, IPHETH_BUF_SIZE,
+			  ipheth_rcvbulk_callback,
+			  dev);
+	dev->rx_urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
+
+	retval = usb_submit_urb(dev->rx_urb, mem_flags);
+	if (retval)
+		err("%s: usb_submit_urb: %d", __func__, retval);
+	return retval;
+}
+
+static int ipheth_open(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	struct usb_device *udev = dev->udev;
+	int retval = 0;
+
+	usb_set_interface(udev, IPHETH_INTFNUM, IPHETH_ALT_INTFNUM);
+
+	retval = ipheth_carrier_set(dev);
+	if (retval)
+		return retval;
+
+	retval = ipheth_rx_submit(dev, GFP_KERNEL);
+	if (retval)
+		return retval;
+
+	schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT);
+	netif_start_queue(net);
+	return retval;
+}
+
+static int ipheth_close(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+
+	cancel_delayed_work_sync(&dev->carrier_work);
+	netif_stop_queue(net);
+	return 0;
+}
+
+static int ipheth_tx(struct sk_buff *skb, struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	struct usb_device *udev = dev->udev;
+	int retval;
+
+	/* Paranoid */
+	if (skb->len > IPHETH_BUF_SIZE) {
+		WARN(1, "%s: skb too large: %d bytes", __func__, skb->len);
+		dev->net->stats.tx_dropped++;
+		dev_kfree_skb_irq(skb);
+		return NETDEV_TX_OK;
+	}
+
+	memcpy(dev->tx_buf, skb->data, skb->len);
+	if (skb->len < IPHETH_BUF_SIZE)
+		memset(dev->tx_buf + skb->len, 0, IPHETH_BUF_SIZE - skb->len);
+
+	usb_fill_bulk_urb(dev->tx_urb, udev,
+			  usb_sndbulkpipe(udev, dev->bulk_out),
+			  dev->tx_buf, IPHETH_BUF_SIZE,
+			  ipheth_sndbulk_callback,
+			  dev);
+	dev->tx_urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
+
+	retval = usb_submit_urb(dev->tx_urb, GFP_ATOMIC);
+	if (retval) {
+		err("%s: usb_submit_urb: %d", __func__, retval);
+		dev->net->stats.tx_errors++;
+		dev_kfree_skb_irq(skb);
+	} else {
+		dev->tx_skb = skb;
+
+		dev->net->stats.tx_packets++;
+		dev->net->stats.tx_bytes += skb->len;
+		netif_stop_queue(net);
+	}
+
+	return NETDEV_TX_OK;
+}
+
+static void ipheth_tx_timeout(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+
+	err("%s: TX timeout", __func__);
+	dev->net->stats.tx_errors++;
+	usb_unlink_urb(dev->tx_urb);
+}
+
+static struct net_device_stats *ipheth_stats(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	return &dev->net->stats;
+}
+
+static u32 ipheth_ethtool_op_get_link(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	return netif_carrier_ok(dev->net);
+}
+
+static struct ethtool_ops ops = {
+	.get_link = ipheth_ethtool_op_get_link
+};
+
+static const struct net_device_ops ipheth_netdev_ops = {
+	.ndo_open = &ipheth_open,
+	.ndo_stop = &ipheth_close,
+	.ndo_start_xmit = &ipheth_tx,
+	.ndo_tx_timeout = &ipheth_tx_timeout,
+	.ndo_get_stats = &ipheth_stats,
+};
+
+static struct device_type ipheth_type = {
+	.name	= "wwan",
+};
+
+static int ipheth_probe(struct usb_interface *intf,
+			const struct usb_device_id *id)
+{
+	struct usb_device *udev = interface_to_usbdev(intf);
+	struct usb_host_interface *hintf;
+	struct usb_endpoint_descriptor *endp;
+	struct ipheth_device *dev;
+	struct net_device *netdev;
+	int i;
+	int retval;
+
+	netdev = alloc_etherdev(sizeof(struct ipheth_device));
+	if (!netdev)
+		return -ENOMEM;
+
+	netdev->netdev_ops = &ipheth_netdev_ops;
+	netdev->watchdog_timeo = IPHETH_TX_TIMEOUT;
+	strcpy(netdev->name, "wwan%d");
+
+	dev = netdev_priv(netdev);
+	dev->udev = udev;
+	dev->net = netdev;
+	dev->intf = intf;
+
+	/* Set up endpoints */
+	hintf = usb_altnum_to_altsetting(intf, IPHETH_ALT_INTFNUM);
+	if (hintf == NULL) {
+		retval = -ENODEV;
+		err("Unable to find alternate settings interface");
+		goto err_endpoints;
+	}
+
+	for (i = 0; i < hintf->desc.bNumEndpoints; i++) {
+		endp = &hintf->endpoint[i].desc;
+		if (usb_endpoint_is_bulk_in(endp))
+			dev->bulk_in = endp->bEndpointAddress;
+		else if (usb_endpoint_is_bulk_out(endp))
+			dev->bulk_out = endp->bEndpointAddress;
+	}
+	if (!(dev->bulk_in && dev->bulk_out)) {
+		retval = -ENODEV;
+		err("Unable to find endpoints");
+		goto err_endpoints;
+	}
+
+	dev->ctrl_buf = kmalloc(IPHETH_CTRL_BUF_SIZE, GFP_KERNEL);
+	if (dev->ctrl_buf == NULL) {
+		retval = -ENOMEM;
+		goto err_alloc_ctrl_buf;
+	}
+
+	retval = ipheth_get_macaddr(dev);
+	if (retval)
+		goto err_get_macaddr;
+
+	INIT_DELAYED_WORK(&dev->carrier_work, ipheth_carrier_check_work);
+
+	retval = ipheth_alloc_urbs(dev);
+	if (retval) {
+		err("error allocating urbs: %d", retval);
+		goto err_alloc_urbs;
+	}
+
+	usb_set_intfdata(intf, dev);
+
+	SET_NETDEV_DEV(netdev, &intf->dev);
+	SET_ETHTOOL_OPS(netdev, &ops);
+	SET_NETDEV_DEVTYPE(netdev, &ipheth_type);
+
+	retval = register_netdev(netdev);
+	if (retval) {
+		err("error registering netdev: %d", retval);
+		retval = -EIO;
+		goto err_register_netdev;
+	}
+
+	dev_info(&intf->dev, "Apple iPhone USB Ethernet device attached\n");
+	return 0;
+
+err_register_netdev:
+	ipheth_free_urbs(dev);
+err_alloc_urbs:
+err_get_macaddr:
+err_alloc_ctrl_buf:
+	kfree(dev->ctrl_buf);
+err_endpoints:
+	free_netdev(netdev);
+	return retval;
+}
+
+static void ipheth_disconnect(struct usb_interface *intf)
+{
+	struct ipheth_device *dev;
+
+	dev = usb_get_intfdata(intf);
+	if (dev != NULL) {
+		unregister_netdev(dev->net);
+		ipheth_kill_urbs(dev);
+		ipheth_free_urbs(dev);
+		kfree(dev->ctrl_buf);
+		free_netdev(dev->net);
+	}
+	usb_set_intfdata(intf, NULL);
+	dev_info(&intf->dev, "Apple iPhone USB Ethernet now disconnected\n");
+}
+
+static struct usb_driver ipheth_driver = {
+	.name =		"ipheth",
+	.probe =	ipheth_probe,
+	.disconnect =	ipheth_disconnect,
+	.id_table =	ipheth_table,
+};
+
+static int __init ipheth_init(void)
+{
+	int retval;
+
+	retval = usb_register(&ipheth_driver);
+	if (retval) {
+		err("usb_register failed: %d", retval);
+		return retval;
+	}
+	return 0;
+}
+
+static void __exit ipheth_exit(void)
+{
+	usb_deregister(&ipheth_driver);
+}
+
+module_init(ipheth_init);
+module_exit(ipheth_exit);
+
+MODULE_AUTHOR("Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>");
+MODULE_DESCRIPTION("Apple iPhone USB Ethernet driver");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
1.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: Bug#577640: linux-image-2.6.33-2-amd64: Kernel warnings in netns thread
From: Ben Hutchings @ 2010-04-18 17:33 UTC (permalink / raw)
  To: Martin Ferrari, 577640; +Cc: netdev, Eric W. Biederman, Alexey Dobriyan
In-Reply-To: <20100413112017.8505.36380.reportbug@japot.localdomain>

[-- Attachment #1: Type: text/plain, Size: 3369 bytes --]

On Tue, 2010-04-13 at 13:20 +0200, Martin Ferrari wrote:
> Package: linux-2.6
> Version: 2.6.33-1~experimental.4
> Severity: normal
> Tags: experimental
> 
> Firstly, please note that I'm running this inside a Qemu, but I
> imagine that it should not change things much.
> 
> I installed 2.6.33 to try out the new improvements regarding network
> namespaces, and while creating and killing hundreds of them, I got
> many warnings from the kernel that might indicate a bug somewhere.
> Please see the log already included by reportbug.

I'm forwarding this to the upstream maintainers.

Ben.

[...]
> [ 6696.035331] ------------[ cut here ]------------
> [ 6696.035334] WARNING: at /build/mattems-linux-2.6_2.6.33-1~experimental.4-amd64-ieqSsa/linux-2.6-2.6.33-1~experimental.4/debian/build/source_amd64_none/kernel/sysctl.c:1894 unregister_sysctl_table+0xa6/0xd1()
> [ 6696.035336] Hardware name: 
> [ 6696.035337] Modules linked in: veth loop parport_pc parport snd_pcm tpm_tis evdev snd_timer i2c_piix4 tpm tpm_bios button processor serio_raw i2c_core snd soundcore snd_page_alloc pcspkr psmouse ext3 jbd mbcache sg sr_mod cdrom sd_mod crc_t10dif ata_generic ata_piix libata floppy 8139cp thermal thermal_sys 8139too mii scsi_mod [last unloaded: veth]
> [ 6696.035354] Pid: 9, comm: netns Tainted: G        W  2.6.33-2-amd64 #1
> [ 6696.035355] Call Trace:
> [ 6696.035357]  [<ffffffff8104e303>] ? unregister_sysctl_table+0xa6/0xd1
> [ 6696.035360]  [<ffffffff8104e303>] ? unregister_sysctl_table+0xa6/0xd1
> [ 6696.035362]  [<ffffffff81046b81>] ? warn_slowpath_common+0x77/0xa3
> [ 6696.035364]  [<ffffffff8104e303>] ? unregister_sysctl_table+0xa6/0xd1
> [ 6696.035367]  [<ffffffff812b1205>] ? addrconf_ifdown+0x26f/0x2cc
> [ 6696.035369]  [<ffffffff81247edc>] ? neigh_sysctl_unregister+0x1a/0x31
> [ 6696.035371]  [<ffffffff812b1211>] ? addrconf_ifdown+0x27b/0x2cc
> [ 6696.035374]  [<ffffffff812b2b0c>] ? addrconf_notify+0x714/0x7ea
> [ 6696.035376]  [<ffffffff811eb2e7>] ? extract_entropy+0x6a/0x125
> [ 6696.035379]  [<ffffffff81053aaf>] ? lock_timer_base+0x26/0x4b
> [ 6696.035382]  [<ffffffff8123951c>] ? skb_dequeue+0x50/0x58
> [ 6696.035384]  [<ffffffff812482c8>] ? pneigh_queue_purge+0x25/0x2f
> [ 6696.035386]  [<ffffffff81249a91>] ? neigh_ifdown+0xba/0xc9
> [ 6696.035389]  [<ffffffff81062f6d>] ? notifier_call_chain+0x29/0x4c
> [ 6696.035392]  [<ffffffff81243662>] ? rollback_registered_many+0xed/0x19c
> [ 6696.035394]  [<ffffffff8124371f>] ? unregister_netdevice_many+0xe/0x57
> [ 6696.035397]  [<ffffffff812438a3>] ? default_device_exit_batch+0x92/0xa3
> [ 6696.035399]  [<ffffffff8123e97a>] ? cleanup_net+0xfd/0x1af
> [ 6696.035402]  [<ffffffff8105bd0d>] ? worker_thread+0x188/0x21d
> [ 6696.035404]  [<ffffffff8123e87d>] ? cleanup_net+0x0/0x1af
> [ 6696.035406]  [<ffffffff8105f2d2>] ? autoremove_wake_function+0x0/0x2e
> [ 6696.035409]  [<ffffffff8105bb85>] ? worker_thread+0x0/0x21d
> [ 6696.035411]  [<ffffffff8105ee99>] ? kthread+0x79/0x81
> [ 6696.035414]  [<ffffffff810098e4>] ? kernel_thread_helper+0x4/0x10
> [ 6696.035416]  [<ffffffff8105ee20>] ? kthread+0x0/0x81
> [ 6696.035418]  [<ffffffff810098e0>] ? kernel_thread_helper+0x0/0x10
> [ 6696.035419] ---[ end trace ef7b93cb006e989e ]---
[...]

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [PATCH] ks8842: Add module param for setting mac address
From: Richard Röjfors @ 2010-04-18 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem

This patch adds a module parameter for setting the MAC address.

To ensure this MAC address is used, the MAC address is written
after each hardware reset.

Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>
---
diff --git a/drivers/net/ks8842.c b/drivers/net/ks8842.c
index 5c45cb5..a013662 100644
--- a/drivers/net/ks8842.c
+++ b/drivers/net/ks8842.c
@@ -1,5 +1,5 @@
 /*
- * ks8842_main.c timberdale KS8842 ethernet driver
+ * ks8842.c timberdale KS8842 ethernet driver
  * Copyright (c) 2009 Intel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
@@ -119,6 +119,8 @@ struct ks8842_adapter {
 	struct platform_device *pdev;
 };
 
+static u8 __devinitdata macaddr[ETH_ALEN];
+
 static inline void ks8842_select_bank(struct ks8842_adapter *adapter, u16 bank)
 {
 	iowrite16(bank, adapter->hw_addr + REG_SELECT_BANK);
@@ -302,6 +304,20 @@ static void ks8842_read_mac_addr(struct ks8842_adapter *adapter, u8 *dest)
 	ks8842_write16(adapter, 39, mac, REG_MACAR3);
 }
 
+static void ks8842_write_mac_addr(struct ks8842_adapter *adapter, u8 *mac)
+{
+	unsigned long flags;
+	unsigned i;
+
+	spin_lock_irqsave(&adapter->lock, flags);
+	for (i = 0; i < ETH_ALEN; i++) {
+		ks8842_write8(adapter, 2, mac[ETH_ALEN - i - 1], REG_MARL + i);
+		ks8842_write8(adapter, 39, mac[ETH_ALEN - i - 1],
+			REG_MACAR1 + i);
+	}
+	spin_unlock_irqrestore(&adapter->lock, flags);
+}
+
 static inline u16 ks8842_tx_fifo_space(struct ks8842_adapter *adapter)
 {
 	return ks8842_read16(adapter, 16, REG_TXMIR) & 0x1fff;
@@ -520,6 +536,8 @@ static int ks8842_open(struct net_device *netdev)
 	/* reset the HW */
 	ks8842_reset_hw(adapter);
 
+	ks8842_write_mac_addr(adapter, netdev->dev_addr);
+
 	ks8842_update_link_status(netdev, adapter);
 
 	err = request_irq(adapter->irq, ks8842_irq, IRQF_SHARED, DRV_NAME,
@@ -567,10 +585,8 @@ static netdev_tx_t ks8842_xmit_frame(struct sk_buff *skb,
 static int ks8842_set_mac(struct net_device *netdev, void *p)
 {
 	struct ks8842_adapter *adapter = netdev_priv(netdev);
-	unsigned long flags;
 	struct sockaddr *addr = p;
 	char *mac = (u8 *)addr->sa_data;
-	int i;
 
 	dev_dbg(&adapter->pdev->dev, "%s: entry\n", __func__);
 
@@ -579,13 +595,7 @@ static int ks8842_set_mac(struct net_device *netdev, void *p)
 
 	memcpy(netdev->dev_addr, mac, netdev->addr_len);
 
-	spin_lock_irqsave(&adapter->lock, flags);
-	for (i = 0; i < ETH_ALEN; i++) {
-		ks8842_write8(adapter, 2, mac[ETH_ALEN - i - 1], REG_MARL + i);
-		ks8842_write8(adapter, 39, mac[ETH_ALEN - i - 1],
-			REG_MACAR1 + i);
-	}
-	spin_unlock_irqrestore(&adapter->lock, flags);
+	ks8842_write_mac_addr(adapter, mac);
 	return 0;
 }
 
@@ -604,6 +614,8 @@ static void ks8842_tx_timeout(struct net_device *netdev)
 
 	ks8842_reset_hw(adapter);
 
+	ks8842_write_mac_addr(adapter, netdev->dev_addr);
+
 	ks8842_update_link_status(netdev, adapter);
 }
 
@@ -627,6 +639,7 @@ static int __devinit ks8842_probe(struct platform_device *pdev)
 	struct net_device *netdev;
 	struct ks8842_adapter *adapter;
 	u16 id;
+	unsigned i;
 
 	iomem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!request_mem_region(iomem->start, resource_size(iomem), DRV_NAME))
@@ -657,7 +670,20 @@ static int __devinit ks8842_probe(struct platform_device *pdev)
 	netdev->netdev_ops = &ks8842_netdev_ops;
 	netdev->ethtool_ops = &ks8842_ethtool_ops;
 
-	ks8842_read_mac_addr(adapter, netdev->dev_addr);
+	/* Check if a mac address was given */
+	for (i = 0; i < netdev->addr_len; i++)
+		if (macaddr[i] != 0)
+			break;
+
+	if (i < netdev->addr_len)
+		/* an address was passed, use it */
+		memcpy(netdev->dev_addr, macaddr, netdev->addr_len);
+	else {
+		ks8842_read_mac_addr(adapter, netdev->dev_addr);
+
+		if (!is_valid_ether_addr(netdev->dev_addr))
+			random_ether_addr(netdev->dev_addr);
+	}
 
 	id = ks8842_read16(adapter, 32, REG_SW_ID_AND_ENABLE);
 
@@ -723,6 +749,10 @@ static void __exit ks8842_exit(void)
 module_init(ks8842_init);
 module_exit(ks8842_exit);
 
+/* accept MAC address of the form macaddr=0x08,0x00,0x20,0x30,0x40,0x50 */
+module_param_array(macaddr, byte, NULL, 0);
+MODULE_PARM_DESC(macaddr, "KS8842 MAC address to set");
+
 MODULE_DESCRIPTION("Timberdale KS8842 ethernet driver");
 MODULE_AUTHOR("Mocean Laboratories <info@mocean-labs.com>");
 MODULE_LICENSE("GPL v2");


^ permalink raw reply related

* Re: [PATCH v2] TCP: avoid to send keepalive probes if it is receiving data
From: Eric Dumazet @ 2010-04-18 17:15 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev
In-Reply-To: <1271602519-6805-1-git-send-email-fleitner@redhat.com>

Le dimanche 18 avril 2010 à 11:55 -0300, Flavio Leitner a écrit :
> RFC 1122 says the following:
> ...
>   Keep-alive packets MUST only be sent when no data or
>   acknowledgement packets have been received for the
>   connection within an interval.
> ...
> 
> Fix this by storing the timestamp of last received data
> packet and checking for it when the keepalive timer expires.
> 
> -v2 fix do_tcp_setsockopt() as pointed by Eric Dumazet <eric.dumazet@gmail.com>
> 
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>


I find this patch very welcome, and we could easily use this new
lrcvtime information available in diagnostic tools (ss command)

But are you sure you update it for all valid packets ?

If we receive a pure ACK, it seems you do not ...



> ---
>  include/linux/tcp.h  |    1 +
>  net/ipv4/tcp.c       |    5 ++++-
>  net/ipv4/tcp_input.c |    3 +++
>  net/ipv4/tcp_timer.c |    8 ++++++++
>  4 files changed, 16 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index a778ee0..405678f 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -314,6 +314,7 @@ struct tcp_sock {
>   	u32	snd_sml;	/* Last byte of the most recently transmitted small packet */
>  	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
>  	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
> +	u32	lrcvtime;	/* timestamp of last received data packet (for keepalives) */
>  
>  	/* Data for direct copy to user */
>  	struct {
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0f8caf6..a4048d7 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2298,7 +2298,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>  			if (sock_flag(sk, SOCK_KEEPOPEN) &&
>  			    !((1 << sk->sk_state) &
>  			      (TCPF_CLOSE | TCPF_LISTEN))) {
> -				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
> +				u32 elapsed = min_t(u32,
> +						      tcp_time_stamp - tp->rcv_tstamp,
> +						      tcp_time_stamp - tp->lrcvtime);
> +
>  				if (tp->keepalive_time > elapsed)
>  					elapsed = tp->keepalive_time - elapsed;
>  				else
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index f240f57..60d2980 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5391,6 +5391,8 @@ no_ack:
>  				__kfree_skb(skb);
>  			else
>  				sk->sk_data_ready(sk, 0);
> +
> +			tp->lrcvtime = tcp_time_stamp;
>  			return 0;
>  		}
>  	}
> @@ -5421,6 +5423,7 @@ step5:
>  
>  	tcp_data_snd_check(sk);
>  	tcp_ack_snd_check(sk);
> +	tp->lrcvtime = tcp_time_stamp;
>  	return 0;
>  
>  csum_error:
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 8a0ab29..74dd804 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -554,6 +554,14 @@ static void tcp_keepalive_timer (unsigned long data)
>  	if (tp->packets_out || tcp_send_head(sk))
>  		goto resched;
>  
> +	elapsed = tcp_time_stamp - tp->lrcvtime;
> +	
> +	/* receiving data means alive */
> +	if (elapsed < keepalive_time_when(tp)) {
> +		elapsed = keepalive_time_when(tp) - elapsed;
> +		goto resched;
> +	}
> +
>  	elapsed = tcp_time_stamp - tp->rcv_tstamp;
>  
>  	if (elapsed >= keepalive_time_when(tp)) {



^ permalink raw reply

* Re: [PATCH 11/13] bnx2x: Forbid LRO on when XEN is enabled
From: Ben Hutchings @ 2010-04-18 15:11 UTC (permalink / raw)
  To: Vladislav Zolotarov; +Cc: Dave Miller, Eilon Greenstein, netdev list, dmitry
In-Reply-To: <1271602258.27235.200.camel@lb-tlvb-vladz>

On Sun, 2010-04-18 at 17:50 +0300, Vladislav Zolotarov wrote:
> LRO cannot be used on virtualized environment, so it is best
> to disable it on compile time in XEN kernel.
[...]

If you want to cripple your performance, I'm not going to stand in your
way, but be aware that most distribution kernels now enable CONFIG_XEN.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 2/13] bnx2x: Use VPD-R V0 entry to display firmware revision
From: Ben Hutchings @ 2010-04-18 15:08 UTC (permalink / raw)
  To: Vladislav Zolotarov; +Cc: davem, netdev, eilong, dmitry, Matt Carlson
In-Reply-To: <1271602093.27235.179.camel@lb-tlvb-vladz>

On Sun, 2010-04-18 at 17:48 +0300, Vladislav Zolotarov wrote:
> Use VPD-R V0 entry to display firmware revision
[...]

Matt Carlson already added VPD support functions and definitions to the
PCI core; you should use them.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH v2] TCP: avoid to send keepalive probes if it is receiving data
From: Flavio Leitner @ 2010-04-18 14:55 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Flavio Leitner
In-Reply-To: <1271581560.16881.4769.camel@edumazet-laptop>

RFC 1122 says the following:
...
  Keep-alive packets MUST only be sent when no data or
  acknowledgement packets have been received for the
  connection within an interval.
...

Fix this by storing the timestamp of last received data
packet and checking for it when the keepalive timer expires.

-v2 fix do_tcp_setsockopt() as pointed by Eric Dumazet <eric.dumazet@gmail.com>

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 include/linux/tcp.h  |    1 +
 net/ipv4/tcp.c       |    5 ++++-
 net/ipv4/tcp_input.c |    3 +++
 net/ipv4/tcp_timer.c |    8 ++++++++
 4 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a778ee0..405678f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -314,6 +314,7 @@ struct tcp_sock {
  	u32	snd_sml;	/* Last byte of the most recently transmitted small packet */
 	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
 	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
+	u32	lrcvtime;	/* timestamp of last received data packet (for keepalives) */
 
 	/* Data for direct copy to user */
 	struct {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0f8caf6..a4048d7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2298,7 +2298,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 			if (sock_flag(sk, SOCK_KEEPOPEN) &&
 			    !((1 << sk->sk_state) &
 			      (TCPF_CLOSE | TCPF_LISTEN))) {
-				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
+				u32 elapsed = min_t(u32,
+						      tcp_time_stamp - tp->rcv_tstamp,
+						      tcp_time_stamp - tp->lrcvtime);
+
 				if (tp->keepalive_time > elapsed)
 					elapsed = tp->keepalive_time - elapsed;
 				else
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f240f57..60d2980 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5391,6 +5391,8 @@ no_ack:
 				__kfree_skb(skb);
 			else
 				sk->sk_data_ready(sk, 0);
+
+			tp->lrcvtime = tcp_time_stamp;
 			return 0;
 		}
 	}
@@ -5421,6 +5423,7 @@ step5:
 
 	tcp_data_snd_check(sk);
 	tcp_ack_snd_check(sk);
+	tp->lrcvtime = tcp_time_stamp;
 	return 0;
 
 csum_error:
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 8a0ab29..74dd804 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -554,6 +554,14 @@ static void tcp_keepalive_timer (unsigned long data)
 	if (tp->packets_out || tcp_send_head(sk))
 		goto resched;
 
+	elapsed = tcp_time_stamp - tp->lrcvtime;
+	
+	/* receiving data means alive */
+	if (elapsed < keepalive_time_when(tp)) {
+		elapsed = keepalive_time_when(tp) - elapsed;
+		goto resched;
+	}
+
 	elapsed = tcp_time_stamp - tp->rcv_tstamp;
 
 	if (elapsed >= keepalive_time_when(tp)) {
-- 
1.5.5.6


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox