Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/5] sparse related patches
From: Stephen Hemminger @ 2011-02-23 19:06 UTC (permalink / raw)
  To: davem; +Cc: netdev



^ permalink raw reply

* [PATCH 5/5] em_meta: fix sparse warning
From: Stephen Hemminger @ 2011-02-23 19:06 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20110223190647.482444598@vyatta.com>

[-- Attachment #1: em-meta-sparse --]
[-- Type: text/plain, Size: 457 bytes --]

gfp_t needs to be cast to integer.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/sched/em_meta.c	2011-02-23 10:26:17.534042646 -0800
+++ b/net/sched/em_meta.c	2011-02-23 10:30:37.533063770 -0800
@@ -401,7 +401,7 @@ META_COLLECTOR(int_sk_sndbuf)
 META_COLLECTOR(int_sk_alloc)
 {
 	SKIP_NONLOCAL(skb);
-	dst->value = skb->sk->sk_allocation;
+	dst->value = (__force int) skb->sk->sk_allocation;
 }
 
 META_COLLECTOR(int_sk_route_caps)



^ permalink raw reply

* [PATCH 1/2] bonding: fix incorrect transmit queue offset
From: Andy Gospodarek @ 2011-02-23 19:42 UTC (permalink / raw)
  To: netdev, Phil Oester; +Cc: Ben Hutchings, Jay Vosburgh

Users noticed the following messages:

kernel: bond0 selects TX queue 16, but real number of TX queues is 16

were filling their logs when frames received from a device that
contained 16 input queues were being forwarded out of a bonded device.
Ben pointed out that the contents of skb->queue_mapping cannot be
directly mapped to a transmit queue, so I went with his suggestion for a
solution to the offset problem.

The function now also warns the user to increase the default value for
tx_queues if needed and returns a valid tx queue to dev_pick_tx.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Phil Oester <kernel@linuxace.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/bonding/bond_main.c |   24 +++++++++++++++++-------
 1 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77e3c6a..1512c83 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4533,15 +4533,25 @@ out:
 	return res;
 }
 
+/*
+ * This helper function exists to help dev_pick_tx get the correct
+ * destination queue.  Using a helper function skips the a call to
+ * skb_tx_hash and will put the skbs in the queue we expect on their
+ * way down to the bonding driver.
+ */
 static u16 bond_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
-	/*
-	 * This helper function exists to help dev_pick_tx get the correct
-	 * destination queue.  Using a helper function skips the a call to
-	 * skb_tx_hash and will put the skbs in the queue we expect on their
-	 * way down to the bonding driver.
-	 */
-	return skb->queue_mapping;
+	u16 txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : 0;
+
+	while (txq >= dev->real_num_tx_queues) {
+		/* let the user know if we do not have enough tx queues */
+		if (net_ratelimit())
+			pr_warning("%s selects invalid tx queue %d.  Consider"
+				   " setting module option tx_queues > %d.",
+				   dev->name, txq, dev->real_num_tx_queues);
+		txq -= dev->real_num_tx_queues;
+	}
+	return txq;
 }
 
 static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
-- 
1.7.4


^ permalink raw reply related

* [PATCH 2/2] bonding: cleanup transmit queue storage and configuration
From: Andy Gospodarek @ 2011-02-23 19:43 UTC (permalink / raw)
  To: netdev

Current bonding code presumes queues 1-16 were usable and queue 0 was
reserved, but parts of the code didn't account for this correctly.  This
update now allows 16 queues (numbered 0-15) to be used (in the nominal
case) and queue awareness can be completely and clearly disabled rather
than by simply setting the queue for a device to 0.

Configuration syntax has changed a little, but is similar to the
existing method.  To associate a queue with an output device, you can do
this:

# echo "+eth1:2" > /sys/class/net/bond0/bonding/queue_id

and to clear it this:

# echo "-eth1:2" > /sys/class/net/bond0/bonding/queue_id

Contents of /proc/net/bonding/bond0 are also a bit different as the
output queue will only be printed if set.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

---
 Documentation/networking/bonding.txt |   20 ++++++----
 drivers/net/bonding/bond_main.c      |   14 ++++---
 drivers/net/bonding/bond_sysfs.c     |   71 +++++++++++++++++++++------------
 drivers/net/bonding/bonding.h        |   20 +++++++++
 4 files changed, 85 insertions(+), 40 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 25d2f41..8a27f0f 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1404,10 +1404,11 @@ By default the bonding driver is multiqueue aware and 16 queues are created
 when the driver initializes (see Documentation/networking/multiqueue.txt
 for details).  If more or less queues are desired the module parameter
 tx_queues can be used to change this value.  There is no sysfs parameter
-available as the allocation is done at module init time.
+available as the allocation is done at module init time.  Currently only one
+queue per device can be set.
 
 The output of the file /proc/net/bonding/bondX has changed so the output Queue
-ID is now printed for each slave:
+ID is now printed for each slave that has a queue configured:
 
 Bonding Mode: fault-tolerance (active-backup)
 Primary Slave: None
@@ -1421,7 +1422,6 @@ Slave Interface: eth0
 MII Status: up
 Link Failure Count: 0
 Permanent HW addr: 00:1a:a0:12:8f:cb
-Slave queue ID: 0
 
 Slave Interface: eth1
 MII Status: up
@@ -1431,12 +1431,16 @@ Slave queue ID: 2
 
 The queue_id for a slave can be set using the command:
 
-# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
+# echo "+eth1:2" > /sys/class/net/bond0/bonding/queue_id
 
-Any interface that needs a queue_id set should set it with multiple calls
-like the one above until proper priorities are set for all interfaces.  On
-distributions that allow configuration via initscripts, multiple 'queue_id'
-arguments can be added to BONDING_OPTS to set all needed slave queues.
+and removed using the command:
+
+# echo "-eth1:2" > /sys/class/net/bond0/bonding/queue_id
+
+Any interface that needs a queue_id set should set it with multiple calls until
+proper priorities are set or cleared for all interfaces.  On distributions that
+allow configuration via initscripts, multiple 'queue_id' arguments can be added
+to BONDING_OPTS to set all needed slave queues.
 
 These queue id's can be used in conjunction with the tc utility to configure
 a multiqueue qdisc and filters to bias certain traffic to transmit on certain
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1512c83..3e8fbfe 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1560,8 +1560,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	}
 
 	/*
-	 * Set the new_slave's queue_id to be zero.  Queue ID mapping
-	 * is set via sysfs or module option if desired.
+	 * Set the new_slave's queue_id to be zero to disable.  Queue ID
+	 * mapping is set via sysfs or module option if desired.
 	 */
 	new_slave->queue_id = 0;
 
@@ -3355,7 +3355,8 @@ static void bond_info_show_slave(struct seq_file *seq,
 		else
 			seq_puts(seq, "Aggregator ID: N/A\n");
 	}
-	seq_printf(seq, "Slave queue ID: %d\n", slave->queue_id);
+	if (slave_tx_queue_recorded(slave))
+		seq_printf(seq, "Slave queue ID: %d\n", slave_get_tx_queue(slave));
 }
 
 static int bond_info_seq_show(struct seq_file *seq, void *v)
@@ -4516,15 +4517,16 @@ static inline int bond_slave_override(struct bonding *bond,
 
 	/* Find out if any slaves have the same mapping as this skb. */
 	bond_for_each_slave(bond, check_slave, i) {
-		if (check_slave->queue_id == skb->queue_mapping) {
+		if (slave_tx_queue_recorded(check_slave) &&
+		    slave_get_tx_queue(check_slave) == skb->queue_mapping) {
 			slave = check_slave;
 			break;
 		}
 	}
 
 	/* If the slave isn't UP, use default transmit policy. */
-	if (slave && slave->queue_id && IS_UP(slave->dev) &&
-	    (slave->link == BOND_LINK_UP)) {
+	if (slave && slave_tx_queue_recorded(slave) &&
+	    IS_UP(slave->dev) && (slave->link == BOND_LINK_UP)) {
 		res = bond_dev_queue_xmit(bond, skb, slave->dev);
 	}
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 72bb0f6..d09701f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1448,15 +1448,18 @@ static ssize_t bonding_show_queue_id(struct device *d,
 
 	read_lock(&bond->lock);
 	bond_for_each_slave(bond, slave, i) {
-		if (res > (PAGE_SIZE - IFNAMSIZ - 6)) {
-			/* not enough space for another interface_name:queue_id pair */
-			if ((PAGE_SIZE - res) > 10)
-				res = PAGE_SIZE - 10;
-			res += sprintf(buf + res, "++more++ ");
-			break;
+		if (slave_tx_queue_recorded(slave)) {
+			if (res > (PAGE_SIZE - IFNAMSIZ - 6)) {
+				/* Not enough space for another
+				 * interface_name:queue_id pair. */
+				if ((PAGE_SIZE - res) > 10)
+					res = PAGE_SIZE - 10;
+				res += sprintf(buf + res, "++more++ ");
+				break;
+			}
+			res += sprintf(buf + res, "%s:%d ", slave->dev->name,
+				       slave_get_tx_queue(slave));
 		}
-		res += sprintf(buf + res, "%s:%d ",
-			       slave->dev->name, slave->queue_id);
 	}
 	read_unlock(&bond->lock);
 	if (res)
@@ -1479,55 +1482,71 @@ static ssize_t bonding_store_queue_id(struct device *d,
 	int i, ret = count;
 	char *delim;
 	struct net_device *sdev = NULL;
+	char command[IFNAMSIZ + 7] = {0, };
+	char *ifname;
 
 	if (!rtnl_trylock())
 		return restart_syscall();
 
+	sscanf(buffer, "%22s", command); /* IFNAMSIZ*/
+
 	/* delim will point to queue id if successful */
-	delim = strchr(buffer, ':');
+	delim = strchr(command + 1, ':');
 	if (!delim)
 		goto err_no_cmd;
-
 	/*
 	 * Terminate string that points to device name and bump it
 	 * up one, so we can read the queue id there.
 	 */
 	*delim = '\0';
-	if (sscanf(++delim, "%hd\n", &qid) != 1)
+	if (sscanf(++delim, "%hd\n", &qid) != 1) {
 		goto err_no_cmd;
+	}
 
-	/* Check buffer length, valid ifname and queue id */
-	if (strlen(buffer) > IFNAMSIZ ||
-	    !dev_valid_name(buffer) ||
-	    qid > bond->params.tx_queues)
+	/* ifname will now be command + 1 */
+	ifname = command + 1;
+	if ((strlen(command) <= 1) ||
+	    !dev_valid_name(ifname) ||
+	    qid >= bond->params.tx_queues)
 		goto err_no_cmd;
 
 	/* Get the pointer to that interface if it exists */
-	sdev = __dev_get_by_name(dev_net(bond->dev), buffer);
+	sdev = __dev_get_by_name(dev_net(bond->dev), ifname);
 	if (!sdev)
 		goto err_no_cmd;
 
 	read_lock(&bond->lock);
 
-	/* Search for thes slave and check for duplicate qids */
+	/* Search for the slave needing the change */
 	update_slave = NULL;
 	bond_for_each_slave(bond, slave, i) {
 		if (sdev == slave->dev)
-			/*
-			 * We don't need to check the matching
-			 * slave for dups, since we're overwriting it
-			 */
 			update_slave = slave;
-		else if (qid && qid == slave->queue_id) {
+		else if (slave_tx_queue_recorded(slave) &&
+			 qid == slave_get_tx_queue(slave)) {
 			goto err_no_cmd_unlock;
 		}
 	}
-
 	if (!update_slave)
 		goto err_no_cmd_unlock;
 
-	/* Actually set the qids for the slave */
-	update_slave->queue_id = qid;
+	/* at this point we have a valid slave */
+	switch (command[0]) {
+	case '+':
+		if (slave_tx_queue_recorded(update_slave))
+			goto err_no_cmd_unlock;
+		slave_record_tx_queue(update_slave, qid);
+		break;
+	case '-':
+		if (!slave_tx_queue_recorded(update_slave) ||
+		    slave_get_tx_queue(update_slave) != qid)
+			goto err_no_cmd_unlock;
+		slave_clear_tx_queue(update_slave);
+		break;
+
+	default:
+		goto err_no_cmd;
+	}
 
 	read_unlock(&bond->lock);
 out:
@@ -1537,7 +1556,7 @@ out:
 err_no_cmd_unlock:
 	read_unlock(&bond->lock);
 err_no_cmd:
-	pr_info("invalid input for queue_id set for %s.\n",
+	pr_info("cannot modify queue_id for %s.\n",
 		bond->dev->name);
 	ret = -EPERM;
 	goto out;
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 31fe980..c9059f4 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -422,4 +422,24 @@ static inline void bond_unregister_ipv6_notifier(void)
 }
 #endif
 
+static inline void slave_record_tx_queue(struct slave *slave, u16 tx_queue)
+{
+	slave->queue_id = tx_queue + 1;
+}
+
+static inline u16 slave_get_tx_queue(const struct slave *slave)
+{
+	return slave->queue_id - 1;
+}
+
+static inline bool slave_tx_queue_recorded(const struct slave *slave)
+{
+	return slave->queue_id != 0;
+}
+
+static inline void slave_clear_tx_queue(struct slave *slave)
+{
+	slave->queue_id = 0;
+}
+
 #endif /* _LINUX_BONDING_H */
-- 
1.7.4


^ permalink raw reply related

* Re: [PATCH V11 2/4] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Scott Wood @ 2011-02-23 19:24 UTC (permalink / raw)
  To: Grant Likely
  Cc: Richard Cochran, Thomas Gleixner, Rodolfo Giometti, Arnd Bergmann,
	Peter Zijlstra, linux-api-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Russell King, Paul Mackerras,
	John Stultz, Alan Cox, netdev-u79uwXL29TY76Z2rM5mHXA,
	Mike Frysinger, Christoph Lameter,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, David Miller,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Krzysztof Halasa
In-Reply-To: <20110223175459.GH14597-MrY2KI0G/OVr83L8+7iqerDks+cytr/Z@public.gmane.org>

On Wed, 23 Feb 2011 10:54:59 -0700
Grant Likely <grant.likely-s3s/WqlpOiPyB63q8FvJNQ@public.gmane.org> wrote:

> On Wed, Feb 23, 2011 at 11:26:12AM -0600, Scott Wood wrote:
> > eTSEC is versioned, that's more reliable than the chip name since chips
> > have revisions (rev 2.1 of mpc8313 has eTSEC 1.6, not sure about previous
> > revs of mpc8313).  Logic blocks can be and have been uprevved between one
> > revision of a chip to the next.  I think "fsl,mpc8313rev2.1-etsec-ptp"
> > would be taking things a bit too far (and there could be board-level bugs
> > too...).
> > 
> > If you really need to know the exact SoC you're on, look in SVR (which
> > will provide revision info as well).  Isn't the device tree for things that
> > can't be probed?
> 
> This is far more about the binding than it is about the chip revision.
> When documenting a binding it makes far more sense to anchor it to a
> specific implementation than to try and come up with a 'generic'
> catchall.

Whatever string is used should be written into a binding document.

fsl,etsec-v1.6-ptp seems like it would be just as good for that purpose.

Even just fsl,etsec-ptp will identify the binding, though it's lacking in
identifying the hardware (in the absence of access to the eTSEC ID
registers).

If somehow Freescale makes something completely different in the future
called "etsec-ptp", then we'll just have to pick a different name for
*that* compatible (after we smack whoever was responsible for reusing the
name).  The point of the vendor namespacing is to constrain this problem to
a manageable scope.

> > The eTSEC revision is probeable as well, but due the way PTP is described as
> > a separate node, the driver doesn't have straightforward access to those
> > registers.
> 
> Ignorant question: Should the ptp be described as a separate node?

Probably not.

> > Insisting on an explicit chip also encourages people to claim compatibility
> > with that chip without ensuring that it really is fully compatible.
> 
> In practise, I've not seen this to be an issue.

I see it often enough in our BSPs (though the BSP device trees tend to be
problematic in a variety of ways), especially on things like guts and
pmc.

It's a question of how strong a statement people are asked to make -- in a
place where fixing errors is somewhat painful, and we don't really need that
strong statement of compatibility to be made, as long as there's another
way to fully identify the hardware (e.g. SVR, top-level board compatible) if
some strange workaround needs to be made[1].

To turn things around, in practice, I've not seen compatibles that don't
encode a specific chip name to be a problem, as long as it's well
documented what it means.

-Scott

[1] IIRC, this was the original reason for citing a specific chip, but it
doesn't hold up if that chip gets cited by other chips as compatible.  If
compatibliity includes having all the same bugs, then very little will be
able to claim compatibility, and we'll be back to long lists of device
IDs in the driver.  Not to mention errata that are discovered after a
device tree claiming compatibility is released...

^ permalink raw reply

* [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-23 19:05 UTC (permalink / raw)
  To: David Miller
  Cc: kaber, eric.dumazet, netdev, shemminger, fubar, nicolas.2p.debian,
	andy
In-Reply-To: <20110218205832.GE2602@psychotron.redhat.com>

This patch converts bonding to use rx_handler. Results in cleaner
__netif_receive_skb() with much less exceptions needed. Also
bond-specific work is moved into bond code.

Did performance test using pktgen and counting incoming packets by
iptables. No regression noted.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

v1->v2:
        using skb_iif instead of new input_dev to remember original
	device

v2->v3:
	do another loop in case skb->dev is changed. That way orig_dev
	core can be left untouched.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c |   74 ++++++++++++++++++++++++-
 net/core/dev.c                  |  119 ++++++++++-----------------------------
 2 files changed, 104 insertions(+), 89 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77e3c6a..3772b61 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1423,6 +1423,67 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
 	bond->setup_by_slave = 1;
 }
 
+/* On bonding slaves other than the currently active slave, suppress
+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
+ * ARP on active-backup slaves with arp_validate enabled.
+ */
+static bool bond_should_deliver_exact_match(struct sk_buff *skb,
+					    struct net_device *slave_dev,
+					    struct net_device *bond_dev)
+{
+	if (slave_dev->priv_flags & IFF_SLAVE_INACTIVE) {
+		if (slave_dev->priv_flags & IFF_SLAVE_NEEDARP &&
+		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
+			return false;
+
+		if (bond_dev->priv_flags & IFF_MASTER_ALB &&
+		    skb->pkt_type != PACKET_BROADCAST &&
+		    skb->pkt_type != PACKET_MULTICAST)
+				return false;
+
+		if (bond_dev->priv_flags & IFF_MASTER_8023AD &&
+		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
+			return false;
+
+		return true;
+	}
+	return false;
+}
+
+static struct sk_buff *bond_handle_frame(struct sk_buff *skb)
+{
+	struct net_device *slave_dev;
+	struct net_device *bond_dev;
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (unlikely(!skb))
+		return NULL;
+	slave_dev = skb->dev;
+	bond_dev = ACCESS_ONCE(slave_dev->master);
+	if (unlikely(!bond_dev))
+		return skb;
+
+	if (bond_dev->priv_flags & IFF_MASTER_ARPMON)
+		slave_dev->last_rx = jiffies;
+
+	if (bond_should_deliver_exact_match(skb, slave_dev, bond_dev)) {
+		skb->deliver_no_wcard = 1;
+		return skb;
+	}
+
+	skb->dev = bond_dev;
+
+	if (bond_dev->priv_flags & IFF_MASTER_ALB &&
+	    bond_dev->priv_flags & IFF_BRIDGE_PORT &&
+	    skb->pkt_type == PACKET_HOST) {
+		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
+
+		memcpy(dest, bond_dev->dev_addr, ETH_ALEN);
+	}
+
+	return skb;
+}
+
 /* enslave device <slave> to bond device <master> */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1599,11 +1660,17 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		pr_debug("Error %d calling netdev_set_bond_master\n", res);
 		goto err_restore_mac;
 	}
+	res = netdev_rx_handler_register(slave_dev, bond_handle_frame, NULL);
+	if (res) {
+		pr_debug("Error %d calling netdev_rx_handler_register\n", res);
+		goto err_unset_master;
+	}
+
 	/* open the slave since the application closed it */
 	res = dev_open(slave_dev);
 	if (res) {
 		pr_debug("Opening slave %s failed\n", slave_dev->name);
-		goto err_unset_master;
+		goto err_unreg_rxhandler;
 	}
 
 	new_slave->dev = slave_dev;
@@ -1811,6 +1878,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 err_close:
 	dev_close(slave_dev);
 
+err_unreg_rxhandler:
+	netdev_rx_handler_unregister(slave_dev);
+
 err_unset_master:
 	netdev_set_bond_master(slave_dev, NULL);
 
@@ -1992,6 +2062,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 		netif_addr_unlock_bh(bond_dev);
 	}
 
+	netdev_rx_handler_unregister(slave_dev);
 	netdev_set_bond_master(slave_dev, NULL);
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -2114,6 +2185,7 @@ static int bond_release_all(struct net_device *bond_dev)
 			netif_addr_unlock_bh(bond_dev);
 		}
 
+		netdev_rx_handler_unregister(slave_dev);
 		netdev_set_bond_master(slave_dev, NULL);
 
 		/* close slave before restoring its mac address */
diff --git a/net/core/dev.c b/net/core/dev.c
index 578415c..e06a834 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3096,63 +3096,31 @@ void netdev_rx_handler_unregister(struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
 
-static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
-					      struct net_device *master)
+static void vlan_on_bond_hook(struct sk_buff *skb)
 {
-	if (skb->pkt_type == PACKET_HOST) {
-		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
+	/*
+	 * Make sure ARP frames received on VLAN interfaces stacked on
+	 * bonding interfaces still make their way to any base bonding
+	 * device that may have registered for a specific ptype.
+	 */
+	if (skb->dev->priv_flags & IFF_802_1Q_VLAN &&
+	    vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING &&
+	    skb->protocol == htons(ETH_P_ARP)) {
+		struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
 
-		memcpy(dest, master->dev_addr, ETH_ALEN);
+		if (!skb2)
+			return;
+		skb2->dev = vlan_dev_real_dev(skb->dev);
+		netif_rx(skb2);
 	}
 }
 
-/* On bonding slaves other than the currently active slave, suppress
- * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
- * ARP on active-backup slaves with arp_validate enabled.
- */
-static int __skb_bond_should_drop(struct sk_buff *skb,
-				  struct net_device *master)
-{
-	struct net_device *dev = skb->dev;
-
-	if (master->priv_flags & IFF_MASTER_ARPMON)
-		dev->last_rx = jiffies;
-
-	if ((master->priv_flags & IFF_MASTER_ALB) &&
-	    (master->priv_flags & IFF_BRIDGE_PORT)) {
-		/* Do address unmangle. The local destination address
-		 * will be always the one master has. Provides the right
-		 * functionality in a bridge.
-		 */
-		skb_bond_set_mac_by_master(skb, master);
-	}
-
-	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
-		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
-		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
-			return 0;
-
-		if (master->priv_flags & IFF_MASTER_ALB) {
-			if (skb->pkt_type != PACKET_BROADCAST &&
-			    skb->pkt_type != PACKET_MULTICAST)
-				return 0;
-		}
-		if (master->priv_flags & IFF_MASTER_8023AD &&
-		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
-			return 0;
-
-		return 1;
-	}
-	return 0;
-}
-
 static int __netif_receive_skb(struct sk_buff *skb)
 {
 	struct packet_type *ptype, *pt_prev;
 	rx_handler_func_t *rx_handler;
 	struct net_device *orig_dev;
-	struct net_device *null_or_orig;
-	struct net_device *orig_or_bond;
+	struct net_device *null_or_dev;
 	int ret = NET_RX_DROP;
 	__be16 type;
 
@@ -3167,32 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
 
 	if (!skb->skb_iif)
 		skb->skb_iif = skb->dev->ifindex;
-
-	/*
-	 * bonding note: skbs received on inactive slaves should only
-	 * be delivered to pkt handlers that are exact matches.  Also
-	 * the deliver_no_wcard flag will be set.  If packet handlers
-	 * are sensitive to duplicate packets these skbs will need to
-	 * be dropped at the handler.
-	 */
-	null_or_orig = NULL;
 	orig_dev = skb->dev;
-	if (skb->deliver_no_wcard)
-		null_or_orig = orig_dev;
-	else if (netif_is_bond_slave(orig_dev)) {
-		struct net_device *bond_master = ACCESS_ONCE(orig_dev->master);
-
-		if (likely(bond_master)) {
-			if (__skb_bond_should_drop(skb, bond_master)) {
-				skb->deliver_no_wcard = 1;
-				/* deliver only exact match */
-				null_or_orig = orig_dev;
-			} else
-				skb->dev = bond_master;
-		}
-	}
 
-	__this_cpu_inc(softnet_data.processed);
 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
 	skb->mac_len = skb->network_header - skb->mac_header;
@@ -3201,6 +3145,10 @@ static int __netif_receive_skb(struct sk_buff *skb)
 
 	rcu_read_lock();
 
+another_round:
+
+	__this_cpu_inc(softnet_data.processed);
+
 #ifdef CONFIG_NET_CLS_ACT
 	if (skb->tc_verd & TC_NCLS) {
 		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
@@ -3209,8 +3157,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 #endif
 
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
-		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
-		    ptype->dev == orig_dev) {
+		if (!ptype->dev || ptype->dev == skb->dev) {
 			if (pt_prev)
 				ret = deliver_skb(skb, pt_prev, orig_dev);
 			pt_prev = ptype;
@@ -3224,16 +3171,20 @@ static int __netif_receive_skb(struct sk_buff *skb)
 ncls:
 #endif
 
-	/* Handle special case of bridge or macvlan */
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
 	if (rx_handler) {
+		struct net_device *prev_dev;
+
 		if (pt_prev) {
 			ret = deliver_skb(skb, pt_prev, orig_dev);
 			pt_prev = NULL;
 		}
+		prev_dev = skb->dev;
 		skb = rx_handler(skb);
 		if (!skb)
 			goto out;
+		if (skb->dev != prev_dev)
+			goto another_round;
 	}
 
 	if (vlan_tx_tag_present(skb)) {
@@ -3248,24 +3199,16 @@ ncls:
 			goto out;
 	}
 
-	/*
-	 * Make sure frames received on VLAN interfaces stacked on
-	 * bonding interfaces still make their way to any base bonding
-	 * device that may have registered for a specific ptype.  The
-	 * handler may have to adjust skb->dev and orig_dev.
-	 */
-	orig_or_bond = orig_dev;
-	if ((skb->dev->priv_flags & IFF_802_1Q_VLAN) &&
-	    (vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING)) {
-		orig_or_bond = vlan_dev_real_dev(skb->dev);
-	}
+	vlan_on_bond_hook(skb);
+
+	/* deliver only exact match when indicated */
+	null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
 
 	type = skb->protocol;
 	list_for_each_entry_rcu(ptype,
 			&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
-		if (ptype->type == type && (ptype->dev == null_or_orig ||
-		     ptype->dev == skb->dev || ptype->dev == orig_dev ||
-		     ptype->dev == orig_or_bond)) {
+		if (ptype->type == type &&
+		    (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
 			if (pt_prev)
 				ret = deliver_skb(skb, pt_prev, orig_dev);
 			pt_prev = ptype;
-- 
1.7.3.4


^ permalink raw reply related

* Re: [ISC-Bugs #22806] [PATCH] lfp: fix AF_INET checksum with csum offloading
From: Michael S. Tsirkin @ 2011-02-23 18:46 UTC (permalink / raw)
  To: dhcp-bugs, dhcp-hackers; +Cc: kvm, herbert, davem, netdev
In-Reply-To: <20101227171632.GA14920@redhat.com>

No answer from isc so far.
I think it's a problem that the popular dhclient
has had this bug for so long.
Anyone knows what else can be done?


On Mon, Dec 27, 2010 at 07:16:32PM +0200, Michael S. Tsirkin wrote:
> When an AF_INET socket is used, linux would sometimes
> return a packet without the checksum.  This happens when a packet
> originates on the same machine, which is most common with virtual
> machines but might be possible even without with a software-based dhcp
> server such as dnsmasq.
> This appears to be a performance optimization to avoid calculating
> the checksum and there's no way to disable this. Users are required to
> detect such packets by passing a msg_control parameter to the recvmsg
> call. This was added in linux kernel 2.6.21.
> 
> dhclient from isc.org as of 4.2.0-P2 fails to do this and
> concequently discards such packets, so such setups fail to get an IP.
> This was reported in the past:
> https://lists.isc.org/mailman/htdig/dhcp-hackers/2010-April/001825.html
> 
> The attached patch fixes this by passing msg_control and looking
> at the tp_status field in the result. If the TP_STATUS_CSUMNOTREADY
> bit is set, the packet checksum is missing because the packet
> is local.
> 
> The patch below is currently successfully used at least on Fedora and
> some other Red Hat based distributions. Applying this to the ISC sources
> would help make the problem go away for everyone.
> ---
> 
> Notes:
> - The patch is to be applied with -p1.
> - The patch below applies to dhcp 4.2.0-P2 and 4.1.2 (the last with an offset).
> - Please Cc me on comments directly, I am not subscribed to the dhcp-hackers list.
> 
>  common/bpf.c     |    2 +-
>  common/dlpi.c    |    2 +-
>  common/lpf.c     |   81 ++++++++++++++++++++++++++++++++++++++++++------------
>  common/nit.c     |    2 +-
>  common/packet.c  |    4 +-
>  common/upf.c     |    2 +-
>  includes/dhcpd.h |    2 +-
>  7 files changed, 70 insertions(+), 25 deletions(-)
> 
> diff --git a/common/bpf.c b/common/bpf.c
> index b0ef657..8bd5727 100644
> --- a/common/bpf.c
> +++ b/common/bpf.c
> @@ -485,7 +485,7 @@ ssize_t receive_packet (interface, buf, len, from, hfrom)
>  		offset = decode_udp_ip_header (interface,
>  					       interface -> rbuf,
>  					       interface -> rbuf_offset,
> -  					       from, hdr.bh_caplen, &paylen);
> +  					       from, hdr.bh_caplen, &paylen, 0);
>  
>  		/* If the IP or UDP checksum was bad, skip the packet... */
>  		if (offset < 0) {
> diff --git a/common/dlpi.c b/common/dlpi.c
> index eb64342..d4a8bb9 100644
> --- a/common/dlpi.c
> +++ b/common/dlpi.c
> @@ -694,7 +694,7 @@ ssize_t receive_packet (interface, buf, len, from, hfrom)
>  	length -= offset;
>  #endif
>  	offset = decode_udp_ip_header (interface, dbuf, bufix,
> -				       from, length, &paylen);
> +				       from, length, &paylen, 0);
>  
>  	/*
>  	 * If the IP or UDP checksum was bad, skip the packet...
> diff --git a/common/lpf.c b/common/lpf.c
> index f727b7c..4bdb0f1 100644
> --- a/common/lpf.c
> +++ b/common/lpf.c
> @@ -29,18 +29,33 @@
>  #include "dhcpd.h"
>  #if defined (USE_LPF_SEND) || defined (USE_LPF_RECEIVE)
>  #include <sys/ioctl.h>
> +#include <sys/socket.h>
>  #include <sys/uio.h>
>  #include <errno.h>
>  
>  #include <asm/types.h>
>  #include <linux/filter.h>
>  #include <linux/if_ether.h>
> +#include <linux/if_packet.h>
>  #include <netinet/in_systm.h>
>  #include "includes/netinet/ip.h"
>  #include "includes/netinet/udp.h"
>  #include "includes/netinet/if_ether.h"
>  #include <net/if.h>
>  
> +#ifndef PACKET_AUXDATA
> +#define PACKET_AUXDATA 8
> +
> +struct tpacket_auxdata
> +{
> +	__u32		tp_status;
> +	__u32		tp_len;
> +	__u32		tp_snaplen;
> +	__u16		tp_mac;
> +	__u16		tp_net;
> +};
> +#endif
> +
>  /* Reinitializes the specified interface after an address change.   This
>     is not required for packet-filter APIs. */
>  
> @@ -66,10 +81,14 @@ int if_register_lpf (info)
>  	struct interface_info *info;
>  {
>  	int sock;
> -	struct sockaddr sa;
> +	union {
> +		struct sockaddr_ll ll;
> +		struct sockaddr common;
> +	} sa;
> +	struct ifreq ifr;
>  
>  	/* Make an LPF socket. */
> -	if ((sock = socket(PF_PACKET, SOCK_PACKET,
> +	if ((sock = socket(PF_PACKET, SOCK_RAW,
>  			   htons((short)ETH_P_ALL))) < 0) {
>  		if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
>  		    errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
> @@ -84,11 +103,16 @@ int if_register_lpf (info)
>  		log_fatal ("Open a socket for LPF: %m");
>  	}
>  
> +	memset (&ifr, 0, sizeof ifr);
> +	strncpy (ifr.ifr_name, (const char *)info -> ifp, sizeof ifr.ifr_name);
> +	if (ioctl (sock, SIOCGIFINDEX, &ifr))
> +		log_fatal ("Failed to get interface index: %m");
> +
>  	/* Bind to the interface name */
>  	memset (&sa, 0, sizeof sa);
> -	sa.sa_family = AF_PACKET;
> -	strncpy (sa.sa_data, (const char *)info -> ifp, sizeof sa.sa_data);
> -	if (bind (sock, &sa, sizeof sa)) {
> +	sa.ll.sll_family = AF_PACKET;
> +	sa.ll.sll_ifindex = ifr.ifr_ifindex;
> +	if (bind (sock, &sa.common, sizeof sa)) {
>  		if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
>  		    errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
>  		    errno == EAFNOSUPPORT || errno == EINVAL) {
> @@ -170,9 +194,18 @@ static void lpf_gen_filter_setup (struct interface_info *);
>  void if_register_receive (info)
>  	struct interface_info *info;
>  {
> +	int val;
> +
>  	/* Open a LPF device and hang it on this interface... */
>  	info -> rfdesc = if_register_lpf (info);
>  
> +	val = 1;
> +	if (setsockopt (info -> rfdesc, SOL_PACKET, PACKET_AUXDATA, &val,
> +			sizeof val) < 0) {
> +		if (errno != ENOPROTOOPT)
> +			log_fatal ("Failed to set auxiliary packet data: %m");
> +	}
> +
>  #if defined (HAVE_TR_SUPPORT)
>  	if (info -> hw_address.hbuf [0] == HTYPE_IEEE802)
>  		lpf_tr_filter_setup (info);
> @@ -294,7 +327,6 @@ ssize_t send_packet (interface, packet, raw, len, from, to, hto)
>  	double hh [16];
>  	double ih [1536 / sizeof (double)];
>  	unsigned char *buf = (unsigned char *)ih;
> -	struct sockaddr sa;
>  	int result;
>  	int fudge;
>  
> @@ -315,15 +347,7 @@ ssize_t send_packet (interface, packet, raw, len, from, to, hto)
>  				(unsigned char *)raw, len);
>  	memcpy (buf + ibufp, raw, len);
>  
> -	/* For some reason, SOCK_PACKET sockets can't be connected,
> -	   so we have to do a sentdo every time. */
> -	memset (&sa, 0, sizeof sa);
> -	sa.sa_family = AF_PACKET;
> -	strncpy (sa.sa_data,
> -		 (const char *)interface -> ifp, sizeof sa.sa_data);
> -
> -	result = sendto (interface -> wfdesc,
> -			 buf + fudge, ibufp + len - fudge, 0, &sa, sizeof sa);
> +	result = write (interface -> wfdesc, buf + fudge, ibufp + len - fudge);
>  	if (result < 0)
>  		log_error ("send_packet: %m");
>  	return result;
> @@ -340,14 +364,35 @@ ssize_t receive_packet (interface, buf, len, from, hfrom)
>  {
>  	int length = 0;
>  	int offset = 0;
> +	int nocsum = 0;
>  	unsigned char ibuf [1536];
>  	unsigned bufix = 0;
>  	unsigned paylen;
> -
> -	length = read (interface -> rfdesc, ibuf, sizeof ibuf);
> +	unsigned char cmsgbuf[CMSG_LEN(sizeof(struct tpacket_auxdata))];
> +	struct iovec iov = {
> +		.iov_base = ibuf,
> +		.iov_len = sizeof ibuf,
> +	};
> +	struct msghdr msg = {
> +		.msg_iov = &iov,
> +		.msg_iovlen = 1,
> +		.msg_control = cmsgbuf,
> +		.msg_controllen = sizeof(cmsgbuf),
> +	};
> +	struct cmsghdr *cmsg;
> +
> +	length = recvmsg (interface -> rfdesc, &msg, 0);
>  	if (length <= 0)
>  		return length;
>  
> +	for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
> +		if (cmsg->cmsg_level == SOL_PACKET &&
> +		    cmsg->cmsg_type == PACKET_AUXDATA) {
> +			struct tpacket_auxdata *aux = (void *)CMSG_DATA(cmsg);
> +			nocsum = aux->tp_status & TP_STATUS_CSUMNOTREADY;
> +		}
> +	}
> +
>  	bufix = 0;
>  	/* Decode the physical header... */
>  	offset = decode_hw_header (interface, ibuf, bufix, hfrom);
> @@ -364,7 +409,7 @@ ssize_t receive_packet (interface, buf, len, from, hfrom)
>  
>  	/* Decode the IP and UDP headers... */
>  	offset = decode_udp_ip_header (interface, ibuf, bufix, from,
> -				       (unsigned)length, &paylen);
> +				       (unsigned)length, &paylen, nocsum);
>  
>  	/* If the IP or UDP checksum was bad, skip the packet... */
>  	if (offset < 0)
> diff --git a/common/nit.c b/common/nit.c
> index 3822206..0da9c36 100644
> --- a/common/nit.c
> +++ b/common/nit.c
> @@ -369,7 +369,7 @@ ssize_t receive_packet (interface, buf, len, from, hfrom)
>  
>  	/* Decode the IP and UDP headers... */
>  	offset = decode_udp_ip_header (interface, ibuf, bufix,
> -				       from, length, &paylen);
> +				       from, length, &paylen, 0);
>  
>  	/* If the IP or UDP checksum was bad, skip the packet... */
>  	if (offset < 0)
> diff --git a/common/packet.c b/common/packet.c
> index 42bca69..fd2d975 100644
> --- a/common/packet.c
> +++ b/common/packet.c
> @@ -211,7 +211,7 @@ ssize_t
>  decode_udp_ip_header(struct interface_info *interface,
>  		     unsigned char *buf, unsigned bufix,
>  		     struct sockaddr_in *from, unsigned buflen,
> -		     unsigned *rbuflen)
> +		     unsigned *rbuflen, int nocsum)
>  {
>    unsigned char *data;
>    struct ip ip;
> @@ -322,7 +322,7 @@ decode_udp_ip_header(struct interface_info *interface,
>  					   8, IPPROTO_UDP + ulen))));
>  
>    udp_packets_seen++;
> -  if (usum && usum != sum) {
> +  if (!nocsum && usum && usum != sum) {
>  	  udp_packets_bad_checksum++;
>  	  if (udp_packets_seen > 4 &&
>  	      (udp_packets_seen / udp_packets_bad_checksum) < 2) {
> diff --git a/common/upf.c b/common/upf.c
> index feb82a2..fff3949 100644
> --- a/common/upf.c
> +++ b/common/upf.c
> @@ -320,7 +320,7 @@ ssize_t receive_packet (interface, buf, len, from, hfrom)
>  
>  	/* Decode the IP and UDP headers... */
>  	offset = decode_udp_ip_header (interface, ibuf, bufix,
> -				       from, length, &paylen);
> +				       from, length, &paylen, 0);
>  
>  	/* If the IP or UDP checksum was bad, skip the packet... */
>  	if (offset < 0)
> diff --git a/includes/dhcpd.h b/includes/dhcpd.h
> index cd7d962..0835d98 100644
> --- a/includes/dhcpd.h
> +++ b/includes/dhcpd.h
> @@ -2769,7 +2769,7 @@ ssize_t decode_hw_header PROTO ((struct interface_info *, unsigned char *,
>  				 unsigned, struct hardware *));
>  ssize_t decode_udp_ip_header PROTO ((struct interface_info *, unsigned char *,
>  				     unsigned, struct sockaddr_in *,
> -				     unsigned, unsigned *));
> +				     unsigned, unsigned *, int));
>  
>  /* ethernet.c */
>  void assemble_ethernet_header PROTO ((struct interface_info *, unsigned char *,
> -- 
> 1.7.3.2.91.g446ac

^ permalink raw reply

* Re: Mass udp flow reboot linux with RealTek RTL-8169 Gigabit
From: Hans Nieser @ 2011-02-23 18:31 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <1298463704.31256.29.camel@krikkit>

On Wed, 2011-02-23 at 13:21 +0100, Hans Nieser wrote:
> On Wed, 2011-02-23 at 10:55 +0100, Francois Romieu wrote:
> > Hans Nieser <hnsr@xs4all.nl> :
> > [...]
> > > With your patches applied to 2.6.38-rc6, I have gathered some of the
> > > info you requested from Seblu as well, I hope it's helpful:
> > > 
> > > 1: see attachment
> > 
> > Ok.
> > 
> > The chipset requires no trivial last minute regression fix (yet).
> > 
> > > 2: I'm not sure how to check the size of the packets, but I'm just
> > > fetching a (large) file over http/tcp, so I guess they are mostly of the
> > > size of my MTU which is 1500 looking at ifconfig output
> > 
> > Fine.
> > 
> > Your testcases are always based on a real download, whence including some
> > disk activity, as opposed to a pure network test, right ?
> 
> Yeah, I just had a little script that wgetted a file from a webserver in
> my LAN and saved it to separate (non-root) fs, then removed it - in a
> loop. When testing on the 2.6.35 and 2.6.35.9 kernels it did max out at
> about 107MiB/s, sometimes falling down a little presumably when disk was
> being touched.
> 
> > > For the other vmstat/ethtool/interrupts output, I started the following
> > > commands remotely via ssh a second or two before starting the download,
> > > and the machine locked up a few seconds later:
> > 
> > SysRq is enabled (/etc/sysctl.conf::kernel.sysrq = 1), the computer was
> > switched back on a no-X console before the test. Then the keyboard leds
> > ignore keypresses and the sysrq keys don't display anything in the
> > console, right ?
> 
> Yep I had X shutdown and switched to VT1, after lock up the LEDs can't
> be toggled anymore and sysrq key combo was nonresponsive (it works if I
> do it before it locks up)
> 
> > You may enable PCIEASPM_DEBUG, force 'pcie_aspm=off' and switch from
> > SLUB to SLAB but it's a bit cargo-cultish.
> 
> I'll give that a try this evening
> 
> > A bisection could help. Bisecting 2.6.35 .. 2.6.35.9 may be enough if
> > 2.6.35.9 works well.
> 
> Hmm did you mean bisecting 2.6.36 - 2.6.35.9 ? Since with 2.6.36 and
> above I can get the machine to hang within seconds and performance is
> really bad (10-20MiB/s with wget), while with 2.6.35.9 and 2.6.35
> performance was really good (reaching 107MiB/s most of the time) and
> lock up took 5-10 minutes instead of seconds (I guess I didn't mention
> this in my last e-mail but I managed to get both 2.6.35 and 2.6.35.9 to
> lock up eventually) - but I guess something changed between .35 and .36
> that made the issue easier to trigger.
> 
> I can also try even older kernels to see if there is one that doesn't
> lock up at all
> 

Ok, I just tried 2.6.34, and after over 5 hours of running my script,
the system is still up and running, with only 24 'link up' messages on
dmesg, and having transferred 2.1TiB of data (1428042421 rx_packets, 45
rx_missed). So I'm going to assume the problem isn't present with this
kernel and try a bisect between it and 2.6.35

^ permalink raw reply

* Re: [PATCH] e1000: fix sparse warning
From: Brandeburg, Jesse @ 2011-02-23 18:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Kirsher, Jeffrey T, Jesse Brandeburg,
	e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org
In-Reply-To: <20110223101203.0894c474@nehalam>



On Wed, 23 Feb 2011, Stephen Hemminger wrote:

> Sparse complains because the e1000 driver is calling ioread on a pointer
> not tagged as __iomem.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Seems fine, thanks Stephen.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

 

^ permalink raw reply

* Re: [PATCH] cls_u32: fix sparse warnings
From: Stephen Hemminger @ 2011-02-23 18:16 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev
In-Reply-To: <1298464542.2271.1.camel@mojatatu>

On Wed, 23 Feb 2011 07:35:42 -0500
jamal <hadi@cyberus.ca> wrote:

> On Sun, 2011-02-20 at 18:14 -0800, Stephen Hemminger wrote:
> > The variable _data is used in asm-generic to define sections
> > which causes sparse warnings, so just rename the variable.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
> 
> Hey, while you are sparsing away you may wanna fix 
> br_fdb_cleanup() as well ;-> /me runs
> 

With current sparse and net-next it comes up clean.

^ permalink raw reply

* [PATCH] e1000: fix sparse warning
From: Stephen Hemminger @ 2011-02-23 18:12 UTC (permalink / raw)
  To: Jeff Kirsher, Jesse Brandeburg; +Cc: e1000-devel, netdev

Sparse complains because the e1000 driver is calling ioread on a pointer
not tagged as __iomem.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/e1000/e1000_osdep.h	2011-02-23 10:00:31.496097384 -0800
+++ b/drivers/net/e1000/e1000_osdep.h	2011-02-23 10:00:47.740287665 -0800
@@ -42,7 +42,8 @@
 #define GBE_CONFIG_RAM_BASE \
 	((unsigned int)(CONFIG_RAM_BASE + GBE_CONFIG_OFFSET))
 
-#define GBE_CONFIG_BASE_VIRT    phys_to_virt(GBE_CONFIG_RAM_BASE)
+#define GBE_CONFIG_BASE_VIRT \
+	((void __iomem *)phys_to_virt(GBE_CONFIG_RAM_BASE))
 
 #define GBE_CONFIG_FLASH_WRITE(base, offset, count, data) \
 	(iowrite16_rep(base + offset, data, count))

^ permalink raw reply

* Re: [PATCH V11 2/4] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Grant Likely @ 2011-02-23 17:54 UTC (permalink / raw)
  To: Scott Wood
  Cc: Richard Cochran, Thomas Gleixner, Rodolfo Giometti, Arnd Bergmann,
	Peter Zijlstra, linux-api-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Russell King, Paul Mackerras,
	John Stultz, Alan Cox, netdev-u79uwXL29TY76Z2rM5mHXA,
	Mike Frysinger, Christoph Lameter,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, David Miller,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Krzysztof Halasa
In-Reply-To: <20110223112612.30071995@schlenkerla>

On Wed, Feb 23, 2011 at 11:26:12AM -0600, Scott Wood wrote:
> On Wed, 23 Feb 2011 09:50:58 -0700
> Grant Likely <grant.likely-s3s/WqlpOiPyB63q8FvJNQ@public.gmane.org> wrote:
> 
> > On Wed, Feb 23, 2011 at 11:38:17AM +0100, Richard Cochran wrote:
> > > +
> > > +* Gianfar PTP clock nodes
> > > +
> > > +General Properties:
> > > +
> > > +  - compatible   Should be "fsl,etsec-ptp"
> > 
> > Should specify an *exact* part; ie: "fsl,mpc8313-etsec-ptp" instead of
> > trying to define a generic catchall.  The reason is that the same
> > marketing name can end up getting applied to a wide range of parts.
> > 
> > Instead, choose one specific device to stand in as the 'common'
> > implementation and get all parts with the same core to claim
> > compatibility with it.  ie: a p2020 might have:
> > 
> > 	compatible = "fsl,mpc2020-etsec-ptp", "fsl,mpc8313-etsec-ptp";
> 
> eTSEC is versioned, that's more reliable than the chip name since chips
> have revisions (rev 2.1 of mpc8313 has eTSEC 1.6, not sure about previous
> revs of mpc8313).  Logic blocks can be and have been uprevved between one
> revision of a chip to the next.  I think "fsl,mpc8313rev2.1-etsec-ptp"
> would be taking things a bit too far (and there could be board-level bugs
> too...).
> 
> If you really need to know the exact SoC you're on, look in SVR (which
> will provide revision info as well).  Isn't the device tree for things that
> can't be probed?

This is far more about the binding than it is about the chip revision.
When documenting a binding it makes far more sense to anchor it to a
specific implementation than to try and come up with a 'generic'
catchall.  A new binding means new compatible value and dropping any
claims of being compatible with the old.

It is not about the logic block version, particularly when it is
detectable by the driver as you say.

> 
> The eTSEC revision is probeable as well, but due the way PTP is described as
> a separate node, the driver doesn't have straightforward access to those
> registers.

Ignorant question: Should the ptp be described as a separate node?

> 
> Insisting on an explicit chip also encourages people to claim compatibility
> with that chip without ensuring that it really is fully compatible.

In practise, I've not seen this to be an issue.

g.

^ permalink raw reply

* [PATCH] qla3xxx: add missing __iomem annotation
From: Stephen Hemminger @ 2011-02-23 17:54 UTC (permalink / raw)
  To: Ron Mercer, David Miller; +Cc: netdev

Add necessary annotations about pointer to io memory space
that is checked by sparse.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/qla3xxx.c	2011-02-23 09:42:44.363626866 -0800
+++ b/drivers/net/qla3xxx.c	2011-02-23 09:43:32.564194771 -0800
@@ -379,7 +379,7 @@ static void fm93c56a_select(struct ql3_a
 {
 	struct ql3xxx_port_registers __iomem *port_regs =
 			qdev->mem_map_registers;
-	u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
+	__iomem u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
 
 	qdev->eeprom_cmd_data = AUBURN_EEPROM_CS_1;
 	ql_write_nvram_reg(qdev, spir, ISP_NVRAM_MASK | qdev->eeprom_cmd_data);
@@ -398,7 +398,7 @@ static void fm93c56a_cmd(struct ql3_adap
 	u32 previousBit;
 	struct ql3xxx_port_registers __iomem *port_regs =
 			qdev->mem_map_registers;
-	u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
+	__iomem u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
 
 	/* Clock in a zero, then do the start bit */
 	ql_write_nvram_reg(qdev, spir,
@@ -467,7 +467,7 @@ static void fm93c56a_deselect(struct ql3
 {
 	struct ql3xxx_port_registers __iomem *port_regs =
 			qdev->mem_map_registers;
-	u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
+	__iomem u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
 
 	qdev->eeprom_cmd_data = AUBURN_EEPROM_CS_0;
 	ql_write_nvram_reg(qdev, spir, ISP_NVRAM_MASK | qdev->eeprom_cmd_data);
@@ -483,7 +483,7 @@ static void fm93c56a_datain(struct ql3_a
 	u32 dataBit;
 	struct ql3xxx_port_registers __iomem *port_regs =
 			qdev->mem_map_registers;
-	u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
+	__iomem u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
 
 	/* Read the data bits */
 	/* The first bit is a dummy.  Clock right over it. */
@@ -3011,7 +3011,7 @@ static int ql_adapter_initialize(struct
 	u32 value;
 	struct ql3xxx_port_registers __iomem *port_regs =
 		qdev->mem_map_registers;
-	u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
+	__iomem u32 *spir = &port_regs->CommonRegs.serialPortInterfaceReg;
 	struct ql3xxx_host_memory_registers __iomem *hmem_regs =
 		(void __iomem *)port_regs;
 	u32 delay = 10;

^ permalink raw reply

* [PATCH] r8169: incorrect args to oob notify
From: Stephen Hemminger @ 2011-02-23 17:52 UTC (permalink / raw)
  To: Francois Romieu, David Miller; +Cc: netdev


Sparse detected this bug. The function oob_notify was being
passed the private ptr, but expected to get the ioaddr.
Compile checked only; not tested on real hardware.

Patch against net-next-2.6 but should be applied to net-2.6.
Bug not present in 2.6.37 and earlier.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/net/r8169.c	2011-02-23 09:46:30.042289776 -0800
+++ b/drivers/net/r8169.c	2011-02-23 09:48:00.167355773 -0800
@@ -617,8 +617,9 @@ static void ocp_write(struct rtl8169_pri
 	}
 }
 
-static void rtl8168_oob_notify(void __iomem *ioaddr, u8 cmd)
+static void rtl8168_oob_notify(struct rtl8169_private *tp, u8 cmd)
 {
+	void __iomem *ioaddr = tp->mmio_addr;
 	int i;
 
 	RTL_W8(ERIDR, cmd);
@@ -630,7 +631,7 @@ static void rtl8168_oob_notify(void __io
 			break;
 	}
 
-	ocp_write(ioaddr, 0x1, 0x30, 0x00000001);
+	ocp_write(tp, 0x1, 0x30, 0x00000001);
 }
 
 #define OOB_CMD_RESET		0x00

^ permalink raw reply

* Stale entries in RT_TABLE_LOCAL
From: Alex Sidorenko @ 2011-02-23 17:43 UTC (permalink / raw)
  To: netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 6094 bytes --]

Hello,

I have found several scenarios when after deleting IP-address from an 
interface there is a stale entry left in RT_TABLE_LOCAL.

All these scenarios use the fact that it is possible to add the same address 
multiple times to the same interface using different masks.

Let us do the following using dummy0 interface:

ifconfig dummy0 192.168.140.31 netmask 255.255.252.0
ip addr add 192.168.142.109/23 dev dummy0
ip addr add 192.168.142.109/22 dev dummy0
ip addr del 192.168.142.109/22 dev dummy0
ip addr del 192.168.142.109/23 dev dummy0

We add 192.168.142.109/23 and 192.168.142.109/22, then delete them (order is 
important). After that, 192.168.142.109 is not in 'ip addr ls' but there are 
entries using this addr in RT_TABLE_LOCAL.

An attached script demonstrates the problem:

{asid 14:00:57} sudo sh iptest.sh
Tables before the test
13: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 5e:1a:fa:44:90:f6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.140.31/22 brd 192.168.143.255 scope global dummy0
    inet6 fe80::5c1a:faff:fe44:90f6/64 scope link 
       valid_lft forever preferred_lft forever
 
local 192.168.140.31 dev dummy0  proto kernel  scope host  src 192.168.140.31 
broadcast 192.168.140.0 dev dummy0  proto kernel  scope link  src 
192.168.140.31 
broadcast 192.168.143.255 dev dummy0  proto kernel  scope link  src
192.168.140.31 
----------------------
Tables after the test
13: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 5e:1a:fa:44:90:f6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.140.31/22 brd 192.168.143.255 scope global dummy0
    inet6 fe80::5c1a:faff:fe44:90f6/64 scope link 
       valid_lft forever preferred_lft forever
 
local 192.168.140.31 dev dummy0  proto kernel  scope host  src 192.168.140.31 
local 192.168.142.109 dev dummy0  proto kernel  scope host  src 192.168.140.31 
broadcast 192.168.143.255 dev dummy0  proto kernel  scope link  src
192.168.140.31 
broadcast 192.168.143.255 dev dummy0  proto kernel  scope link  src
192.168.142.109 


As you see, even though there is no 192.168.142.109 on dummy0 address list, 
the entries referring to this addr are still present in RT_TABLE_LOCAL.

Another scenario (adding/deleting two addresses, each one twice with different 
mask) can lead to stale entries cross-referencing each other, like

local 192.168.5.8  proto kernel  scope host  src 192.168.5.9 
local 192.168.5.9  proto kernel  scope host  src 192.168.5.8 

Analysis
--------

Both scenarios use the fact that we can add the same address multiple times to 
the same interface, using different masks.

1. When we delete an IP addr, we remove it from the interface addr list and 
send a notifier to routing code (fib_del_ifaddr) asking to delete the 
associated routes.

2. When we enter fib_del_ifaddr(struct in_ifaddr *ifa), the address is already
deleted. But if we add the same IP twice (with different masks), the same
address (even though with different prefix) is present two times. So after the
first deletion we still have its 2nd instance on the list.

3. We do the following in fib_del_ifaddr():

         for (ifa1 = in_dev->ifa_list; ifa1; ifa1 = ifa1->ifa_next) {
                 if (ifa->ifa_local == ifa1->ifa_local)
                         ok |= LOCAL_OK;
                 if (ifa->ifa_broadcast == ifa1->ifa_broadcast)
                         ok |= BRD_OK;
                 if (brd == ifa1->ifa_broadcast)
                         ok |= BRD1_OK;
                 if (any == ifa1->ifa_broadcast)
                         ok |= BRD0_OK;
         }

That is, we loop on all addrs of the interface (in_dev->ifa_list) and compare
address we have just deleted (passed in 'ifa') with addresses on the list. 

As we compare them without taking prefix (mask) into account, the following 
will be true:

ifa->ifa_local == ifa1->ifa_local
ifa->ifa_broadcast == ifa1->ifa_broadcast

4. As a result, after deleting the first instance of IP (192.168.142.109/22) 
we still have  192.168.142.109/23 on the list. The routing code will find that 
this addr (and broadcast) are still present on the list and will not delete 
the routes.

5. When we delete the second time (192.168.142.109/23), there will be no
192.168.142.109 on the list anymore and the routing code will delete the route 
- but only one out of two entries.

How this can be fixed
---------------------

I am not sure what is the best way to fix this, I can think of several 
approaches:

  (a) change the sources so that it would be impossible to add the same IP 
      multiple times, even with different masks. I cannot think of any
      situation where adding the same IP (but with different mask) to the same
      interface could be useful. But maybe I am wrong?

  (b) improve the deletion algorithm in fib_del_ifaddr()

  (c) add a periodic cleanup that will purge all entries from 'local' table if
      there are no corresponding IPs on the interface list


Impact
------

Stale entries in RT_TABLE_LOCAL make ARP reply to requests for that IPs, even 
though these IPs do not belong to any interface.

These scenarios might seem a bit pathological, but in reality they are 
possible on clusters with multiple addresses on several interfaces, where 
addresses are added/deleted for service migration. Address migration can be 
done both by software and by system administrators and if by mistake a wrong 
mask is used, we can get this situation.

And yes, one of HP customers met exactly this problem. They saw a 'duplicate 
IP' issue after migrating some services and found that the host replies to 
ARP-request even though 'ip addr ls' did not show this address. It is not 
common knowledge that ARP implementation uses RT_TABLE_LOCAL to decide whether 
IP is local, so they were unable to understand what is wrong.

Regards,
Alex

------------------------------------------------------------------
Alexandre Sidorenko             email: asid@hp.com
WTEC Linux                      Hewlett-Packard (Canada)
------------------------------------------------------------------

[-- Attachment #2: iptest.sh --]
[-- Type: application/x-shellscript, Size: 597 bytes --]

^ permalink raw reply

* [PATCH] bonding: fix sparse warning
From: Stephen Hemminger @ 2011-02-23 17:40 UTC (permalink / raw)
  To: Jay Vosburgh, David Miller; +Cc: netdev

Fix use of zero where NULL expected. And wrap long line.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/bonding/bonding.h	2011-02-23 09:33:39.493248155 -0800
+++ b/drivers/net/bonding/bonding.h	2011-02-23 09:33:48.825356637 -0800
@@ -265,7 +265,8 @@ struct bonding {
  *
  * Caller must hold bond lock for read
  */
-static inline struct slave *bond_get_slave_by_dev(struct bonding *bond, struct net_device *slave_dev)
+static inline struct slave *bond_get_slave_by_dev(struct bonding *bond,
+						  struct net_device *slave_dev)
 {
 	struct slave *slave = NULL;
 	int i;
@@ -276,7 +277,7 @@ static inline struct slave *bond_get_sla
 		}
 	}
 
-	return 0;
+	return NULL;
 }
 
 static inline struct bonding *bond_get_bond_by_slave(struct slave *slave)

^ permalink raw reply

* Re: [PATCH] net_sched: long word align struct qdisc_skb_cb data
From: Stephen Hemminger @ 2011-02-23 17:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick McHardy, David Miller, Juliusz Chroboczek,
	John W. Linville, netdev, Andi Kleen
In-Reply-To: <1298480707.3301.386.camel@edumazet-laptop>

On Wed, 23 Feb 2011 18:05:07 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> > Am 23.02.2011 16:14, schrieb Eric Dumazet:
> > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > > index 16626a0..f40d32e 100644
> > > --- a/include/net/sch_generic.h
> > > +++ b/include/net/sch_generic.h
> > > @@ -218,6 +218,7 @@ struct tcf_proto {
> > >  
> > >  struct qdisc_skb_cb {
> > >  	unsigned int		pkt_len;
> > > +	unsigned int		sfb_classid;
> > >  	char			data[];
> > >  };
> > 
> > This could be moved into a SFB specific cb, similar to what netem
> > does.
> 
> Hmm... well... I want to be sure no other sch will destroy my values.
> 
> netem seems buggy then.
> 
> Probably following patch is needed ?

Yes, it was long word aligned when netem was written but
we seem to have bit creep.



-- 

^ permalink raw reply

* Re: [PATCH V11 2/4] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Scott Wood @ 2011-02-23 17:26 UTC (permalink / raw)
  To: Grant Likely
  Cc: Richard Cochran, Thomas Gleixner, Rodolfo Giometti, Arnd Bergmann,
	Peter Zijlstra, linux-api-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Russell King, Paul Mackerras,
	John Stultz, Alan Cox, netdev-u79uwXL29TY76Z2rM5mHXA,
	Mike Frysinger, Christoph Lameter,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, David Miller,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Krzysztof Halasa
In-Reply-To: <20110223165058.GE14597-MrY2KI0G/OVr83L8+7iqerDks+cytr/Z@public.gmane.org>

On Wed, 23 Feb 2011 09:50:58 -0700
Grant Likely <grant.likely-s3s/WqlpOiPyB63q8FvJNQ@public.gmane.org> wrote:

> On Wed, Feb 23, 2011 at 11:38:17AM +0100, Richard Cochran wrote:
> > +
> > +* Gianfar PTP clock nodes
> > +
> > +General Properties:
> > +
> > +  - compatible   Should be "fsl,etsec-ptp"
> 
> Should specify an *exact* part; ie: "fsl,mpc8313-etsec-ptp" instead of
> trying to define a generic catchall.  The reason is that the same
> marketing name can end up getting applied to a wide range of parts.
> 
> Instead, choose one specific device to stand in as the 'common'
> implementation and get all parts with the same core to claim
> compatibility with it.  ie: a p2020 might have:
> 
> 	compatible = "fsl,mpc2020-etsec-ptp", "fsl,mpc8313-etsec-ptp";

eTSEC is versioned, that's more reliable than the chip name since chips
have revisions (rev 2.1 of mpc8313 has eTSEC 1.6, not sure about previous
revs of mpc8313).  Logic blocks can be and have been uprevved between one
revision of a chip to the next.  I think "fsl,mpc8313rev2.1-etsec-ptp"
would be taking things a bit too far (and there could be board-level bugs
too...).

If you really need to know the exact SoC you're on, look in SVR (which
will provide revision info as well).  Isn't the device tree for things that
can't be probed?

The eTSEC revision is probeable as well, but due the way PTP is described as
a separate node, the driver doesn't have straightforward access to those
registers.

Insisting on an explicit chip also encourages people to claim compatibility
with that chip without ensuring that it really is fully compatible.

-Scott

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 17:16 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <4D653C9C.8010205@trash.net>

Le mercredi 23 février 2011 à 17:58 +0100, Patrick McHardy a écrit :
> Am 23.02.2011 17:48, schrieb Eric Dumazet:
> > Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> > 
> >> This needs to be a per-skb property, otherwise you could have the
> >> situation:
> >>
> >> - enqueue skb, double_buffering=0, increment buffer 0
> >> - enable double buffering
> >> - swap buffers
> >> - dequeue same skb, decrement buffer 1
> >>
> >> after which the qlen values of buffer 1 will be incorrect.
> >>
> > 
> > Normally its OK, because we bzero() the zone, and the "decrement" is
> > 0-bounded.
> 
> Yeah, but we might decrement buckets of different flows which
> are non-zero. Probably not too bad, but still not correct.
> 
> > I had this idea (of storing two bits per skb), but :
> > 
> > - It means that swap_buffer() should not touch (bzero) the 'old' bins
> 
> Yes, it means we have to properly decrement the old buffer
> until all bins reached zero.
> 
> > - Since hash perturbator is changed, we have to store the two hash
> > values per skb (instead of one u32 / classid).
> 
> Indeed.

BTQ, I had this idea of storing the double_buffer per skb reading SFB
paper, because paper says the double buffering is really needed only for
unelastic flows, not for all packets.

paper quote :

As one set of bins is being used
for queue management, a second set of bins using the next set of hash
functions can be warmed up. In this
case, any time a flow is classified as non-responsive, it is hashed
using the second set of hash functions and
the marking probabilities of the corresponding bins in the warmup set
are updated.

So using two 'hash' values per skb is the way to go, with special 0
value meanings : skb was not 'inserted' into virtual queues.




^ permalink raw reply

* Kernel panic nf_nat_setup_info+0x5b3/0x6e0
From: "Oleg A. Arkhangelsky" @ 2011-02-23 17:07 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

Hello,

Got this panic yesterday:
http://www.progtech.ru/~oleg/crash.txt

The offending instruction is:
cmpb 54(%edx), %cl # <variable>.tuple.dst.protonum,

and here is the assembler code of net/ipv4/netfilter/nf_nat_core.c:
http://www.progtech.ru/~oleg/nf_nat_core.s

Quick investigation lead me to conclusion that the problem is in
return of same_src function:

        return (t->dst.protonum == tuple->dst.protonum &&
                t->src.u3.ip == tuple->src.u3.ip &&
                t->src.u.all == tuple->src.u.all);

So either t or tuple pointer is bad, but I don't understand how
this can be.

Looks like the similar situation described here:
https://bugzilla.kernel.org/show_bug.cgi?id=21512

Any thoughts on this?
Thank you!

-- 
wbr, Oleg.

^ permalink raw reply

* Re: [Lxc-users] Bad checksums and lost packets with macvlan on dummy
From: Andrian Nord @ 2011-02-23 17:13 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: lxc-users, Patrick McHardy, Linux Netdev List
In-Reply-To: <4D628DC3.9000400@free.fr>

[-- Attachment #1: Type: text/plain, Size: 346 bytes --]

On Mon, Feb 21, 2011 at 05:07:31PM +0100, Daniel Lezcano wrote:
> I Cc'ed the netdev mailing list and Patrick in case my analysis is wrong 
> or incomplete.

I'm confirming, that this happens only when macvlan's are onto dummy net
device. In case of some physical interface under macvlan there is no lost
packages and no broken checksums.

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [PATCH] net_sched: long word align struct qdisc_skb_cb data
From: Eric Dumazet @ 2011-02-23 17:05 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <4D6534C3.1080305@trash.net>

Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> Am 23.02.2011 16:14, schrieb Eric Dumazet:
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 16626a0..f40d32e 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -218,6 +218,7 @@ struct tcf_proto {
> >  
> >  struct qdisc_skb_cb {
> >  	unsigned int		pkt_len;
> > +	unsigned int		sfb_classid;
> >  	char			data[];
> >  };
> 
> This could be moved into a SFB specific cb, similar to what netem
> does.

Hmm... well... I want to be sure no other sch will destroy my values.

netem seems buggy then.

Probably following patch is needed ?

Thanks

[PATCH] net_sched: long word align struct qdisc_skb_cb data

netem_skb_cb() does :

return (struct netem_skb_cb *)qdisc_skb_cb(skb)->data;

Unfortunatly struct qdisc_skb_cb data is not long word aligned, so
access to psched_time_t time_to_send uses a non aligned access.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sch_generic.h |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 16626a0..dbdf2b5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -218,7 +218,7 @@ struct tcf_proto {
 
 struct qdisc_skb_cb {
 	unsigned int		pkt_len;
-	char			data[];
+	long			data[];
 };
 
 static inline int qdisc_qlen(struct Qdisc *q)



^ permalink raw reply related

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Patrick McHardy @ 2011-02-23 16:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <1298479711.3301.373.camel@edumazet-laptop>

Am 23.02.2011 17:48, schrieb Eric Dumazet:
> Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> 
>> This needs to be a per-skb property, otherwise you could have the
>> situation:
>>
>> - enqueue skb, double_buffering=0, increment buffer 0
>> - enable double buffering
>> - swap buffers
>> - dequeue same skb, decrement buffer 1
>>
>> after which the qlen values of buffer 1 will be incorrect.
>>
> 
> Normally its OK, because we bzero() the zone, and the "decrement" is
> 0-bounded.

Yeah, but we might decrement buckets of different flows which
are non-zero. Probably not too bad, but still not correct.

> I had this idea (of storing two bits per skb), but :
> 
> - It means that swap_buffer() should not touch (bzero) the 'old' bins

Yes, it means we have to properly decrement the old buffer
until all bins reached zero.

> - Since hash perturbator is changed, we have to store the two hash
> values per skb (instead of one u32 / classid).

Indeed.

^ permalink raw reply

* Re: [PATCH V11 2/4] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Grant Likely @ 2011-02-23 16:50 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Alan Cox, Arnd Bergmann,
	Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
	Peter Zijlstra, Rodolfo Giometti, Thomas Gleixner,
	Benjamin Herrenschmidt, Mike Frysinger, Paul Mackerras,
	Russell King
In-Reply-To: <be1f5e801e8cd0145dd23aadae7c2055bb3c1d47.1298447722.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>

On Wed, Feb 23, 2011 at 11:38:17AM +0100, Richard Cochran wrote:
> The eTSEC includes a PTP clock with quite a few features. This patch adds
> support for the basic clock adjustment functions, plus two external time
> stamps, one alarm, and the PPS callback.
> 
> Signed-off-by: Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>
> Acked-by: John Stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  .../devicetree/bindings/net/fsl-tsec-phy.txt       |   57 +++
>  arch/powerpc/boot/dts/mpc8313erdb.dts              |   14 +
>  arch/powerpc/boot/dts/mpc8572ds.dts                |   14 +
>  arch/powerpc/boot/dts/p2020ds.dts                  |   14 +
>  arch/powerpc/boot/dts/p2020rdb.dts                 |   14 +
>  drivers/net/Makefile                               |    1 +
>  drivers/net/gianfar_ptp.c                          |  448 ++++++++++++++++++++
>  drivers/net/gianfar_ptp_reg.h                      |  113 +++++
>  drivers/ptp/Kconfig                                |   13 +
>  9 files changed, 688 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/net/gianfar_ptp.c
>  create mode 100644 drivers/net/gianfar_ptp_reg.h
> 
> diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> index edb7ae1..f6edbb8 100644
> --- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> +++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> @@ -74,3 +74,60 @@ Example:
>  		interrupt-parent = <&mpic>;
>  		phy-handle = <&phy0>
>  	};
> +
> +* Gianfar PTP clock nodes
> +
> +General Properties:
> +
> +  - compatible   Should be "fsl,etsec-ptp"

Should specify an *exact* part; ie: "fsl,mpc8313-etsec-ptp" instead of
trying to define a generic catchall.  The reason is that the same
marketing name can end up getting applied to a wide range of parts.

Instead, choose one specific device to stand in as the 'common'
implementation and get all parts with the same core to claim
compatibility with it.  ie: a p2020 might have:

	compatible = "fsl,mpc2020-etsec-ptp", "fsl,mpc8313-etsec-ptp";

> +  - reg          Offset and length of the register set for the device
> +  - interrupts   There should be at least two interrupts. Some devices
> +                 have as many as four PTP related interrupts.
> +
> +Clock Properties:
> +
> +  - tclk-period  Timer reference clock period in nanoseconds.
> +  - tmr-prsc     Prescaler, divides the output clock.
> +  - tmr-add      Frequency compensation value.
> +  - cksel        0= external clock, 1= eTSEC system clock, 3= RTC clock input.
> +                 Currently the driver only supports choice "1".

I'd be hesitant about defining something that isn't actually
implemented yet.  You may find the binding to be insufficient at a
later date.

> +  - tmr-fiper1   Fixed interval period pulse generator.
> +  - tmr-fiper2   Fixed interval period pulse generator.
> +  - max-adj      Maximum frequency adjustment in parts per billion.

These are all custom properties (not part of any shared binding) so
they should probably be prefixed with 'fsl,'.

> +
> +  These properties set the operational parameters for the PTP
> +  clock. You must choose these carefully for the clock to work right.
> +  Here is how to figure good values:
> +
> +  TimerOsc     = system clock               MHz
> +  tclk_period  = desired clock period       nanoseconds
> +  NominalFreq  = 1000 / tclk_period         MHz
> +  FreqDivRatio = TimerOsc / NominalFreq     (must be greater that 1.0)
> +  tmr_add      = ceil(2^32 / FreqDivRatio)
> +  OutputClock  = NominalFreq / tmr_prsc     MHz
> +  PulseWidth   = 1 / OutputClock            microseconds
> +  FiperFreq1   = desired frequency in Hz
> +  FiperDiv1    = 1000000 * OutputClock / FiperFreq1
> +  tmr_fiper1   = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
> +  max_adj      = 1000000000 * (FreqDivRatio - 1.0) - 1
> +
> +  The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
> +  driver expects that tmr_fiper1 will be correctly set to produce a 1
> +  Pulse Per Second (PPS) signal, since this will be offered to the PPS
> +  subsystem to synchronize the Linux clock.

Good documentation, thanks.  Question though, how many of these values
will the end user (or board builder) be likely to want to change.  It
is risky encoding the calculation results into the device tree when
they aren't the actually parameters that will be manipulated, or at
least very user-unfriendly.

> +
> +Example:
> +
> +	ptp_clock@24E00 {
> +		compatible = "fsl,etsec-ptp";
> +		reg = <0x24E00 0xB0>;
> +		interrupts = <12 0x8 13 0x8>;
> +		interrupt-parent = < &ipic >;
> +		tclk-period = <10>;
> +		tmr-prsc    = <100>;
> +		tmr-add     = <0x999999A4>;
> +		cksel       = <0x1>;
> +		tmr-fiper1  = <0x3B9AC9F6>;
> +		tmr-fiper2  = <0x00018696>;
> +		max-adj     = <659999998>;
> +	};
> diff --git a/arch/powerpc/boot/dts/mpc8313erdb.dts b/arch/powerpc/boot/dts/mpc8313erdb.dts
> index 183f2aa..85a7eaa 100644
> --- a/arch/powerpc/boot/dts/mpc8313erdb.dts
> +++ b/arch/powerpc/boot/dts/mpc8313erdb.dts
> @@ -208,6 +208,20 @@
>  			sleep = <&pmc 0x00300000>;
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <12 0x8 13 0x8>;
> +			interrupt-parent = < &ipic >;
> +			tclk-period = <10>;
> +			tmr-prsc    = <100>;
> +			tmr-add     = <0x999999A4>;
> +			cksel       = <0x1>;
> +			tmr-fiper1  = <0x3B9AC9F6>;
> +			tmr-fiper2  = <0x00018696>;
> +			max-adj     = <659999998>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/arch/powerpc/boot/dts/mpc8572ds.dts b/arch/powerpc/boot/dts/mpc8572ds.dts
> index cafc128..74208cd 100644
> --- a/arch/powerpc/boot/dts/mpc8572ds.dts
> +++ b/arch/powerpc/boot/dts/mpc8572ds.dts
> @@ -324,6 +324,20 @@
>  			};
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <68 2 69 2 70 2 71 2>;
> +			interrupt-parent = < &mpic >;
> +			tclk-period = <5>;
> +			tmr-prsc = <200>;
> +			tmr-add = <0xAAAAAAAB>;
> +			cksel = <1>;
> +			tmr-fiper1 = <0x3B9AC9FB>;
> +			tmr-fiper2 = <0x3B9AC9FB>;
> +			max-adj = <499999999>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/arch/powerpc/boot/dts/p2020ds.dts b/arch/powerpc/boot/dts/p2020ds.dts
> index 1101914..39d73bb 100644
> --- a/arch/powerpc/boot/dts/p2020ds.dts
> +++ b/arch/powerpc/boot/dts/p2020ds.dts
> @@ -336,6 +336,20 @@
>  			phy_type = "ulpi";
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <68 2 69 2 70 2>;
> +			interrupt-parent = < &mpic >;
> +			tclk-period = <5>;
> +			tmr-prsc = <200>;
> +			tmr-add = <0xCCCCCCCD>;
> +			cksel = <1>;
> +			tmr-fiper1 = <0x3B9AC9FB>;
> +			tmr-fiper2 = <0x0001869B>;
> +			max-adj = <249999999>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/arch/powerpc/boot/dts/p2020rdb.dts b/arch/powerpc/boot/dts/p2020rdb.dts
> index da4cb0d..5498fb9 100644
> --- a/arch/powerpc/boot/dts/p2020rdb.dts
> +++ b/arch/powerpc/boot/dts/p2020rdb.dts
> @@ -396,6 +396,20 @@
>  			phy_type = "ulpi";
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <68 2 69 2 70 2>;
> +			interrupt-parent = < &mpic >;
> +			tclk-period = <5>;
> +			tmr-prsc = <200>;
> +			tmr-add = <0xCCCCCCCD>;
> +			cksel = <1>;
> +			tmr-fiper1 = <0x3B9AC9FB>;
> +			tmr-fiper2 = <0x0001869B>;
> +			max-adj = <249999999>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index b90738d..c303f5f 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -31,6 +31,7 @@ obj-$(CONFIG_ATL2) += atlx/
>  obj-$(CONFIG_ATL1E) += atl1e/
>  obj-$(CONFIG_ATL1C) += atl1c/
>  obj-$(CONFIG_GIANFAR) += gianfar_driver.o
> +obj-$(CONFIG_PTP_1588_CLOCK_GIANFAR) += gianfar_ptp.o
>  obj-$(CONFIG_TEHUTI) += tehuti.o
>  obj-$(CONFIG_ENIC) += enic/
>  obj-$(CONFIG_JME) += jme.o
> diff --git a/drivers/net/gianfar_ptp.c b/drivers/net/gianfar_ptp.c
> new file mode 100644
> index 0000000..84fff15
> --- /dev/null
> +++ b/drivers/net/gianfar_ptp.c
> @@ -0,0 +1,448 @@
> +/*
> + * PTP 1588 clock using the eTSEC
> + *
> + * Copyright (C) 2010 OMICRON electronics GmbH
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, write to the Free Software
> + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + */
> +#include <linux/device.h>
> +#include <linux/hrtimer.h>
> +#include <linux/init.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/timex.h>
> +#include <linux/io.h>
> +
> +#include <linux/ptp_clock_kernel.h>
> +
> +#include "gianfar_ptp_reg.h"
> +#include "gianfar.h"
> +
> +#define DRIVER		"gianfar_ptp"
> +#define N_ALARM		1 /* first alarm is used internally to reset fipers */
> +#define N_EXT_TS	2
> +#define REG_SIZE	sizeof(struct gianfar_ptp_registers)
> +
> +struct etsects {
> +	struct gianfar_ptp_registers *regs;
> +	struct ptp_clock *clock;
> +	struct ptp_clock_info caps;
> +	int irq;
> +	u64 alarm_interval; /* for periodic alarm */
> +	u64 alarm_value;
> +	u32 tclk_period;  /* nanoseconds */
> +	u32 tmr_prsc;
> +	u32 tmr_add;
> +	u32 cksel;
> +	u32 tmr_fiper1;
> +	u32 tmr_fiper2;
> +};
> +
> +/* Private globals */
> +static struct etsects the_clock;
> +DEFINE_SPINLOCK(register_lock);
> +
> +/*
> + * Register access functions
> + */
> +
> +static u64 tmr_cnt_read(struct etsects *etsects)
> +{
> +	u64 ns;
> +	u32 lo, hi;
> +
> +	lo = gfar_read(&etsects->regs->tmr_cnt_l);
> +	hi = gfar_read(&etsects->regs->tmr_cnt_h);
> +	ns = ((u64) hi) << 32;
> +	ns |= lo;
> +	return ns;
> +}
> +
> +static void tmr_cnt_write(struct etsects *etsects, u64 ns)
> +{
> +	u32 hi = ns >> 32;
> +	u32 lo = ns & 0xffffffff;
> +
> +	gfar_write(&etsects->regs->tmr_cnt_l, lo);
> +	gfar_write(&etsects->regs->tmr_cnt_h, hi);
> +}
> +
> +static void set_alarm(struct etsects *etsects)
> +{
> +	u64 ns;
> +	u32 lo, hi;
> +
> +	ns = tmr_cnt_read(etsects) + 1500000000ULL;
> +	ns = div_u64(ns, 1000000000UL) * 1000000000ULL;
> +	ns -= etsects->tclk_period;
> +	hi = ns >> 32;
> +	lo = ns & 0xffffffff;
> +	gfar_write(&etsects->regs->tmr_alarm1_l, lo);
> +	gfar_write(&etsects->regs->tmr_alarm1_h, hi);
> +}
> +
> +static void set_fipers(struct etsects *etsects)
> +{
> +	u32 tmr_ctrl = gfar_read(&etsects->regs->tmr_ctrl);
> +
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl & (~TE));
> +	gfar_write(&etsects->regs->tmr_prsc,   etsects->tmr_prsc);
> +	gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
> +	gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
> +	set_alarm(etsects);
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl|TE);
> +}
> +
> +/*
> + * Interrupt service routine
> + */
> +
> +static irqreturn_t isr(int irq, void *priv)
> +{
> +	struct etsects *etsects = priv;
> +	struct ptp_clock_event event;
> +	u64 ns;
> +	u32 ack = 0, lo, hi, mask, val;
> +
> +	val = gfar_read(&etsects->regs->tmr_tevent);
> +
> +	if (val & ETS1) {
> +		ack |= ETS1;
> +		hi = gfar_read(&etsects->regs->tmr_etts1_h);
> +		lo = gfar_read(&etsects->regs->tmr_etts1_l);
> +		event.type = PTP_CLOCK_EXTTS;
> +		event.index = 0;
> +		event.timestamp = ((u64) hi) << 32;
> +		event.timestamp |= lo;
> +		ptp_clock_event(etsects->clock, &event);
> +	}
> +
> +	if (val & ETS2) {
> +		ack |= ETS2;
> +		hi = gfar_read(&etsects->regs->tmr_etts2_h);
> +		lo = gfar_read(&etsects->regs->tmr_etts2_l);
> +		event.type = PTP_CLOCK_EXTTS;
> +		event.index = 1;
> +		event.timestamp = ((u64) hi) << 32;
> +		event.timestamp |= lo;
> +		ptp_clock_event(etsects->clock, &event);
> +	}
> +
> +	if (val & ALM2) {
> +		ack |= ALM2;
> +		if (etsects->alarm_value) {
> +			event.type = PTP_CLOCK_ALARM;
> +			event.index = 0;
> +			event.timestamp = etsects->alarm_value;
> +			ptp_clock_event(etsects->clock, &event);
> +		}
> +		if (etsects->alarm_interval) {
> +			ns = etsects->alarm_value + etsects->alarm_interval;
> +			hi = ns >> 32;
> +			lo = ns & 0xffffffff;
> +			spin_lock(&register_lock);
> +			gfar_write(&etsects->regs->tmr_alarm2_l, lo);
> +			gfar_write(&etsects->regs->tmr_alarm2_h, hi);
> +			spin_unlock(&register_lock);
> +			etsects->alarm_value = ns;
> +		} else {
> +			gfar_write(&etsects->regs->tmr_tevent, ALM2);
> +			spin_lock(&register_lock);
> +			mask = gfar_read(&etsects->regs->tmr_temask);
> +			mask &= ~ALM2EN;
> +			gfar_write(&etsects->regs->tmr_temask, mask);
> +			spin_unlock(&register_lock);
> +			etsects->alarm_value = 0;
> +			etsects->alarm_interval = 0;
> +		}
> +	}
> +
> +	if (val & PP1) {
> +		ack |= PP1;
> +		event.type = PTP_CLOCK_PPS;
> +		ptp_clock_event(etsects->clock, &event);
> +	}
> +
> +	if (ack) {
> +		gfar_write(&etsects->regs->tmr_tevent, ack);
> +		return IRQ_HANDLED;
> +	} else
> +		return IRQ_NONE;
> +}
> +
> +/*
> + * PTP clock operations
> + */
> +
> +static int ptp_gianfar_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
> +{
> +	u64 adj;
> +	u32 diff, tmr_add;
> +	int neg_adj = 0;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	if (ppb < 0) {
> +		neg_adj = 1;
> +		ppb = -ppb;
> +	}
> +	tmr_add = etsects->tmr_add;
> +	adj = tmr_add;
> +	adj *= ppb;
> +	diff = div_u64(adj, 1000000000ULL);
> +
> +	tmr_add = neg_adj ? tmr_add - diff : tmr_add + diff;
> +
> +	gfar_write(&etsects->regs->tmr_add, tmr_add);
> +
> +	return 0;
> +}
> +
> +static int ptp_gianfar_adjtime(struct ptp_clock_info *ptp, s64 delta)
> +{
> +	s64 now;
> +	unsigned long flags;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	spin_lock_irqsave(&register_lock, flags);
> +
> +	now = tmr_cnt_read(etsects);
> +	now += delta;
> +	tmr_cnt_write(etsects, now);
> +
> +	spin_unlock_irqrestore(&register_lock, flags);
> +
> +	set_fipers(etsects);
> +
> +	return 0;
> +}
> +
> +static int ptp_gianfar_gettime(struct ptp_clock_info *ptp, struct timespec *ts)
> +{
> +	u64 ns;
> +	u32 remainder;
> +	unsigned long flags;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	spin_lock_irqsave(&register_lock, flags);
> +
> +	ns = tmr_cnt_read(etsects);
> +
> +	spin_unlock_irqrestore(&register_lock, flags);
> +
> +	ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
> +	ts->tv_nsec = remainder;
> +	return 0;
> +}
> +
> +static int ptp_gianfar_settime(struct ptp_clock_info *ptp,
> +			       const struct timespec *ts)
> +{
> +	u64 ns;
> +	unsigned long flags;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	ns = ts->tv_sec * 1000000000ULL;
> +	ns += ts->tv_nsec;
> +
> +	spin_lock_irqsave(&register_lock, flags);
> +
> +	tmr_cnt_write(etsects, ns);
> +	set_fipers(etsects);
> +
> +	spin_unlock_irqrestore(&register_lock, flags);
> +
> +	return 0;
> +}
> +
> +static int ptp_gianfar_enable(struct ptp_clock_info *ptp,
> +			      struct ptp_clock_request *rq, int on)
> +{
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +	unsigned long flags;
> +	u32 bit, mask;
> +
> +	switch (rq->type) {
> +	case PTP_CLK_REQ_EXTTS:
> +		switch (rq->extts.index) {
> +		case 0:
> +			bit = ETS1EN;
> +			break;
> +		case 1:
> +			bit = ETS2EN;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +		spin_lock_irqsave(&register_lock, flags);
> +		mask = gfar_read(&etsects->regs->tmr_temask);
> +		if (on)
> +			mask |= bit;
> +		else
> +			mask &= ~bit;
> +		gfar_write(&etsects->regs->tmr_temask, mask);
> +		spin_unlock_irqrestore(&register_lock, flags);
> +		return 0;
> +
> +	case PTP_CLK_REQ_PPS:
> +		spin_lock_irqsave(&register_lock, flags);
> +		mask = gfar_read(&etsects->regs->tmr_temask);
> +		if (on)
> +			mask |= PP1EN;
> +		else
> +			mask &= ~PP1EN;
> +		gfar_write(&etsects->regs->tmr_temask, mask);
> +		spin_unlock_irqrestore(&register_lock, flags);
> +		return 0;
> +
> +	default:
> +		break;
> +	}
> +
> +	return -EOPNOTSUPP;
> +}
> +
> +static struct ptp_clock_info ptp_gianfar_caps = {
> +	.owner		= THIS_MODULE,
> +	.name		= "gianfar clock",
> +	.max_adj	= 512000,
> +	.n_alarm	= N_ALARM,
> +	.n_ext_ts	= N_EXT_TS,
> +	.n_per_out	= 0,
> +	.pps		= 1,
> +	.adjfreq	= ptp_gianfar_adjfreq,
> +	.adjtime	= ptp_gianfar_adjtime,
> +	.gettime	= ptp_gianfar_gettime,
> +	.settime	= ptp_gianfar_settime,
> +	.enable		= ptp_gianfar_enable,
> +};
> +
> +/* OF device tree */
> +
> +static int get_of_u32(struct device_node *node, char *str, u32 *val)
> +{
> +	int plen;
> +	const u32 *prop = of_get_property(node, str, &plen);
> +
> +	if (!prop || plen != sizeof(*prop))
> +		return -1;
> +	*val = *prop;
> +	return 0;
> +}
> +
> +static int gianfar_ptp_probe(struct platform_device *dev,
> +			     const struct of_device_id *match)
> +{
> +	struct device_node *node = dev->dev.of_node;
> +	struct etsects *etsects = &the_clock;
> +	struct timespec now;
> +	u32 tmr_ctrl;
> +
> +	etsects->caps = ptp_gianfar_caps;
> +
> +	if (get_of_u32(node, "tclk-period", &etsects->tclk_period) ||
> +	    get_of_u32(node, "tmr-prsc", &etsects->tmr_prsc) ||
> +	    get_of_u32(node, "tmr-add", &etsects->tmr_add) ||
> +	    get_of_u32(node, "cksel", &etsects->cksel) ||
> +	    get_of_u32(node, "tmr-fiper1", &etsects->tmr_fiper1) ||
> +	    get_of_u32(node, "tmr-fiper2", &etsects->tmr_fiper2) ||
> +	    get_of_u32(node, "max-adj", &etsects->caps.max_adj)) {
> +		pr_err("device tree node missing required elements\n");
> +		return -ENODEV;
> +	}
> +
> +	etsects->irq = irq_of_parse_and_map(node, 0);

Use platform_get_irq().

> +
> +	if (etsects->irq == NO_IRQ) {
> +		pr_err("irq not in device tree\n");
> +		return -ENODEV;
> +	}
> +	if (request_irq(etsects->irq, isr, 0, DRIVER, etsects)) {
> +		pr_err("request_irq failed\n");
> +		return -ENODEV;
> +	}
> +	etsects->regs = of_iomap(node, 0);

Use platform_get_resource(), and don't forget to request the
resources.

> +	if (!etsects->regs) {
> +		pr_err("of_iomap ptp registers failed\n");
> +		return -EINVAL;
> +	}
> +	getnstimeofday(&now);
> +	ptp_gianfar_settime(&etsects->caps, &now);
> +
> +	tmr_ctrl =
> +	  (etsects->tclk_period & TCLK_PERIOD_MASK) << TCLK_PERIOD_SHIFT |
> +	  (etsects->cksel & CKSEL_MASK) << CKSEL_SHIFT;
> +
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl);
> +	gfar_write(&etsects->regs->tmr_add,    etsects->tmr_add);
> +	gfar_write(&etsects->regs->tmr_prsc,   etsects->tmr_prsc);
> +	gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
> +	gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
> +	set_alarm(etsects);
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl|FS|RTPE|TE);
> +
> +	etsects->clock = ptp_clock_register(&etsects->caps);
> +
> +	return IS_ERR(etsects->clock) ? PTR_ERR(etsects->clock) : 0;
> +}
> +
> +static int gianfar_ptp_remove(struct platform_device *dev)
> +{
> +	gfar_write(&the_clock.regs->tmr_temask, 0);
> +	gfar_write(&the_clock.regs->tmr_ctrl,   0);
> +
> +	ptp_clock_unregister(the_clock.clock);
> +	free_irq(the_clock.irq, &the_clock);
> +	iounmap(the_clock.regs);
> +
> +	return 0;
> +}
> +
> +static struct of_device_id match_table[] = {
> +	{ .compatible = "fsl,etsec-ptp" },
> +	{},
> +};
> +
> +static struct of_platform_driver gianfar_ptp_driver = {

Use a platform_driver instead.  of_platform_driver is deprecated and
being removed.

> +	.driver = {
> +		.name		= "gianfar_ptp",
> +		.of_match_table	= match_table,
> +		.owner		= THIS_MODULE,
> +	},
> +	.probe       = gianfar_ptp_probe,
> +	.remove      = gianfar_ptp_remove,
> +};
> +
> +/* module operations */
> +
> +static int __init ptp_gianfar_init(void)
> +{
> +	return of_register_platform_driver(&gianfar_ptp_driver);
> +}
> +
> +module_init(ptp_gianfar_init);
> +
> +static void __exit ptp_gianfar_exit(void)
> +{
> +	of_unregister_platform_driver(&gianfar_ptp_driver);
> +}
> +
> +module_exit(ptp_gianfar_exit);
> +
> +MODULE_AUTHOR("Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>");
> +MODULE_DESCRIPTION("PTP clock using the eTSEC");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/net/gianfar_ptp_reg.h b/drivers/net/gianfar_ptp_reg.h
> new file mode 100644
> index 0000000..95e171f
> --- /dev/null
> +++ b/drivers/net/gianfar_ptp_reg.h

This data is only used by gianfar_ptp.c, so there is no need for a
separate include file.  Move the contents of gianfar_ptp_reg.h into
gianfar_ptp.c

g.

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 16:48 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <4D6534C3.1080305@trash.net>

Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :

> This needs to be a per-skb property, otherwise you could have the
> situation:
> 
> - enqueue skb, double_buffering=0, increment buffer 0
> - enable double buffering
> - swap buffers
> - dequeue same skb, decrement buffer 1
> 
> after which the qlen values of buffer 1 will be incorrect.
> 

Normally its OK, because we bzero() the zone, and the "decrement" is
0-bounded.

I had this idea (of storing two bits per skb), but :

- It means that swap_buffer() should not touch (bzero) the 'old' bins

- Since hash perturbator is changed, we have to store the two hash
values per skb (instead of one u32 / classid).


> 
> > +		slot ^= 1;
> > +		sfbhash = sfb_hash(skb, slot, q);
> 
> Isn't there room in the cb to store both hash values?

Yes, I am going to implement your idea, its probably OK to use two u32
on skb_cb for this.

Thanks !



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox