Netdev List
 help / color / mirror / Atom feed
* [PATCH 6/7] batman-adv: Allow to use rntl_link for device creation/deletion
From: Sven Eckelmann @ 2012-11-24  0:02 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: netdev, davem, Sven Eckelmann
In-Reply-To: <1353715332-4284-1-git-send-email-sven@narfation.org>

The sysfs configuration interface of batman-adv to add/remove soft-interfaces
is not deadlock free and doesn't follow the currently common way to create new
virtual interfaces.

An additional interface though rtnl_link is introduced which provides easy device
creation/deletion with tools like "ip":

$ ip link add dev bat0 type batadv
$ ip link del dev bat0

Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
 net/batman-adv/main.c           |    2 ++
 net/batman-adv/main.h           |    1 +
 net/batman-adv/soft-interface.c |    9 +++++++++
 net/batman-adv/soft-interface.h |    1 +
 4 files changed, 13 insertions(+)

diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index f65a222..9b180a1 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -70,6 +70,7 @@ static int __init batadv_init(void)
 	batadv_debugfs_init();
 
 	register_netdevice_notifier(&batadv_hard_if_notifier);
+	rtnl_link_register(&batadv_link_ops);
 
 	pr_info("B.A.T.M.A.N. advanced %s (compatibility version %i) loaded\n",
 		BATADV_SOURCE_VERSION, BATADV_COMPAT_VERSION);
@@ -80,6 +81,7 @@ static int __init batadv_init(void)
 static void __exit batadv_exit(void)
 {
 	batadv_debugfs_destroy();
+	rtnl_link_unregister(&batadv_link_ops);
 	unregister_netdevice_notifier(&batadv_hard_if_notifier);
 	batadv_hardif_remove_interfaces();
 
diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
index 2f85577..605b125 100644
--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -148,6 +148,7 @@ enum batadv_uev_type {
 #include <linux/percpu.h>
 #include <linux/slab.h>
 #include <net/sock.h>		/* struct sock */
+#include <net/rtnetlink.h>
 #include <linux/jiffies.h>
 #include <linux/seq_file.h>
 #include "types.h"
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 68832f5..fcb510e 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -533,6 +533,8 @@ struct net_device *batadv_softif_create(const char *name)
 	if (!soft_iface)
 		return NULL;
 
+	soft_iface->rtnl_link_ops = &batadv_link_ops;
+
 	ret = register_netdevice(soft_iface);
 	if (ret < 0) {
 		pr_err("Unable to register the batman interface '%s': %i\n",
@@ -567,6 +569,13 @@ int batadv_softif_is_valid(const struct net_device *net_dev)
 	return 0;
 }
 
+struct rtnl_link_ops batadv_link_ops __read_mostly = {
+	.kind		= "batadv",
+	.priv_size	= sizeof(struct batadv_priv),
+	.setup		= batadv_interface_setup,
+	.dellink	= batadv_softif_destroy,
+};
+
 /* ethtool */
 static int batadv_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 {
diff --git a/net/batman-adv/soft-interface.h b/net/batman-adv/soft-interface.h
index fe9b0f7..5eecc6e 100644
--- a/net/batman-adv/soft-interface.h
+++ b/net/batman-adv/soft-interface.h
@@ -28,5 +28,6 @@ struct net_device *batadv_softif_create(const char *name);
 void batadv_softif_destroy(struct net_device *soft_iface,
 			   struct list_head *head);
 int batadv_softif_is_valid(const struct net_device *net_dev);
+extern struct rtnl_link_ops batadv_link_ops;
 
 #endif /* _NET_BATMAN_ADV_SOFT_INTERFACE_H_ */
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 5/7] batman-adv: Allow to postpone softif unregistration
From: Sven Eckelmann @ 2012-11-24  0:02 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: netdev, davem, Sven Eckelmann
In-Reply-To: <1353715332-4284-1-git-send-email-sven@narfation.org>

The caller of batadv_softif_destroy may want to postpone the unregistration of
the soft-interface. This is necessary when the rtnl_link configuration hooks
are used.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
 net/batman-adv/hard-interface.c |    2 +-
 net/batman-adv/soft-interface.c |    5 +++--
 net/batman-adv/soft-interface.h |    3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index daeb632..5e2655d 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -447,7 +447,7 @@ void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface,
 
 	/* nobody uses this interface anymore */
 	if (!bat_priv->num_ifaces && autodel == BATADV_IF_CLEANUP_AUTO)
-		batadv_softif_destroy(hard_iface->soft_iface);
+		batadv_softif_destroy(hard_iface->soft_iface, NULL);
 
 	hard_iface->soft_iface = NULL;
 	batadv_hardif_free_ref(hard_iface);
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 5fd0560..68832f5 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -544,7 +544,8 @@ struct net_device *batadv_softif_create(const char *name)
 	return soft_iface;
 }
 
-void batadv_softif_destroy(struct net_device *soft_iface)
+void batadv_softif_destroy(struct net_device *soft_iface,
+			   struct list_head *head)
 {
 	struct batadv_hard_iface *hard_iface;
 
@@ -555,7 +556,7 @@ void batadv_softif_destroy(struct net_device *soft_iface)
 	}
 
 	batadv_sysfs_del_meshif(soft_iface);
-	unregister_netdevice(soft_iface);
+	unregister_netdevice_queue(soft_iface, head);
 }
 
 int batadv_softif_is_valid(const struct net_device *net_dev)
diff --git a/net/batman-adv/soft-interface.h b/net/batman-adv/soft-interface.h
index 07a08fe..fe9b0f7 100644
--- a/net/batman-adv/soft-interface.h
+++ b/net/batman-adv/soft-interface.h
@@ -25,7 +25,8 @@ void batadv_interface_rx(struct net_device *soft_iface,
 			 struct sk_buff *skb, struct batadv_hard_iface *recv_if,
 			 int hdr_size, struct batadv_orig_node *orig_node);
 struct net_device *batadv_softif_create(const char *name);
-void batadv_softif_destroy(struct net_device *soft_iface);
+void batadv_softif_destroy(struct net_device *soft_iface,
+			   struct list_head *head);
 int batadv_softif_is_valid(const struct net_device *net_dev);
 
 #endif /* _NET_BATMAN_ADV_SOFT_INTERFACE_H_ */
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/7] batman-adv: Move deinitialization of soft-interface to destructor
From: Sven Eckelmann @ 2012-11-24  0:02 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: netdev, davem, Sven Eckelmann
In-Reply-To: <1353715332-4284-1-git-send-email-sven@narfation.org>

The deinitialization of the soft-interface created in ndo_init/constructor
should be done in the destructor and not directly before calling
unregister_netdevice

Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
 net/batman-adv/soft-interface.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index b545595..8467b9f 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -487,6 +487,17 @@ static const struct net_device_ops batadv_netdev_ops = {
 	.ndo_validate_addr = eth_validate_addr
 };
 
+/**
+ * batadv_interface_free - Deconstructor of batadv_soft_interface
+ * @dev: Device to cleanup and remove
+ */
+static void batadv_interface_free(struct net_device *dev)
+{
+	batadv_debugfs_del_meshif(dev);
+	batadv_mesh_free(dev);
+	free_netdev(dev);
+}
+
 static void batadv_interface_setup(struct net_device *dev)
 {
 	struct batadv_priv *priv = netdev_priv(dev);
@@ -494,7 +505,7 @@ static void batadv_interface_setup(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->netdev_ops = &batadv_netdev_ops;
-	dev->destructor = free_netdev;
+	dev->destructor = batadv_interface_free;
 	dev->tx_queue_len = 0;
 
 	/* can't call min_mtu, because the needed variables
@@ -535,9 +546,7 @@ struct net_device *batadv_softif_create(const char *name)
 
 void batadv_softif_destroy(struct net_device *soft_iface)
 {
-	batadv_debugfs_del_meshif(soft_iface);
 	batadv_sysfs_del_meshif(soft_iface);
-	batadv_mesh_free(soft_iface);
 	unregister_netdevice(soft_iface);
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 4/7] batman-adv: Remove slave interfaces before removing master interface
From: Sven Eckelmann @ 2012-11-24  0:02 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: netdev, davem, Sven Eckelmann
In-Reply-To: <1353715332-4284-1-git-send-email-sven@narfation.org>

The master (soft-interface) should remove the references to its soft-interfaces
before it gets destroyed. Otherwise the usage count of the interface will not be
zero and therefore the net_device cannot be removed. This leads to an endless
hang of the unregister call.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
 net/batman-adv/soft-interface.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 8467b9f..5fd0560 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -546,6 +546,14 @@ struct net_device *batadv_softif_create(const char *name)
 
 void batadv_softif_destroy(struct net_device *soft_iface)
 {
+	struct batadv_hard_iface *hard_iface;
+
+	list_for_each_entry(hard_iface, &batadv_hardif_list, list) {
+		if (hard_iface->soft_iface == soft_iface)
+			batadv_hardif_disable_interface(hard_iface,
+							BATADV_IF_CLEANUP_KEEP);
+	}
+
 	batadv_sysfs_del_meshif(soft_iface);
 	unregister_netdevice(soft_iface);
 }
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/7] batman-adv: Don't always delete softif when last slave was removed
From: Sven Eckelmann @ 2012-11-24  0:02 UTC (permalink / raw)
  To: b.a.t.m.a.n; +Cc: netdev, davem, Sven Eckelmann
In-Reply-To: <1353715332-4284-1-git-send-email-sven@narfation.org>

batman-adv has an unusual way to manage softinterfaces. These will be created
automatically when a user writes to the batman-adv/mesh_iface file in sysfs and
removed when no slave device exists anymore.

This behaviour cannot be changed without breaking compatibility with existing
code. Instead other interfaces should be able to slightly reduce this behaviour
and provide a more common reaction to a removal of a slave interface.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
 net/batman-adv/hard-interface.c |    8 +++++---
 net/batman-adv/hard-interface.h |   13 ++++++++++++-
 net/batman-adv/sysfs.c          |    6 ++++--
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 50f4a5c..daeb632 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -407,7 +407,8 @@ err:
 	return ret;
 }
 
-void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface)
+void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface,
+				     enum batadv_hard_if_cleanup autodel)
 {
 	struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface);
 	struct batadv_hard_iface *primary_if = NULL;
@@ -445,7 +446,7 @@ void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface)
 	dev_put(hard_iface->soft_iface);
 
 	/* nobody uses this interface anymore */
-	if (!bat_priv->num_ifaces)
+	if (!bat_priv->num_ifaces && autodel == BATADV_IF_CLEANUP_AUTO)
 		batadv_softif_destroy(hard_iface->soft_iface);
 
 	hard_iface->soft_iface = NULL;
@@ -511,7 +512,8 @@ static void batadv_hardif_remove_interface(struct batadv_hard_iface *hard_iface)
 
 	/* first deactivate interface */
 	if (hard_iface->if_status != BATADV_IF_NOT_IN_USE)
-		batadv_hardif_disable_interface(hard_iface);
+		batadv_hardif_disable_interface(hard_iface,
+						BATADV_IF_CLEANUP_AUTO);
 
 	if (hard_iface->if_status != BATADV_IF_NOT_IN_USE)
 		return;
diff --git a/net/batman-adv/hard-interface.h b/net/batman-adv/hard-interface.h
index 3732366..34cf367 100644
--- a/net/batman-adv/hard-interface.h
+++ b/net/batman-adv/hard-interface.h
@@ -29,13 +29,24 @@ enum batadv_hard_if_state {
 	BATADV_IF_I_WANT_YOU,
 };
 
+/**
+ * enum batadv_hard_if_cleanup - Cleanup modi for soft_iface after slave removal
+ * @BATADV_IF_CLEANUP_KEEP: Don't automatically delete soft-interface
+ * @BATADV_IF_CLEANUP_AUTO: Delete soft-interface after last slave was removed
+ */
+enum batadv_hard_if_cleanup {
+	BATADV_IF_CLEANUP_KEEP,
+	BATADV_IF_CLEANUP_AUTO,
+};
+
 extern struct notifier_block batadv_hard_if_notifier;
 
 struct batadv_hard_iface*
 batadv_hardif_get_by_netdev(const struct net_device *net_dev);
 int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface,
 				   const char *iface_name);
-void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface);
+void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface,
+				     enum batadv_hard_if_cleanup autodel);
 void batadv_hardif_remove_interfaces(void);
 int batadv_hardif_min_mtu(struct net_device *soft_iface);
 void batadv_update_min_mtu(struct net_device *soft_iface);
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c
index 84a55cb..5d868d3 100644
--- a/net/batman-adv/sysfs.c
+++ b/net/batman-adv/sysfs.c
@@ -582,13 +582,15 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj,
 	}
 
 	if (status_tmp == BATADV_IF_NOT_IN_USE) {
-		batadv_hardif_disable_interface(hard_iface);
+		batadv_hardif_disable_interface(hard_iface,
+						BATADV_IF_CLEANUP_AUTO);
 		goto unlock;
 	}
 
 	/* if the interface already is in use */
 	if (hard_iface->if_status != BATADV_IF_NOT_IN_USE)
-		batadv_hardif_disable_interface(hard_iface);
+		batadv_hardif_disable_interface(hard_iface,
+						BATADV_IF_CLEANUP_AUTO);
 
 	ret = batadv_hardif_enable_interface(hard_iface, buff);
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/7] batman-adv: Move soft-interface initialization to ndo_init
From: Sven Eckelmann @ 2012-11-24  0:02 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q

The initialization of an net_device object should be done in the
init/constructor function and not from the outside after the register_netdevice
was done to avoid race conditions.

Signed-off-by: Sven Eckelmann <sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org>
---
 net/batman-adv/hard-interface.c |    5 ++
 net/batman-adv/soft-interface.c |  147 ++++++++++++++++++---------------------
 2 files changed, 73 insertions(+), 79 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 365ed74..50f4a5c 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -542,6 +542,11 @@ static int batadv_hard_if_event(struct notifier_block *this,
 	struct batadv_hard_iface *primary_if = NULL;
 	struct batadv_priv *bat_priv;
 
+	if (batadv_softif_is_valid(net_dev) && event == NETDEV_REGISTER) {
+		batadv_sysfs_add_meshif(net_dev);
+		return NOTIFY_DONE;
+	}
+
 	hard_iface = batadv_hardif_get_by_netdev(net_dev);
 	if (!hard_iface && event == NETDEV_REGISTER)
 		hard_iface = batadv_hardif_add_interface(net_dev);
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 54800c7..b545595 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -407,9 +407,73 @@ static void batadv_set_lockdep_class(struct net_device *dev)
  */
 static int batadv_softif_init(struct net_device *dev)
 {
+	struct batadv_priv *bat_priv;
+	int ret;
+	size_t cnt_len = sizeof(uint64_t) * BATADV_CNT_NUM;
+
 	batadv_set_lockdep_class(dev);
 
+	bat_priv = netdev_priv(dev);
+
+	/* batadv_interface_stats() needs to be available as soon as
+	 * register_netdevice() has been called
+	 */
+	bat_priv->bat_counters = __alloc_percpu(cnt_len, __alignof__(uint64_t));
+	if (!bat_priv->bat_counters)
+		return -ENOMEM;
+
+	atomic_set(&bat_priv->aggregated_ogms, 1);
+	atomic_set(&bat_priv->bonding, 0);
+	atomic_set(&bat_priv->bridge_loop_avoidance, 0);
+#ifdef CONFIG_BATMAN_ADV_DAT
+	atomic_set(&bat_priv->distributed_arp_table, 1);
+#endif
+	atomic_set(&bat_priv->ap_isolation, 0);
+	atomic_set(&bat_priv->vis_mode, BATADV_VIS_TYPE_CLIENT_UPDATE);
+	atomic_set(&bat_priv->gw_mode, BATADV_GW_MODE_OFF);
+	atomic_set(&bat_priv->gw_sel_class, 20);
+	atomic_set(&bat_priv->gw_bandwidth, 41);
+	atomic_set(&bat_priv->orig_interval, 1000);
+	atomic_set(&bat_priv->hop_penalty, 30);
+	atomic_set(&bat_priv->log_level, 0);
+	atomic_set(&bat_priv->fragmentation, 1);
+	atomic_set(&bat_priv->bcast_queue_left, BATADV_BCAST_QUEUE_LEN);
+	atomic_set(&bat_priv->batman_queue_left, BATADV_BATMAN_QUEUE_LEN);
+
+	atomic_set(&bat_priv->mesh_state, BATADV_MESH_INACTIVE);
+	atomic_set(&bat_priv->bcast_seqno, 1);
+	atomic_set(&bat_priv->tt.vn, 0);
+	atomic_set(&bat_priv->tt.local_changes, 0);
+	atomic_set(&bat_priv->tt.ogm_append_cnt, 0);
+#ifdef CONFIG_BATMAN_ADV_BLA
+	atomic_set(&bat_priv->bla.num_requests, 0);
+#endif
+	bat_priv->tt.last_changeset = NULL;
+	bat_priv->tt.last_changeset_len = 0;
+
+	bat_priv->primary_if = NULL;
+	bat_priv->num_ifaces = 0;
+
+	ret = batadv_algo_select(bat_priv, batadv_routing_algo);
+	if (ret < 0)
+		goto free_bat_counters;
+
+	ret = batadv_debugfs_add_meshif(dev);
+	if (ret < 0)
+		goto free_bat_counters;
+
+	ret = batadv_mesh_init(dev);
+	if (ret < 0)
+		goto unreg_debugfs;
+
 	return 0;
+
+unreg_debugfs:
+	batadv_debugfs_del_meshif(dev);
+free_bat_counters:
+	free_percpu(bat_priv->bat_counters);
+
+	return ret;
 }
 
 static const struct net_device_ops batadv_netdev_ops = {
@@ -451,97 +515,22 @@ static void batadv_interface_setup(struct net_device *dev)
 struct net_device *batadv_softif_create(const char *name)
 {
 	struct net_device *soft_iface;
-	struct batadv_priv *bat_priv;
 	int ret;
-	size_t cnt_len = sizeof(uint64_t) * BATADV_CNT_NUM;
 
-	soft_iface = alloc_netdev(sizeof(*bat_priv), name,
+	soft_iface = alloc_netdev(sizeof(struct batadv_priv), name,
 				  batadv_interface_setup);
-
 	if (!soft_iface)
-		goto out;
-
-	bat_priv = netdev_priv(soft_iface);
-
-	/* batadv_interface_stats() needs to be available as soon as
-	 * register_netdevice() has been called
-	 */
-	bat_priv->bat_counters = __alloc_percpu(cnt_len, __alignof__(uint64_t));
-	if (!bat_priv->bat_counters)
-		goto free_soft_iface;
+		return NULL;
 
 	ret = register_netdevice(soft_iface);
 	if (ret < 0) {
 		pr_err("Unable to register the batman interface '%s': %i\n",
 		       name, ret);
-		goto free_bat_counters;
+		free_netdev(soft_iface);
+		return NULL;
 	}
 
-	atomic_set(&bat_priv->aggregated_ogms, 1);
-	atomic_set(&bat_priv->bonding, 0);
-	atomic_set(&bat_priv->bridge_loop_avoidance, 0);
-#ifdef CONFIG_BATMAN_ADV_DAT
-	atomic_set(&bat_priv->distributed_arp_table, 1);
-#endif
-	atomic_set(&bat_priv->ap_isolation, 0);
-	atomic_set(&bat_priv->vis_mode, BATADV_VIS_TYPE_CLIENT_UPDATE);
-	atomic_set(&bat_priv->gw_mode, BATADV_GW_MODE_OFF);
-	atomic_set(&bat_priv->gw_sel_class, 20);
-	atomic_set(&bat_priv->gw_bandwidth, 41);
-	atomic_set(&bat_priv->orig_interval, 1000);
-	atomic_set(&bat_priv->hop_penalty, 30);
-	atomic_set(&bat_priv->log_level, 0);
-	atomic_set(&bat_priv->fragmentation, 1);
-	atomic_set(&bat_priv->bcast_queue_left, BATADV_BCAST_QUEUE_LEN);
-	atomic_set(&bat_priv->batman_queue_left, BATADV_BATMAN_QUEUE_LEN);
-
-	atomic_set(&bat_priv->mesh_state, BATADV_MESH_INACTIVE);
-	atomic_set(&bat_priv->bcast_seqno, 1);
-	atomic_set(&bat_priv->tt.vn, 0);
-	atomic_set(&bat_priv->tt.local_changes, 0);
-	atomic_set(&bat_priv->tt.ogm_append_cnt, 0);
-#ifdef CONFIG_BATMAN_ADV_BLA
-	atomic_set(&bat_priv->bla.num_requests, 0);
-#endif
-	bat_priv->tt.last_changeset = NULL;
-	bat_priv->tt.last_changeset_len = 0;
-
-	bat_priv->primary_if = NULL;
-	bat_priv->num_ifaces = 0;
-
-	ret = batadv_algo_select(bat_priv, batadv_routing_algo);
-	if (ret < 0)
-		goto unreg_soft_iface;
-
-	ret = batadv_sysfs_add_meshif(soft_iface);
-	if (ret < 0)
-		goto unreg_soft_iface;
-
-	ret = batadv_debugfs_add_meshif(soft_iface);
-	if (ret < 0)
-		goto unreg_sysfs;
-
-	ret = batadv_mesh_init(soft_iface);
-	if (ret < 0)
-		goto unreg_debugfs;
-
 	return soft_iface;
-
-unreg_debugfs:
-	batadv_debugfs_del_meshif(soft_iface);
-unreg_sysfs:
-	batadv_sysfs_del_meshif(soft_iface);
-unreg_soft_iface:
-	free_percpu(bat_priv->bat_counters);
-	unregister_netdevice(soft_iface);
-	return NULL;
-
-free_bat_counters:
-	free_percpu(bat_priv->bat_counters);
-free_soft_iface:
-	free_netdev(soft_iface);
-out:
-	return NULL;
 }
 
 void batadv_softif_destroy(struct net_device *soft_iface)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH] atm: br2684: Fix excessive queue bloat
From: David Woodhouse @ 2012-11-24  0:01 UTC (permalink / raw)
  To: netdev
  Cc: John Crispin, Dave Täht, Krzysztof Mazur,
	Chas Williams (CONTRACTOR)

[-- Attachment #1: Type: text/plain, Size: 5089 bytes --]

There's really no excuse for an additional wmem_default of buffering
between the netdev queue and the ATM device. Two packets (one in-flight,
and one ready to send) ought to be fine. It's not as if it should take
long to get another from the netdev queue when we need it.

If necessary we can make the queue space configurable later, but I don't
think it's likely to be necessary.

cf. commit 9d02daf754238adac48fa075ee79e7edd3d79ed3 (pppoatm: Fix
excessive queue bloat) which did something very similar for PPPoATM.

Note that there is a tremendously unlikely race condition which may
result in qspace temporarily going negative. If a CPU running the
br2684_pop() function goes off into the weeds for a long period of time
after incrementing qspace to 1, but before calling netdev_wake_queue()...
and another CPU ends up calling br2684_start_xmit() and *stopping* the
queue again before the first CPU comes back, the netdev queue could
end up being woken when qspace has already reached zero.

An alternative approach to coping with this race would be to check in
br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just
using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is
simpler. It just warranted a mention of *why* we do it that way...

Move the call to atmvcc->send() to happen *after* the accounting and
potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters
if the ->send() call suffers an immediate failure, because it'll call
br2684_pop() with the offending skb before returning. We want that to
happen *after* we've done the initial accounting for the packet in
question. Also make it return an appropriate success/failure indication
while we're at it.

Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my
network, with only a single PPPoE-over-BR2684 link running. And after
setting txqueuelen on the nas0 interface to something low (5, in fact).
Before the patch, we'd see about 15 packets being queued and a resulting
latency of ~56ms being reached. After the patch, we see only about 8,
which is fairly much what we expect. And a max latency of ~36ms. On this
OpenWRT box, wmem_default is 163840.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 net/atm/br2684.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 4819d315..3440a79 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -74,6 +74,7 @@ struct br2684_vcc {
 	struct br2684_filter filter;
 #endif /* CONFIG_ATM_BR2684_IPFILTER */
 	unsigned int copies_needed, copies_failed;
+	atomic_t qspace;
 };
 
 struct br2684_dev {
@@ -181,18 +182,15 @@ static struct notifier_block atm_dev_notifier = {
 static void br2684_pop(struct atm_vcc *vcc, struct sk_buff *skb)
 {
 	struct br2684_vcc *brvcc = BR2684_VCC(vcc);
-	struct net_device *net_dev = skb->dev;
 
-	pr_debug("(vcc %p ; net_dev %p )\n", vcc, net_dev);
+	pr_debug("(vcc %p ; net_dev %p )\n", vcc, brvcc->device);
 	brvcc->old_pop(vcc, skb);
 
-	if (!net_dev)
-		return;
-
-	if (atm_may_send(vcc, 0))
-		netif_wake_queue(net_dev);
-
+	/* If the queue space just went up from zero, wake */
+	if (atomic_inc_return(&brvcc->qspace) == 1)
+		netif_wake_queue(brvcc->device);
 }
+
 /*
  * Send a packet out a particular vcc.  Not to useful right now, but paves
  * the way for multiple vcc's per itf.  Returns true if we can send,
@@ -256,16 +254,19 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
 	ATM_SKB(skb)->atm_options = atmvcc->atm_options;
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
-	atmvcc->send(atmvcc, skb);
 
-	if (!atm_may_send(atmvcc, 0)) {
+	if (atomic_dec_return(&brvcc->qspace) < 1) {
+		/* No more please! */
 		netif_stop_queue(brvcc->device);
-		/*check for race with br2684_pop*/
-		if (atm_may_send(atmvcc, 0))
-			netif_start_queue(brvcc->device);
+		/* We might have raced with br2684_pop() */
+		if (unlikely(atomic_read(&brvcc->qspace) > 0))
+			netif_wake_queue(brvcc->device);
 	}
 
-	return 1;
+	/* If this fails immediately, the skb will be freed and br2684_pop()
+	   will wake the queue if appropriate. Just return an error so that
+	   the stats are updated correctly */
+	return !atmvcc->send(atmvcc, skb);
 }
 
 static inline struct br2684_vcc *pick_outgoing_vcc(const struct sk_buff *skb,
@@ -504,6 +505,11 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
 	brvcc = kzalloc(sizeof(struct br2684_vcc), GFP_KERNEL);
 	if (!brvcc)
 		return -ENOMEM;
+	/* Allow two packets in the ATM queue. One actually being sent, and one
+	   for the ATM 'TX done' handler to send. It shouldn't take long to get
+	   the next one from the netdev queue, when we need it. More than that
+	   would be bufferbloat. */
+	atomic_set(&brvcc->qspace, 2);
 	write_lock_irq(&devs_lock);
 	net_dev = br2684_find_dev(&be.ifspec);
 	if (net_dev == NULL) {
-- 
1.8.0


-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* [PATCH] pkt_sched: QFQ Plus: fair-queueing service at DRR cost
From: Paolo Valente @ 2012-11-23 21:03 UTC (permalink / raw)
  To: davem, jhs, shemminger
  Cc: linux-kernel, netdev, rizzo, fchecconi, paolo.valente

This patch turns QFQ into QFQ+, a variant of QFQ that provides the
following two benefits: 1) QFQ+ is faster than QFQ, 2) differently
from QFQ, QFQ+ correctly schedules also non-leaves classes in a
hierarchical setting. A detailed description of QFQ+, plus a
performance comparison with DRR and QFQ, can be found in [1].

[1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
---
 net/sched/sch_qfq.c |  830 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 566 insertions(+), 264 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 9687fa1..6ed3765 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1,7 +1,8 @@
 /*
- * net/sched/sch_qfq.c         Quick Fair Queueing Scheduler.
+ * net/sched/sch_qfq.c         Quick Fair Queueing Plus Scheduler.
  *
  * Copyright (c) 2009 Fabio Checconi, Luigi Rizzo, and Paolo Valente.
+ * Copyright (c) 2012 Paolo Valente.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -19,12 +20,18 @@
 #include <net/pkt_cls.h>
 
 
-/*  Quick Fair Queueing
-    ===================
+/*  Quick Fair Queueing Plus
+    ========================
 
     Sources:
 
-    Fabio Checconi, Luigi Rizzo, and Paolo Valente: "QFQ: Efficient
+    [1] Paolo Valente,
+    "Reducing the Execution Time of Fair-Queueing Schedulers."
+    http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf
+
+    Sources for QFQ:
+
+    [2] Fabio Checconi, Luigi Rizzo, and Paolo Valente: "QFQ: Efficient
     Packet Scheduling with Tight Bandwidth Distribution Guarantees."
 
     See also:
@@ -33,6 +40,20 @@
 
 /*
 
+  QFQ+ divides classes into aggregates of at most MAX_AGG_CLASSES
+  classes. Each aggregate is timestamped with a virtual start time S
+  and a virtual finish time F, and scheduled according to its
+  timestamps. S and F are computed as a function of a system virtual
+  time function V. The classes within each aggregate are instead
+  scheduled with DRR.
+
+  To speed up operations, QFQ+ divides also aggregates into a limited
+  number of groups. Which group a class belongs to depends on the
+  ratio between the maximum packet length for the class and the weight
+  of the class. Groups have their own S and F. In the end, QFQ+
+  schedules groups, then aggregates within groups, then classes within
+  aggregates. See [1] and [2] for a full description.
+
   Virtual time computations.
 
   S, F and V are all computed in fixed point arithmetic with
@@ -76,27 +97,28 @@
 #define QFQ_MAX_SLOTS	32
 
 /*
- * Shifts used for class<->group mapping.  We allow class weights that are
- * in the range [1, 2^MAX_WSHIFT], and we try to map each class i to the
+ * Shifts used for aggregate<->group mapping.  We allow class weights that are
+ * in the range [1, 2^MAX_WSHIFT], and we try to map each aggregate i to the
  * group with the smallest index that can support the L_i / r_i configured
- * for the class.
+ * for the classes in the aggregate.
  *
  * grp->index is the index of the group; and grp->slot_shift
  * is the shift for the corresponding (scaled) sigma_i.
  */
 #define QFQ_MAX_INDEX		24
-#define QFQ_MAX_WSHIFT		12
+#define QFQ_MAX_WSHIFT		10
 
-#define	QFQ_MAX_WEIGHT		(1<<QFQ_MAX_WSHIFT)
-#define QFQ_MAX_WSUM		(16*QFQ_MAX_WEIGHT)
+#define	QFQ_MAX_WEIGHT		(1<<QFQ_MAX_WSHIFT) /* see qfq_slot_insert */
+#define QFQ_MAX_WSUM		(64*QFQ_MAX_WEIGHT)
 
 #define FRAC_BITS		30	/* fixed point arithmetic */
 #define ONE_FP			(1UL << FRAC_BITS)
 #define IWSUM			(ONE_FP/QFQ_MAX_WSUM)
 
 #define QFQ_MTU_SHIFT		16	/* to support TSO/GSO */
-#define QFQ_MIN_SLOT_SHIFT	(FRAC_BITS + QFQ_MTU_SHIFT - QFQ_MAX_INDEX)
-#define QFQ_MIN_LMAX		256	/* min possible lmax for a class */
+#define QFQ_MIN_LMAX		512	/* see qfq_slot_insert */
+
+#define QFQ_MAX_AGG_CLASSES	8 /* max num classes per aggregate allowed */
 
 /*
  * Possible group states.  These values are used as indexes for the bitmaps
@@ -106,6 +128,8 @@ enum qfq_state { ER, IR, EB, IB, QFQ_MAX_STATE };
 
 struct qfq_group;
 
+struct qfq_aggregate;
+
 struct qfq_class {
 	struct Qdisc_class_common common;
 
@@ -116,7 +140,12 @@ struct qfq_class {
 	struct gnet_stats_queue qstats;
 	struct gnet_stats_rate_est rate_est;
 	struct Qdisc *qdisc;
+	struct list_head alist;		/* Link for active-classes list. */
+	struct qfq_aggregate *agg;	/* Parent aggregate. */
+	int deficit;			/* DRR deficit counter. */
+};
 
+struct qfq_aggregate {
 	struct hlist_node next;	/* Link for the slot list. */
 	u64 S, F;		/* flow timestamps (exact) */
 
@@ -127,8 +156,18 @@ struct qfq_class {
 	struct qfq_group *grp;
 
 	/* these are copied from the flowset. */
-	u32	inv_w;		/* ONE_FP/weight */
-	u32	lmax;		/* Max packet size for this flow. */
+	u32	class_weight; /* Weight of each class in this aggregate. */
+	/* Max pkt size for the classes in this aggregate, DRR quantum. */
+	int	lmax;
+
+	u32	inv_w;	    /* ONE_FP/(sum of weights of classes in aggr.). */
+	u32	budgetmax;  /* Max budget for this aggregate. */
+	u32	initial_budget, budget;     /* Initial and current budget. */
+
+	int		  num_classes;	/* Number of classes in this aggr. */
+	struct list_head  active;	/* DRR queue of active classes. */
+
+	struct hlist_node nonfull_next;	/* See nonfull_aggs in qfq_sched. */
 };
 
 struct qfq_group {
@@ -138,7 +177,7 @@ struct qfq_group {
 	unsigned int front;		/* Index of the front slot. */
 	unsigned long full_slots;	/* non-empty slots */
 
-	/* Array of RR lists of active classes. */
+	/* Array of RR lists of active aggregates. */
 	struct hlist_head slots[QFQ_MAX_SLOTS];
 };
 
@@ -146,13 +185,28 @@ struct qfq_sched {
 	struct tcf_proto *filter_list;
 	struct Qdisc_class_hash clhash;
 
-	u64		V;		/* Precise virtual time. */
-	u32		wsum;		/* weight sum */
+	u64			oldV, V;	/* Precise virtual times. */
+	struct qfq_aggregate	*in_serv_agg;   /* Aggregate being served. */
+	u32			num_active_agg; /* Num. of active aggregates */
+	u32			wsum;		/* weight sum */
 
 	unsigned long bitmaps[QFQ_MAX_STATE];	    /* Group bitmaps. */
 	struct qfq_group groups[QFQ_MAX_INDEX + 1]; /* The groups. */
+	u32 min_slot_shift;	/* Index of the group-0 bit in the bitmaps. */
+
+	u32 max_agg_classes;		/* Max number of classes per aggr. */
+	struct hlist_head nonfull_aggs; /* Aggs with room for more classes. */
 };
 
+/*
+ * Possible reasons why the timestamps of an aggregate are updated
+ * enqueue: the aggregate switches from idle to active and must scheduled
+ *	    for service
+ * requeue: the aggregate finishes its budget, so it stops being served and
+ *	    must be rescheduled for service
+ */
+enum update_reason {enqueue, requeue};
+
 static struct qfq_class *qfq_find_class(struct Qdisc *sch, u32 classid)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
@@ -182,18 +236,18 @@ static const struct nla_policy qfq_policy[TCA_QFQ_MAX + 1] = {
  * index = log_2(maxlen/weight) but we need to apply the scaling.
  * This is used only once at flow creation.
  */
-static int qfq_calc_index(u32 inv_w, unsigned int maxlen)
+static int qfq_calc_index(u32 inv_w, unsigned int maxlen, u32 min_slot_shift)
 {
 	u64 slot_size = (u64)maxlen * inv_w;
 	unsigned long size_map;
 	int index = 0;
 
-	size_map = slot_size >> QFQ_MIN_SLOT_SHIFT;
+	size_map = slot_size >> min_slot_shift;
 	if (!size_map)
 		goto out;
 
 	index = __fls(size_map) + 1;	/* basically a log_2 */
-	index -= !(slot_size - (1ULL << (index + QFQ_MIN_SLOT_SHIFT - 1)));
+	index -= !(slot_size - (1ULL << (index + min_slot_shift - 1)));
 
 	if (index < 0)
 		index = 0;
@@ -204,66 +258,150 @@ out:
 	return index;
 }
 
-/* Length of the next packet (0 if the queue is empty). */
-static unsigned int qdisc_peek_len(struct Qdisc *sch)
+static void qfq_deactivate_agg(struct qfq_sched *, struct qfq_aggregate *);
+static void qfq_activate_agg(struct qfq_sched *, struct qfq_aggregate *,
+			     enum update_reason);
+
+static void qfq_init_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
+			 u32 lmax, u32 weight)
 {
-	struct sk_buff *skb;
+	INIT_LIST_HEAD(&agg->active);
+	hlist_add_head(&agg->nonfull_next, &q->nonfull_aggs);
+
+	agg->lmax = lmax;
+	agg->class_weight = weight;
+}
+
+static struct qfq_aggregate *qfq_find_agg(struct qfq_sched *q,
+					  u32 lmax, u32 weight)
+{
+	struct qfq_aggregate *agg;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(agg, n, &q->nonfull_aggs, nonfull_next)
+		if (agg->lmax == lmax && agg->class_weight == weight)
+			return agg;
+
+	return NULL;
+}
+
 
-	skb = sch->ops->peek(sch);
-	return skb ? qdisc_pkt_len(skb) : 0;
+/* Update aggregate as a function of the new number of classes. */
+static void qfq_update_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
+			   int new_num_classes)
+{
+	u32 new_agg_weight;
+
+	if (new_num_classes == q->max_agg_classes)
+		hlist_del_init(&agg->nonfull_next);
+
+	if (agg->num_classes > new_num_classes &&
+	    new_num_classes == q->max_agg_classes - 1) /* agg no more full */
+		hlist_add_head(&agg->nonfull_next, &q->nonfull_aggs);
+
+	agg->budgetmax = new_num_classes * agg->lmax;
+	new_agg_weight = agg->class_weight * new_num_classes;
+	agg->inv_w = ONE_FP/new_agg_weight;
+
+	if (agg->grp == NULL) {
+		int i = qfq_calc_index(agg->inv_w, agg->budgetmax,
+				       q->min_slot_shift);
+		agg->grp = &q->groups[i];
+	}
+
+	q->wsum +=
+		(int) agg->class_weight * (new_num_classes - agg->num_classes);
+
+	agg->num_classes = new_num_classes;
+}
+
+/* Add class to aggregate. */
+static void qfq_add_to_agg(struct qfq_sched *q,
+			   struct qfq_aggregate *agg,
+			   struct qfq_class *cl)
+{
+	cl->agg = agg;
+
+	qfq_update_agg(q, agg, agg->num_classes+1);
+	if (cl->qdisc->q.qlen > 0) { /* adding an active class */
+		list_add_tail(&cl->alist, &agg->active);
+		if (list_first_entry(&agg->active, struct qfq_class, alist) ==
+		    cl && q->in_serv_agg != agg) /* agg was inactive */
+			qfq_activate_agg(q, agg, enqueue); /* schedule agg */
+	}
 }
 
-static void qfq_deactivate_class(struct qfq_sched *, struct qfq_class *);
-static void qfq_activate_class(struct qfq_sched *q, struct qfq_class *cl,
-			       unsigned int len);
+static struct qfq_aggregate *qfq_choose_next_agg(struct qfq_sched *);
 
-static void qfq_update_class_params(struct qfq_sched *q, struct qfq_class *cl,
-				    u32 lmax, u32 inv_w, int delta_w)
+static void qfq_destroy_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 {
-	int i;
+	if (!hlist_unhashed(&agg->nonfull_next))
+		hlist_del_init(&agg->nonfull_next);
+	if (q->in_serv_agg == agg)
+		q->in_serv_agg = qfq_choose_next_agg(q);
+	kfree(agg);
+}
 
-	/* update qfq-specific data */
-	cl->lmax = lmax;
-	cl->inv_w = inv_w;
-	i = qfq_calc_index(cl->inv_w, cl->lmax);
+/* Deschedule class from within its parent aggregate. */
+static void qfq_deactivate_class(struct qfq_sched *q, struct qfq_class *cl)
+{
+	struct qfq_aggregate *agg = cl->agg;
 
-	cl->grp = &q->groups[i];
 
-	q->wsum += delta_w;
+	list_del(&cl->alist); /* remove from RR queue of the aggregate */
+	if (list_empty(&agg->active)) /* agg is now inactive */
+		qfq_deactivate_agg(q, agg);
 }
 
-static void qfq_update_reactivate_class(struct qfq_sched *q,
-					struct qfq_class *cl,
-					u32 inv_w, u32 lmax, int delta_w)
+/* Remove class from its parent aggregate. */
+static void qfq_rm_from_agg(struct qfq_sched *q, struct qfq_class *cl)
 {
-	bool need_reactivation = false;
-	int i = qfq_calc_index(inv_w, lmax);
+	struct qfq_aggregate *agg = cl->agg;
 
-	if (&q->groups[i] != cl->grp && cl->qdisc->q.qlen > 0) {
-		/*
-		 * shift cl->F back, to not charge the
-		 * class for the not-yet-served head
-		 * packet
-		 */
-		cl->F = cl->S;
-		/* remove class from its slot in the old group */
-		qfq_deactivate_class(q, cl);
-		need_reactivation = true;
+	cl->agg = NULL;
+	if (agg->num_classes == 1) { /* agg being emptied, destroy it */
+		qfq_destroy_agg(q, agg);
+		return;
 	}
+	qfq_update_agg(q, agg, agg->num_classes-1);
+}
 
-	qfq_update_class_params(q, cl, lmax, inv_w, delta_w);
+/* Deschedule class and remove it from its parent aggregate. */
+static void qfq_deact_rm_from_agg(struct qfq_sched *q, struct qfq_class *cl)
+{
+	if (cl->qdisc->q.qlen > 0) /* class is active */
+		qfq_deactivate_class(q, cl);
 
-	if (need_reactivation) /* activate in new group */
-		qfq_activate_class(q, cl, qdisc_peek_len(cl->qdisc));
+	qfq_rm_from_agg(q, cl);
 }
 
+/* Move class to a new aggregate, matching the new class weight and/or lmax */
+static int qfq_change_agg(struct Qdisc *sch, struct qfq_class *cl, u32 weight,
+			   u32 lmax)
+{
+	struct qfq_sched *q = qdisc_priv(sch);
+	struct qfq_aggregate *new_agg = qfq_find_agg(q, lmax, weight);
+
+	if (new_agg == NULL) { /* create new aggregate */
+		new_agg = kzalloc(sizeof(*new_agg), GFP_ATOMIC);
+		if (new_agg == NULL)
+			return -ENOBUFS;
+		qfq_init_agg(q, new_agg, lmax, weight);
+	}
+	qfq_deact_rm_from_agg(q, cl);
+	qfq_add_to_agg(q, new_agg, cl);
+
+	return 0;
+}
 
 static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 			    struct nlattr **tca, unsigned long *arg)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
 	struct qfq_class *cl = (struct qfq_class *)*arg;
+	bool existing = false;
 	struct nlattr *tb[TCA_QFQ_MAX + 1];
+	struct qfq_aggregate *new_agg = NULL;
 	u32 weight, lmax, inv_w;
 	int err;
 	int delta_w;
@@ -286,15 +424,6 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	} else
 		weight = 1;
 
-	inv_w = ONE_FP / weight;
-	weight = ONE_FP / inv_w;
-	delta_w = weight - (cl ? ONE_FP / cl->inv_w : 0);
-	if (q->wsum + delta_w > QFQ_MAX_WSUM) {
-		pr_notice("qfq: total weight out of range (%u + %u)\n",
-			  delta_w, q->wsum);
-		return -EINVAL;
-	}
-
 	if (tb[TCA_QFQ_LMAX]) {
 		lmax = nla_get_u32(tb[TCA_QFQ_LMAX]);
 		if (lmax < QFQ_MIN_LMAX || lmax > (1UL << QFQ_MTU_SHIFT)) {
@@ -304,7 +433,23 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	} else
 		lmax = psched_mtu(qdisc_dev(sch));
 
-	if (cl != NULL) {
+	inv_w = ONE_FP / weight;
+	weight = ONE_FP / inv_w;
+
+	if (cl != NULL &&
+	    lmax == cl->agg->lmax &&
+	    weight == cl->agg->class_weight)
+		return 0; /* nothing to change */
+
+	delta_w = weight - (cl ? cl->agg->class_weight : 0);
+
+	if (q->wsum + delta_w > QFQ_MAX_WSUM) {
+		pr_notice("qfq: total weight out of range (%d + %u)\n",
+			  delta_w, q->wsum);
+		return -EINVAL;
+	}
+
+	if (cl != NULL) { /* modify existing class */
 		if (tca[TCA_RATE]) {
 			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
 						    qdisc_root_sleeping_lock(sch),
@@ -312,25 +457,18 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 			if (err)
 				return err;
 		}
-
-		if (lmax == cl->lmax && inv_w == cl->inv_w)
-			return 0; /* nothing to update */
-
-		sch_tree_lock(sch);
-		qfq_update_reactivate_class(q, cl, inv_w, lmax, delta_w);
-		sch_tree_unlock(sch);
-
-		return 0;
+		existing = true;
+		goto set_change_agg;
 	}
 
+	/* create and init new class */
 	cl = kzalloc(sizeof(struct qfq_class), GFP_KERNEL);
 	if (cl == NULL)
 		return -ENOBUFS;
 
 	cl->refcnt = 1;
 	cl->common.classid = classid;
-
-	qfq_update_class_params(q, cl, lmax, inv_w, delta_w);
+	cl->deficit = lmax;
 
 	cl->qdisc = qdisc_create_dflt(sch->dev_queue,
 				      &pfifo_qdisc_ops, classid);
@@ -341,11 +479,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		err = gen_new_estimator(&cl->bstats, &cl->rate_est,
 					qdisc_root_sleeping_lock(sch),
 					tca[TCA_RATE]);
-		if (err) {
-			qdisc_destroy(cl->qdisc);
-			kfree(cl);
-			return err;
-		}
+		if (err)
+			goto destroy_class;
 	}
 
 	sch_tree_lock(sch);
@@ -354,19 +489,39 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	qdisc_class_hash_grow(sch, &q->clhash);
 
+set_change_agg:
+	sch_tree_lock(sch);
+	new_agg = qfq_find_agg(q, lmax, weight);
+	if (new_agg == NULL) { /* create new aggregate */
+		sch_tree_unlock(sch);
+		new_agg = kzalloc(sizeof(*new_agg), GFP_KERNEL);
+		if (new_agg == NULL) {
+			err = -ENOBUFS;
+			gen_kill_estimator(&cl->bstats, &cl->rate_est);
+			goto destroy_class;
+		}
+		sch_tree_lock(sch);
+		qfq_init_agg(q, new_agg, lmax, weight);
+	}
+	if (existing)
+		qfq_deact_rm_from_agg(q, cl);
+	qfq_add_to_agg(q, new_agg, cl);
+	sch_tree_unlock(sch);
+
 	*arg = (unsigned long)cl;
 	return 0;
+
+destroy_class:
+	qdisc_destroy(cl->qdisc);
+	kfree(cl);
+	return err;
 }
 
 static void qfq_destroy_class(struct Qdisc *sch, struct qfq_class *cl)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
 
-	if (cl->inv_w) {
-		q->wsum -= ONE_FP / cl->inv_w;
-		cl->inv_w = 0;
-	}
-
+	qfq_rm_from_agg(q, cl);
 	gen_kill_estimator(&cl->bstats, &cl->rate_est);
 	qdisc_destroy(cl->qdisc);
 	kfree(cl);
@@ -481,8 +636,8 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
-	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, ONE_FP/cl->inv_w) ||
-	    nla_put_u32(skb, TCA_QFQ_LMAX, cl->lmax))
+	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
+	    nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
 		goto nla_put_failure;
 	return nla_nest_end(skb, nest);
 
@@ -500,8 +655,8 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	memset(&xstats, 0, sizeof(xstats));
 	cl->qdisc->qstats.qlen = cl->qdisc->q.qlen;
 
-	xstats.weight = ONE_FP/cl->inv_w;
-	xstats.lmax = cl->lmax;
+	xstats.weight = cl->agg->class_weight;
+	xstats.lmax = cl->agg->lmax;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
@@ -652,16 +807,16 @@ static void qfq_unblock_groups(struct qfq_sched *q, int index, u64 old_F)
  * perhaps
  *
 	old_V ^= q->V;
-	old_V >>= QFQ_MIN_SLOT_SHIFT;
+	old_V >>= q->min_slot_shift;
 	if (old_V) {
 		...
 	}
  *
  */
-static void qfq_make_eligible(struct qfq_sched *q, u64 old_V)
+static void qfq_make_eligible(struct qfq_sched *q)
 {
-	unsigned long vslot = q->V >> QFQ_MIN_SLOT_SHIFT;
-	unsigned long old_vslot = old_V >> QFQ_MIN_SLOT_SHIFT;
+	unsigned long vslot = q->V >> q->min_slot_shift;
+	unsigned long old_vslot = q->oldV >> q->min_slot_shift;
 
 	if (vslot != old_vslot) {
 		unsigned long mask = (1UL << fls(vslot ^ old_vslot)) - 1;
@@ -672,34 +827,38 @@ static void qfq_make_eligible(struct qfq_sched *q, u64 old_V)
 
 
 /*
- * If the weight and lmax (max_pkt_size) of the classes do not change,
- * then QFQ guarantees that the slot index is never higher than
- * 2 + ((1<<QFQ_MTU_SHIFT)/QFQ_MIN_LMAX) * (QFQ_MAX_WEIGHT/QFQ_MAX_WSUM).
+ * The index of the slot in which the aggregate is to be inserted must
+ * not be higher than QFQ_MAX_SLOTS-2. There is a '-2' and not a '-1'
+ * because the start time of the group may be moved backward by one
+ * slot after the aggregate has been inserted, and this would cause
+ * non-empty slots to be right-shifted by one position.
  *
- * With the current values of the above constants, the index is
- * then guaranteed to never be higher than 2 + 256 * (1 / 16) = 18.
+ * If the weight and lmax (max_pkt_size) of the classes do not change,
+ * then QFQ+ does meet the above contraint according to the current
+ * values of its parameters. In fact, if the weight and lmax of the
+ * classes do not change, then, from the theory, QFQ+ guarantees that
+ * the slot index is never higher than
+ * 2 + QFQ_MAX_AGG_CLASSES * ((1<<QFQ_MTU_SHIFT)/QFQ_MIN_LMAX) *
+ * (QFQ_MAX_WEIGHT/QFQ_MAX_WSUM) = 2 + 8 * 128 * (1 / 64) = 18
  *
  * When the weight of a class is increased or the lmax of the class is
- * decreased, a new class with smaller slot size may happen to be
- * activated. The activation of this class should be properly delayed
- * to when the service of the class has finished in the ideal system
- * tracked by QFQ. If the activation of the class is not delayed to
- * this reference time instant, then this class may be unjustly served
- * before other classes waiting for service. This may cause
- * (unfrequently) the above bound to the slot index to be violated for
- * some of these unlucky classes.
+ * decreased, a new aggregate with smaller slot size than the original
+ * parent aggregate of the class may happen to be activated. The
+ * activation of this aggregate should be properly delayed to when the
+ * service of the class has finished in the ideal system tracked by
+ * QFQ+. If the activation of the aggregate is not delayed to this
+ * reference time instant, then this aggregate may be unjustly served
+ * before other aggregates waiting for service. This may cause the
+ * above bound to the slot index to be violated for some of these
+ * unlucky aggregates.
  *
- * Instead of delaying the activation of the new class, which is quite
- * complex, the following inaccurate but simple solution is used: if
- * the slot index is higher than QFQ_MAX_SLOTS-2, then the timestamps
- * of the class are shifted backward so as to let the slot index
- * become equal to QFQ_MAX_SLOTS-2. This threshold is used because, if
- * the slot index is above it, then the data structure implementing
- * the bucket list either gets immediately corrupted or may get
- * corrupted on a possible next packet arrival that causes the start
- * time of the group to be shifted backward.
+ * Instead of delaying the activation of the new aggregate, which is
+ * quite complex, the following inaccurate but simple solution is used:
+ * if the slot index is higher than QFQ_MAX_SLOTS-2, then the
+ * timestamps of the aggregate are shifted backward so as to let the
+ * slot index become equal to QFQ_MAX_SLOTS-2.
  */
-static void qfq_slot_insert(struct qfq_group *grp, struct qfq_class *cl,
+static void qfq_slot_insert(struct qfq_group *grp, struct qfq_aggregate *agg,
 			    u64 roundedS)
 {
 	u64 slot = (roundedS - grp->S) >> grp->slot_shift;
@@ -708,22 +867,22 @@ static void qfq_slot_insert(struct qfq_group *grp, struct qfq_class *cl,
 	if (unlikely(slot > QFQ_MAX_SLOTS - 2)) {
 		u64 deltaS = roundedS - grp->S -
 			((u64)(QFQ_MAX_SLOTS - 2)<<grp->slot_shift);
-		cl->S -= deltaS;
-		cl->F -= deltaS;
+		agg->S -= deltaS;
+		agg->F -= deltaS;
 		slot = QFQ_MAX_SLOTS - 2;
 	}
 
 	i = (grp->front + slot) % QFQ_MAX_SLOTS;
 
-	hlist_add_head(&cl->next, &grp->slots[i]);
+	hlist_add_head(&agg->next, &grp->slots[i]);
 	__set_bit(slot, &grp->full_slots);
 }
 
 /* Maybe introduce hlist_first_entry?? */
-static struct qfq_class *qfq_slot_head(struct qfq_group *grp)
+static struct qfq_aggregate *qfq_slot_head(struct qfq_group *grp)
 {
 	return hlist_entry(grp->slots[grp->front].first,
-			   struct qfq_class, next);
+			   struct qfq_aggregate, next);
 }
 
 /*
@@ -731,20 +890,20 @@ static struct qfq_class *qfq_slot_head(struct qfq_group *grp)
  */
 static void qfq_front_slot_remove(struct qfq_group *grp)
 {
-	struct qfq_class *cl = qfq_slot_head(grp);
+	struct qfq_aggregate *agg = qfq_slot_head(grp);
 
-	BUG_ON(!cl);
-	hlist_del(&cl->next);
+	BUG_ON(!agg);
+	hlist_del(&agg->next);
 	if (hlist_empty(&grp->slots[grp->front]))
 		__clear_bit(0, &grp->full_slots);
 }
 
 /*
- * Returns the first full queue in a group. As a side effect,
- * adjust the bucket list so the first non-empty bucket is at
- * position 0 in full_slots.
+ * Returns the first aggregate in the first non-empty bucket of the
+ * group. As a side effect, adjusts the bucket list so the first
+ * non-empty bucket is at position 0 in full_slots.
  */
-static struct qfq_class *qfq_slot_scan(struct qfq_group *grp)
+static struct qfq_aggregate *qfq_slot_scan(struct qfq_group *grp)
 {
 	unsigned int i;
 
@@ -780,7 +939,7 @@ static void qfq_slot_rotate(struct qfq_group *grp, u64 roundedS)
 	grp->front = (grp->front - i) % QFQ_MAX_SLOTS;
 }
 
-static void qfq_update_eligible(struct qfq_sched *q, u64 old_V)
+static void qfq_update_eligible(struct qfq_sched *q)
 {
 	struct qfq_group *grp;
 	unsigned long ineligible;
@@ -792,137 +951,226 @@ static void qfq_update_eligible(struct qfq_sched *q, u64 old_V)
 			if (qfq_gt(grp->S, q->V))
 				q->V = grp->S;
 		}
-		qfq_make_eligible(q, old_V);
+		qfq_make_eligible(q);
 	}
 }
 
-/*
- * Updates the class, returns true if also the group needs to be updated.
- */
-static bool qfq_update_class(struct qfq_group *grp, struct qfq_class *cl)
+/* Dequeue head packet of the head class in the DRR queue of the aggregate. */
+static void agg_dequeue(struct qfq_aggregate *agg,
+			struct qfq_class *cl, unsigned int len)
 {
-	unsigned int len = qdisc_peek_len(cl->qdisc);
+	qdisc_dequeue_peeked(cl->qdisc);
 
-	cl->S = cl->F;
-	if (!len)
-		qfq_front_slot_remove(grp);	/* queue is empty */
-	else {
-		u64 roundedS;
+	cl->deficit -= (int) len;
 
-		cl->F = cl->S + (u64)len * cl->inv_w;
-		roundedS = qfq_round_down(cl->S, grp->slot_shift);
-		if (roundedS == grp->S)
-			return false;
-
-		qfq_front_slot_remove(grp);
-		qfq_slot_insert(grp, cl, roundedS);
+	if (cl->qdisc->q.qlen == 0) /* no more packets, remove from list */
+		list_del(&cl->alist);
+	else if (cl->deficit < qdisc_pkt_len(cl->qdisc->ops->peek(cl->qdisc))) {
+		cl->deficit += agg->lmax;
+		list_move_tail(&cl->alist, &agg->active);
 	}
+}
+
+static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg,
+					   struct qfq_class **cl,
+					   unsigned int *len)
+{
+	struct sk_buff *skb;
 
-	return true;
+	*cl = list_first_entry(&agg->active, struct qfq_class, alist);
+	skb = (*cl)->qdisc->ops->peek((*cl)->qdisc);
+	if (skb == NULL)
+		WARN_ONCE(1, "qfq_dequeue: non-workconserving leaf\n");
+	else
+		*len = qdisc_pkt_len(skb);
+
+	return skb;
+}
+
+/* Update F according to the actual service received by the aggregate. */
+static inline void charge_actual_service(struct qfq_aggregate *agg)
+{
+	/* compute the service received by the aggregate */
+	u32 service_received = agg->initial_budget - agg->budget;
+
+	agg->F = agg->S + (u64)service_received * agg->inv_w;
 }
 
 static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
-	struct qfq_group *grp;
+	struct qfq_aggregate *in_serv_agg = q->in_serv_agg;
 	struct qfq_class *cl;
-	struct sk_buff *skb;
-	unsigned int len;
-	u64 old_V;
+	struct sk_buff *skb = NULL;
+	/* next-packet len, 0 means no more active classes in in-service agg */
+	unsigned int len = 0;
 
-	if (!q->bitmaps[ER])
+	if (in_serv_agg == NULL)
 		return NULL;
 
-	grp = qfq_ffs(q, q->bitmaps[ER]);
+	if (!list_empty(&in_serv_agg->active))
+		skb = qfq_peek_skb(in_serv_agg, &cl, &len);
 
-	cl = qfq_slot_head(grp);
-	skb = qdisc_dequeue_peeked(cl->qdisc);
-	if (!skb) {
-		WARN_ONCE(1, "qfq_dequeue: non-workconserving leaf\n");
-		return NULL;
+	/*
+	 * If there are no active classes in the in-service aggregate,
+	 * or if the aggregate has not enough budget to serve its next
+	 * class, then choose the next aggregate to serve.
+	 */
+	if (len == 0 || in_serv_agg->budget < len) {
+		charge_actual_service(in_serv_agg);
+
+		/* recharge the budget of the aggregate */
+		in_serv_agg->initial_budget = in_serv_agg->budget =
+			in_serv_agg->budgetmax;
+
+		if (!list_empty(&in_serv_agg->active))
+			/*
+			 * Still active: reschedule for
+			 * service. Possible optimization: if no other
+			 * aggregate is active, then there is no point
+			 * in rescheduling this aggregate, and we can
+			 * just keep it as the in-service one. This
+			 * should be however a corner case, and to
+			 * handle it, we would need to maintain an
+			 * extra num_active_aggs field.
+			*/
+			qfq_activate_agg(q, in_serv_agg, requeue);
+		else if (sch->q.qlen == 0) { /* no aggregate to serve */
+			q->in_serv_agg = NULL;
+			return NULL;
+		}
+
+		/*
+		 * If we get here, there are other aggregates queued:
+		 * choose the new aggregate to serve.
+		 */
+		in_serv_agg = q->in_serv_agg = qfq_choose_next_agg(q);
+		skb = qfq_peek_skb(in_serv_agg, &cl, &len);
 	}
+	if (!skb)
+		return NULL;
 
 	sch->q.qlen--;
 	qdisc_bstats_update(sch, skb);
 
-	old_V = q->V;
-	len = qdisc_pkt_len(skb);
+	agg_dequeue(in_serv_agg, cl, len);
+	in_serv_agg->budget -= len;
 	q->V += (u64)len * IWSUM;
 	pr_debug("qfq dequeue: len %u F %lld now %lld\n",
-		 len, (unsigned long long) cl->F, (unsigned long long) q->V);
+		 len, (unsigned long long) in_serv_agg->F,
+		 (unsigned long long) q->V);
 
-	if (qfq_update_class(grp, cl)) {
-		u64 old_F = grp->F;
+	return skb;
+}
 
-		cl = qfq_slot_scan(grp);
-		if (!cl)
-			__clear_bit(grp->index, &q->bitmaps[ER]);
-		else {
-			u64 roundedS = qfq_round_down(cl->S, grp->slot_shift);
-			unsigned int s;
+static struct qfq_aggregate *qfq_choose_next_agg(struct qfq_sched *q)
+{
+	struct qfq_group *grp;
+	struct qfq_aggregate *agg, *new_front_agg;
+	u64 old_F;
 
-			if (grp->S == roundedS)
-				goto skip_unblock;
-			grp->S = roundedS;
-			grp->F = roundedS + (2ULL << grp->slot_shift);
-			__clear_bit(grp->index, &q->bitmaps[ER]);
-			s = qfq_calc_state(q, grp);
-			__set_bit(grp->index, &q->bitmaps[s]);
-		}
+	qfq_update_eligible(q);
+	q->oldV = q->V;
+
+	if (!q->bitmaps[ER])
+		return NULL;
+
+	grp = qfq_ffs(q, q->bitmaps[ER]);
+	old_F = grp->F;
+
+	agg = qfq_slot_head(grp);
 
-		qfq_unblock_groups(q, grp->index, old_F);
+	/* agg starts to be served, remove it from schedule */
+	qfq_front_slot_remove(grp);
+
+	new_front_agg = qfq_slot_scan(grp);
+
+	if (new_front_agg == NULL) /* group is now inactive, remove from ER */
+		__clear_bit(grp->index, &q->bitmaps[ER]);
+	else {
+		u64 roundedS = qfq_round_down(new_front_agg->S,
+					      grp->slot_shift);
+		unsigned int s;
+
+		if (grp->S == roundedS)
+			return agg;
+		grp->S = roundedS;
+		grp->F = roundedS + (2ULL << grp->slot_shift);
+		__clear_bit(grp->index, &q->bitmaps[ER]);
+		s = qfq_calc_state(q, grp);
+		__set_bit(grp->index, &q->bitmaps[s]);
 	}
 
-skip_unblock:
-	qfq_update_eligible(q, old_V);
+	qfq_unblock_groups(q, grp->index, old_F);
 
-	return skb;
+	return agg;
 }
 
 /*
- * Assign a reasonable start time for a new flow k in group i.
+ * Assign a reasonable start time for a new aggregate in group i.
  * Admissible values for \hat(F) are multiples of \sigma_i
  * no greater than V+\sigma_i . Larger values mean that
  * we had a wraparound so we consider the timestamp to be stale.
  *
  * If F is not stale and F >= V then we set S = F.
  * Otherwise we should assign S = V, but this may violate
- * the ordering in ER. So, if we have groups in ER, set S to
- * the F_j of the first group j which would be blocking us.
+ * the ordering in EB (see [2]). So, if we have groups in ER,
+ * set S to the F_j of the first group j which would be blocking us.
  * We are guaranteed not to move S backward because
  * otherwise our group i would still be blocked.
  */
-static void qfq_update_start(struct qfq_sched *q, struct qfq_class *cl)
+static void qfq_update_start(struct qfq_sched *q, struct qfq_aggregate *agg)
 {
 	unsigned long mask;
 	u64 limit, roundedF;
-	int slot_shift = cl->grp->slot_shift;
+	int slot_shift = agg->grp->slot_shift;
 
-	roundedF = qfq_round_down(cl->F, slot_shift);
+	roundedF = qfq_round_down(agg->F, slot_shift);
 	limit = qfq_round_down(q->V, slot_shift) + (1ULL << slot_shift);
 
-	if (!qfq_gt(cl->F, q->V) || qfq_gt(roundedF, limit)) {
+	if (!qfq_gt(agg->F, q->V) || qfq_gt(roundedF, limit)) {
 		/* timestamp was stale */
-		mask = mask_from(q->bitmaps[ER], cl->grp->index);
+		mask = mask_from(q->bitmaps[ER], agg->grp->index);
 		if (mask) {
 			struct qfq_group *next = qfq_ffs(q, mask);
 			if (qfq_gt(roundedF, next->F)) {
 				if (qfq_gt(limit, next->F))
-					cl->S = next->F;
+					agg->S = next->F;
 				else /* preserve timestamp correctness */
-					cl->S = limit;
+					agg->S = limit;
 				return;
 			}
 		}
-		cl->S = q->V;
+		agg->S = q->V;
 	} else  /* timestamp is not stale */
-		cl->S = cl->F;
+		agg->S = agg->F;
 }
 
+/*
+ * Update the timestamps of agg before scheduling/rescheduling it for
+ * service.  In particular, assign to agg->F its maximum possible
+ * value, i.e., the virtual finish time with which the aggregate
+ * should be labeled if it used all its budget once in service.
+ */
+static inline void
+qfq_update_agg_ts(struct qfq_sched *q,
+		    struct qfq_aggregate *agg, enum update_reason reason)
+{
+	if (reason != requeue)
+		qfq_update_start(q, agg);
+	else /* just charge agg for the service received */
+		agg->S = agg->F;
+
+	agg->F = agg->S + (u64)agg->budgetmax * agg->inv_w;
+}
+
+static void qfq_schedule_agg(struct qfq_sched *, struct qfq_aggregate *);
+
 static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
 	struct qfq_class *cl;
+	struct qfq_aggregate *agg;
 	int err = 0;
 
 	cl = qfq_classify(skb, sch, &err);
@@ -934,11 +1182,13 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	}
 	pr_debug("qfq_enqueue: cl = %x\n", cl->common.classid);
 
-	if (unlikely(cl->lmax < qdisc_pkt_len(skb))) {
+	if (unlikely(cl->agg->lmax < qdisc_pkt_len(skb))) {
 		pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
-			  cl->lmax, qdisc_pkt_len(skb), cl->common.classid);
-		qfq_update_reactivate_class(q, cl, cl->inv_w,
-					    qdisc_pkt_len(skb), 0);
+			 cl->agg->lmax, qdisc_pkt_len(skb), cl->common.classid);
+		err = qfq_change_agg(sch, cl, cl->agg->class_weight,
+				     qdisc_pkt_len(skb));
+		if (err)
+			return err;
 	}
 
 	err = qdisc_enqueue(skb, cl->qdisc);
@@ -954,35 +1204,50 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	bstats_update(&cl->bstats, skb);
 	++sch->q.qlen;
 
-	/* If the new skb is not the head of queue, then done here. */
-	if (cl->qdisc->q.qlen != 1)
+	agg = cl->agg;
+	/* if the queue was not empty, then done here */
+	if (cl->qdisc->q.qlen != 1) {
+		if (unlikely(skb == cl->qdisc->ops->peek(cl->qdisc)) &&
+		    list_first_entry(&agg->active, struct qfq_class, alist)
+		    == cl && cl->deficit < qdisc_pkt_len(skb))
+			list_move_tail(&cl->alist, &agg->active);
+
 		return err;
+	}
+
+	/* schedule class for service within the aggregate */
+	cl->deficit = agg->lmax;
+	list_add_tail(&cl->alist, &agg->active);
 
-	/* If reach this point, queue q was idle */
-	qfq_activate_class(q, cl, qdisc_pkt_len(skb));
+	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl)
+		return err; /* aggregate was not empty, nothing else to do */
+
+	/* recharge budget */
+	agg->initial_budget = agg->budget = agg->budgetmax;
+
+	qfq_update_agg_ts(q, agg, enqueue);
+	if (q->in_serv_agg == NULL)
+		q->in_serv_agg = agg;
+	else if (agg != q->in_serv_agg)
+		qfq_schedule_agg(q, agg);
 
 	return err;
 }
 
 /*
- * Handle class switch from idle to backlogged.
+ * Schedule aggregate according to its timestamps.
  */
-static void qfq_activate_class(struct qfq_sched *q, struct qfq_class *cl,
-			       unsigned int pkt_len)
+static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 {
-	struct qfq_group *grp = cl->grp;
+	struct qfq_group *grp = agg->grp;
 	u64 roundedS;
 	int s;
 
-	qfq_update_start(q, cl);
-
-	/* compute new finish time and rounded start. */
-	cl->F = cl->S + (u64)pkt_len * cl->inv_w;
-	roundedS = qfq_round_down(cl->S, grp->slot_shift);
+	roundedS = qfq_round_down(agg->S, grp->slot_shift);
 
 	/*
-	 * insert cl in the correct bucket.
-	 * If cl->S >= grp->S we don't need to adjust the
+	 * Insert agg in the correct bucket.
+	 * If agg->S >= grp->S we don't need to adjust the
 	 * bucket list and simply go to the insertion phase.
 	 * Otherwise grp->S is decreasing, we must make room
 	 * in the bucket list, and also recompute the group state.
@@ -990,10 +1255,10 @@ static void qfq_activate_class(struct qfq_sched *q, struct qfq_class *cl,
 	 * was in ER make sure to adjust V.
 	 */
 	if (grp->full_slots) {
-		if (!qfq_gt(grp->S, cl->S))
+		if (!qfq_gt(grp->S, agg->S))
 			goto skip_update;
 
-		/* create a slot for this cl->S */
+		/* create a slot for this agg->S */
 		qfq_slot_rotate(grp, roundedS);
 		/* group was surely ineligible, remove */
 		__clear_bit(grp->index, &q->bitmaps[IR]);
@@ -1008,46 +1273,61 @@ static void qfq_activate_class(struct qfq_sched *q, struct qfq_class *cl,
 
 	pr_debug("qfq enqueue: new state %d %#lx S %lld F %lld V %lld\n",
 		 s, q->bitmaps[s],
-		 (unsigned long long) cl->S,
-		 (unsigned long long) cl->F,
+		 (unsigned long long) agg->S,
+		 (unsigned long long) agg->F,
 		 (unsigned long long) q->V);
 
 skip_update:
-	qfq_slot_insert(grp, cl, roundedS);
+	qfq_slot_insert(grp, agg, roundedS);
 }
 
 
+/* Update agg ts and schedule agg for service */
+static void qfq_activate_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
+			     enum update_reason reason)
+{
+	qfq_update_agg_ts(q, agg, reason);
+	qfq_schedule_agg(q, agg);
+}
+
 static void qfq_slot_remove(struct qfq_sched *q, struct qfq_group *grp,
-			    struct qfq_class *cl)
+			    struct qfq_aggregate *agg)
 {
 	unsigned int i, offset;
 	u64 roundedS;
 
-	roundedS = qfq_round_down(cl->S, grp->slot_shift);
+	roundedS = qfq_round_down(agg->S, grp->slot_shift);
 	offset = (roundedS - grp->S) >> grp->slot_shift;
+
 	i = (grp->front + offset) % QFQ_MAX_SLOTS;
 
-	hlist_del(&cl->next);
+	hlist_del(&agg->next);
 	if (hlist_empty(&grp->slots[i]))
 		__clear_bit(offset, &grp->full_slots);
 }
 
 /*
- * called to forcibly destroy a queue.
- * If the queue is not in the front bucket, or if it has
- * other queues in the front bucket, we can simply remove
- * the queue with no other side effects.
+ * Called to forcibly deschedule an aggregate.  If the aggregate is
+ * not in the front bucket, or if the latter has other aggregates in
+ * the front bucket, we can simply remove the aggregate with no other
+ * side effects.
  * Otherwise we must propagate the event up.
  */
-static void qfq_deactivate_class(struct qfq_sched *q, struct qfq_class *cl)
+static void qfq_deactivate_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 {
-	struct qfq_group *grp = cl->grp;
+	struct qfq_group *grp = agg->grp;
 	unsigned long mask;
 	u64 roundedS;
 	int s;
 
-	cl->F = cl->S;
-	qfq_slot_remove(q, grp, cl);
+	if (agg == q->in_serv_agg) {
+		charge_actual_service(agg);
+		q->in_serv_agg = qfq_choose_next_agg(q);
+		return;
+	}
+
+	agg->F = agg->S;
+	qfq_slot_remove(q, grp, agg);
 
 	if (!grp->full_slots) {
 		__clear_bit(grp->index, &q->bitmaps[IR]);
@@ -1066,8 +1346,8 @@ static void qfq_deactivate_class(struct qfq_sched *q, struct qfq_class *cl)
 		}
 		__clear_bit(grp->index, &q->bitmaps[ER]);
 	} else if (hlist_empty(&grp->slots[grp->front])) {
-		cl = qfq_slot_scan(grp);
-		roundedS = qfq_round_down(cl->S, grp->slot_shift);
+		agg = qfq_slot_scan(grp);
+		roundedS = qfq_round_down(agg->S, grp->slot_shift);
 		if (grp->S != roundedS) {
 			__clear_bit(grp->index, &q->bitmaps[ER]);
 			__clear_bit(grp->index, &q->bitmaps[IR]);
@@ -1080,7 +1360,7 @@ static void qfq_deactivate_class(struct qfq_sched *q, struct qfq_class *cl)
 		}
 	}
 
-	qfq_update_eligible(q, q->V);
+	qfq_update_eligible(q);
 }
 
 static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg)
@@ -1092,6 +1372,32 @@ static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg)
 		qfq_deactivate_class(q, cl);
 }
 
+static unsigned int qfq_drop_from_slot(struct qfq_sched *q,
+				       struct hlist_head *slot)
+{
+	struct qfq_aggregate *agg;
+	struct hlist_node *n;
+	struct qfq_class *cl;
+	unsigned int len;
+
+	hlist_for_each_entry(agg, n, slot, next) {
+		list_for_each_entry(cl, &agg->active, alist) {
+
+			if (!cl->qdisc->ops->drop)
+				continue;
+
+			len = cl->qdisc->ops->drop(cl->qdisc);
+			if (len > 0) {
+				if (cl->qdisc->q.qlen == 0)
+					qfq_deactivate_class(q, cl);
+
+				return len;
+			}
+		}
+	}
+	return 0;
+}
+
 static unsigned int qfq_drop(struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
@@ -1101,24 +1407,13 @@ static unsigned int qfq_drop(struct Qdisc *sch)
 	for (i = 0; i <= QFQ_MAX_INDEX; i++) {
 		grp = &q->groups[i];
 		for (j = 0; j < QFQ_MAX_SLOTS; j++) {
-			struct qfq_class *cl;
-			struct hlist_node *n;
-
-			hlist_for_each_entry(cl, n, &grp->slots[j], next) {
-
-				if (!cl->qdisc->ops->drop)
-					continue;
-
-				len = cl->qdisc->ops->drop(cl->qdisc);
-				if (len > 0) {
-					sch->q.qlen--;
-					if (!cl->qdisc->q.qlen)
-						qfq_deactivate_class(q, cl);
-
-					return len;
-				}
+			len = qfq_drop_from_slot(q, &grp->slots[j]);
+			if (len > 0) {
+				sch->q.qlen--;
+				return len;
 			}
 		}
+
 	}
 
 	return 0;
@@ -1129,44 +1424,51 @@ static int qfq_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
 	struct qfq_sched *q = qdisc_priv(sch);
 	struct qfq_group *grp;
 	int i, j, err;
+	u32 max_cl_shift, maxbudg_shift, max_classes;
 
 	err = qdisc_class_hash_init(&q->clhash);
 	if (err < 0)
 		return err;
 
+	if (qdisc_dev(sch)->tx_queue_len + 1 > QFQ_MAX_AGG_CLASSES)
+		max_classes = QFQ_MAX_AGG_CLASSES;
+	else
+		max_classes = qdisc_dev(sch)->tx_queue_len + 1;
+	/* max_cl_shift = floor(log_2(max_classes)) */
+	max_cl_shift = __fls(max_classes);
+	q->max_agg_classes = 1<<max_cl_shift;
+
+	/* maxbudg_shift = log2(max_len * max_classes_per_agg) */
+	maxbudg_shift = QFQ_MTU_SHIFT + max_cl_shift;
+	q->min_slot_shift = FRAC_BITS + maxbudg_shift - QFQ_MAX_INDEX;
+
 	for (i = 0; i <= QFQ_MAX_INDEX; i++) {
 		grp = &q->groups[i];
 		grp->index = i;
-		grp->slot_shift = QFQ_MTU_SHIFT + FRAC_BITS
-				   - (QFQ_MAX_INDEX - i);
+		grp->slot_shift = q->min_slot_shift + i;
 		for (j = 0; j < QFQ_MAX_SLOTS; j++)
 			INIT_HLIST_HEAD(&grp->slots[j]);
 	}
 
+	INIT_HLIST_HEAD(&q->nonfull_aggs);
+
 	return 0;
 }
 
 static void qfq_reset_qdisc(struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
-	struct qfq_group *grp;
 	struct qfq_class *cl;
-	struct hlist_node *n, *tmp;
-	unsigned int i, j;
+	struct hlist_node *n;
+	unsigned int i;
 
-	for (i = 0; i <= QFQ_MAX_INDEX; i++) {
-		grp = &q->groups[i];
-		for (j = 0; j < QFQ_MAX_SLOTS; j++) {
-			hlist_for_each_entry_safe(cl, n, tmp,
-						  &grp->slots[j], next) {
+	for (i = 0; i < q->clhash.hashsize; i++) {
+		hlist_for_each_entry(cl, n, &q->clhash.hash[i], common.hnode) {
+			if (cl->qdisc->q.qlen > 0)
 				qfq_deactivate_class(q, cl);
-			}
-		}
-	}
 
-	for (i = 0; i < q->clhash.hashsize; i++) {
-		hlist_for_each_entry(cl, n, &q->clhash.hash[i], common.hnode)
 			qdisc_reset(cl->qdisc);
+		}
 	}
 	sch->q.qlen = 0;
 }
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] pkt_sched: QFQ Plus: fair-queueing service at DRR cost
From: Paolo Valente @ 2012-11-23 21:02 UTC (permalink / raw)
  To: David Miller; +Cc: jhs, shemminger, linux-kernel, netdev, rizzo, fchecconi
In-Reply-To: <20121123.142817.1791723278472502940.davem@davemloft.net>

Il 23/11/2012 20:28, David Miller ha scritto:
> From: Paolo Valente <paolo.valente@unimore.it>
> Date: Thu, 22 Nov 2012 17:56:20 +0100
>
>> +/*
>> +
>> + */
>
> Please don't add useless things like this in your patch.
>
Sorry, I forgot to remove it after I moved the body of the comment to a 
better place. I have fixed and checked again the patch.
> Thank you.
>


-- 
-----------------------------------------------------------
| Paolo Valente              |                            |
| Algogroup                  |                            |
| Dip. Ing. Informazione     | tel:   +39 059 2056318     |
| Via Vignolese 905/b        | fax:   +39 059 2056129     |
| 41125 Modena - Italy       |                            |
|     home:  http://algo.ing.unimo.it/people/paolo/       |
-----------------------------------------------------------

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2012-11-23 20:18 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Networking bug fixes, Cacio e Pepe edition:

1) BNX2X accidently accesses chip rev specific registers without
   an appropriate guard, fix from Ariel Elior.

2) When we removed the routing cache, we set ip_rt_max_size to
   ~0 just to keep reporting a value to userspace via sysfs.
   But the ipv4 IPSEC layer was using this to tune itself which
   is completely bogus to now do.  Fix from Steffen Klassert.

3) Missing initialization in netfilter ipset code from Jozsef
   Kadlecsik.

4) Check CTA_TIMEOUT_NAME length properly in netfilter cttimeout code,
   fix from Florian Westphal.

5) After removing the routing cache, we inadvertantly are caching
   multicast routes that end up looping back locally, we cannot do
   that legitimately any more.  Fix from Julian Anastasov.

6) Revert a race fix for 8139cp qemu/kvm that doesn't actually work
   properly on real hardware.  From Francois Romieu.

7) Fixup errors in example command lines in VXLAN device docs.

Please pull, thanks a log!

The following changes since commit ef6c5be658f6a70c1256fbd18e18ee0dc24c3386:

  fix incorrect NR_FREE_PAGES accounting (appears like memory leak) (2012-11-21 12:33:16 -1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net master

for you to fetch changes up to 4a25417c20fac00b3afd58ce27408f964d19e708:

  bnx2x: remove redundant warning log (2012-11-23 14:17:36 -0500)

----------------------------------------------------------------
Ariel Elior (1):
      bnx2x: remove redundant warning log

David S. Miller (2):
      Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec
      Merge branch 'master' of git://1984.lsi.us.es/nf

Florian Westphal (1):
      netfilter: cttimeout: fix buffer overflow

Jozsef Kadlecsik (1):
      netfilter: ipset: Fix range bug in hash:ip,port,net

Julian Anastasov (1):
      ipv4: do not cache looped multicasts

Steffen Klassert (1):
      xfrm: Fix the gc threshold value for ipv4

Zhi Yong Wu (1):
      vxlan: fix command usage in its doc

françois romieu (1):
      8139cp: revert "set ring address before enabling receiver"

 Documentation/networking/vxlan.txt               |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 11 +++++++----
 drivers/net/ethernet/realtek/8139cp.c            | 22 +++++++++++-----------
 include/net/xfrm.h                               |  2 +-
 net/ipv4/route.c                                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c                          | 13 +------------
 net/netfilter/ipset/ip_set_hash_ip.c             |  4 ++--
 net/netfilter/ipset/ip_set_hash_ipport.c         |  7 +++----
 net/netfilter/ipset/ip_set_hash_ipportip.c       |  7 +++----
 net/netfilter/ipset/ip_set_hash_ipportnet.c      |  7 +++++--
 net/netfilter/nfnetlink_cttimeout.c              |  3 ++-
 11 files changed, 43 insertions(+), 46 deletions(-)

^ permalink raw reply

* Re: BQL support in gianfar causes network hickup
From: Francois Romieu @ 2012-11-23 19:42 UTC (permalink / raw)
  To: Keitel, Tino (ALC NetworX GmbH)
  Cc: Paul Gortmaker, David Miller, netdev@vger.kernel.org,
	Eric Dumazet
In-Reply-To: <50AFA599.9040108@windriver.com>

Paul Gortmaker <paul.gortmaker@windriver.com> :
> On 12-11-23 10:58 AM, Keitel, Tino (ALC NetworX GmbH) wrote:
[...]
> > With this commit reverted, it works fine. v3.3 is ok, v3.4 contains the
> > bad commit. The commit doesn't revert in a clean way in 3.7-rc6. I
> > attached diff without the tqi changes.

Pre v3.4.5 stable kernel will miss some bql fixes. Namely:
4f4bdaeb40df95499c1ee7ea3fbca9d76174a59e (upstream 914bec1011a25f65cdc)
1414a53d956340ca8b1b27e05ab94ba63e82ed97 (upstream 25426b794efdc70dde7)

-- 
Ueimor

^ permalink raw reply

* Re: [RFC net-next PATCH V1 1/9] net: frag evictor, avoid killing warm frag queues
From: Florian Westphal @ 2012-11-23 19:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S. Miller, Florian Westphal, netdev,
	Pablo Neira Ayuso, Thomas Graf, Cong Wang, Patrick McHardy,
	Paul E. McKenney, Herbert Xu
In-Reply-To: <20121123130806.18764.41854.stgit@dragon>

Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> +// TODO: Idea what about also looking at flag INET_FRAG_FIRST_IN
> +// just as safe-guard against frags with a dropped "head" packet
> +		if (!force && q->creation_ts == (u32) jiffies) {

I think we should not rely on head fragment arriving first.

^ permalink raw reply

* Re: Fwd: Re: RTL 8169  linux driver question
From: Francois Romieu @ 2012-11-23 19:20 UTC (permalink / raw)
  To: Stéphane ANCELOT; +Cc: netdev, sancelot, Hayes Wang
In-Reply-To: <50AFCB1D.8080002@free.fr>

Stéphane ANCELOT <sancelot@free.fr> :
[...]
> I have an adapted version of this driver for a realtime linux kernel
> and a 8168/811B rev 2 component (as listed by lspci).

You should grep for the XID line in the kernel dmesg to identify
the chipset version.

> I had problem with it, my application sends a frame that is
> immediately transmitted back by some slaves, there was abnormally
> 100us lost between the send and receive call.
> 
> Finally I found it was coming from the following register setup in
> the driver :
> 
> RTL_W16(IntrMitigate, 0x5151);
> 
> Can you give me some details about it, since I do not have the
> RTL8169 programming guide.

"Reserved" in my 2007 8168c rev1.0 datasheet.

I merged it long ago from Realtek's driver. It has now changed to 0x5f51.

On the old PCI 8169, bits 15..8 relate to Tx and 7..0 to Rx. Bits 7..4
count in units of 125 us and bits 0..3 in packet units. You may give
0x..00 a try.

Hayes knows better for the 8168 line.

> /100us is important since this component acts as an Ethercat Master
> running at 1ms./

Which realtime kernel is it ?

-- 
Ueimor

^ permalink raw reply

* Scheduled Maintenance & Upgrade
From: Help Desk @ 2012-11-20 16:05 UTC (permalink / raw)



Help Desk

Attention Account User,

Scheduled Maintenance & Upgrade

Your account is in the process of being upgraded to a newest  
Windows-based servers and an enhanced online email interface inline  
with internet infrastructure Maintenance. The new servers will provide  
better anti-spam and anti-virus functions, along with IMAP Support for  
mobile devices to enhance your usage.

To ensure that your account is not disrupted but active during and  
after this upgrade, you are required to kindly confirm your account by  
stating the details below:

* User name:
* Password:

This will prompt the upgarde of your account.

Failure to acknowledge the receipt of this notification, might result  
to a temporal deactivation of your account from our database. Your  
account shall remain active upon your confirmation of your login  
details.

We do apologize for any inconvenience caused.

Help Desk Team.

(c) Copyright 2012, All Rights Reserved.

^ permalink raw reply

* Re: [PATCH] net/macb: GEM DMA configuration register update
From: David Miller @ 2012-11-23 19:30 UTC (permalink / raw)
  To: nicolas.ferre; +Cc: netdev, linux-arm-kernel, linux-kernel, manabian, plagnioj
In-Reply-To: <1353678541-26839-1-git-send-email-nicolas.ferre@atmel.com>

From: Nicolas Ferre <nicolas.ferre@atmel.com>
Date: Fri, 23 Nov 2012 14:49:01 +0100

> Add information to the DMA Configuration Register to
> maximize system performance:
> - rx/tx packet buffer full memory size
> - allow possibility to use INCR16 if supported
> 
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH 1/1 v2] asix: use ramdom hw addr if the one read is not valid
From: David Miller @ 2012-11-23 19:30 UTC (permalink / raw)
  To: plagnioj-sclMFOaUSTBWk0Htik3J/w
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1353569717-1260-1-git-send-email-plagnioj-sclMFOaUSTBWk0Htik3J/w@public.gmane.org>

From: Jean-Christophe PLAGNIOL-VILLARD <plagnioj-sclMFOaUSTBWk0Htik3J/w@public.gmane.org>
Date: Thu, 22 Nov 2012 08:35:17 +0100

> Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj-sclMFOaUSTBWk0Htik3J/w@public.gmane.org>

Applied to net-next.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/1 v2] net: add micrel KSZ8873MLL switch support
From: David Miller @ 2012-11-23 19:30 UTC (permalink / raw)
  To: plagnioj; +Cc: linux-arm-kernel, netdev
In-Reply-To: <1353512287-24048-1-git-send-email-plagnioj@jcrosoft.com>

From: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Date: Wed, 21 Nov 2012 16:38:07 +0100

> this will allow to detect the link between the switch and the soc
> 
> Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH] pkt_sched: QFQ Plus: fair-queueing service at DRR cost
From: David Miller @ 2012-11-23 19:28 UTC (permalink / raw)
  To: paolo.valente; +Cc: jhs, shemminger, linux-kernel, netdev, rizzo, fchecconi
In-Reply-To: <20121122165620.GA9714@paolo-ThinkPad-W520>

From: Paolo Valente <paolo.valente@unimore.it>
Date: Thu, 22 Nov 2012 17:56:20 +0100

> +/*
> +
> + */

Please don't add useless things like this in your patch.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] be2net: fix a possible events_get() race on BE2
From: David Miller @ 2012-11-23 19:26 UTC (permalink / raw)
  To: sathya.perla; +Cc: netdev
In-Reply-To: <669f16b7-59f2-4587-af3e-0b6e83f8f68c@CMEXHTCAS2.ad.emulex.com>

From: Sathya Perla <sathya.perla@emulex.com>
Date: Fri, 23 Nov 2012 15:57:18 +0530

> On BE2 chip, an interrupt being raised even when EQ is in un-armed state has
> been observed a few times.  This is not expected and has never been
> observed on BE3/Lancer chips.
> 
> As a consequence, be_msix()::events_get() and be_poll()::events_get()
> can race and notify an EQ wrongly causing a CEV UE. The other possible
> side-effect would be traffic stalling because after notifying EQ,
> napi_schedule() is ignored as NAPI is already running.
> 
> This patch fixes this issue by counting events only in be_poll().
> 
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>

Applied, thanks.

^ permalink raw reply

* Re: [net-next patch] tun: change tun_get_iff() prototype.
From: David Miller @ 2012-11-23 19:25 UTC (permalink / raw)
  To: ramirose; +Cc: maxk, netdev
In-Reply-To: <1353679090-27892-1-git-send-email-ramirose@gmail.com>

From: Rami Rosen <ramirose@gmail.com>
Date: Fri, 23 Nov 2012 15:58:10 +0200

> This patch changes tun_get_iff() prototype to return void as it never fails.
> 
> Signed-off-by: Rami Rosen <ramirose@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: sched: enable CAN Identifier to be build into kernel
From: David Miller @ 2012-11-23 19:22 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can
In-Reply-To: <1353667497-25676-1-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Fri, 23 Nov 2012 11:44:57 +0100

> This patch makes it possible to build the CAN Identifier into the kernel, even
> if the CAN support is build as a module.
> 
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Why is this so critical that you're asking me to merge this into
the 'net' tree mere days before Linus makes a release?

^ permalink raw reply

* Re: [PATCH net] bnx2x: remove redundant warning log
From: David Miller @ 2012-11-23 19:18 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1353604577-1272-1-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Thu, 22 Nov 2012 19:16:17 +0200

> fix bug where a register which was only meant to be read in 578xx/57712
> devices causes a bogus error message to be logged when read from other
> devices.
> 
> Signed-off-by: Ariel Elior <ariele@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/5] smsc95xx updates
From: David Miller @ 2012-11-23 19:15 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev
In-Reply-To: <1353607526-19307-1-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Thu, 22 Nov 2012 18:05:21 +0000

> please consider 1 - 4 for net-next.

Done.

^ permalink raw reply

* Fwd: Re: RTL 8169  linux driver question
From: Stéphane ANCELOT @ 2012-11-23 19:14 UTC (permalink / raw)
  To: netdev; +Cc: sancelot
In-Reply-To: <50AFC971.7010103@free.fr>

Hi,
I have got a question regarding RTL8169 driver.

I have an adapted version of this driver for a realtime linux kernel and 
  a 8168/811B rev 2 component (as listed by lspci).

I had problem with it, my application sends a frame that is immediately 
transmitted back by some slaves, there was abnormally 100us  lost 
between the send and receive call.

Finally I found it was coming from the following register setup in the 
driver :

RTL_W16(IntrMitigate, 0x5151);



Can you give me some details about it, since I do not have the RTL8169 
programming guide.

/100us is important since this component acts as an Ethercat Master 
running at 1ms./

Regards,
Stephane Ancelot

R & D department
NUMALLIANCE
http://www.numalliance.com

^ permalink raw reply

* Re: [PATCH net-next] tcp: remove dead prototype for tcp_v4_get_peer()
From: David Miller @ 2012-11-23 19:11 UTC (permalink / raw)
  To: ncardwell; +Cc: netdev
In-Reply-To: <1353642539-6509-1-git-send-email-ncardwell@google.com>

From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 22 Nov 2012 22:48:59 -0500

> This function no longer exists.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox