* [PATCH net-next 0/9] bonding: remove bond_next_slave()
From: Veaceslav Falico @ 2013-09-27 14:11 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Jay Vosburgh, Andy Gospodarek,
Veaceslav Falico
(on top of "[PATCH net-next 0/2] bonding: fix 3ad slave (de)init" - the
patchset is essential)
Hi,
As Ben Hutchings and Nikolay Aleksandrov correctly noted -
bond_next_slave() already is not O(1), but rather O(n), so it's only useful
for one-off operations and shouldn't be used widely - for example, for list
traversal, which will take O(n^2) time, which will be disastrous for any
hot path with a large number of slaves.
Currently, bond_next_slave() is used several times in 802.3ad and for
seq_read - bond_info_seq_next().
The 802.3ad part uses it only in constructs like:
for (X = __get_first_X(); X; X = __get_next_X()) {
where __get_next_X() uses bond_next_slave().
This for can (and should) actually be replaced by the standard
bond_for_each_slave(bond, slave, iter) {
X = __get_X_by_slave(slave);
it's faster, easier to read, debug and maintain. Also, removes a lot of
code lines.
After replacing it, the only user of bond_next_slave() is
bond_info_seq_next() - which can, actually, implement it by itself, and not
call another function.
So, that way, we can completely remove the bond_next_slave(), cleanup and
optimize a bit.
p.s. the 802.3ad code is horrible, both style-wise and from the logical
point of view - so I've decided to *not* change anything except that what
this patch is intended to provide. The cleanup and some refactoring should
be done in another patchset, which I've began to work on already.
Thank you!
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 137 ++++++++++++++------------------------
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_procfs.c | 16 +++--
drivers/net/bonding/bonding.h | 31 ---------
4 files changed, 62 insertions(+), 124 deletions(-)
^ permalink raw reply
* [PATCH net-next 1/9] bonding: remove __get_next_port()
From: Veaceslav Falico @ 2013-09-27 14:11 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
Currently this function is only used in constructs like
for (port = __get_first_port(bond); port; port = __get_next_port(port))
which is basicly the same as
bond_for_each_slave(bond, slave, iter) {
port = &(SLAVE_AD_INFO(slave).port);
but a more time consuming.
Remove the function and convert the users to bond_for_each_slave().
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 34 +++++++++-------------------------
1 file changed, 9 insertions(+), 25 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 2715ea8..c1535f8 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -149,28 +149,6 @@ static inline struct port *__get_first_port(struct bonding *bond)
}
/**
- * __get_next_port - get the next port in the bond
- * @port: the port we're looking at
- *
- * Return the port of the slave that is next in line of @port's slave in the
- * bond, or %NULL if it can't be found.
- */
-static inline struct port *__get_next_port(struct port *port)
-{
- struct bonding *bond = __get_bond_by_port(port);
- struct slave *slave = port->slave, *slave_next;
-
- // If there's no bond for this port, or this is the last slave
- if (bond == NULL)
- return NULL;
- slave_next = bond_next_slave(bond, slave);
- if (!slave_next || bond_is_first_slave(bond, slave_next))
- return NULL;
-
- return &(SLAVE_AD_INFO(slave_next).port);
-}
-
-/**
* __get_first_agg - get the first aggregator in the bond
* @bond: the bond we're looking at
*
@@ -2113,8 +2091,10 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
{
struct bonding *bond = container_of(work, struct bonding,
ad_work.work);
- struct port *port;
struct aggregator *aggregator;
+ struct list_head *iter;
+ struct slave *slave;
+ struct port *port;
read_lock(&bond->lock);
@@ -2139,7 +2119,8 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
}
// for each port run the state machines
- for (port = __get_first_port(bond); port; port = __get_next_port(port)) {
+ bond_for_each_slave(bond, slave, iter) {
+ port = &(SLAVE_AD_INFO(slave).port);
if (!port->slave) {
pr_warning("%s: Warning: Found an uninitialized port\n",
bond->dev->name);
@@ -2384,9 +2365,12 @@ int __bond_3ad_get_active_agg_info(struct bonding *bond,
struct ad_info *ad_info)
{
struct aggregator *aggregator = NULL;
+ struct list_head *iter;
+ struct slave *slave;
struct port *port;
- for (port = __get_first_port(bond); port; port = __get_next_port(port)) {
+ bond_for_each_slave(bond, slave, iter) {
+ port = &(SLAVE_AD_INFO(slave).port);
if (port->aggregator && port->aggregator->is_active) {
aggregator = port->aggregator;
break;
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 2/9] bonding: remove __get_first_port()
From: Veaceslav Falico @ 2013-09-27 14:11 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
Currently we have only one user of it, so it's kind of useless and just
obfusicates things.
Remove it and move the logic to the only user -
bond_3ad_state_machine_handler().
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c1535f8..0f86d2b 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -136,19 +136,6 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
}
/**
- * __get_first_port - get the first port in the bond
- * @bond: the bond we're looking at
- *
- * Return the port of the first slave in @bond, or %NULL if it can't be found.
- */
-static inline struct port *__get_first_port(struct bonding *bond)
-{
- struct slave *first_slave = bond_first_slave(bond);
-
- return first_slave ? &(SLAVE_AD_INFO(first_slave).port) : NULL;
-}
-
-/**
* __get_first_agg - get the first aggregator in the bond
* @bond: the bond we're looking at
*
@@ -2104,8 +2091,11 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
// check if agg_select_timer timer after initialize is timed out
if (BOND_AD_INFO(bond).agg_select_timer && !(--BOND_AD_INFO(bond).agg_select_timer)) {
+ slave = bond_first_slave(bond);
+ port = slave ? &(SLAVE_AD_INFO(slave).port) : NULL;
+
// select the active aggregator for the bond
- if ((port = __get_first_port(bond))) {
+ if (port) {
if (!port->slave) {
pr_warning("%s: Warning: bond's first port is uninitialized\n",
bond->dev->name);
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 3/9] bonding: make ad_port_selection_logic() use bond_for_each_slave()
From: Veaceslav Falico @ 2013-09-27 14:11 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
Currently, ad_port_selection_logic() uses
for (aggregator = __get_first_agg(port); aggregator;
aggregator = __get_next_agg(aggregator)) {
construct, however it's suboptimal, difficult to read and understand.
Change it to a standard bond_for_each_slave(), so that we won't need
__get_first/next_agg() and have it more readable.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0f86d2b..8b64ed4 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1239,12 +1239,17 @@ static void ad_port_selection_logic(struct port *port)
{
struct aggregator *aggregator, *free_aggregator = NULL, *temp_aggregator;
struct port *last_port = NULL, *curr_port;
+ struct list_head *iter;
+ struct bonding *bond;
+ struct slave *slave;
int found = 0;
// if the port is already Selected, do nothing
if (port->sm_vars & AD_PORT_SELECTED)
return;
+ bond = __get_bond_by_port(port);
+
// if the port is connected to other aggregator, detach it
if (port->aggregator) {
// detach the port from its former aggregator
@@ -1285,8 +1290,8 @@ static void ad_port_selection_logic(struct port *port)
}
}
// search on all aggregators for a suitable aggregator for this port
- for (aggregator = __get_first_agg(port); aggregator;
- aggregator = __get_next_agg(aggregator)) {
+ bond_for_each_slave(bond, slave, iter) {
+ aggregator = &(SLAVE_AD_INFO(slave).aggregator);
// keep a free aggregator for later use(if needed)
if (!aggregator->lag_ports) {
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 4/9] bonding: make __get_active_agg() use bond_for_each_slave()
From: Veaceslav Falico @ 2013-09-27 14:12 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
Currently we're relying on suboptimal construct
for (; aggregator; aggregator = __get_next_agg(aggregator)) {
where aggregator is an argument of __get_active_agg() which is _always_ the
first slave's aggregator - judging by all the callers, comments in the
ad_agg_selection_logic() and by logic.
Convert it to use the standard bond_for_each_slave().
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 8b64ed4..1109d82 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -720,16 +720,15 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
*/
static struct aggregator *__get_active_agg(struct aggregator *aggregator)
{
- struct aggregator *retval = NULL;
+ struct bonding *bond = aggregator->slave->bond;
+ struct list_head *iter;
+ struct slave *slave;
- for (; aggregator; aggregator = __get_next_agg(aggregator)) {
- if (aggregator->is_active) {
- retval = aggregator;
- break;
- }
- }
+ bond_for_each_slave(bond, slave, iter)
+ if (SLAVE_AD_INFO(slave).aggregator.is_active)
+ return &(SLAVE_AD_INFO(slave).aggregator);
- return retval;
+ return NULL;
}
/**
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 5/9] bonding: make ad_agg_selection_logic() use bond_for_each_slave()
From: Veaceslav Falico @ 2013-09-27 14:12 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
Convert all instances of
for (agg = __get_first_agg(); agg; agg = __get_next_port)
to the standard bond_for_each_slave(). Also, remove the useless checks
before calling bond_3ad_set_carrier() - if we have something NULL - it
would fire long ago, in __get_first/next_port(), per example.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 1109d82..6cd1444 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1484,19 +1484,23 @@ static int agg_device_up(const struct aggregator *agg)
static void ad_agg_selection_logic(struct aggregator *agg)
{
struct aggregator *best, *active, *origin;
+ struct bonding *bond = agg->slave->bond;
+ struct list_head *iter;
+ struct slave *slave;
struct port *port;
origin = agg;
active = __get_active_agg(agg);
best = (active && agg_device_up(active)) ? active : NULL;
- do {
+ bond_for_each_slave(bond, slave, iter) {
+ agg = &(SLAVE_AD_INFO(slave).aggregator);
+
agg->is_active = 0;
if (agg->num_of_ports && agg_device_up(agg))
best = ad_agg_selection_test(best, agg);
-
- } while ((agg = __get_next_agg(agg)));
+ }
if (best &&
__get_agg_selection_mode(best->lag_ports) == BOND_AD_STABLE) {
@@ -1534,8 +1538,8 @@ static void ad_agg_selection_logic(struct aggregator *agg)
best->lag_ports, best->slave,
best->slave ? best->slave->dev->name : "NULL");
- for (agg = __get_first_agg(best->lag_ports); agg;
- agg = __get_next_agg(agg)) {
+ bond_for_each_slave(bond, slave, iter) {
+ agg = &(SLAVE_AD_INFO(slave).aggregator);
pr_debug("Agg=%d; P=%d; a k=%d; p k=%d; Ind=%d; Act=%d\n",
agg->aggregator_identifier, agg->num_of_ports,
@@ -1583,13 +1587,7 @@ static void ad_agg_selection_logic(struct aggregator *agg)
}
}
- if (origin->slave) {
- struct bonding *bond;
-
- bond = bond_get_bond_by_slave(origin->slave);
- if (bond)
- bond_3ad_set_carrier(bond);
- }
+ bond_3ad_set_carrier(bond);
}
/**
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 7/9] bonding: remove unused __get_next_agg()
From: Veaceslav Falico @ 2013-09-27 14:12 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
It has no users, so it's safe to remove it completely.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 22 ----------------------
1 file changed, 22 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6dbb84d..c62606a 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -155,28 +155,6 @@ static inline struct aggregator *__get_first_agg(struct port *port)
return first_slave ? &(SLAVE_AD_INFO(first_slave).aggregator) : NULL;
}
-/**
- * __get_next_agg - get the next aggregator in the bond
- * @aggregator: the aggregator we're looking at
- *
- * Return the aggregator of the slave that is next in line of @aggregator's
- * slave in the bond, or %NULL if it can't be found.
- */
-static inline struct aggregator *__get_next_agg(struct aggregator *aggregator)
-{
- struct slave *slave = aggregator->slave, *slave_next;
- struct bonding *bond = bond_get_bond_by_slave(slave);
-
- // If there's no bond for this aggregator, or this is the last slave
- if (bond == NULL)
- return NULL;
- slave_next = bond_next_slave(bond, slave);
- if (!slave_next || bond_is_first_slave(bond, slave_next))
- return NULL;
-
- return &(SLAVE_AD_INFO(slave_next).aggregator);
-}
-
/*
* __agg_has_partner
*
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 6/9] bonding: make bond_3ad_unbind_slave() use bond_for_each_slave()
From: Veaceslav Falico @ 2013-09-27 14:12 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
Convert all instances of
for (agg = __get_first_agg(); agg; agg = __get_next_port)
to the standard bond_for_each_slave().
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_3ad.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6cd1444..6dbb84d 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1936,6 +1936,9 @@ void bond_3ad_unbind_slave(struct slave *slave)
struct port *port, *prev_port, *temp_port;
struct aggregator *aggregator, *new_aggregator, *temp_aggregator;
int select_new_active_agg = 0;
+ struct bonding *bond = slave->bond;
+ struct slave *slave_iter;
+ struct list_head *iter;
// find the aggregator related to this slave
aggregator = &(SLAVE_AD_INFO(slave).aggregator);
@@ -1965,14 +1968,16 @@ void bond_3ad_unbind_slave(struct slave *slave)
// reason to search for new aggregator, and that we will find one
if ((aggregator->lag_ports != port) || (aggregator->lag_ports->next_port_in_aggregator)) {
// find new aggregator for the related port(s)
- new_aggregator = __get_first_agg(port);
- for (; new_aggregator; new_aggregator = __get_next_agg(new_aggregator)) {
+ bond_for_each_slave(bond, slave_iter, iter) {
+ new_aggregator = &(SLAVE_AD_INFO(slave_iter).aggregator);
// if the new aggregator is empty, or it is connected to our port only
if (!new_aggregator->lag_ports
|| ((new_aggregator->lag_ports == port)
&& !new_aggregator->lag_ports->next_port_in_aggregator))
break;
}
+ if (!slave_iter)
+ new_aggregator = NULL;
// if new aggregator found, copy the aggregator's parameters
// and connect the related lag_ports to the new aggregator
if ((new_aggregator) && ((!new_aggregator->lag_ports) || ((new_aggregator->lag_ports == port) && !new_aggregator->lag_ports->next_port_in_aggregator))) {
@@ -2032,8 +2037,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
pr_debug("Unbinding port %d\n", port->actor_port_number);
// find the aggregator that this port is connected to
- temp_aggregator = __get_first_agg(port);
- for (; temp_aggregator; temp_aggregator = __get_next_agg(temp_aggregator)) {
+ bond_for_each_slave(bond, slave_iter, iter) {
+ temp_aggregator = &(SLAVE_AD_INFO(slave_iter).aggregator);
prev_port = NULL;
// search the port in the aggregator's related ports
for (temp_port = temp_aggregator->lag_ports; temp_port;
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 8/9] bonding: don't use bond_next_slave() in bond_info_seq_next()
From: Veaceslav Falico @ 2013-09-27 14:12 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
We don't need the circular loop there and it's the only current user of
bond_next_slave() - so just use the standard bond_for_each_slave().
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_procfs.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c
index 7af5646..fb868d6 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -31,17 +31,25 @@ static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
static void *bond_info_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
struct bonding *bond = seq->private;
- struct slave *slave = v;
+ struct list_head *iter;
+ struct slave *slave;
+ bool found = false;
++*pos;
if (v == SEQ_START_TOKEN)
return bond_first_slave(bond);
- if (bond_is_last_slave(bond, slave))
+ if (bond_is_last_slave(bond, v))
return NULL;
- slave = bond_next_slave(bond, slave);
- return slave;
+ bond_for_each_slave(bond, slave, iter) {
+ if (found)
+ return slave;
+ if (slave == v)
+ found = true;
+ }
+
+ return NULL;
}
static void bond_info_seq_stop(struct seq_file *seq, void *v)
--
1.8.4
^ permalink raw reply related
* [PATCH net-next 9/9] bonding: remove bond_next_slave()
From: Veaceslav Falico @ 2013-09-27 14:12 UTC (permalink / raw)
To: netdev; +Cc: nikolay, bhutchings, Veaceslav Falico, Jay Vosburgh,
Andy Gospodarek
In-Reply-To: <1380291125-5671-1-git-send-email-vfalico@redhat.com>
There are no users left, so it's safe to remove.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bonding.h | 31 -------------------------------
1 file changed, 31 deletions(-)
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 5b71601..713b6af 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -89,9 +89,6 @@
#define bond_is_first_slave(bond, pos) (pos == bond_first_slave(bond))
#define bond_is_last_slave(bond, pos) (pos == bond_last_slave(bond))
-/* Since bond_first/last_slave can return NULL, these can return NULL too */
-#define bond_next_slave(bond, pos) __bond_next_slave(bond, pos)
-
/**
* bond_for_each_slave - iterate over all slaves
* @bond: the bond holding this list
@@ -244,34 +241,6 @@ struct bonding {
((struct slave *) rtnl_dereference(dev->rx_handler_data))
/**
- * __bond_next_slave - get the next slave after the one provided
- * @bond - bonding struct
- * @slave - the slave provided
- *
- * Returns the next slave after the slave provided, first slave if the
- * slave provided is the last slave and NULL if slave is not found
- */
-static inline struct slave *__bond_next_slave(struct bonding *bond,
- struct slave *slave)
-{
- struct slave *slave_iter;
- struct list_head *iter;
- bool found = false;
-
- netdev_for_each_lower_private(bond->dev, slave_iter, iter) {
- if (found)
- return slave_iter;
- if (slave_iter == slave)
- found = true;
- }
-
- if (found)
- return bond_first_slave(bond);
-
- return NULL;
-}
-
-/**
* Returns NULL if the net_device does not belong to any of the bond's slaves
*
* Caller must hold bond lock for read
--
1.8.4
^ permalink raw reply related
* Re: [PATCH net-next] virtio-net: switch to use XPS to choose txq
From: Michael S. Tsirkin @ 2013-09-27 14:35 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <1380261444-40918-1-git-send-email-jasowang@redhat.com>
On Fri, Sep 27, 2013 at 01:57:24PM +0800, Jason Wang wrote:
> We used to use a percpu structure vq_index to record the cpu to queue
> mapping, this is suboptimal since it duplicates the work of XPS and
> loses all other XPS functionality such as allowing use to configure
> their own transmission steering strategy.
>
> So this patch switches to use XPS and suggest a default mapping when
> the number of cpus is equal to the number of queues. With XPS support,
> there's no need for keeping per-cpu vq_index and .ndo_select_queue(),
> so they were removed also.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
More lines deleted that added is good :)
But how does the result perform?
About the same?
> ---
> drivers/net/virtio_net.c | 55 +++++++--------------------------------------
> 1 files changed, 9 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index defec2b..4102c1b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -127,9 +127,6 @@ struct virtnet_info {
> /* Does the affinity hint is set for virtqueues? */
> bool affinity_hint_set;
>
> - /* Per-cpu variable to show the mapping from CPU to virtqueue */
> - int __percpu *vq_index;
> -
> /* CPU hot plug notifier */
> struct notifier_block nb;
> };
> @@ -1063,7 +1060,6 @@ static int virtnet_vlan_rx_kill_vid(struct net_device *dev,
> static void virtnet_clean_affinity(struct virtnet_info *vi, long hcpu)
> {
> int i;
> - int cpu;
>
> if (vi->affinity_hint_set) {
> for (i = 0; i < vi->max_queue_pairs; i++) {
> @@ -1073,20 +1069,11 @@ static void virtnet_clean_affinity(struct virtnet_info *vi, long hcpu)
>
> vi->affinity_hint_set = false;
> }
> -
> - i = 0;
> - for_each_online_cpu(cpu) {
> - if (cpu == hcpu) {
> - *per_cpu_ptr(vi->vq_index, cpu) = -1;
> - } else {
> - *per_cpu_ptr(vi->vq_index, cpu) =
> - ++i % vi->curr_queue_pairs;
> - }
> - }
> }
>
> static void virtnet_set_affinity(struct virtnet_info *vi)
> {
> + cpumask_var_t cpumask;
> int i;
> int cpu;
>
> @@ -1100,15 +1087,21 @@ static void virtnet_set_affinity(struct virtnet_info *vi)
> return;
> }
>
> + if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
> + return;
> +
> i = 0;
> for_each_online_cpu(cpu) {
> virtqueue_set_affinity(vi->rq[i].vq, cpu);
> virtqueue_set_affinity(vi->sq[i].vq, cpu);
> - *per_cpu_ptr(vi->vq_index, cpu) = i;
> + cpumask_clear(cpumask);
> + cpumask_set_cpu(cpu, cpumask);
> + netif_set_xps_queue(vi->dev, cpumask, i);
> i++;
> }
>
> vi->affinity_hint_set = true;
> + free_cpumask_var(cpumask);
> }
>
> static int virtnet_cpu_callback(struct notifier_block *nfb,
> @@ -1217,28 +1210,6 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> return 0;
> }
>
> -/* To avoid contending a lock hold by a vcpu who would exit to host, select the
> - * txq based on the processor id.
> - */
> -static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> -{
> - int txq;
> - struct virtnet_info *vi = netdev_priv(dev);
> -
> - if (skb_rx_queue_recorded(skb)) {
> - txq = skb_get_rx_queue(skb);
> - } else {
> - txq = *__this_cpu_ptr(vi->vq_index);
> - if (txq == -1)
> - txq = 0;
> - }
> -
> - while (unlikely(txq >= dev->real_num_tx_queues))
> - txq -= dev->real_num_tx_queues;
> -
> - return txq;
> -}
> -
> static const struct net_device_ops virtnet_netdev = {
> .ndo_open = virtnet_open,
> .ndo_stop = virtnet_close,
> @@ -1250,7 +1221,6 @@ static const struct net_device_ops virtnet_netdev = {
> .ndo_get_stats64 = virtnet_stats,
> .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
> .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> - .ndo_select_queue = virtnet_select_queue,
> #ifdef CONFIG_NET_POLL_CONTROLLER
> .ndo_poll_controller = virtnet_netpoll,
> #endif
> @@ -1559,10 +1529,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (vi->stats == NULL)
> goto free;
>
> - vi->vq_index = alloc_percpu(int);
> - if (vi->vq_index == NULL)
> - goto free_stats;
> -
> mutex_init(&vi->config_lock);
> vi->config_enable = true;
> INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> @@ -1589,7 +1555,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> /* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> err = init_vqs(vi);
> if (err)
> - goto free_index;
> + goto free_stats;
>
> netif_set_real_num_tx_queues(dev, 1);
> netif_set_real_num_rx_queues(dev, 1);
> @@ -1640,8 +1606,6 @@ free_recv_bufs:
> free_vqs:
> cancel_delayed_work_sync(&vi->refill);
> virtnet_del_vqs(vi);
> -free_index:
> - free_percpu(vi->vq_index);
> free_stats:
> free_percpu(vi->stats);
> free:
> @@ -1678,7 +1642,6 @@ static void virtnet_remove(struct virtio_device *vdev)
>
> flush_work(&vi->config_work);
>
> - free_percpu(vi->vq_index);
> free_percpu(vi->stats);
> free_netdev(vi->dev);
> }
> --
> 1.7.1
^ permalink raw reply
* RE: [PATCH net-next 2/9] bonding: remove __get_first_port()
From: David Laight @ 2013-09-27 14:50 UTC (permalink / raw)
To: Veaceslav Falico, netdev
Cc: nikolay, bhutchings, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1380291125-5671-3-git-send-email-vfalico@redhat.com>
> @@ -2104,8 +2091,11 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>
> // check if agg_select_timer timer after initialize is timed out
> if (BOND_AD_INFO(bond).agg_select_timer && !(--BOND_AD_INFO(bond).agg_select_timer)) {
> + slave = bond_first_slave(bond);
> + port = slave ? &(SLAVE_AD_INFO(slave).port) : NULL;
> +
> // select the active aggregator for the bond
> - if ((port = __get_first_port(bond))) {
> + if (port) {
> if (!port->slave) {
> pr_warning("%s: Warning: bond's first port is uninitialized\n",
> bond->dev->name);
> --
Looks like that could be:
slave = bond_first_slave(bond);
if (slave) {
port = SLAVE_AD_INFO(slave).port;
and I assume 'slave == port->slave' so there is no need for the latter check?
David
^ permalink raw reply
* Re: [PATCH net-next 2/9] bonding: remove __get_first_port()
From: Veaceslav Falico @ 2013-09-27 14:58 UTC (permalink / raw)
To: David Laight; +Cc: netdev, nikolay, bhutchings, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B7365@saturn3.aculab.com>
On Fri, Sep 27, 2013 at 03:50:12PM +0100, David Laight wrote:
>> @@ -2104,8 +2091,11 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>>
>> // check if agg_select_timer timer after initialize is timed out
>> if (BOND_AD_INFO(bond).agg_select_timer && !(--BOND_AD_INFO(bond).agg_select_timer)) {
>> + slave = bond_first_slave(bond);
>> + port = slave ? &(SLAVE_AD_INFO(slave).port) : NULL;
>> +
>> // select the active aggregator for the bond
>> - if ((port = __get_first_port(bond))) {
>> + if (port) {
>> if (!port->slave) {
>> pr_warning("%s: Warning: bond's first port is uninitialized\n",
>> bond->dev->name);
>> --
>
>Looks like that could be:
> slave = bond_first_slave(bond);
> if (slave) {
> port = SLAVE_AD_INFO(slave).port;
>and I assume 'slave == port->slave' so there is no need for the latter check?
I've also fallen to this trap at first - slave->port can (virtually) be
NULL, and this way we'll panic on "if (!port->slave)".
>
> David
>
>
>
^ permalink raw reply
* Re: [PATCH net-next 2/9] bonding: remove __get_first_port()
From: Veaceslav Falico @ 2013-09-27 15:05 UTC (permalink / raw)
To: David Laight; +Cc: netdev, nikolay, bhutchings, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <20130927145825.GA14139@redhat.com>
On Fri, Sep 27, 2013 at 04:58:25PM +0200, Veaceslav Falico wrote:
>On Fri, Sep 27, 2013 at 03:50:12PM +0100, David Laight wrote:
>>>@@ -2104,8 +2091,11 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>>>
>>> // check if agg_select_timer timer after initialize is timed out
>>> if (BOND_AD_INFO(bond).agg_select_timer && !(--BOND_AD_INFO(bond).agg_select_timer)) {
>>>+ slave = bond_first_slave(bond);
>>>+ port = slave ? &(SLAVE_AD_INFO(slave).port) : NULL;
>>>+
>>> // select the active aggregator for the bond
>>>- if ((port = __get_first_port(bond))) {
>>>+ if (port) {
>>> if (!port->slave) {
>>> pr_warning("%s: Warning: bond's first port is uninitialized\n",
>>> bond->dev->name);
>>>--
>>
>>Looks like that could be:
>> slave = bond_first_slave(bond);
>> if (slave) {
>> port = SLAVE_AD_INFO(slave).port;
>>and I assume 'slave == port->slave' so there is no need for the latter check?
>
>I've also fallen to this trap at first - slave->port can (virtually) be
>NULL, and this way we'll panic on "if (!port->slave)".
Err, forgot to address the 'slave == port->slave' - yes, virtually it
should be - if it's initialized (and, it should be) - however, as I've
stated in the cover letter - there are *tons* of cleanups/optimizations of
these kind that might be done here - and not to mix cleanups/optimizations
with the thing that these patches should do (remove bond_next_slave) - I've
decided to leave it as close to the original as possible.
>
>>
>> David
>>
>>
>>
^ permalink raw reply
* Re: [PATCH] tcp: TSQ can use a dynamic limit
From: Neal Cardwell @ 2013-09-27 15:08 UTC (permalink / raw)
To: Eric Dumazet
Cc: Cong Wang, David Miller, Wei Liu, Linux Kernel Network Developers,
Yuchung Cheng
In-Reply-To: <1380277734.30872.25.camel@edumazet-glaptop.roam.corp.google.com>
On Fri, Sep 27, 2013 at 6:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> When TCP Small Queues was added, we used a sysctl to limit amount of
> packets queues on Qdisc/device queues for a given TCP flow.
>
> Problem is this limit is either too big for low rates, or too small
> for high rates.
>
> Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
> auto sizing, it can better control number of packets in Qdisc/device
> queues.
>
> New limit is two packets or at least 1 to 2 ms worth of packets.
>
> Low rates flows benefit from this patch by having even smaller
> number of packets in queues, allowing for faster recovery,
> better RTT estimations.
>
> High rates flows benefit from this patch by allowing more than 2 packets
> in flight as we had reports this was a limiting factor to reach line
> rate. [ In particular if TX completion is delayed because of coalescing
> parameters ]
>
> Example for a single flow on 10Gbp link controlled by FQ/pacing
>
> 14 packets in flight instead of 2
>
> $ tc -s -d qd
> qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
> buckets 1024 quantum 3028 initial_quantum 15140
> Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
> requeues 6822476)
> rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
> 2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
> 2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit
>
> Note that sk_pacing_rate is currently set to twice the actual rate, but
> this might be refined in the future when a flow is in congestion
> avoidance.
>
> Additional change : skb->destructor should be set to tcp_wfree().
>
> A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---
Acked-by: Neal Cardwell <ncardwell@google.com>
neal
^ permalink raw reply
* Re: question about map_read_chunks()
From: Tom Tucker @ 2013-09-27 15:23 UTC (permalink / raw)
To: Dan Carpenter
Cc: Tom Tucker, J. Bruce Fields, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20130927122108.GF6247@mwanda>
Hi Dan,
On 9/27/13 7:21 AM, Dan Carpenter wrote:
> I have looked at this again, and I still worry that it looks like a bug.
> (remote security related blah blah blah).
>
> regards,
> dan carpenter
>
> On Mon, Feb 20, 2012 at 12:50:19PM +0300, Dan Carpenter wrote:
>> I had a couple questions about some map_read_chunks().
>>
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>
>> 150 ch_bytes = ntohl(ch->rc_target.rs_length);
>> ^^^^^^^^
>> It look like this is 32 bits from the network?
>>
>> 151 head->arg.head[0] = rqstp->rq_arg.head[0];
>> 152 head->arg.tail[0] = rqstp->rq_arg.tail[0];
>> 153 head->arg.pages = &head->pages[head->count];
>> 154 head->hdr_count = head->count; /* save count of hdr pages */
>> 155 head->arg.page_base = 0;
>> 156 head->arg.page_len = ch_bytes;
>> 157 head->arg.len = rqstp->rq_arg.len + ch_bytes;
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> Can overflow.
>> 158 head->arg.buflen = rqstp->rq_arg.buflen + ch_bytes;
agreed.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> Same. I didn't follow it through to see if an overflow matters. Does
>> it?
>>
>> 159 head->count++;
>> 160 chl_map->ch[0].start = 0;
>> 161 while (byte_count) {
>> 162 rpl_map->sge[sge_no].iov_base =
>> 163 page_address(rqstp->rq_arg.pages[page_no]) + page_off;
>> 164 sge_bytes = min_t(int, PAGE_SIZE-page_off, ch_bytes);
>> ^^^
>> This is the wrong cast to use. A large ch_bytes would be counted as a
>> negative value and get around the cap here.
True, but if we validate the wire data like we should, that's probably
not an issue.
>> 165 rpl_map->sge[sge_no].iov_len = sge_bytes;
>>
>> regards,
>> dan carpenter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v3] powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
From: Kumar Gala @ 2013-09-27 15:28 UTC (permalink / raw)
To: Aida Mynzhasova
Cc: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
richardcochran-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1380289227-12029-1-git-send-email-aida.mynzhasova-Fmhy8gsqeTEvJsYlp49lxw@public.gmane.org>
On Sep 27, 2013, at 8:40 AM, Aida Mynzhasova wrote:
> Currently IEEE 1588 timer reference clock source is determined through
> hard-coded value in gianfar_ptp driver. This patch allows to select ptp
> clock source by means of device tree file node.
>
> For instance:
>
> fsl,cksel = <0>;
>
> for using external (TSEC_TMR_CLK input) high precision timer
> reference clock.
>
> Other acceptable values:
>
> <1> : eTSEC system clock
> <2> : eTSEC1 transmit clock
> <3> : RTC clock input
>
> When this attribute isn't used, eTSEC system clock will serve as
> IEEE 1588 timer reference clock.
>
> Signed-off-by: Aida Mynzhasova <aida.mynzhasova-Fmhy8gsqeTEvJsYlp49lxw@public.gmane.org>
> ---
> Documentation/devicetree/bindings/net/fsl-tsec-phy.txt | 18 +++++++++++++++++-
> drivers/net/ethernet/freescale/gianfar_ptp.c | 4 +++-
> 2 files changed, 20 insertions(+), 2 deletions(-)
Acked-by: Kumar Gala <galak-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
- k--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] bonding: Fix broken promiscuity reference counting issue
From: Neil Horman @ 2013-09-27 16:22 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Jay Vosburgh, Andy Gospodarek, Mark Wu,
David S. Miller
Recently grabbed this report:
https://bugzilla.redhat.com/show_bug.cgi?id=1005567
Of an issue in which the bonding driver, with an attached vlan encountered the
following errors when bond0 was taken down and back up:
dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of
device might be broken.
The error occurs because, during __bond_release_one, if we release our last
slave, we take on a random mac address and issue a NETDEV_CHANGEADDR
notification. With an attached vlan, the vlan may see that the vlan and bond
mac address were in sync, but no longer are. This triggers a call to dev_uc_add
and dev_set_rx_mode, which enables IFF_PROMISC on the bond device. Then, when
we complete __bond_release_one, we use the current state of the bond flags to
determine if we should decrement the promiscuity of the releasing slave. But
since the bond changed promiscuity state during the release operation, we
incorrectly decrement the slave promisc count when it wasn't in promiscuous mode
to begin with, causing the above error
Fix is pretty simple, just cache the bonding flags at the start of the function
and use those when determining the need to set promiscuity.
This is also needed for the ALLMULTI flag
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Mark Wu <wudxw@linux.vnet.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
Reported-by: Mark Wu <wudxw@linux.vnet.ibm.com>
---
drivers/net/bonding/bond_main.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d5c3153..5373a8d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1710,6 +1710,7 @@ static int __bond_release_one(struct net_device *bond_dev,
struct bonding *bond = netdev_priv(bond_dev);
struct slave *slave, *oldcurrent;
struct sockaddr addr;
+ int old_flags = bond_dev->flags;
netdev_features_t old_features = bond_dev->features;
/* slave is not a slave or master is not master of this slave */
@@ -1841,12 +1842,18 @@ static int __bond_release_one(struct net_device *bond_dev,
* bond_change_active_slave(..., NULL)
*/
if (!USES_PRIMARY(bond->params.mode)) {
- /* unset promiscuity level from slave */
- if (bond_dev->flags & IFF_PROMISC)
+ /* unset promiscuity level from slave
+ * NOTE: The NETDEV_CHANGEADDR call above may change the value
+ * of the IFF_PROMISC flag in the bond_dev, but we need the
+ * value of that flag before that change, as that was the value
+ * when this slave was attached, so we cache at the start of the
+ * function and use it here. Same goes for ALLMULTI below
+ */
+ if (old_flags & IFF_PROMISC)
dev_set_promiscuity(slave_dev, -1);
/* unset allmulti level from slave */
- if (bond_dev->flags & IFF_ALLMULTI)
+ if (old_flags & IFF_ALLMULTI)
dev_set_allmulti(slave_dev, -1);
bond_hw_addr_flush(bond_dev, slave_dev);
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Oussama Ghorbel @ 2013-09-27 16:36 UTC (permalink / raw)
To: Oussama Ghorbel, David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <20130927105856.GF28287@order.stressinduktion.org>
Please see my comments below
Regards,
Oussama
On Fri, Sep 27, 2013 at 11:58 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Fri, Sep 27, 2013 at 11:45:48AM +0100, Oussama Ghorbel wrote:
>> The ip6_tunnel.c module would be then dependent on ip_tunnel.c and may
>> be it would not be good thing?
>
> It could just be a static inline in some shared header. So there would
> be no compile-time dependency.
>
The higher limit of mtu in ip_tunnel_change_mtu() is a calculated value.
This high limit is calculated using the netdev priv struct ip_tunnel
(field hlen).
This netdev priv struct is different in ipv6, it's a ip6_tnl struct.
Therefore implementing a beautiful function like
ip_tunnel_valid_mtu(struct net_device *dev, int mtu) won't be
feasible, unless we implement it in macro or something like like
ip_tunnel_valid_mtu(struct net_device *dev, int hlen, int mtu) which
seems not very nice ...
What do yo think?
>> As I have check in v3.10 there is no call from ip6_tunnel to ip_tunnel...
>>
>> For information, there is no check for the maximum MTU for ipv4 in the
>> patch as this is not done for ipv6.
>
> I understand, but it would be better to limit the MTU here. There is a
> (non-jumo) IPV6_MAXPLEN constant.
>
> Looking through the source it seems grev6 does actually check this,
> so it would not hurt adding them here, too.
what if jumbograms is used, in that case, we can't use IPV6_MAXPLEN.
the limit would be the the full unsigned int.
However checking the higher limit for ipv4 would be useful.
Please also note in case the tunnel mode is any (ipv4 or ipv6 means
ip6_tnl.parms.proto = 0), then we will be required to take the most
restrict limit from both ipv4 and ipv6 which means the lower limit
will be 1280 and the higher limit will be about 65535
Do you agree on this?
>
> Otherwise, I think your patch is fine.
>
> Greetings,
>
> Hannes
>
^ permalink raw reply
* Re: [PATCH v2.40 1/7] odp: Only pass vlan_tci to commit_vlan_action()
From: Ben Pfaff @ 2013-09-27 16:56 UTC (permalink / raw)
To: Simon Horman
Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
Joe Stringer
In-Reply-To: <1380241116-7661-2-git-send-email-horms@verge.net.au>
On Fri, Sep 27, 2013 at 09:18:30AM +0900, Simon Horman wrote:
> From: Joe Stringer <joe@wand.net.nz>
>
> This allows for future patches to pass different tci values to
> commit_vlan_action() without passing an entire flow structure.
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Simon Horman <horms@verge.net.au>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Hannes Frederic Sowa @ 2013-09-27 17:03 UTC (permalink / raw)
To: Oussama Ghorbel
Cc: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <CA+ev272+Gp6Kt7S8P-wXxPBh0D8Di9WVH9NDi41HpNakLKC4Dw@mail.gmail.com>
On Fri, Sep 27, 2013 at 05:36:45PM +0100, Oussama Ghorbel wrote:
> Please see my comments below
>
> Regards,
> Oussama
>
> On Fri, Sep 27, 2013 at 11:58 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Fri, Sep 27, 2013 at 11:45:48AM +0100, Oussama Ghorbel wrote:
> >> The ip6_tunnel.c module would be then dependent on ip_tunnel.c and may
> >> be it would not be good thing?
> >
> > It could just be a static inline in some shared header. So there would
> > be no compile-time dependency.
> >
>
> The higher limit of mtu in ip_tunnel_change_mtu() is a calculated value.
> This high limit is calculated using the netdev priv struct ip_tunnel
> (field hlen).
> This netdev priv struct is different in ipv6, it's a ip6_tnl struct.
> Therefore implementing a beautiful function like
> ip_tunnel_valid_mtu(struct net_device *dev, int mtu) won't be
> feasible, unless we implement it in macro or something like like
> ip_tunnel_valid_mtu(struct net_device *dev, int hlen, int mtu) which
> seems not very nice ...
> What do yo think?
Ok, let's go with one function per protocol type. Seems easier.
It seems to get more hairy, because it depends on the tunnel driver if the
prepended ip header is accounted in hard_header_len. :/
I don't know if it works out cleanly. Otherwise I would be ok if the checks
just get repeated in ip6_tunnel and leave the rest as-is.
>
> >> As I have check in v3.10 there is no call from ip6_tunnel to ip_tunnel...
> >>
> >> For information, there is no check for the maximum MTU for ipv4 in the
> >> patch as this is not done for ipv6.
> >
> > I understand, but it would be better to limit the MTU here. There is a
> > (non-jumo) IPV6_MAXPLEN constant.
> >
> > Looking through the source it seems grev6 does actually check this,
> > so it would not hurt adding them here, too.
>
> what if jumbograms is used, in that case, we can't use IPV6_MAXPLEN.
> the limit would be the the full unsigned int.
> However checking the higher limit for ipv4 would be useful.
Linux currently cannot create "jumbograms" (only the receiving side
is supported).
> Please also note in case the tunnel mode is any (ipv4 or ipv6 means
> ip6_tnl.parms.proto = 0), then we will be required to take the most
> restrict limit from both ipv4 and ipv6 which means the lower limit
> will be 1280 and the higher limit will be about 65535
> Do you agree on this?
Agreed, this would be needed for e.g. the sit driver, which now can
handle both protocols.
Thanks,
Hannes
^ permalink raw reply
* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Toshiaki Makita @ 2013-09-27 17:11 UTC (permalink / raw)
To: vyasevic
Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
Patrick McHardy
In-Reply-To: <52444322.40408@redhat.com>
On Thu, 2013-09-26 at 10:22 -0400, Vlad Yasevich wrote:
> On 09/26/2013 06:38 AM, Toshiaki Makita wrote:
> > On Tue, 2013-09-24 at 13:55 -0400, Vlad Yasevich wrote:
> >> On 09/24/2013 01:30 PM, Toshiaki Makita wrote:
> >>> On Tue, 2013-09-24 at 09:35 -0400, Vlad Yasevich wrote:
> >>>> On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
> >>>>> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich wrote:
> >>>>>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
> >>>>>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich wrote:
> >>>>>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
> >>>>>>>>> On Thu, 2013-09-12 at 16:00 -0400, David Miller wrote:
> >>>>>>>>>> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> >>>>>>>>>> Date: Tue, 10 Sep 2013 19:27:54 +0900
> >>>>>>>>>>
> >>>>>>>>>>> There seem to be some undesirable behaviors related with PVID.
> >>>>>>>>>>> 1. It has no effect assigning PVID to a port. PVID cannot be applied
> >>>>>>>>>>> to any frame regardless of whether we set it or not.
> >>>>>>>>>>> 2. FDB entries learned via frames applied PVID are registered with
> >>>>>>>>>>> VID 0 rather than VID value of PVID.
> >>>>>>>>>>> 3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
> >>>>>>>>>>> This leads interoperational problems such as sending frames with VID
> >>>>>>>>>>> 4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
> >>>>>>>>>>> 0 as they belong to VLAN 0, which is expected to be handled as they have
> >>>>>>>>>>> no VID according to IEEE 802.1Q.
> >>>>>>>>>>>
> >>>>>>>>>>> Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
> >>>>>>>>>>> is fixed, because we cannot activate PVID due to it.
> >>>>>>>>>>
> >>>>>>>>>> Please work out the issues in patch #2 with Vlad and resubmit this
> >>>>>>>>>> series.
> >>>>>>>>>>
> >>>>>>>>>> Thank you.
> >>>>>>>>>
> >>>>>>>>> I'm hovering between whether we should fix the issue by changing vlan 0
> >>>>>>>>> interface behavior in 8021q module or enabling a bridge port to sending
> >>>>>>>>> priority-tagged frames, or another better way.
> >>>>>>>>>
> >>>>>>>>> If you could comment it, I'd appreciate it :)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> BTW, I think what is discussed in patch #2 is another problem about
> >>>>>>>>> handling priority-tags, and it exists without this patch set applied.
> >>>>>>>>> It looks like that we should prepare another patch set than this to fix
> >>>>>>>>> that problem.
> >>>>>>>>>
> >>>>>>>>> Should I include patches that fix the priority-tags problem in this
> >>>>>>>>> patch set and resubmit them all together?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I am thinking that we might need to do it in bridge and it looks like
> >>>>>>>> the simplest way to do it is to have default priority regeneration table
> >>>>>>>> (table 6-5 from 802.1Q doc).
> >>>>>>>>
> >>>>>>>> That way I think we would conform to the spec.
> >>>>>>>>
> >>>>>>>> -vlad
> >>>>>>>
> >>>>>>> Unfortunately I don't think the default priority regeneration table
> >>>>>>> resolves the problem because IEEE 802.1Q says that a VLAN-aware bridge
> >>>>>>> can transmit untagged or VLAN-tagged frames only (the end of section 7.5
> >>>>>>> and 8.1.7).
> >>>>>>>
> >>>>>>> No mechanism to send priority-tagged frames is found as far as I can see
> >>>>>>> the standard. I think the regenerated priority is used for outgoing PCP
> >>>>>>> field only if egress policy is not untagged (i.e. transmitting as
> >>>>>>> VLAN-tagged), and unused if untagged (Section 6.9.2 3rd/4th Paragraph).
> >>>>>>>
> >>>>>>> If we want to transmit priority-tagged frames from a bridge port, I
> >>>>>>> think we need to implement a new (optional) feature that is above the
> >>>>>>> standard, as I stated previously.
> >>>>>>>
> >>>>>>> How do you feel about adding a per-port policy that enables a bridge to
> >>>>>>> send priority-tagged frames instead of untagged frames when egress
> >>>>>>> policy for the port is untagged?
> >>>>>>> With this change, we can transmit frames for a given vlan as either all
> >>>>>>> untagged, all priority-tagged or all VLAN-tagged.
> >>>>>>
> >>>>>> That would work. What I am thinking is that we do it by special casing
> >>>>>> the vid 0 egress policy specification. Let it be untagged by default
> >>>>>> and if it is tagged, then we preserve the priority field and forward
> >>>>>> it on.
> >>>>>>
> >>>>>> This keeps the API stable and doesn't require user/admin from knowing
> >>>>>> exactly what happens. Default operation conforms to the spec and allows
> >>>>>> simple change to make it backward-compatible.
> >>>>>>
> >>>>>> What do you think. I've done a simple prototype of this an it seems to
> >>>>>> work with the VMs I am testing with.
> >>>>>
> >>>>> Are you saying that
> >>>>> - by default, set the 0th bit of untagged_bitmap; and
> >>>>> - if we unset the 0th bit and set the "vid"th bit, we transmit frames
> >>>>> classified as belonging to VLAN "vid" as priority-tagged?
> >>>>>
> >>>>> If so, though it's attractive to keep current API, I'm worried about if
> >>>>> it could be a bit confusing and not intuitive for kernel/iproute2
> >>>>> developers that VID 0 has a special meaning only in the egress policy.
> >>>>> Wouldn't it be better to adding a new member to struct net_port_vlans
> >>>>> instead of using VID 0 of untagged_bitmap?
> >>>>>
> >>>>> Or are you saying that we use a new flag in struct net_port_vlans but
> >>>>> use the BRIDGE_VLAN_INFO_UNTAGGED bit with VID 0 in netlink to set the
> >>>>> flag?
> >>>>>
> >>>>> Even in that case, I'm afraid that it might be confusing for developers
> >>>>> for the same reason. We are going to prohibit to specify VID with 0 (and
> >>>>> 4095) in adding/deleting a FDB entry or a vlan filtering entry, but it
> >>>>> would allow us to use VID 0 only when a vlan filtering entry is
> >>>>> configured.
> >>>>> I am thinking a new nlattr is a straightforward approach to configure
> >>>>> it.
> >>>>
> >>>> By making this an explicit attribute it makes vid 0 a special case for
> >>>> any automatic tool that would provision such filtering. Seeing vid 0
> >>>> would mean that these tools would have to know that this would have to
> >>>> be translated to a different attribute instead of setting the policy
> >>>> values.
> >>>
> >>> Yes, I agree with you that we can do it by the way you explained.
> >>> What I don't understand is the advantage of using vid 0 over another way
> >>> such as adding a new nlattr.
> >>> I think we can indicate transmitting priority-tags explicitly by such a
> >>> nlattr. Using vid 0 seems to be easier to implement than a new nlattr,
> >>> but, for me, it looks less intuitive and more difficult to maintain
> >>> because we have to care about vid 0 instead of simply ignoring it.
> >>>
> >>
> >> The point I am trying to make is that regardless of the approach someone
> >> has to know what to do when enabling priority tagged frames. You
> >> proposal would require the administrator or config tool to have that
> >> knowledge. Example is:
> >> Admin does: bridge vlan set priority on dev eth0
> >> Automated app:
> >> if (vid == 0)
> >> /* Turn on priority tagged frame support */
> >>
> >> My proposal would require the bridge filtering implementation to have it.
> >> user tool: bridge vlan add vid 0 tagged
> >> Automated app: No special case.
> >>
> >> IMO its better to have 1 piece code handling the special case then
> >> putting it multiple places.
> >
> > Thank you for the detailed explanation.
> > Now I understand your intention.
> >
> > I have one question about your proposal.
> > I guess the way to enable priority-tagged is something like
> > bridge vlan add vid 10 dev eth0
> > bridge vlan add vid 10 dev vnet0 pvid untagged
> > bridge vlan add vid 0 dev vnet0 tagged
> > where vnet0 has sub interface vnet0.0.
> >
> > Here the admin have to know the egress policy is applied to a frame
> > twice in a certain order when it is transmitted from the port vnet0
> > attached, that is, first, a frame with vid 10 get untagged, and then, an
> > untagged frame get priority-tagged.
> >
> > This behavior looks difficult to know without previous knowledge.
> > Any good idea to avoid such a need for the admin's additional knowledge?
>
> To me, the fact that there is vnet0.0 (or typically, there is eth0.0 in
> the guest or on the remote host) already tells the admin vlan 0 has to
> be tagged. The fact that we codify this in the policy makes it explicit.
My worry is that the admin might not be able to guess how to use bridge
commands to enable priority-tag without any additional hint in "man
bridge", "bridge vlan help", etc.
I actually couldn't hit upon such a usage before seeing example commands
you gave, because I had never think the egress policy could be applied
twice.
>
> However, I can see strong argument to be made for an addition egress
> policy attribute that could be for instance:
>
> bridge vlan add vid 10 dev eth0 pvid
> bridge vlan add vid 10 dev vnet0 pvid untagged prio_tag
>
> But this has the same connotations as wrt to egress policy. The 2
> policies are applied:
> (1) untag the frame.
> (2) add priority_tag.
>
> (2) only happens if initial fame received on eth0 was priority tagged.
If we do so, we will not be able to communicate using vlan 0 interface
under a certain circumstance.
Eth0 can be receive mixed untagged and priority-tagged frames according
to the network element it is connected to: for example, Open vSwitch can
send such two kinds of frames from the same port even if original
incoming frames belong to the same vlan.
In this situation, we can only receive frames that is priority-tagged
when received on eth0.
IMO, if prio_tag is configured, the bridge should send any untagged
frame as priority-tagged regardless of whatever it is on eth0.
>
> I think I am ok with either approach. Explicit vid 0 policy is easier
> for automatic provisioning. The flag based one is easier for admin/
> manual provisioning.
Supposing we have to add something to help or man in any case, I think
flag based is better.
The format below seems to suitable for a per-port policy.
bridge vlan set prio_tag on dev vnet0
Thanks,
Toshiaki Makita
>
> -vlad.
>
> -vlad
>
>
>
>
> >
> >>
> >> Thanks
> >> -vlad
> >>
> >>> Thanks,
> >>>
> >>> Toshiaki Makita
> >>>
> >>>>
> >>>> How it is implemented internally in the kernel isn't as big of an issue.
> >>>> We can do it as a separate flag or as part of existing policy.
> >>>>
> >>>> -vlad
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Toshiaki Makita
> >>>>>
> >>>>>>
> >>>>>> -vlad
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Toshiaki Makita
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Toshiaki Makita
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >
> >
>
^ permalink raw reply
* Re: [net v2 0/6][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2013-09-27 17:29 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1380285358-15685-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 27 Sep 2013 05:35:52 -0700
> This series contains updates to igb and i40e.
>
> Todd provides a fix for 82580 devices in igb, where the ethtool
> loopback test was missing 82580 copper devices.
>
> Jesse provides five fixes/cleanups to i40e based on feedback from
> Joe Perches and the community.
>
> v2: fixed up patch 5 in the series based on feedback from Joe Perches
> and David Miller
>
> The following are changes since commit f875691640cd3fa67f7ad1d3130ff9fa7fdca042:
> Merge branch 'fixes-for-3.12' of git://gitorious.org/linux-can/linux-can
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master
Pulled, thanks a lot Jeff.
^ permalink raw reply
* Re: [PATCH net-next 0/2] netxen_nic: Minor enhancement
From: David Miller @ 2013-09-27 17:31 UTC (permalink / raw)
To: shahed.shaikh; +Cc: netdev, Dept-HSGLinuxNICDev
In-Reply-To: <1380260547-25903-1-git-send-email-shahed.shaikh@qlogic.com>
From: Shahed Shaikh <shahed.shaikh@qlogic.com>
Date: Fri, 27 Sep 2013 01:42:25 -0400
> From: Shahed Shaikh <shahed.shaikh@qlogic.com>
>
> This patch series contains following changes
> * Log a message about ULA adapter type.
> * Update the driver version to 4.0.82.
All applied, thank you.
^ permalink raw reply
* [PATCH v3 net-next 0/2] qlge: feature update
From: Jitendra Kalsaria @ 2013-09-27 17:17 UTC (permalink / raw)
To: davem; +Cc: netdev, ron.mercer, Dept-HSGLinuxNICDev, Jitendra Kalsaria
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
This patch series enhance the handling of nested vlan tags in Rx path.
V2 changes:
* removed module parameter.
V3 changes:
* Users can enable or disable hardware VLAN acceleration using ethtool
Please apply it to net-next.
Jitendra Kalsaria (2):
qlge: Enhance nested VLAN (Q-in-Q) handling.
qlge: Update version to 1.00.00.33
drivers/net/ethernet/qlogic/qlge/qlge.h | 2 +-
drivers/net/ethernet/qlogic/qlge/qlge_main.c | 126 +++++++++++++++++++++-----
2 files changed, 102 insertions(+), 26 deletions(-)
--
1.7.6.rc1.1.g2c162b
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox