Netdev List

Netdev List
 help / color / mirror / Atom feed

* Setting mac address of loopback device
From: Cong Wang @ 2014-01-23 19:53 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, netdev

Hi,

I am wondering how much sense it makes to allow setting the mac
address of the loopback device?

We are trying to mirror the local traffic from lo to eth0, but
apparently the mac addresses in L2 header are all zero. Of course, one
solution is using pedit action to modify the mac addresses. But still,
if we could set the mac addr of lo, we don't need to modify the mac
header twice.

The patch itself is simple, probably just:

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c5011e0..a0ee030 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -160,6 +160,7 @@ static const struct net_device_ops loopback_ops = {
        .ndo_init      = loopback_dev_init,
        .ndo_start_xmit= loopback_xmit,
        .ndo_get_stats64 = loopback_get_stats64,
+       .ndo_set_mac_address = eth_mac_addr,
 };

 /*

BTW, how to change route to redirect _local_ traffic (that is packets
sending to itself) to a non-loopback device? I tried to modify local
route table, but have no luck to make it working.

Thanks!

^ permalink raw reply related

* Re: [PATCH v4 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Kay Sievers @ 2014-01-23 19:31 UTC (permalink / raw)
  To: Jan Kaluža
  Cc: Tejun Heo, Eric Paris, David Miller, LKML,
	netdev-u79uwXL29TY76Z2rM5mHXA, rgb-H+wXaHxf7aLQT0dZR+AlfA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <52D7A68F.5030700-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Jan 16, 2014 at 10:29 AM, Jan Kaluža <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 01/16/2014 12:23 AM, Tejun Heo wrote:
>> On Wed, Jan 15, 2014 at 06:21:43PM -0500, Eric Paris wrote:
>>>
>>> Reliably being able to audit what process requested an action is
>>> extremely useful.  And I like the audit patch, as it is a couple of ints
>>> we are storing.
>>>
>>> procinfo and cgroup can both be up to 4k of data.
>>>
>>> Is there an alternative he should consider?  Some way to grab a
>>> reference on task_struct and just attach that to the message?
>>
>> Or maybe it can be made separately optional instead of tagging along
>> on an existing option so that it doesn't tax use cases which don't
>> care about the new stuff?
>
> Right, I could add new option next to SOCK_PASSCRED which could be used to
> send newly added stuff. Would this be acceptable?
>
> I would still vote for SCM_AUDIT to be part of SOCK_PASSCRED and move
> SCM_CGROUP and SCM_PROCINFO into new option.

Maybe we could just add a new SOCK_PASS_TASKINFO bit to set in
socket->flags. Set that bit with a new SO_PASS_TASKINFO sockoption.

The SOCK_PASS_TASKINFO can carry all sorts of "struct task" related
stuff, also include the audit data. It is all fully conditional, so
users which do not explicitly subscribe to TASKINFO will never see the
data or needlessly pay for the overhead.

A TASKINFO sounds generic enough to be possibly extended with new data
in the future, without wasting extra bits in the socket flags.

Users which subscribe with SO_PASS_TASKINFO expect some overhead
anyway. In the end it's still a lot cheaper than parsing /proc for the
data; which is also racy and does therefore not work for any
short-living program.

Thanks,
Kay

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] bonding: restructure locking of bond_ab_arp_probe()
From: Jay Vosburgh @ 2014-01-23 19:25 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, Andy Gospodarek
In-Reply-To: <1390482511-14884-3-git-send-email-vfalico@redhat.com>

Veaceslav Falico <vfalico@redhat.com> wrote:

>Currently we're calling it from under RCU context, however we're using some
>functions that require rtnl to be held.
>
>Fix this by restructuring the locking - don't call it under any locks,
>aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active
>slave present), and use rtnl locking otherwise - if we need to modify
>(in)active flags of a slave.
>
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>---
> drivers/net/bonding/bond_main.c | 38 ++++++++++++++++++++++++--------------
> 1 file changed, 24 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 22d8b69..f879e9e 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2605,11 +2605,14 @@ do_failover:
> static void bond_ab_arp_probe(struct bonding *bond)
> {
> 	struct slave *slave, *before = NULL, *new_slave = NULL,
>-		     *curr_arp_slave = rcu_dereference(bond->current_arp_slave),
>-		     *curr_active_slave = rcu_dereference(bond->curr_active_slave);
>+		     *curr_arp_slave, *curr_active_slave;
> 	struct list_head *iter;
> 	bool found = false;
>
>+	rcu_read_lock();
>+	curr_arp_slave = rcu_dereference(bond->current_arp_slave);
>+	curr_active_slave = rcu_dereference(bond->curr_active_slave);
>+
> 	if (curr_arp_slave && curr_active_slave)
> 		pr_info("PROBE: c_arp %s && cas %s BAD\n",
> 			curr_arp_slave->dev->name,
>@@ -2617,23 +2620,31 @@ static void bond_ab_arp_probe(struct bonding *bond)
>
> 	if (curr_active_slave) {
> 		bond_arp_send_all(bond, curr_active_slave);
>+		rcu_read_unlock();
> 		return;
> 	}
>+	rcu_read_unlock();
>
> 	/* if we don't have a curr_active_slave, search for the next available
> 	 * backup slave from the current_arp_slave and make it the candidate
> 	 * for becoming the curr_active_slave
> 	 */
>
>+	rtnl_lock();

	I don't believe we can unconditionally acquire RTNL here, as it
may deadlock with bond_close's cancel_delayed_work_sync() calls (which
occur under RTNL).

	I think I'd make this a rtnl_trylock, and if that fails, have
the bond_ab_arp_probe function return non-zero as an indication for
activebackup_arp_mon to change delta_in_ticks to 1.  Changing the delta
is just an optimization; without it, things will still work, but it will
take longer to run the curr_arp_slave probe cycle.

	-J

>+	/* curr_arp_slave might have gone away */
>+	curr_arp_slave = rcu_dereference(bond->current_arp_slave);
>+
> 	if (!curr_arp_slave) {
>-		curr_arp_slave = bond_first_slave_rcu(bond);
>-		if (!curr_arp_slave)
>+		curr_arp_slave = bond_first_slave(bond);
>+		if (!curr_arp_slave) {
>+			rtnl_unlock();
> 			return;
>+		}
> 	}
>
> 	bond_set_slave_inactive_flags(curr_arp_slave);
>
>-	bond_for_each_slave_rcu(bond, slave, iter) {
>+	bond_for_each_slave(bond, slave, iter) {
> 		if (!found && !before && IS_UP(slave->dev))
> 			before = slave;
>
>@@ -2663,21 +2674,24 @@ static void bond_ab_arp_probe(struct bonding *bond)
> 	if (!new_slave && before)
> 		new_slave = before;
>
>-	if (!new_slave)
>+	if (!new_slave) {
>+		rtnl_unlock();
> 		return;
>+	}
>
> 	new_slave->link = BOND_LINK_BACK;
> 	bond_set_slave_active_flags(new_slave);
> 	bond_arp_send_all(bond, new_slave);
> 	new_slave->jiffies = jiffies;
> 	rcu_assign_pointer(bond->current_arp_slave, new_slave);
>+	rtnl_unlock();
> }
>
> static void bond_activebackup_arp_mon(struct work_struct *work)
> {
> 	struct bonding *bond = container_of(work, struct bonding,
> 					    arp_work.work);
>-	bool should_notify_peers = false;
>+	bool should_notify_peers = false, should_commit = false;
> 	int delta_in_ticks;
>
> 	delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
>@@ -2686,12 +2700,11 @@ static void bond_activebackup_arp_mon(struct work_struct *work)
> 		goto re_arm;
>
> 	rcu_read_lock();
>-
> 	should_notify_peers = bond_should_notify_peers(bond);
>+	should_commit = bond_ab_arp_inspect(bond);
>+	rcu_read_unlock();
>
>-	if (bond_ab_arp_inspect(bond)) {
>-		rcu_read_unlock();
>-
>+	if (should_commit) {
> 		/* Race avoidance with bond_close flush of workqueue */
> 		if (!rtnl_trylock()) {
> 			delta_in_ticks = 1;
>@@ -2700,13 +2713,10 @@ static void bond_activebackup_arp_mon(struct work_struct *work)
> 		}
>
> 		bond_ab_arp_commit(bond);
>-
> 		rtnl_unlock();
>-		rcu_read_lock();
> 	}
>
> 	bond_ab_arp_probe(bond);
>-	rcu_read_unlock();
>
> re_arm:
> 	if (bond->params.arp_interval)
>-- 
>1.8.4
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* [PATCH net-next] bonding: Use do_div to divide 64 bit numbers
From: Zoltan Kiss @ 2014-01-23 18:47 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, netdev,
	linux-kernel, Nikolay Aleksandrov
  Cc: Zoltan Kiss

Nikolay Aleksandrov's recent bonding option API changes (25a9b54a and e4994612)
introduced u64 as the type of downdelay and updelay. On 32 bit the division and
modulo operations cause compile errors:

ERROR: "__udivdi3" [drivers/net/bonding/bonding.ko] undefined!
ERROR: "__umoddi3" [drivers/net/bonding/bonding.ko] undefined!

This patch use the do_div macro, which guaranteed to do the right thing.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/bonding/bond_options.c |   19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 4cee04a..4f94907 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -18,6 +18,7 @@
 #include <linux/rcupdate.h>
 #include <linux/ctype.h>
 #include <linux/inet.h>
+#include <asm/div64.h>
 #include "bonding.h"
 
 static struct bond_opt_value bond_mode_tbl[] = {
@@ -727,19 +728,20 @@ int bond_option_miimon_set(struct bonding *bond, struct bond_opt_value *newval)
 
 int bond_option_updelay_set(struct bonding *bond, struct bond_opt_value *newval)
 {
+	u64 quotient = newval->value;
+	u64 remainder = do_div(quotient, bond->params.miimon);
 	if (!bond->params.miimon) {
 		pr_err("%s: Unable to set up delay as MII monitoring is disabled\n",
 		       bond->dev->name);
 		return -EPERM;
 	}
-	if ((newval->value % bond->params.miimon) != 0) {
+	if (remainder != 0) {
 		pr_warn("%s: Warning: up delay (%llu) is not a multiple of miimon (%d), updelay rounded to %llu ms\n",
 			bond->dev->name, newval->value,
 			bond->params.miimon,
-			(newval->value / bond->params.miimon) *
-			bond->params.miimon);
+			quotient * bond->params.miimon);
 	}
-	bond->params.updelay = newval->value / bond->params.miimon;
+	bond->params.updelay = quotient;
 	pr_info("%s: Setting up delay to %d.\n",
 		bond->dev->name,
 		bond->params.updelay * bond->params.miimon);
@@ -750,19 +752,20 @@ int bond_option_updelay_set(struct bonding *bond, struct bond_opt_value *newval)
 int bond_option_downdelay_set(struct bonding *bond,
 			      struct bond_opt_value *newval)
 {
+	u64 quotient = newval->value;
+	u64 remainder = do_div(quotient, bond->params.miimon);
 	if (!bond->params.miimon) {
 		pr_err("%s: Unable to set down delay as MII monitoring is disabled\n",
 		       bond->dev->name);
 		return -EPERM;
 	}
-	if ((newval->value % bond->params.miimon) != 0) {
+	if (remainder != 0) {
 		pr_warn("%s: Warning: down delay (%llu) is not a multiple of miimon (%d), delay rounded to %llu ms\n",
 			bond->dev->name, newval->value,
 			bond->params.miimon,
-			(newval->value / bond->params.miimon) *
-			bond->params.miimon);
+			quotient * bond->params.miimon);
 	}
-	bond->params.downdelay = newval->value / bond->params.miimon;
+	bond->params.downdelay = quotient;
 	pr_info("%s: Setting down delay to %d.\n",
 		bond->dev->name,
 		bond->params.downdelay * bond->params.miimon);

^ permalink raw reply related

* [patch net-next] rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
From: Jiri Pirko @ 2014-01-23 18:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, sfeldma, nicolas.dichtel, john.r.fastabend, vyasevich

This check is not needed because the same check is done before
fill_slave_info is used in rtnl_link_slave_info_fill.
Also, by removing this check, kernel will fillup IFLA_INFO_SLAVE_KIND
even for slaves of masters which does not implement fill_slave_info.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/core/rtnetlink.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index db6a239..393b1bc 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -482,8 +482,7 @@ static bool rtnl_have_link_slave_info(const struct net_device *dev)
 	struct net_device *master_dev;
 
 	master_dev = netdev_master_upper_dev_get((struct net_device *) dev);
-	if (master_dev && master_dev->rtnl_link_ops &&
-	    master_dev->rtnl_link_ops->fill_slave_info)
+	if (master_dev && master_dev->rtnl_link_ops)
 		return true;
 	return false;
 }
-- 
1.8.3.1

^ permalink raw reply related

* Re: How to identify different ip tunnels
From: Cong Wang @ 2014-01-23 18:16 UTC (permalink / raw)
  To: zhuyj
  Cc: David S. Miller, netdev, kuznet, jmorris, yoshfuji, kaber,
	linux-kernel
In-Reply-To: <52E0C5B7.7040908@gmail.com>

On Wed, Jan 22, 2014 at 11:33 PM, zhuyj <zyjzyj2000@gmail.com> wrote:
> Here are the result that we have got:
>
> Actual Tunnel type ifi->ifi_type
> 4IN4 768
> GRE4 778
> 6IN4/6TO4/ISATAP 776
> 4IN6/6IN6/IPIN6 769
>
> So, we can NOT distinguish the actual tunnel type via ifi_type attribute
> except the GRE4 and 4IN4 tunnel. However we need the actual type. That is
> our question.

Try to search IFLA_IPTUN_PROTO. They are distinguished by protocol.

^ permalink raw reply

* Re: [PATCH net-next 1/3] sch_netem: return errcode before setting params
From: Stephen Hemminger @ 2014-01-23 17:25 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem
In-Reply-To: <1390469499-31952-2-git-send-email-yangyingliang@huawei.com>

I am not sure how important it is that netem updates to multiple
parameters act as a single transaction. The only failures possible
are out of memory and user configuration error.

I prefer the simplicity of the original code.

> +			/* recover clg and loss_model, in case of
> +			 * q->clg and q->loss_model were modified
> +			 * in get_loss_clg() */

This comment needs to be formatted as:
	/* multiline comment
         * example
         */

^ permalink raw reply

* [patch iproute2 net-next-for-3.13 2/2] iplink: add support for bonding slave
From: Jiri Pirko @ 2014-01-23 16:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, vfalico, andy, sfeldma, stephen, john.r.fastabend
In-Reply-To: <1390495974-11234-1-git-send-email-jiri@resnulli.us>

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/if_link.h | 17 +++++-----
 ip/Makefile             |  2 +-
 ip/iplink_bond_slave.c  | 90 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 100 insertions(+), 9 deletions(-)
 create mode 100644 ip/iplink_bond_slave.c

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index d59f05e..3b6e1c9 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -370,16 +370,17 @@ enum {
 #define IFLA_BOND_AD_INFO_MAX	(__IFLA_BOND_AD_INFO_MAX - 1)
 
 enum {
-	IFLA_SLAVE_STATE,
-	IFLA_SLAVE_MII_STATUS,
-	IFLA_SLAVE_LINK_FAILURE_COUNT,
-	IFLA_SLAVE_PERM_HWADDR,
-	IFLA_SLAVE_QUEUE_ID,
-	IFLA_SLAVE_AD_AGGREGATOR_ID,
-	__IFLA_SLAVE_MAX,
+	IFLA_BOND_SLAVE_UNSPEC,
+	IFLA_BOND_SLAVE_STATE,
+	IFLA_BOND_SLAVE_MII_STATUS,
+	IFLA_BOND_SLAVE_LINK_FAILURE_COUNT,
+	IFLA_BOND_SLAVE_PERM_HWADDR,
+	IFLA_BOND_SLAVE_QUEUE_ID,
+	IFLA_BOND_SLAVE_AD_AGGREGATOR_ID,
+	__IFLA_BOND_SLAVE_MAX,
 };
 
-#define IFLA_SLAVE_MAX	(__IFLA_SLAVE_MAX - 1)
+#define IFLA_BOND_SLAVE_MAX	(__IFLA_BOND_SLAVE_MAX - 1)
 
 /* SR-IOV virtual function management section */
 
diff --git a/ip/Makefile b/ip/Makefile
index 6b08cb5..713adf5 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -5,7 +5,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \
     iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
     iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o \
     iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
-    link_iptnl.o link_gre6.o iplink_bond.o iplink_hsr.o
+    link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/iplink_bond_slave.c b/ip/iplink_bond_slave.c
new file mode 100644
index 0000000..bb4e1d6
--- /dev/null
+++ b/ip/iplink_bond_slave.c
@@ -0,0 +1,90 @@
+/*
+ * iplink_bond_slave.c	Bonding slave device support
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Jiri Pirko <jiri@resnulli.us>
+ */
+
+#include <stdio.h>
+#include <sys/socket.h>
+#include <linux/if_bonding.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+static const char *slave_states[] = {
+	[BOND_STATE_ACTIVE] = "ACTIVE",
+	[BOND_STATE_BACKUP] = "BACKUP",
+};
+
+static void print_slave_state(FILE *f, struct rtattr *tb)
+{
+	unsigned int state = rta_getattr_u8(tb);
+
+	if (state >= sizeof(slave_states) / sizeof(slave_states[0]))
+		fprintf(f, "state %d ", state);
+	else
+		fprintf(f, "state %s ", slave_states[state]);
+}
+
+static const char *slave_mii_status[] = {
+	[BOND_LINK_UP] = "UP",
+	[BOND_LINK_FAIL] = "GOING_DOWN",
+	[BOND_LINK_DOWN] = "DOWN",
+	[BOND_LINK_BACK] = "GOING_BACK",
+};
+
+static void print_slave_mii_status(FILE *f, struct rtattr *tb)
+{
+	unsigned int status = rta_getattr_u8(tb);
+
+	fprintf(f, "mii_status %d ", status);
+	if (status >= sizeof(slave_mii_status) / sizeof(slave_mii_status[0]))
+		fprintf(f, "mii_status %d ", status);
+	else
+		fprintf(f, "mii_status %s ", slave_mii_status[status]);
+}
+
+static void bond_slave_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
+{
+	SPRINT_BUF(b1);
+	if (!tb)
+		return;
+
+	if (tb[IFLA_BOND_SLAVE_STATE])
+		print_slave_state(f, tb[IFLA_BOND_SLAVE_STATE]);
+
+	if (tb[IFLA_BOND_SLAVE_MII_STATUS])
+		print_slave_mii_status(f, tb[IFLA_BOND_SLAVE_MII_STATUS]);
+
+	if (tb[IFLA_BOND_SLAVE_LINK_FAILURE_COUNT])
+		fprintf(f, "link_failure_count %d ",
+			rta_getattr_u32(tb[IFLA_BOND_SLAVE_LINK_FAILURE_COUNT]));
+
+	if (tb[IFLA_BOND_SLAVE_PERM_HWADDR])
+		fprintf(f, "perm_hwaddr %s ",
+			ll_addr_n2a(RTA_DATA(tb[IFLA_BOND_SLAVE_PERM_HWADDR]),
+				    RTA_PAYLOAD(tb[IFLA_BOND_SLAVE_PERM_HWADDR]),
+				    0, b1, sizeof(b1)));
+
+	if (tb[IFLA_BOND_SLAVE_QUEUE_ID])
+		fprintf(f, "queue_id %d ",
+			rta_getattr_u16(tb[IFLA_BOND_SLAVE_QUEUE_ID]));
+
+	if (tb[IFLA_BOND_SLAVE_AD_AGGREGATOR_ID])
+		fprintf(f, "ad_aggregator_id %d ",
+			rta_getattr_u16(tb[IFLA_BOND_SLAVE_AD_AGGREGATOR_ID]));
+}
+
+struct link_util bond_slave_link_util = {
+	.id		= "bond",
+	.maxattr	= IFLA_BOND_SLAVE_MAX,
+	.print_opt	= bond_slave_print_opt,
+	.slave		= true,
+};
+
-- 
1.8.3.1

^ permalink raw reply related

* [patch iproute2 net-next-for-3.13 1/2] introduce support for slave info data
From: Jiri Pirko @ 2014-01-23 16:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, vfalico, andy, sfeldma, stephen, john.r.fastabend
In-Reply-To: <1390495974-11234-1-git-send-email-jiri@resnulli.us>

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/if_link.h |  2 ++
 ip/ip_common.h          |  4 ++++
 ip/ipaddress.c          | 54 ++++++++++++++++++++++++++++++++-----------------
 ip/iplink.c             | 21 ++++++++++++++++---
 4 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 9cb5909..d59f05e 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -239,6 +239,8 @@ enum {
 	IFLA_INFO_KIND,
 	IFLA_INFO_DATA,
 	IFLA_INFO_XSTATS,
+	IFLA_INFO_SLAVE_KIND,
+	IFLA_INFO_SLAVE_DATA,
 	__IFLA_INFO_MAX,
 };
 
diff --git a/ip/ip_common.h b/ip/ip_common.h
index f9b4734..698dc7a 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -61,6 +61,8 @@ static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
 
 extern struct rtnl_handle rth;
 
+#include <stdbool.h>
+
 struct link_util
 {
 	struct link_util	*next;
@@ -72,9 +74,11 @@ struct link_util
 					     struct rtattr *[]);
 	void			(*print_xstats)(struct link_util *, FILE *,
 					     struct rtattr *);
+	bool			slave;
 };
 
 struct link_util *get_link_kind(const char *kind);
+struct link_util *get_link_slave_kind(const char *slave_kind);
 int get_netns_fd(const char *name);
 
 #ifndef	INFINITY_LIFE_TIME
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index f794fa1..33698ae 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -192,34 +192,52 @@ static void print_linktype(FILE *fp, struct rtattr *tb)
 {
 	struct rtattr *linkinfo[IFLA_INFO_MAX+1];
 	struct link_util *lu;
+	struct link_util *slave_lu;
 	char *kind;
+	char *slave_kind;
 
 	parse_rtattr_nested(linkinfo, IFLA_INFO_MAX, tb);
 
-	if (!linkinfo[IFLA_INFO_KIND])
-		return;
-	kind = RTA_DATA(linkinfo[IFLA_INFO_KIND]);
+	if (linkinfo[IFLA_INFO_KIND]) {
+		kind = RTA_DATA(linkinfo[IFLA_INFO_KIND]);
 
-	fprintf(fp, "%s", _SL_);
-	fprintf(fp, "    %s ", kind);
+		fprintf(fp, "%s", _SL_);
+		fprintf(fp, "    %s ", kind);
 
-	lu = get_link_kind(kind);
-	if (!lu || !lu->print_opt)
-		return;
+		lu = get_link_kind(kind);
+		if (lu && lu->print_opt) {
+			struct rtattr *attr[lu->maxattr+1], **data = NULL;
 
-	if (1) {
-		struct rtattr *attr[lu->maxattr+1], **data = NULL;
+			if (linkinfo[IFLA_INFO_DATA]) {
+				parse_rtattr_nested(attr, lu->maxattr,
+						    linkinfo[IFLA_INFO_DATA]);
+				data = attr;
+			}
+			lu->print_opt(lu, fp, data);
 
-		if (linkinfo[IFLA_INFO_DATA]) {
-			parse_rtattr_nested(attr, lu->maxattr,
-					    linkinfo[IFLA_INFO_DATA]);
-			data = attr;
+			if (linkinfo[IFLA_INFO_XSTATS] && show_stats &&
+			    lu->print_xstats)
+				lu->print_xstats(lu, fp, linkinfo[IFLA_INFO_XSTATS]);
 		}
-		lu->print_opt(lu, fp, data);
+	}
 
-		if (linkinfo[IFLA_INFO_XSTATS] && show_stats &&
-		    lu->print_xstats)
-			lu->print_xstats(lu, fp, linkinfo[IFLA_INFO_XSTATS]);
+	if (linkinfo[IFLA_INFO_SLAVE_KIND]) {
+		slave_kind = RTA_DATA(linkinfo[IFLA_INFO_SLAVE_KIND]);
+
+		fprintf(fp, "%s", _SL_);
+		fprintf(fp, "    %s_slave ", slave_kind);
+
+		slave_lu = get_link_slave_kind(slave_kind);
+		if (slave_lu && slave_lu->print_opt) {
+			struct rtattr *attr[slave_lu->maxattr+1], **data = NULL;
+
+			if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
+				parse_rtattr_nested(attr, slave_lu->maxattr,
+						    linkinfo[IFLA_INFO_SLAVE_DATA]);
+				data = attr;
+			}
+			slave_lu->print_opt(slave_lu, fp, data);
+		}
 	}
 }
 
diff --git a/ip/iplink.c b/ip/iplink.c
index 343b29f..6a62cc0 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -27,6 +27,7 @@
 #include <string.h>
 #include <sys/ioctl.h>
 #include <linux/sockios.h>
+#include <stdbool.h>
 
 #include "rt_names.h"
 #include "utils.h"
@@ -105,14 +106,15 @@ static int on_off(const char *msg, const char *realval)
 static void *BODY;		/* cached dlopen(NULL) handle */
 static struct link_util *linkutil_list;
 
-struct link_util *get_link_kind(const char *id)
+static struct link_util *__get_link_kind(const char *id, bool slave)
 {
 	void *dlh;
 	char buf[256];
 	struct link_util *l;
 
 	for (l = linkutil_list; l; l = l->next)
-		if (strcmp(l->id, id) == 0)
+		if (strcmp(l->id, id) == 0 &&
+		    l->slave == slave)
 			return l;
 
 	snprintf(buf, sizeof(buf), LIBDIR "/ip/link_%s.so", id);
@@ -127,7 +129,10 @@ struct link_util *get_link_kind(const char *id)
 		}
 	}
 
-	snprintf(buf, sizeof(buf), "%s_link_util", id);
+	if (slave)
+		snprintf(buf, sizeof(buf), "%s_slave_link_util", id);
+	else
+		snprintf(buf, sizeof(buf), "%s_link_util", id);
 	l = dlsym(dlh, buf);
 	if (l == NULL)
 		return NULL;
@@ -137,6 +142,16 @@ struct link_util *get_link_kind(const char *id)
 	return l;
 }
 
+struct link_util *get_link_kind(const char *id)
+{
+	return __get_link_kind(id, false);
+}
+
+struct link_util *get_link_slave_kind(const char *id)
+{
+	return __get_link_kind(id, true);
+}
+
 static int get_link_mode(const char *mode)
 {
 	if (strcasecmp(mode, "default") == 0)
-- 
1.8.3.1

^ permalink raw reply related

* [patch iproute2 net-next-for-3.13 0/2] iplink: add support for bonding slave
From: Jiri Pirko @ 2014-01-23 16:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, vfalico, andy, sfeldma, stephen, john.r.fastabend

Jiri Pirko (2):
  introduce support for slave info data
  iplink: add support for bonding slave

 include/linux/if_link.h | 19 ++++++-----
 ip/Makefile             |  2 +-
 ip/ip_common.h          |  4 +++
 ip/ipaddress.c          | 54 +++++++++++++++++++----------
 ip/iplink.c             | 21 ++++++++++--
 ip/iplink_bond_slave.c  | 90 +++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 160 insertions(+), 30 deletions(-)
 create mode 100644 ip/iplink_bond_slave.c

-- 
1.8.3.1

^ permalink raw reply

* Re: [patch] tulip: cleanup by using ARRAY_SIZE()
From: Grant Grundler @ 2014-01-23 16:46 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Grant Grundler, open list:TULIP NETWORK DRI..., David S. Miller,
	kernel-janitors
In-Reply-To: <20140123082637.GC28688@elgon.mountain>

On Thu, Jan 23, 2014 at 12:26 AM, Dan Carpenter
<dan.carpenter@oracle.com> wrote:
> In this situation then ARRAY_SIZE() and sizeof() are the same, but we're
> really dealing with array indexes and not byte offsets so ARRAY_SIZE()
> is cleaner.

Yup.

> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Acked-by: Grant Grundler <grundler@parisc-linux.org>

cheers
grant
>
> diff --git a/drivers/net/ethernet/dec/tulip/media.c b/drivers/net/ethernet/dec/tulip/media.c
> index 0d0ba725341a..dcf21a36a9cf 100644
> --- a/drivers/net/ethernet/dec/tulip/media.c
> +++ b/drivers/net/ethernet/dec/tulip/media.c
> @@ -457,7 +457,7 @@ void tulip_find_mii(struct net_device *dev, int board_idx)
>         /* Find the connected MII xcvrs.
>            Doing this in open() would allow detecting external xcvrs later,
>            but takes much time. */
> -       for (phyn = 1; phyn <= 32 && phy_idx < sizeof (tp->phys); phyn++) {
> +       for (phyn = 1; phyn <= 32 && phy_idx < ARRAY_SIZE(tp->phys); phyn++) {
>                 int phy = phyn & 0x1f;
>                 int mii_status = tulip_mdio_read (dev, phy, MII_BMSR);
>                 if ((mii_status & 0x8301) == 0x8001 ||

^ permalink raw reply

* [patch net-next] rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
From: Jiri Pirko @ 2014-01-23 15:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, sfeldma, stephen, vyasevic

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/uapi/linux/if_link.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b8fb352..206f651 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -361,6 +361,7 @@ enum {
 #define IFLA_BOND_MAX	(__IFLA_BOND_MAX - 1)
 
 enum {
+	IFLA_BOND_AD_INFO_UNSPEC,
 	IFLA_BOND_AD_INFO_AGGREGATOR,
 	IFLA_BOND_AD_INFO_NUM_PORTS,
 	IFLA_BOND_AD_INFO_ACTOR_KEY,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [Xen-devel] [PATCH v4] xen/grant-table: Avoid m2p_override during mapping
From: Stefano Stabellini @ 2014-01-23 15:14 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Stefano Stabellini, ian.campbell, wei.liu2, xen-devel, netdev,
	linux-kernel, jonathan.davies
In-Reply-To: <52E13149.1070705@citrix.com>

On Thu, 23 Jan 2014, Zoltan Kiss wrote:
> On 23/01/14 13:59, Stefano Stabellini wrote:
> > On Wed, 22 Jan 2014, Zoltan Kiss wrote:
> > > > > > > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> > > > > > > index 2ae8699..0060178 100644
> > > > > > > --- a/arch/x86/xen/p2m.c
> > > > > > > +++ b/arch/x86/xen/p2m.c
> > > > > > > @@ -872,15 +872,13 @@ static unsigned long mfn_hash(unsigned long
> > > > > > > mfn)
> > > > > > > 
> > > > > > >     /* Add an MFN override for a particular page */
> > > > > > >     int m2p_add_override(unsigned long mfn, struct page *page,
> > > > > > > -		struct gnttab_map_grant_ref *kmap_op)
> > > > > > > +		struct gnttab_map_grant_ref *kmap_op, unsigned long
> > > > > > > pfn)
> > > > > > 
> > > > > > Do we really need to add another additional parameter to
> > > > > > m2p_add_override?
> > > > > > I would just let m2p_add_override and m2p_remove_override call
> > > > > > page_to_pfn again. It is not that expensive.
> > > > > Yes, because that page_to_pfn can return something different. That's
> > > > > why
> > > > > the
> > > > > v2 patches failed.
> > > > 
> > > > I am really curious: how can page_to_pfn return something different?
> > > > I don't think is supposed to happen.
> > > You call set_phys_to_machine before calling m2p* functions.
> > 
> > set_phys_to_machine changes the physical to machine mapping, that would
> > be the mfn corresponding to a given pfn. It shouldn't affect the output
> > of page_to_pfn that returns the pfn corresponding to a given struct
> > page. The calculation of which is based on address offsets and should be
> > static and unaffected by things like set_phys_to_machine.
> 
> Indeed, my mistake. The mfn is the only thing which changes, it still has to
> be passed to m2p_remove_override. I'll send in a next version

Passing the mfn to m2p_remove_override is OK.

^ permalink raw reply

* Re: [Xen-devel] [PATCH v4] xen/grant-table: Avoid m2p_override during mapping
From: Zoltan Kiss @ 2014-01-23 15:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <alpine.DEB.2.02.1401231355340.15917@kaball.uk.xensource.com>

On 23/01/14 13:59, Stefano Stabellini wrote:
> On Wed, 22 Jan 2014, Zoltan Kiss wrote:
>>>>>> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>>>>>> index 2ae8699..0060178 100644
>>>>>> --- a/arch/x86/xen/p2m.c
>>>>>> +++ b/arch/x86/xen/p2m.c
>>>>>> @@ -872,15 +872,13 @@ static unsigned long mfn_hash(unsigned long mfn)
>>>>>>
>>>>>>     /* Add an MFN override for a particular page */
>>>>>>     int m2p_add_override(unsigned long mfn, struct page *page,
>>>>>> -		struct gnttab_map_grant_ref *kmap_op)
>>>>>> +		struct gnttab_map_grant_ref *kmap_op, unsigned long
>>>>>> pfn)
>>>>>
>>>>> Do we really need to add another additional parameter to
>>>>> m2p_add_override?
>>>>> I would just let m2p_add_override and m2p_remove_override call
>>>>> page_to_pfn again. It is not that expensive.
>>>> Yes, because that page_to_pfn can return something different. That's why
>>>> the
>>>> v2 patches failed.
>>>
>>> I am really curious: how can page_to_pfn return something different?
>>> I don't think is supposed to happen.
>> You call set_phys_to_machine before calling m2p* functions.
>
> set_phys_to_machine changes the physical to machine mapping, that would
> be the mfn corresponding to a given pfn. It shouldn't affect the output
> of page_to_pfn that returns the pfn corresponding to a given struct
> page. The calculation of which is based on address offsets and should be
> static and unaffected by things like set_phys_to_machine.

Indeed, my mistake. The mfn is the only thing which changes, it still 
has to be passed to m2p_remove_override. I'll send in a next version

Zoli

^ permalink raw reply

* Re: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash key.
From: Ben Hutchings @ 2014-01-23 15:09 UTC (permalink / raw)
  To: Venkata Duvvuru; +Cc: netdev@vger.kernel.org
In-Reply-To: <BF3270C86E8B1349A26C34E4EC1C44CB2C84D71E@CMEXMB1.ad.emulex.com>

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

On Thu, 2014-01-23 at 13:47 +0000, Venkata Duvvuru wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:ben@decadent.org.uk]
> > Sent: Thursday, January 23, 2014 11:09 AM
> > To: Venkata Duvvuru
> > Cc: netdev@vger.kernel.org
> > Subject: Re: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash
> > key.
> > 
> > On Wed, 2014-01-22 at 12:12 +0000, Venkata Duvvuru wrote:
> > [...]
> > > > No, what I mean is:
> > > >
> > > > 1. An RX flow steering filter can specify use of RSS, in which case
> > > > the value looked up in the indirection is added to the queue number
> > > > specified in the filter.  This is not yet controllable through RX
> > > > NFC though there is room for extension there.
> > > >
> > > > 2. Multi-function controllers need multiple RSS contexts (key +
> > > > indirection
> > > > table) to support independent use of RSS on each function.
> > > > But it may also be possible to allocate multiple contexts to a single
> > function.
> > > > This could be useful in conjunction with 1.  But there would need to
> > > > be a way to allocate and configure extra contexts first.
> > > The proposed changes will be incremental so I think this can be done
> > > in a separate patch. Thoughts?
> > 
> > The ethtool ABI (to userland) has to remain backward-compatible, and it is
> > preferable if we don't add lots of different structures for this.
> > 
> > So please define the new command structure to include both the key and
> > indirection table, and some reserved space (documented as 'userland must
> > set to 0') for future extensions.
> 
> I think it’s better to keep key and indirection table settings as
> different ethtool commands. We can probably add rss contexts (reserved
> space) to both the command structures.
> If we mix key and indirection table into one command structure then it
> will hamper the compatibility.
[...]

Right, there is no compatible way to extend struct ethtool_rxfh_indir.
I should have thought ahead when defining it!  But the new structure
doesn't need to have that problem.

Ben.

-- 
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH 4/4] ipv4: mark nexthop as dead when it's subnet becomes unreachable
From: Sergey Popovich @ 2014-01-23 15:05 UTC (permalink / raw)
  To: netdev
In-Reply-To: <alpine.LFD.2.11.1401231158510.2682@ja.home.ssi.bg>

В письме от 23 января 2014 12:06:30 пользователь Julian Anastasov написал:
> 	Hello,
> 
> On Tue, 21 Jan 2014, Sergey Popovich wrote:
> > +			if (nexthop_nh->nh_dev != dev ||
> > +			    nexthop_nh->nh_scope == scope ||
> > +			    (ifa && !inet_ifa_match(nexthop_nh->nh_gw, ifa)))
> 
> 	What if nh_gw is part from another smaller/larger subnet?
> For example, what if we still have 10.0.0.200/8 ? 10.0.10.5 is
> still reachable, i.e. fib_check_nh() would create such NH.

Please correct me if I dont understand something:

1. fib_sync_down_dev() is used when interface is going down
to remove entires with stray nexthops (including multipath routes).

2. It takes as its argument device on which event (DOWN for short) is received
and force argument to force fib info entry deletion (which is true when
fib_sync_down_dev() called from fib_disable_ip() with 2 on UNREGISTER event.

Case, that patch is tries to address happens when we have two
or more addresses on interface, and NH exists in one of such subnet.

With two or more address on iface, fib_disable_ip() is not called on single 
address removal, so fib_sync_down_dev() also not called, and we end with 
routes with stray nexthop.

There is no problem with single address and NH in its subnet, as 
fib_sync_down_dev() called from fib_disable_ip().

When deleting IP address, we have net_device where address deleted
and deleted ifa entry.

Only thing that I miss is RTNH_F_ONLINK NH flag, should be consulted
before marking nexthop as dead. I will fix this in v2.

> IMHO, marking NH by exact nh_gw looks more acceptable because
> the exact GW becomes unreachable. Otherwise, you will need
> fib_lookup() as in fib_check_nh() to check that NH becomes
> unreachable.

Not sure that I fully understand you.

When deleting address and removing its subnet, instead of removing route from 
the FIB, resolve new NH with fib_lookup() if possible, as this done in 
fib_check_nh(), and leave route with modified NH?

Well, sounds good, but what to do with multipath routes?
Is this correct at all?

Thanks revieving my patches.

> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
SP5474-RIPE
Sergey Popovich

^ permalink raw reply

* Re: [PATCH net-next] Add Shradha Shah as the sfc driver maintainer.
From: Ben Hutchings @ 2014-01-23 14:59 UTC (permalink / raw)
  To: David Miller; +Cc: Shradha Shah, netdev, linux-net-drivers
In-Reply-To: <52E116FB.4080800@solarflare.com>

On Thu, 2014-01-23 at 13:19 +0000, Shradha Shah wrote:
> I will be taking over the work from Ben Hutchings.
> 
> Signed-off-by: Shradha Shah <sshah@solarflare.com>

Acked-by: Ben Hutchings <bhutchings@solarflare.com>

> ---
>  MAINTAINERS |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e11d495..fce0c9c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7631,7 +7631,7 @@ F:	drivers/net/ethernet/emulex/benet/
>  
>  SFC NETWORK DRIVER
>  M:	Solarflare linux maintainers <linux-net-drivers@solarflare.com>
> -M:	Ben Hutchings <bhutchings@solarflare.com>
> +M:	Shradha Shah <sshah@solarflare.com>
>  L:	netdev@vger.kernel.org
>  S:	Supported
>  F:	drivers/net/ethernet/sfc/

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v2] bridge: Fix crash with vlan filtering and tcpdump
From: Vlad Yasevich @ 2014-01-23 14:57 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: netdev
In-Reply-To: <1390480711.5347.8.camel@ubuntu-vm-makita>

On 01/23/2014 07:38 AM, Toshiaki Makita wrote:
> On Wed, 2014-01-22 at 13:24 -0500, Vlad Yasevich wrote:
>> When the vlan filtering is enabled on the bridge, but
>> the filter is not configured on the bridge device itself,
>> running tcpdump on the bridge device will result in a
>> an Oops with NULL pointer dereference.  The reason
>> is that br_pass_frame_up() will bypass the vlan
>> check because promisc flag is set.  It will then try
>> to get the table pointer and process the packet based
>> on the table.  Since the table pointer is NULL, we oops.
>> Catch this special condition in br_handle_vlan().
>>
>> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>> CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>> ---
>> Changes since v1:
>>   - Do not use a BUG or BUG_ON as it is possible to trigger the
>>     false BUG condition when thrashing the vlan_enable toggle.
>>     Instead just drop the skb.
> 
> Maybe that frame went through should_deliver() while vlan_filtering was
> disabled?

yes.  That's what happend.

> I hope it doesn't affect other codes in an adverse way...

I didn't see any adverse conditions while running the while(1) loop
changing vlan_flitering, but I didn't run it for all that long (about a
minute).

I am working on a series to reduce the possible races that involves
marking the skb (in the cb) and using that marking in later invocations.

> 
>>
>>  net/bridge/br_input.c | 11 ++++++-----
>>  net/bridge/br_vlan.c  | 14 ++++++++++++++
>>  2 files changed, 20 insertions(+), 5 deletions(-)
>>
> ...
>> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
>> index af5ebd1..abc841c 100644
>> --- a/net/bridge/br_vlan.c
>> +++ b/net/bridge/br_vlan.c
>> @@ -144,6 +144,20 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
>>  	if (!br->vlan_enabled)
>>  		goto out;
>>  
>> +	/* Vlan filter table must be configured at this point.  The
>> +	 * only exception is the bridge is set in promisc mode and the
>> +	 * packet is destined for the bridge device.  In this case
>> +	 * pass the packet as is.
>> +	 */
>> +	if (!pv) {
>> +		if ((br->dev->flags & IFF_PROMISC) && skb->dev == br->dev)
>> +			goto out;
>> +		else {
>> +			kfree_skb_list(skb);
> 
> Why is this not kfree_skb() but kfree_skb_list()?
> I haven't seen kfree_skb_list() in the bridge code.

I thought it is possible to get a list here, but upon further inspection
I don't see that happening.   I'll change that to kfree_skb().

Thanks
-vlad

> 
> Thanks,
> Toshiaki Makita
> 
>> +			return NULL;
>> +		}
>> +	}
>> +
>>  	/* At this point, we know that the frame was filtered and contains
>>  	 * a valid vlan id.  If the vlan id is set in the untagged bitmap,
>>  	 * send untagged; otherwise, send taged.
> 
> 

^ permalink raw reply

* [PATCH net-next] sfc: Use the correct maximum TX DMA ring size for SFC9100
From: Shradha Shah @ 2014-01-23 14:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

From: Ben Hutchings <bhutchings@solarflare.com>

As part of a workaround for a hardware erratum in the SFC9100 family
(SF bug 35388), the TX_DESC_UPD_DWORD register address is also used
for communicating with the event block, and only descriptor pointer
values < 2048 are valid.

If the TX DMA ring size is increased to 4096 descriptors (which the
firmware still allows) then we may write a descriptor pointer 
value >= 2048, which has entirely different and undesirable effects!

Limit the TX DMA ring size correctly when this workaround is in
effect.

Fixes: 8127d661e77f ('sfc: Add support for Solarflare SFC9100 family')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
---
This patch limits the TX DMA ring size to <= 2048.
This bug fix is needed for all versions supporting EF10
i.e all versions including and after 3.12.

 drivers/net/ethernet/sfc/efx.h     |    3 +++
 drivers/net/ethernet/sfc/ethtool.c |    4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 6012247..dbd7b78 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -66,6 +66,9 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
 #define EFX_RXQ_MIN_ENT		128U
 #define EFX_TXQ_MIN_ENT(efx)	(2 * efx_tx_max_skb_descs(efx))
 
+#define EFX_TXQ_MAX_ENT(efx)	(EFX_WORKAROUND_35388(efx) ? \
+				 EFX_MAX_DMAQ_SIZE / 2 : EFX_MAX_DMAQ_SIZE)
+
 /* Filters */
 
 /**
diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index f181522..2294289 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -591,7 +591,7 @@ static void efx_ethtool_get_ringparam(struct net_device *net_dev,
 	struct efx_nic *efx = netdev_priv(net_dev);
 
 	ring->rx_max_pending = EFX_MAX_DMAQ_SIZE;
-	ring->tx_max_pending = EFX_MAX_DMAQ_SIZE;
+	ring->tx_max_pending = EFX_TXQ_MAX_ENT(efx);
 	ring->rx_pending = efx->rxq_entries;
 	ring->tx_pending = efx->txq_entries;
 }
@@ -604,7 +604,7 @@ static int efx_ethtool_set_ringparam(struct net_device *net_dev,
 
 	if (ring->rx_mini_pending || ring->rx_jumbo_pending ||
 	    ring->rx_pending > EFX_MAX_DMAQ_SIZE ||
-	    ring->tx_pending > EFX_MAX_DMAQ_SIZE)
+	    ring->tx_pending > EFX_TXQ_MAX_ENT(efx))
 		return -EINVAL;
 
 	if (ring->rx_pending < EFX_RXQ_MIN_ENT) {

^ permalink raw reply related

* Produt Request
From: Jermaine Shannon @ 2014-01-23 14:26 UTC (permalink / raw)


Hello

Can you send us your full Product catalog, we want to buy and ship to Doha,
Qatar.

waiting
for your response
Jermaine  Shannon

^ permalink raw reply

* Re: Freescale FEC packet loss
From: Marek Vasut @ 2014-01-23 14:20 UTC (permalink / raw)
  To: fugang.duan@freescale.com
  Cc: netdev@vger.kernel.org, Frank.Li@freescale.com,
	Fabio.Estevam@freescale.com, Hector Palacios,
	linux-arm-kernel@lists.infradead.org, Detlev Zundel, Eric Nelson
In-Reply-To: <1288e1f778f74c669ad15b62d32e8859@BLUPR03MB373.namprd03.prod.outlook.com>

On Thursday, January 23, 2014 at 02:49:48 AM, fugang.duan@freescale.com wrote:
[...]

> >[  3] 71.0-72.0 sec  23.4 MBytes   196 Mbits/sec
> >[  3] 72.0-73.0 sec  12.2 MBytes   103 Mbits/sec
> >[  3] 73.0-74.0 sec  0.00 Bytes  0.00 bits/sec [  3] 74.0-75.0 sec  0.00
> >Bytes 0.00 bits/sec [  3] 75.0-76.0 sec  10.9 MBytes  91.2 Mbits/sec
> >[  3] 76.0-77.0 sec  22.4 MBytes   188 Mbits/sec
> >[  3] 77.0-78.0 sec  23.0 MBytes   193 Mbits/sec
> 
> I will debug the issue when I am free, and then report the result to you.
> Thanks for your reporting the issue.

Hi Andy,

Thanks for looking into this. Is there any way I can help you with figuring out 
the issue ? Do you need any more feedback or anything please ?

Thank you!

Best regards,
Marek Vasut

^ permalink raw reply

* Re: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Michael S. Tsirkin @ 2014-01-23 14:26 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Jason Wang, Linux Netdev List, Tom Herbert, Eric Dumazet,
	David S. Miller, Zhi Yong Wu, Stefan Hajnoczi, Rusty Russell
In-Reply-To: <CAEH94Lg8aL2UcrdgSW6ahtiSYTE0=z1o8Q0hHf0t9RpPVvGiDQ@mail.gmail.com>

On Thu, Jan 16, 2014 at 04:48:43PM +0800, Zhi Yong Wu wrote:
> On Thu, Jan 16, 2014 at 12:23 PM, Jason Wang <jasowang@redhat.com> wrote:
> > On 01/15/2014 10:20 PM, Zhi Yong Wu wrote:
> >>
> >> From: Zhi Yong Wu<wuzhy@linux.vnet.ibm.com>
> >>
> >> HI, folks
> >>
> >> The patchset is trying to integrate aRFS support to virtio_net. In this
> >> case,
> >> aRFS will be used to select the RX queue. To make sure that it's going
> >> ahead
> >> in the correct direction, although it is still one RFC and isn't tested,
> >> it's
> >> post out ASAP. Any comment are appreciated, thanks.
> >>
> >> If anyone is interested in playing with it, you can get this patchset from
> >> my
> >> dev git on github:
> >>    git://github.com/wuzhy/kernel.git virtnet_rfs
> >>
> >> Zhi Yong Wu (3):
> >>    virtio_pci: Introduce one new config api vp_get_vq_irq()
> >>    virtio_net: Introduce one dummy function virtnet_filter_rfs()
> >>    virtio-net: Add accelerated RFS support
> >>
> >>   drivers/net/virtio_net.c      |   67
> >> ++++++++++++++++++++++++++++++++++++++++-
> >>   drivers/virtio/virtio_pci.c   |   11 +++++++
> >>   include/linux/virtio_config.h |   12 +++++++
> >>   3 files changed, 89 insertions(+), 1 deletions(-)
> >>
> >
> > Please run get_maintainter.pl before sending the patch. You'd better at
> > least cc virtio maintainer/list for this.
> Is this one must for virtio stuff?

See Documentation/SubmittingPatches

> >
> > The core aRFS method is a noop in this RFC which make this series no much
> > sense to discuss. You should at least mention the big picture here in the
> > cover letter. I suggest you should post a RFC which can run and has expected
> > result or you can just raise a thread for the design discussion.
> Yes, it currently miss some important stuff as i said in another mail
> of this series.
> 
> >
> > And this method has been discussed before, you can search "[net-next RFC
> > PATCH 5/5] virtio-net: flow director support" in netdev archive for a very
> > old prototype implemented by me. It can work and looks like most of this RFC
> > have already done there.
> ah? Can you let me know the result of your discussion? Will checked
> it, thanks for your pointer.
> >
> > A basic question is whether or not we need this, not all the mq cards use
> > aRFS (see ixgbe ATR). And whether or not it can bring extra overheads? For
> > virtio, we want to reduce the vmexits as much as possible but this aRFS
> Good question, i also have concern about this, and don't know if Tom
> has good explanation.
> > seems introduce a lot of more of this. Making a complex interfaces just for
> > an virtual device may not be good, simple method may works for most of the
> > cases.
> >
> > We really should consider to offload this to real nic. VMDq and L2
> > forwarding offload may help in this case.
> 
> By the way, Stefan, can you let us know your concerns here? as we
> talked in irc channel. :)
> 
> 
> -- 
> Regards,
> 
> Zhi Yong Wu

^ permalink raw reply

* Re: Linux kernel patch: elide fib_validate_source() completely when possible - bad side effect?
From: Hannes Frederic Sowa @ 2014-01-23 14:20 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, gregory.hoggarth, netdev
In-Reply-To: <alpine.LFD.2.11.1401231056530.2682@ja.home.ssi.bg>

Hi Julian,

On Thu, Jan 23, 2014 at 11:47:33AM +0200, Julian Anastasov wrote:
> On Thu, 23 Jan 2014, Hannes Frederic Sowa wrote:
> 
> > E.g. we register all local registered broadcast addresses in a structure
> > like inet_addr_lst so we only need to check if the packet would leave
> > this host with a broadcast hardware address. If the packet is forwarded
> > the router must do the same check as only it knows the local broadcast
> > addresses. I hope this is correct. ;)
> 
> 	Now when we can override the local table with ip
> rules or to prepend route in local table, we can not be
> sure that the broadcast routes are returned in all cases,
> users can add unicast routes for such addresses.
> 
> 	I don't remember for useful case where one may need to
> override broadcast routes but adding such exceptions looks
> risky.

The check I envisioned would only actually block the ifa_broadcast of only the
incoming interface as source address. So we wouldn't depend on the view of the
routing tables at all.

But I have no strong opinion on that.

Greetings,

  Hannes

^ permalink raw reply

* Re: Fwd: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Michael S. Tsirkin @ 2014-01-23 14:23 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Zhi Yong Wu, Ben Hutchings, Stefan Hajnoczi, Linux Netdev List,
	Eric Dumazet, David S. Miller, Zhi Yong Wu, Rusty Russell,
	Jason Wang
In-Reply-To: <CA+mtBx-Xv6xp3VCq=PKMyboAmDjUWzVrbUhr4WdKQRi1PCansg@mail.gmail.com>

On Wed, Jan 22, 2014 at 10:00:37AM -0800, Tom Herbert wrote:
> >> 1. The aRFS interface for the guest to specify which virtual queue to
> >> receive a packet on is fairly straight forward.
> >> 2. To hook into RFS, we need to match the virtual queue to the real
> >> CPU it will processed on, and then program the RFS table for that flow
> >> and CPU.
> >> 3. NIC aRFS keys off the RFS tables so it can program the HW with the
> >> correct queue for the CPU.
> > Does anyone have time to make one conclusion for this discussion? in
> > particular, how will rx packet be steered up the stack from guest
> > virtio_net driver, virtio_net NIC, vhost_net, tun driver, host network
> > stack, to physical NIC with more details?
> > What is the role of each path units? otherwise this discussion wont
> > get any meanful result, which is not what we expect.
> >
> Working code outweighs theoretical discussion :-).

So far all that was posted was an untested patchset.
Zhi Yong Wu did this intentionally to get early feedback, so
it's not surprising we got a theoretical discussion
as opposed to working code out of this.

> I think you started
> on a good track with original patches, and I believe the tun path
> should work pretty well (some performance numbers would still be good
> to validate).

Yes, merging performance-oriented patches without any
performance numbers to show for it is strange.

> Seems like there's enough hooks in the virtio_net path
> to implement something meaningful and maybe get some numbers (maybe
> ignore NIC aRFS in the first pass).
> 
> Tom
> 
> >>
> >>> Stefan
> >
> >
> >
> > --
> > Regards,
> >
> > Zhi Yong Wu

^ permalink raw reply

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
From: Hannes Frederic Sowa @ 2014-01-23 14:11 UTC (permalink / raw)
  To: Gao feng; +Cc: Sabrina Dubroca, netdev
In-Reply-To: <52E0B55F.4070501@cn.fujitsu.com>

On Thu, Jan 23, 2014 at 02:23:27PM +0800, Gao feng wrote:
> >  
> > -static int fib6_ifdown(struct rt6_info *rt, void *arg)
> > +static int __fib6_match_or_update_if(struct rt6_info *rt, void *arg)
> >  {
> >  	const struct arg_dev_net *adn = arg;
> >  	const struct net_device *dev = adn->dev;
> >  
> >  	if ((rt->dst.dev == dev || !dev) &&
> > -	    rt != adn->net->ipv6.ip6_null_entry)
> > -		return -1;
> > +	    rt != adn->net->ipv6.ip6_null_entry) {
> > +		switch (adn->action) {
> > +		case ARG_DEV_NET_REMOVE:
> > +			/* remove rt */
> > +			return -1;
> > +		case ARG_DEV_NET_DISABLE:
> > +			WARN_ON(rt->rt6i_flags & RTF_DEAD);
> > +			rt->rt6i_flags |= RTF_DEAD;
> > +			return 0;
> > +		case ARG_DEV_NET_ENABLE:
> > +			WARN_ON(!(rt->rt6i_flags & RTF_DEAD));
> 
> I think this may happen, think a new router is cloned from this rt but hasn't been inserted
> into the router tree on other CPU at the same time.

Yes, there are still some problems with this patch. I actually shuffeled
the code around to purge all those RTF_CACHE RTF_DEAD routes yesterday
while holding the same write_lock on the table.

Regarding the problem you addressed, I tried to validate the route
using dst.from on insert while holding write_lock, somehow like we do
for checking expire time. But I really dislike this.

Thanks,

  Hannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox