Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v4 0/3] path mtu hardening patches
From: David Miller @ 2014-01-13 19:25 UTC (permalink / raw)
  To: hannes; +Cc: netdev, eric.dumazet, johnwheffner, steffen.klassert, fweimer
In-Reply-To: <1389258077-23282-1-git-send-email-hannes@stressinduktion.org>

From: hannes@stressinduktion.org
Date: Thu,  9 Jan 2014 10:01:14 +0100

> After a lot of back and forth I want to propose these changes regarding
> path mtu hardening and give an outline why I think this is the best way
> how to proceed:

I'm not going to fight this any more even though I still disagree with
these changes.  John Heffner has not provided a coherent strong
argument for not doing this, in fact the counter arguments were
extremely vague.

I am pretty sure that now my worst fears will be realized and every
single distribution will not use the kernel's default, and everyone
will get this behavior rather than adminstrators making well informed
decisions about how to defend against these kind of situations when
enabling routing, or whether they'd even be exposed to the issue at
all in a particular setup.

Such is life.

^ permalink raw reply

* Re: [PATCH] HHF qdisc: fix jiffies-time conversion.
From: David Miller @ 2014-01-13 19:20 UTC (permalink / raw)
  To: vtlam; +Cc: netdev, nanditad, stephen
In-Reply-To: <1389256800-18809-1-git-send-email-vtlam@google.com>

From: Terry Lam <vtlam@google.com>
Date: Thu,  9 Jan 2014 00:40:00 -0800

> This is to be compatible with the use of "get_time" (i.e. default
> time unit in us) in iproute2 patch for HHF as requested by Stephen.
> 
> Signed-off-by: Terry Lam <vtlam@google.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 3/3] r8152: add supporting the vendor mode only
From: David Miller @ 2014-01-13 19:19 UTC (permalink / raw)
  To: hayeswang; +Cc: oliver, netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1389250232-8663-4-git-send-email-hayeswang@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Thu, 9 Jan 2014 14:50:32 +0800

> Remove the limitation that the ecm and r8152 drivers couldn't coexist.
> Besides, add the feature to support the vendor mode only. This let
> someone who doesn't want to use ecm driver easy to use the vendor
> driver without creating the udev rule.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

This is a very bad precedence and I do not want drivers to start doing
things like this.

I think it would be wiser to use the existing facilities which exist
to control this situation, and yes udev is one of them.

I'm sorry, I'm not applying this series as long as it has this patch
in it.

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] virtio-net: initial debugfs support, export mergeable rx buffer size
From: Michael Dalton @ 2014-01-13 19:19 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Michael S. Tsirkin, netdev, lf-virt, Eric Dumazet,
	David S. Miller
In-Reply-To: <CANJ5vPLamPonELhzEEs_e+BQUDrY0w=KpwpzSwGpG551rnbfbQ@mail.gmail.com>

Sorry I missed this important piece of information, it appears that
netdev_queue (the TX equivalent of netdev_rx_queue) already has
decoupled itself from CONFIG_XPS due to an attribute,
queue_trans_timeout, that does not depend on XPS functionality. So it
seems that something somewhat equivalent has already happened on the
TX side.

Best,

Mike

^ permalink raw reply

* Re: [PATCH net-next] qlcnic: Convert vmalloc/memset to kcalloc
From: David Miller @ 2014-01-13 19:17 UTC (permalink / raw)
  To: joe; +Cc: jitendra.kalsaria, linux-driver, netdev, linux-kernel
In-Reply-To: <1389249745.24222.20.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Wed, 08 Jan 2014 22:42:25 -0800

> vmalloc is a limited resource.  Don't use it unnecessarily.
> 
> It seems this allocation should work with kcalloc.
> 
> Remove unnecessary memset(,0,) of buf as it's completely
> overwritten as the previously only unset field in
> struct qlcnic_pci_func_cfg is now set to 0.
> 
> Use kfree instead of vfree.
> Use ETH_ALEN instead of 6.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied, thanks Joe.

^ permalink raw reply

* Re: [patch] cxgb4: silence shift wrapping static checker warning
From: David Miller @ 2014-01-13 19:16 UTC (permalink / raw)
  To: dan.carpenter; +Cc: dm, kumaras, netdev, kernel-janitors
In-Reply-To: <20140109053400.GF1265@elgon.mountain>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Thu, 9 Jan 2014 08:34:00 +0300

> I don't know how large "tp->vlan_shift" is but static checkers worry
> about shift wrapping bugs here.
> 
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied to 'net'.

I don't think I'll be able to get this into 3.13, but on the off chance
Linus has to do another -rc I'll try to send it to him.

^ permalink raw reply

* Re: [PATCH net 0/2] bonding: ensure that the TSO being set on bond master
From: David Miller @ 2014-01-13 19:15 UTC (permalink / raw)
  To: dingtianhong; +Cc: fubar, vfalico, edumazet, netdev
In-Reply-To: <52CCFE15.2020605@huawei.com>

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Wed, 8 Jan 2014 15:28:21 +0800

> The commit b0ce3508(bonding: allow TSO being set on bonding master)
> has make the TSO being set for bond dev, but in some situation, if
> the slave did not have the NETIF_F_SG features, the bond master will
> miss the TSO features in netdev_fix_features because the TSO is
> depended on SG. So I have to add SG and TSO features on bond master
> together.
> 
> The function netdev_add_tso_features() was only be used for bonding,
> so no need to export it in netdevice.h, remove it and add it to bonding.

As far as I can tell from the discussion, there is some issue wrt. TSO
about what happens if SG is not supported by some of the slaves.

>From my perspective it appears that some changes to these patches are
necessary to handle that correctly.

So I am going to mark them as "Changes Requested" in patchwork.

If this is not the case, please resubmit these changes with appropriate
explanations added to the commit message(s).

Thanks.

^ permalink raw reply

* Re: [PATCH 0/2] net/mlx4_core: clean up two functions
From: David Miller @ 2014-01-13 19:13 UTC (permalink / raw)
  To: pebolle; +Cc: ogerlitz, jackm, ronye, hadarh, netdev, linux-kernel
In-Reply-To: <1389099622.15032.18.camel@x41>

From: Paul Bolle <pebolle@tiscali.nl>
Date: Tue, 07 Jan 2014 14:00:22 +0100

> 0) These two patches are very similar. They both clean up a function to
> help GCC understand the codeflow. Both help silence a warning.
> 
> 1) Compile tested only (on 32 bits x86). I do not have this hardware.
> 
> 2) Please note that there's no MAINTAINERS entry for mlx4_core. (Neither
> is there an entry for the MLX4 IB driver.) Shouldn't it be added? 

These patches have been rotting for a week.  I know the mlx4 folks
said the SRIOV guy inside Mellanox will look at it, but this is taking
way too long.

This is absolutely unreasonable from Paul's perspective to have to wait
this long for a review of these relatively simple patches.

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] virtio-net: initial debugfs support, export mergeable rx buffer size
From: Michael Dalton @ 2014-01-13 19:07 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Michael S. Tsirkin, netdev, lf-virt, Eric Dumazet,
	David S. Miller
In-Reply-To: <1389627488.2025.134.camel@bwh-desktop.uk.level5networks.com>

On Mon, Jan 13, 2014 at 7:38 AM, Ben Hutchings <bhutchings@solarflare.com>
wrote:
> I don't think RPS should own this structure.  It's just that there are
> currently no per-RX-queue attributes other than those defined by RPS.

Agreed, there is useful attribute-independent functionality already
built around netdev_rx_queue - e.g., dynamically resizing the rx queue
kobjs as the number of RX queues enabled for the netdev is changed. While
the current attributes happen to be used only by RPS, AFAICT it seems
RPS should not own netdev_rx_queue but rather should own the RPS-specific
fields themselves within netdev_rx_queue.

If there are no objections, it seems like I could modify
netdev_rx_queue and related functionality so that their existence does
not depend on CONFIG_RPS, and instead just have CONFIG_RPS control
whether or not the RPS-specific attributes/fields are present.

Best,

Mike

^ permalink raw reply

* Re: [PATCHv2 net-next 1/5] {IPv4,xfrm} Add ESN support for AH egress part
From: Sergei Shtylyov @ 2014-01-13 18:37 UTC (permalink / raw)
  To: Fan Du, steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389599324-5174-2-git-send-email-fan.du@windriver.com>

Hello.

On 13-01-2014 11:48, Fan Du wrote:

> This patch add esn support for AH output stage by attaching upper 32bits
> sequence number right after packet payload as specified by RFC 4302.

> Then the ICV value will guard upper 32bits sequence number as well when
> packet going out.

> Signed-off-by: Fan Du <fan.du@windriver.com>
> ---
>   net/ipv4/ah4.c |   25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)

> diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
> index 7179026..a7fac03 100644
> --- a/net/ipv4/ah4.c
> +++ b/net/ipv4/ah4.c
[...]
> @@ -213,7 +223,14 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
>   	sg_init_table(sg, nfrags);
>   	skb_to_sgvec(skb, sg, 0, skb->len);
>
> -	ahash_request_set_crypt(req, sg, icv, skb->len);
> +	if ((x->props.flags & XFRM_STATE_ESN)) {

    What's the point in double parens here? The same question about the other 
patches....

WBR, Sergei

^ permalink raw reply

* [PATCH-next] netfilter: don't use module_init/exit in core IPV4 code
From: Paul Gortmaker @ 2014-01-13 18:35 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Patrick McHardy, Jozsef Kadlecsik,
	David S. Miller
  Cc: netfilter, netdev, Paul Gortmaker

The file net/ipv4/netfilter.o is created based on whether
CONFIG_NETFILTER is set.  However that is defined as a bool, and
hence this file with the core netfilter hooks will never be
modular.  So using module_init as an alias for __initcall can be
somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.  Also add an inclusion of init.h, as
that was previously implicit here in the netfilter.c file.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
seems to make sense for netfilter code) will thus change this
registration from level 6-device to level 4-subsys (i.e. slightly
earlier).  However no observable impact of that small difference
has been observed during testing, or is expected. (i.e. the
location of the netfilter messages in dmesg remains unchanged
with respect to all the other surrounding messages.)

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index c3e0adea9c27..e1237a64b14e 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -7,6 +7,7 @@
 #include <linux/kernel.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter_ipv4.h>
+#include <linux/init.h>
 #include <linux/ip.h>
 #include <linux/skbuff.h>
 #include <linux/gfp.h>
@@ -203,5 +204,5 @@ static void __exit ipv4_netfilter_fini(void)
 	nf_unregister_afinfo(&nf_ip_afinfo);
 }
 
-module_init(ipv4_netfilter_init);
-module_exit(ipv4_netfilter_fini);
+subsys_initcall(ipv4_netfilter_init);
+__exitcall(ipv4_netfilter_fini);
-- 
1.8.5.2

^ permalink raw reply related

* Re: randconfig build error with next-20140113, in net/netfilter/nft_reject.c
From: Paul Gortmaker @ 2014-01-13 18:12 UTC (permalink / raw)
  To: Jim Davis
  Cc: Stephen Rothwell, linux-next@vger.kernel.org, LKML, pablo,
	Patrick McHardy, kadlec, David S. Miller, netfilter-devel,
	netfilter, coreteam, netdev
In-Reply-To: <CA+r1Zhjqrmm8a9fVsjfXamCE1qQq=YBexpeU95ODi=BiSie7aA@mail.gmail.com>

On Mon, Jan 13, 2014 at 8:51 AM, Jim Davis <jim.epost@gmail.com> wrote:
> Building with the attached random configuration file,
>
> warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet
> direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
> warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet
> direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
>
>   LD      init/built-in.o
> net/built-in.o: In function `nft_reject_eval':
> nft_reject.c:(.text+0x67bc1): undefined reference to `nf_ip6_checksum'
> nft_reject.c:(.text+0x67c28): undefined reference to `ip6_route_output'
> nft_reject.c:(.text+0x67d09): undefined reference to `ip6_dst_hoplimit'
> make: *** [vmlinux] Error 1

I came across the same thing independently while working on an
unrelated netfilter patch.  Bisected and fix sent.

http://patchwork.ozlabs.org/patch/310298/

^ permalink raw reply

* Re: [ovs-dev] [GIT net-next] Open vSwitch
From: Zoltan Kiss @ 2014-01-13 18:04 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, dev@openvswitch.org, netdev
In-Reply-To: <CAEP_g=_pQpH--V+7GSrjGP2z=0TBD_37yXjzTdeKG+2OF3aRaw@mail.gmail.com>

On 08/01/14 15:10, Jesse Gross wrote:
> On Wed, Jan 8, 2014 at 9:49 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>> Hi,
>>
>> I've tried the latest net-next on a Xenserver install with 1.9.3 userspace,
>> and it seems this patch series broke it (at least after reverting that
>> locally it works now). I haven't went too far yet checking what's the
>> problem, but it seems the xenbrX device doesn't really receive too much of
>> the traffic coming through the NIC. Is it expected?
>
> What do you mean by doesn't receive too much traffic? What does it get?
>

Sorry for the vague error description, now I had more time to look into 
this. I think the problem boils down to this:

Jan 13 17:55:07 localhost ovs-vswitchd: 
07890|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:274, 
type=29(ovs_packet), flags=0, seq=0, pid=0,genl(cmd=1,version=1)
Jan 13 17:55:07 localhost ovs-vswitchd: 07891|netlink|DBG|attributes 
followed by garbage
Jan 13 17:55:07 localhost ovs-vswitchd: 07892|dpif|WARN|system@xenbr0: 
recv failed (Invalid argument)

That's just keep repeating. I'm keep looking.

Zoli

^ permalink raw reply

* [PATCH] netfilter: Add dependency on IPV6 for NF_TABLES_INET
From: Paul Gortmaker @ 2014-01-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Patrick McHardy,
	Jozsef Kadlecsik
  Cc: netfilter, netdev, Paul Gortmaker

Commit 1d49144c0aaa61be4e3ccbef9cc5c40b0ec5f2fe ("netfilter: nf_tables:
add "inet" table for IPv4/IPv6") allows creation of non-IPV6 enabled
.config files that will fail to configure/link as follows:

warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
net/built-in.o: In function `nft_reject_eval':
nft_reject.c:(.text+0x651e8): undefined reference to `nf_ip6_checksum'
nft_reject.c:(.text+0x65270): undefined reference to `ip6_route_output'
nft_reject.c:(.text+0x656c4): undefined reference to `ip6_dst_hoplimit'
make: *** [vmlinux] Error 1

Since the feature is to allow for a mixed IPV4 and IPV6 table, it
seems sensible to make it depend on IPV6.

Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index afe50c0f526f..c37467562fd0 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -429,7 +429,7 @@ config NF_TABLES
 	  To compile it as a module, choose M here.
 
 config NF_TABLES_INET
-	depends on NF_TABLES
+	depends on NF_TABLES && IPV6
 	select NF_TABLES_IPV4
 	select NF_TABLES_IPV6
 	tristate "Netfilter nf_tables mixed IPv4/IPv6 tables support"
-- 
1.8.5.2

^ permalink raw reply related

* Re: [PATCH net 2/2] bonding: rename the dev upper link if the master's, name changed
From: Sergei Shtylyov @ 2014-01-13 17:54 UTC (permalink / raw)
  To: Ding Tianhong, Jay Vosburgh, Veaceslav Falico, Eric Dumazet,
	David S. Miller, Netdev
In-Reply-To: <52D3E55B.8080401@huawei.com>

On 13-01-2014 17:08, Ding Tianhong wrote:

> The bond_maste_rename() will rename the links for slave dev's upper dev link,

    s/maste/master/.

> if faild, it will rollback and rename the new name to old name for slave dev.

    s/faild/failed/.

> Add a new parameter called name to save the old bonding name in struct bonding.

> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>   drivers/net/bonding/bond_main.c | 35 +++++++++++++++++++++++++++++++++++
>   drivers/net/bonding/bonding.h   |  1 +
>   2 files changed, 36 insertions(+)

> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 4b8c58b..8c044c0 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2799,11 +2799,41 @@ re_arm:
>
>   /*-------------------------- netdev event handling --------------------------*/
>
> +static int bond_master_rename(struct bonding *bond)
> +{
> +	struct slave *slave;
> +	struct list_head *iter;
> +	char ori_linkname[IFNAMSIZ + 7], new_linkname[IFNAMSIZ + 7];

    Perhaps s/ori/old/ is better?

> +	int err = 0;
> +
> +	sprintf(ori_linkname, "upper_%s", bond->name);
> +	sprintf(new_linkname, "upper_%s", bond->dev->name);
> +
> +	bond_for_each_slave(bond, slave, iter) {
> +

    No need for this empty line, I think.

> +		err = netdev_upper_dev_rename(slave->dev, bond->dev, ori_linkname,
> +					new_linkname);

    The continuation line should start right under 'slave' on the broken up line.

WBR, Sergei

^ permalink raw reply

* [PATCH net-next] IPv6: add option to use anycast addresses as source addresses in icmp error messages
From: Francois-Xavier Le Bail @ 2014-01-13 17:22 UTC (permalink / raw)
  To: netdev
  Cc: Bill Fink, Hannes Frederic Sowa, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki Yoshifuji,
	Patrick McHardy, Francois-Xavier Le Bail

- Add "anycast_src_icmp_error" sysctl to control the use of anycast addresses
  as source addresses for ICMPv6 error messages. This sysctl is false by
  default to preserve existing behavior.
- Use it in icmp6_send().

Suggested-by: Bill Fink <billfink@mindspring.com>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
---
This change allows to follow a recommandation of RFC4942.

RFC4942 - IPv6 Transition/Coexistence Security Considerations
   (http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security
   [...]
   To avoid exposing knowledge about the internal structure of the
   network, it is recommended that anycast servers now take advantage of
   the ability to return responses with the anycast address as the
   source address if possible.

 include/net/netns/ipv6.h   |    1 +
 net/ipv6/icmp.c            |    4 +++-
 net/ipv6/sysctl_net_ipv6.c |    8 ++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 592fecd..75abbbe 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -29,6 +29,7 @@ struct netns_sysctl_ipv6 {
 	int ip6_rt_min_advmss;
 	int icmpv6_time;
 	int anycast_src_echo_reply;
+	int anycast_src_icmp_error;
 };
 
 struct netns_ipv6 {
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 13640f2..410aa32 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -412,7 +412,9 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 	 */
 	addr_type = ipv6_addr_type(&hdr->daddr);
 
-	if (ipv6_chk_addr(net, &hdr->daddr, skb->dev, 0))
+	if (ipv6_chk_addr(net, &hdr->daddr, skb->dev, 0) ||
+	    (net->ipv6.sysctl.anycast_src_icmp_error &&
+	     ipv6_chk_acast_addr_src(net, skb->dev, &hdr->daddr)))
 		saddr = &hdr->daddr;
 
 	/*
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index b51b268..5f6d5bb 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -31,6 +31,13 @@ static struct ctl_table ipv6_table_template[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "anycast_src_icmp_error",
+		.data		= &init_net.ipv6.sysctl.anycast_src_icmp_error,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{ }
 };
 
@@ -59,6 +66,7 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
 		goto out;
 	ipv6_table[0].data = &net->ipv6.sysctl.bindv6only;
 	ipv6_table[1].data = &net->ipv6.sysctl.anycast_src_echo_reply;
+	ipv6_table[2].data = &net->ipv6.sysctl.anycast_src_icmp_error;
 
 	ipv6_route_table = ipv6_route_sysctl_init(net);
 	if (!ipv6_route_table)

^ permalink raw reply related

* [PATCH net-next 2/2] net: vxlan: properly cleanup devs on module unload
From: Daniel Borkmann @ 2014-01-13 17:41 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1389634880-4138-1-git-send-email-dborkman@redhat.com>

We should use vxlan_dellink() handler in vxlan_exit_net(), since
i) we're not in fast-path and we should be consistent in dismantle
just as we would remove a device through rtnl ops, and more
importantly, ii) in case future code will kfree() memory in
vxlan_dellink(), we would leak it right here unnoticed. Therefore,
do not only half of the cleanup work, but make it properly.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 drivers/net/vxlan.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 81c553f..d2acb6b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2711,13 +2711,13 @@ static __net_init int vxlan_init_net(struct net *net)
 static __net_exit void vxlan_exit_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
-	struct vxlan_dev *vxlan;
-	LIST_HEAD(list);
+	struct vxlan_dev *vxlan, *next;
+	LIST_HEAD(list_kill);
 
 	rtnl_lock();
-	list_for_each_entry(vxlan, &vn->vxlan_list, next)
-		unregister_netdevice_queue(vxlan->dev, &list);
-	unregister_netdevice_many(&list);
+	list_for_each_entry_safe(vxlan, next, &vn->vxlan_list, next)
+		vxlan_dellink(vxlan->dev, &list_kill);
+	unregister_netdevice_many(&list_kill);
 	rtnl_unlock();
 }
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 1/2] net: vxlan: when lower dev unregisters remove vxlan dev as well
From: Daniel Borkmann @ 2014-01-13 17:41 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1389634880-4138-1-git-send-email-dborkman@redhat.com>

We can create a vxlan device with an explicit underlying carrier.
In that case, when the carrier link is being deleted from the
system (e.g. due to module unload) we should also clean up all
created vxlan devices on top of it since otherwise we're in an
inconsistent state in vxlan device. In that case, the user needs
to remove all such devices, while in case of other virtual devs
that sit on top of physical ones, it is usually the case that
these devices do unregister automatically as well and do not
leave the burden on the user.

This work is not necessary when vxlan device was not created with
a real underlying device, as connections can resume in that case
when driver is plugged again. But at least for the other cases,
we should go ahead and do the cleanup on removal.

We don't register the notifier during vxlan_newlink() here since
I consider this event rather rare, and therefore we should not
bloat vxlan's core structure unecessary. Also, we can simply make
use of unregister_netdevice_many() to batch that. fdb is flushed
upon ndo_stop().

E.g. `ip -d link show vxlan13` after carrier removal before
this patch:

5: vxlan13: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default
    link/ether 1e:47:da:6d:4d:99 brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 13 group 239.0.0.10 dev 2 port 32768 61000 ageing 300
                                 ^^^^^
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 drivers/net/vxlan.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 481f85d..81c553f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2656,6 +2656,44 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.fill_info	= vxlan_fill_info,
 };
 
+static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
+					     struct net_device *dev)
+{
+	struct vxlan_dev *vxlan, *next;
+	LIST_HEAD(list_kill);
+
+	list_for_each_entry_safe(vxlan, next, &vn->vxlan_list, next) {
+		struct vxlan_rdst *dst = &vxlan->default_dst;
+
+		/* In case we created vxlan device with carrier
+		 * and we loose the carrier due to module unload
+		 * we also need to remove vxlan device. In other
+		 * cases, it's not necessary and remote_ifindex
+		 * is 0 here, so no matches.
+		 */
+		if (dst->remote_ifindex == dev->ifindex)
+			vxlan_dellink(vxlan->dev, &list_kill);
+	}
+
+	unregister_netdevice_many(&list_kill);
+}
+
+static int vxlan_lowerdev_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
+
+	if (event == NETDEV_UNREGISTER)
+		vxlan_handle_lowerdev_unregister(vn, dev);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block vxlan_notifier_block __read_mostly = {
+	.notifier_call = vxlan_lowerdev_event,
+};
+
 static __net_init int vxlan_init_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -2704,12 +2742,17 @@ static int __init vxlan_init_module(void)
 	if (rc)
 		goto out1;
 
-	rc = rtnl_link_register(&vxlan_link_ops);
+	rc = register_netdevice_notifier(&vxlan_notifier_block);
 	if (rc)
 		goto out2;
 
-	return 0;
+	rc = rtnl_link_register(&vxlan_link_ops);
+	if (rc)
+		goto out3;
 
+	return 0;
+out3:
+	unregister_netdevice_notifier(&vxlan_notifier_block);
 out2:
 	unregister_pernet_device(&vxlan_net_ops);
 out1:
@@ -2721,6 +2764,7 @@ late_initcall(vxlan_init_module);
 static void __exit vxlan_cleanup_module(void)
 {
 	rtnl_link_unregister(&vxlan_link_ops);
+	unregister_netdevice_notifier(&vxlan_notifier_block);
 	destroy_workqueue(vxlan_wq);
 	unregister_pernet_device(&vxlan_net_ops);
 	rcu_barrier();
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 0/2] vxlan updates
From: Daniel Borkmann @ 2014-01-13 17:41 UTC (permalink / raw)
  To: davem; +Cc: netdev

Did the split into two patches upon request from Cong Wang.

Changelog:

 v1->v2:
  - Removed BUG_ON as it's not needed.
 v2->v3:
  - Removed dev->reg_state check for netns.
 v3->v4:
  - Removed list_del(), we seem to do it in some places and
    in some others not; we agreed it's not really necessary.
  - Split patch into 2 patches, notifier part and module
    unload cleanup part.

Daniel Borkmann (2):
  net: vxlan: when lower dev unregisters remove vxlan dev as well
  net: vxlan: properly cleanup devs on module unload

 drivers/net/vxlan.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 51 insertions(+), 7 deletions(-)

-- 
1.7.11.7

^ permalink raw reply

* Re: [PATCH net 0/2] bonding: fix the sysfs warning when change the master's name
From: Veaceslav Falico @ 2014-01-13 17:24 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: Jay Vosburgh, David S. Miller, Netdev, Eric Dumazet
In-Reply-To: <52D3E536.8080404@huawei.com>

On Mon, Jan 13, 2014 at 09:08:06PM +0800, Ding Tianhong wrote:
>When I change the master's name, and then rebuild the master and ensalve a nic again,
>than I got the calltrace:
>
>[329215.749344] WARNING: CPU: 0 PID: 4778 at fs/sysfs/dir.c:486 sysfs_warn_dup+0x87/0xa0()
>[329215.749347] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth100/upper_bond0'
...snip...
>[329215.749494]  [<ffffffff81205b27>] sysfs_warn_dup+0x87/0xa0
>[329215.749500]  [<ffffffff81205eed>] sysfs_add_one+0x4d/0x50
>[329215.749505]  [<ffffffff81206f9e>] sysfs_do_create_link_sd+0xbe/0x210
>[329215.749511]  [<ffffffff812951a0>] ? sprintf+0x40/0x50
>[329215.749516]  [<ffffffff8120714b>] sysfs_create_link+0x2b/0x30
>[329215.749523]  [<ffffffff8140a708>] __netdev_adjacent_dev_insert+0x1b8/0x270
>[329215.749528]  [<ffffffff8140a7f8>] __netdev_adjacent_dev_link_lists+0x38/0x90
>[329215.749533]  [<ffffffff8140a98b>] __netdev_upper_dev_link+0x13b/0x470
>[329215.749538]  [<ffffffff8141319c>] ? __ethtool_get_settings+0x5c/0x90
>[329215.749547]  [<ffffffffa0722179>] ? bond_update_speed_duplex+0x29/0x70 [bonding]
>[329215.749552]  [<ffffffff8140acd1>] netdev_master_upper_dev_link_private+0x11/0x20
>[329215.749561]  [<ffffffffa0729246>] bond_enslave+0x806/0xe40 [bonding]
>[329215.749570]  [<ffffffffa073241f>] bonding_store_slaves+0x18f/0x1c0 [bonding]
>[329215.749576]  [<ffffffff813757ab>] dev_attr_store+0x1b/0x20
>[329215.749581]  [<ffffffff812049cc>] sysfs_write_file+0x15c/0x1f0
>[329215.749587]  [<ffffffff81188897>] vfs_write+0xc7/0x1e0

It's unrelated to bonding, as it touches any device that uses netdev_adjacent
logic.

This case (renaming stale sysfs links) should be properly handled in
dev_change_name().

^ permalink raw reply

* [PATCH net-next v2] IPv6: enable bind() to assign an anycast address
From: Francois-Xavier Le Bail @ 2014-01-13 16:59 UTC (permalink / raw)
  To: netdev
  Cc: Hannes Frederic Sowa, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki Yoshifuji, Patrick McHardy,
	Francois-Xavier Le Bail

- Use ipv6_chk_acast_addr_src() in inet6_bind().

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
---
Tested with SOCK_DGRAM and SOCK_STREAM sockets.
Tested with link-local and global anycast addresses.

 net/ipv6/af_inet6.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index c921d5d..68b81e9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -347,7 +347,9 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			if (!(addr_type & IPV6_ADDR_MULTICAST))	{
 				if (!(inet->freebind || inet->transparent) &&
 				    !ipv6_chk_addr(net, &addr->sin6_addr,
-						   dev, 0)) {
+						   dev, 0) &&
+				    !ipv6_chk_acast_addr_src(net, dev,
+							     &addr->sin6_addr)) {
 					err = -EADDRNOTAVAIL;
 					goto out_unlock;
 				}

^ permalink raw reply related

* [PATCH net-next] IPv6: add a function to check for a valid anycast source address
From: Francois-Xavier Le Bail @ 2014-01-13 16:44 UTC (permalink / raw)
  To: netdev
  Cc: Hannes Frederic Sowa, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki Yoshifuji, Patrick McHardy,
	Francois-Xavier Le Bail

- Add ipv6_chk_acast_addr_src() to check if an anycast address is link-local
  on given interface or is global (on any interface).

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
---
 include/net/addrconf.h |    5 +++--
 net/ipv6/anycast.c     |   11 +++++++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 66c4a44..50e39a8 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -205,8 +205,9 @@ void ipv6_sock_ac_close(struct sock *sk);
 int ipv6_dev_ac_inc(struct net_device *dev, const struct in6_addr *addr);
 int __ipv6_dev_ac_dec(struct inet6_dev *idev, const struct in6_addr *addr);
 bool ipv6_chk_acast_addr(struct net *net, struct net_device *dev,
-				const struct in6_addr *addr);
-
+			 const struct in6_addr *addr);
+bool ipv6_chk_acast_addr_src(struct net *net, struct net_device *dev,
+			     const struct in6_addr *addr);
 
 /* Device notifier */
 int register_inet6addr_notifier(struct notifier_block *nb);
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 5a80f15..4f2c62a 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -383,6 +383,17 @@ bool ipv6_chk_acast_addr(struct net *net, struct net_device *dev,
 	return found;
 }
 
+/*	check if address is link-local on given interface
+ *	or is global (on any interface)
+ */
+bool ipv6_chk_acast_addr_src(struct net *net, struct net_device *dev,
+			     const struct in6_addr *addr)
+{
+	if (ipv6_addr_type(addr) & IPV6_ADDR_LINKLOCAL)
+		return ipv6_chk_acast_dev(dev, addr);
+	else
+		return ipv6_chk_acast_addr(net, NULL, addr);
+}
 
 #ifdef CONFIG_PROC_FS
 struct ac6_iter_state {

^ permalink raw reply related

* Re: [PATCH v4 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Tejun Heo @ 2014-01-13 16:55 UTC (permalink / raw)
  To: Jan Kaluza
  Cc: rgb-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1389600109-30739-1-git-send-email-jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hello,

On Mon, Jan 13, 2014 at 09:01:46AM +0100, Jan Kaluza wrote:
> this patchset against net-next (applies also to linux-next) adds 3 new types
> of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).
> 
> Server-like processes in many cases need credentials and other
> metadata of the peer, to decide if the calling process is allowed to
> request a specific action, or the server just wants to log away this
> type of information for auditing tasks.
> 
> The current practice to retrieve such process metadata is to look that
> information up in procfs with the $PID received over SCM_CREDENTIALS.
> This is sufficient for long-running tasks, but introduces a race which
> cannot be worked around for short-living processes; the calling
> process and all the information in /proc/$PID/ is gone before the
> receiver of the socket message can look it up.
> 
> Changes introduced in this patchset can also increase performance
> of such server-like processes, because current way of opening and
> parsing /proc/$PID/* files is much more expensive than receiving these
> metadata using SCM.

Closing the race sounds like a good idea to me.  What do net people
think?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH v4 3/3] Send cgroup_path in SCM_CGROUP
From: Tejun Heo @ 2014-01-13 16:52 UTC (permalink / raw)
  To: Jan Kaluza
  Cc: rgb-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1389600109-30739-4-git-send-email-jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Mon, Jan 13, 2014 at 09:01:49AM +0100, Jan Kaluza wrote:
> Server-like processes in many cases need credentials and other
> metadata of the peer, to decide if the calling process is allowed to
> request a specific action, or the server just wants to log away this
> type of information for auditing tasks.
> 
> The current practice to retrieve such process metadata is to look that
> information up in procfs with the $PID received over SCM_CREDENTIALS.
> This is sufficient for long-running tasks, but introduces a race which
> cannot be worked around for short-living processes; the calling
> process and all the information in /proc/$PID/ is gone before the
> receiver of the socket message can look it up.
> 
> This introduces a new SCM type called SCM_CGROUP to allow the direct
> attaching of "cgroup_path" to SCM, which is significantly more
> efficient and will reliably avoid the race with the round-trip over
> procfs.
> 
> Signed-off-by: Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

For cgroup related part:

 Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] sh_eth: fix garbled TX error message
From: Joe Perches @ 2014-01-13 16:51 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Simon Horman, netdev, linux-sh
In-Reply-To: <52D41117.3040809@cogentembedded.com>

On Mon, 2014-01-13 at 20:15 +0400, Sergei Shtylyov wrote:
> On 13-012014 5:24, Joe Perches wrote:
> >>> sh_eth_error() in case of a TX error tries to print a message using 2 dev_err()
> >>> calls with the first string not finished by '\n', so that the resulting message
> >>> would inevitably come out garbled, with something like "3net eth0: " inserted
> >>> in the middle.  Avoid that by merging 2 calls into one.
> 
> > I believe this interleaving should not happen since
> > commit e28d713704117bca0820c732210df6075b09f13b
> > (2.6.31 days)
> 
>     I believe you have given me the wrong commit, which has nothing to do the 
> the newline problem per se. It just adds KERN_DEFAULT. I was able to find the 
> correct commit though: it's the parent of the commit you cited, 

Yeah, I was just scanning the printk commit logs
and looked for the first one that said something
like force newline.

> I should have tested my assumption beforehand... 

Always a good thing.

> (I'd like to merge these 
> dev_err() calls still though).

Yes, the dev_err calls should be merged.

About the other stuff, what/when ever...

cheers, Joe

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox