Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices
From: Florian Fainelli @ 2017-05-08 17:35 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: roopa, nicolas.dichtel
In-Reply-To: <20170506160734.47084-1-dsahern@gmail.com>

On 05/06/2017 09:07 AM, David Ahern wrote:
> As I have mentioned many times[1], at ~43+kB per instance the use of
> net_devices does not scale for deployments needing 10,000+ devices. At
> netconf 1.2 there was a discussion about using a net_device_common for
> the minimal set of common attributes with other structs built on top of
> that one for "full" devices. It provided a means for the code to know
> "non-standard" net_devices. Conceptually, that approach has its merits
> but it is not practical given the sweeping changes required to the code
> base. More importantly though struct net_device is not the problem; it
> weighs in at less than 2kB so reorganizing the code base around a
> refactored net_device is not going to solve the problem. The primary
> issue is all of the initializations done *because* it is a struct
> net_device -- kobject and sysfs and the protocols (e.g., ipv4, ipv6,
> mpls, neighbors).
> 
> So, how do you keep the desired attributes of a net device -- network
> addresses, xmit function, qdisc, netfilter rules, tcpdump -- while
> lowering the overhead of a net_device instance and without sweeping
> changes across net/ and drivers/net/?
> 
> This patch set introduces the concept of labeling net_devices as
> "lightweight", first mentioned at netdev 1.1 [1]. Users have to opt
> in to lightweight devices by passing a new attribute, IFLA_LWT_NETDEV,
> in the new link request. This lightweight tag is meant for virtual
> devices such as vlan, vrf, vti, and dummy where the user expects to
> create a lot of them and does not want the duplication of resources.
> Each device type can always opt out of a lightweight label if necessary
> by failing device creates.
> 
> Labeling a virtual device as "lightweight" reduces the footprint for
> device creation from ~43kB to ~6kB. That reduction in memory is obtained
> by:
> 1. no entry in sysfs
>    - kobject in net_device.device is not initialized
> 
> 2. no entry in procfs
>    - no sysctl option for these devices
> 
> 3. deferred ipv4, ipv6, mpls initialization
>    - network layer must be enabled before an address can be assigned
>      or mpls labels can be processed
>    - enables what Florian called L2 only devices [2]
> 
> Once the core premise of a lightweight device is accepted, follow on
> patches can reduce the overhead of network initializations. e.g.,
> 
> 1. remove devconf per device (ipv4 and ipv6)
>    - lightweight devices use the default settings rather than replicate
>      the same data for each device
> 
> 2. reduce / remove / opt out of snmp mibs
>    - snmp6_alloc_dev and icmpv6msg_mib_device specifically is a heavy
>      hitter
> 
> Patches can also be found here:
>     https://github.com/dsahern/linux lwt-dev-rfc
> 
> And iproute2 here:
>     https://github.com/dsahern/iproute2 lwt-dev
> 
> Example:
>     ip li add foo lwd type vrf table 123
> 
> - creates VRF device 'foo' as a lightweight netdevice.

This is really looking nice, thanks for posting this patch series! The
only submission wide comment I have is that the flag is named
IFF_LWT_NETDEV whereas the helper that checks for it is named
netif_is_lwd() so we should reconcile the two. Since there is an
existing lightweight tunnel infrastructure already, maybe using
IFF_LWD_NETDEV (or just IFF_LWD) would be good enough here?

> 
> 
> [1] http://www.netdevconf.org/1.1/proceedings/slides/ahern-aleksandrov-prabhu-scaling-network-cumulus.pdf
> [2] https://www.spinics.net/lists/netdev/msg340808.html
> David Ahern (6):
>   net: Add accessor for kboject in a net_device
>   net: Add flags argument to alloc_netdev_mqs
>   net: Introduce IFF_LWT_NETDEV flag
>   net: Do not intialize kobject for lightweight netdevs
>   net: Delay initializations for lightweight devices
>   net: add uapi for creating lightweight devices
> 
>  drivers/net/ethernet/mellanox/mlx5/core/ipoib.c |  2 +-
>  drivers/net/ethernet/tile/tilegx.c              |  2 +-
>  drivers/net/tun.c                               |  2 +-
>  drivers/net/wireless/marvell/mwifiex/cfg80211.c |  2 +-
>  include/linux/netdevice.h                       | 27 ++++++++--
>  include/uapi/linux/if_link.h                    |  1 +
>  net/batman-adv/sysfs.c                          | 13 ++++-
>  net/bridge/br_if.c                              | 12 +++--
>  net/bridge/br_sysfs_br.c                        | 17 +++---
>  net/bridge/br_sysfs_if.c                        |  8 ++-
>  net/core/dev.c                                  | 71 ++++++++++++++++++-------
>  net/core/neighbour.c                            |  3 ++
>  net/core/net-sysfs.c                            | 25 ++++++---
>  net/core/rtnetlink.c                            | 10 +++-
>  net/ethernet/eth.c                              |  2 +-
>  net/ipv4/devinet.c                              | 18 ++++++-
>  net/ipv6/addrconf.c                             |  9 ++++
>  net/mac80211/iface.c                            |  2 +-
>  net/mpls/af_mpls.c                              |  6 +++
>  net/wireless/core.c                             | 15 ++++--
>  20 files changed, 190 insertions(+), 57 deletions(-)
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH v1 0/4] stmmac: pci: Fix crash on Intel Galileo Gen2
From: Jan Kiszka @ 2017-05-08 17:34 UTC (permalink / raw)
  To: Andy Shevchenko, Joao Pinto, netdev, Giuseppe CAVALLARO,
	David S. Miller
In-Reply-To: <20170508141422.39612-1-andriy.shevchenko@linux.intel.com>

On 2017-05-08 16:14, Andy Shevchenko wrote:
> Due to misconfiguration of PCI driver for Intel Quark the user will get
> a kernel crash:
> 
> # udhcpc -i eth0
> udhcpc: started, v1.26.2
> stmmaceth 0000:00:14.6 eth0: device MAC address 98:4f:ee:05:ac:47
> Generic PHY stmmac-a6:01: attached PHY driver [Generic PHY] (mii_bus:phy_addr=stmmac-a6:01, irq=-1)
> stmmaceth 0000:00:14.6 eth0: IEEE 1588-2008 Advanced Timestamp supported
> stmmaceth 0000:00:14.6 eth0: registered PTP clock
> IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> 
> udhcpc: sending discover
> 
> stmmaceth 0000:00:14.6 eth0: Link is Up - 100Mbps/Full - flow control off
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: stmmac_xmit+0xf1/0x1080
> 
> Fix this by adding necessary settings.
> 
> P.S. I split fix to three patches according to what each of them adds.
> 
> Andy Shevchenko (4):
>   stmmac: pci: set default number of rx and tx queues
>   stmmac: pci: TX and RX queue priority configuration
>   stmmac: pci: RX queue routing configuration
>   stmmac: pci: split out common_default_data() helper
> 
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 41 +++++++++++-------------
>  1 file changed, 18 insertions(+), 23 deletions(-)
> 

Tested-by: Jan Kiszka <jan.kiszka@siemens.com>

All fine again, thanks!

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices
From: Florian Fainelli @ 2017-05-08 17:31 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: roopa, nicolas.dichtel
In-Reply-To: <20170506160734.47084-6-dsahern@gmail.com>

On 05/06/2017 09:07 AM, David Ahern wrote:
> Delay ipv4 and ipv6 initializations on lightweight netdevices until an
> address is added to the device.
> 
> Skip sysctl initialization for neighbor path as well.

Yeah, thanks for including the sysctl initialization. One thing that my
earlier "L2 only" attempt attempted to solve as well, was to put the
IFF_NOIPV4 and IFF_NOIPV6 flags as volatile. In case you changed your
mind and ended-up needing IP stacks to be initialized, this ought to be
possible at some point. I did not get to test that part though.

AFAIR, some peculiar devices like 6lowpan (and to some extent the larger
802.15.4 family) may want to be IPv6 exclusively. This means we may have
a bit of overlap with flags like IFF_NOARP, (the proposed IFF_NOIPV6
before) and IFF_LWT_NETDEV.

Thanks!
-- 
Florian

^ permalink raw reply

* Re: bpf pointer alignment validation
From: Alexei Starovoitov @ 2017-05-08 17:30 UTC (permalink / raw)
  To: David Miller; +Cc: ast, daniel, netdev
In-Reply-To: <20170505.224709.1156323937148435706.davem@davemloft.net>

On Fri, May 05, 2017 at 10:47:09PM -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Fri, 05 May 2017 16:20:44 -0400 (EDT)
> 
> > Anyways, I'll play with this design and see what happens...
> > Feedback is of course welcome.
> 
> Here is a prototype that works for me with test_pkt_access.c,
> which otherwise won't load on sparc.

the approach looks good.

> +static u32 calc_align(u32 imm)
> +{
> +	u32 align = 1;
> +
> +	if (!imm)
> +		return 1U << 31;
> +
> +	while (!(imm & 1)) {
> +		imm >>= 1;
> +		align <<= 1;
> +	}
> +	return align;
> +}

who about
static u32 calc_align(u32 n)
{
	if (!n)
		return 1U << 31;
        return n - ((n - 1) & n);
}
instead?

^ permalink raw reply

* [PATCH] qca_debug: Reduce function calls for sequence output in qcaspi_info_show()
From: SF Markus Elfring @ 2017-05-08 17:29 UTC (permalink / raw)
  To: netdev, David S. Miller, Philippe Reynes, Stefan Wahren
  Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 8 May 2017 19:21:27 +0200

A bit of data was put into a sequence by separate function calls.
Print the same data together with adjusted seq_printf() calls instead.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/qualcomm/qca_debug.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_debug.c b/drivers/net/ethernet/qualcomm/qca_debug.c
index d145df98feff..27b5c1dd27d0 100644
--- a/drivers/net/ethernet/qualcomm/qca_debug.c
+++ b/drivers/net/ethernet/qualcomm/qca_debug.c
@@ -81,11 +81,7 @@ qcaspi_info_show(struct seq_file *s, void *what)
 	else
 		seq_puts(s, "in use");
 
-	seq_puts(s, "\n");
-
-	seq_printf(s, "TX ring size     : %u\n",
-		   qca->txr.size);
-
+	seq_printf(s, "\nTX ring size     : %u\n", qca->txr.size);
 	seq_printf(s, "Sync state       : %u (",
 		   (unsigned int)qca->sync);
 	switch (qca->sync) {
@@ -102,10 +98,8 @@ qcaspi_info_show(struct seq_file *s, void *what)
 		seq_puts(s, "INVALID");
 		break;
 	}
-	seq_puts(s, ")\n");
 
-	seq_printf(s, "IRQ              : %d\n",
-		   qca->spi_dev->irq);
+	seq_printf(s, ")\nIRQ              : %d\n", qca->spi_dev->irq);
 	seq_printf(s, "INTR REQ         : %u\n",
 		   qca->intr_req);
 	seq_printf(s, "INTR SVC         : %u\n",
-- 
2.12.2

^ permalink raw reply related

* Re: [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs
From: Florian Fainelli @ 2017-05-08 17:26 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: roopa, nicolas.dichtel
In-Reply-To: <20170506160734.47084-5-dsahern@gmail.com>

On 05/06/2017 09:07 AM, David Ahern wrote:
> Lightweight netdevices are not added to sysfs; bypass kobject
> initialization.

I was wondering if we actually needed a flag to tell: this is a
lightweight device, but still let it show up in /sys. All use cases that
I have in mind (getting the physical port name etc. etc) can be done via
netlink which is not restricted even with LWT devices, so this sounds
reasonable. In case we need to revisit that, we can always add more
flags to control the lightweight devices creation and how this
percolates through the networking stack.

Thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH net] bpf: don't let ldimm64 leak map addresses on unprivileged
From: Alexei Starovoitov @ 2017-05-08 17:18 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, jannh, kafai, netdev
In-Reply-To: <793c517a7d163c613ab886eb02d32efea9f902fd.1494194233.git.daniel@iogearbox.net>

On Mon, May 08, 2017 at 12:04:09AM +0200, Daniel Borkmann wrote:
> The patch fixes two things at once:
> 
> 1) It checks the env->allow_ptr_leaks and only prints the map address to
>    the log if we have the privileges to do so, otherwise it just dumps 0
>    as we would when kptr_restrict is enabled on %pK. Given the latter is
>    off by default and not every distro sets it, I don't want to rely on
>    this, hence the 0 by default for unprivileged.
> 
> 2) Printing of ldimm64 in the verifier log is currently broken in that
>    we don't print the full immediate, but only the 32 bit part of the
>    first insn part for ldimm64. Thus, fix this up as well; it's okay to
>    access, since we verified all ldimm64 earlier already (including just
>    constants) through replace_map_fd_with_map_ptr().
> 
> Fixes: cbd357008604 ("bpf: verifier (add ability to receive verification log)")
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

thanks for the fix!

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* [Patch net v3] ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
From: Cong Wang @ 2017-05-08 17:12 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, dsahern, Cong Wang

For each netns (except init_net), we initialize its null entry
in 3 places:

1) The template itself, as we use kmemdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
   loopback registers

Unfortunately the last one still happens in a wrong order because
we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
net->loopback_dev's idev, thus we have to do that after we add
idev to loopback. However, this notifier has priority == 0 same as
ipv6_dev_notf, and ipv6_dev_notf is registered after
ip6_route_dev_notifier so it is called actually after
ip6_route_dev_notifier. This is similar to commit 2f460933f58e
("ipv6: initialize route null entry in addrconf_init()") which
fixes init_net.

Fix it by picking a smaller priority for ip6_route_dev_notifier.
Also, we have to release the refcnt accordingly when unregistering
loopback_dev because device exit functions are called before subsys
exit functions.

Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/addrconf.h |  2 ++
 net/ipv6/addrconf.c    |  1 +
 net/ipv6/route.c       | 13 +++++++++++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 2452e64..b43a4ee 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -20,6 +20,8 @@
 #define ADDRCONF_TIMER_FUZZ		(HZ / 4)
 #define ADDRCONF_TIMER_FUZZ_MAX		(HZ)
 
+#define ADDRCONF_NOTIFY_PRIORITY	0
+
 #include <linux/in.h>
 #include <linux/in6.h>
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 77a4bd5..8d297a7 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3548,6 +3548,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
  */
 static struct notifier_block ipv6_dev_notf = {
 	.notifier_call = addrconf_notify,
+	.priority = ADDRCONF_NOTIFY_PRIORITY,
 };
 
 static void addrconf_type_change(struct net_device *dev, unsigned long event)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2f11366..dc61b0b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3709,7 +3709,10 @@ static int ip6_route_dev_notify(struct notifier_block *this,
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	struct net *net = dev_net(dev);
 
-	if (event == NETDEV_REGISTER && (dev->flags & IFF_LOOPBACK)) {
+	if (!(dev->flags & IFF_LOOPBACK))
+		return NOTIFY_OK;
+
+	if (event == NETDEV_REGISTER) {
 		net->ipv6.ip6_null_entry->dst.dev = dev;
 		net->ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(dev);
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
@@ -3718,6 +3721,12 @@ static int ip6_route_dev_notify(struct notifier_block *this,
 		net->ipv6.ip6_blk_hole_entry->dst.dev = dev;
 		net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev);
 #endif
+	 } else if (event == NETDEV_UNREGISTER) {
+		in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev);
+#ifdef CONFIG_IPV6_MULTIPLE_TABLES
+		in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev);
+		in6_dev_put(net->ipv6.ip6_blk_hole_entry->rt6i_idev);
+#endif
 	}
 
 	return NOTIFY_OK;
@@ -4024,7 +4033,7 @@ static struct pernet_operations ip6_route_net_late_ops = {
 
 static struct notifier_block ip6_route_dev_notifier = {
 	.notifier_call = ip6_route_dev_notify,
-	.priority = 0,
+	.priority = ADDRCONF_NOTIFY_PRIORITY - 10,
 };
 
 void __init ip6_route_init_special_entries(void)
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH] sky2: Use seq_putc() in sky2_debug_show()
From: Lino Sanfilippo @ 2017-05-08 17:08 UTC (permalink / raw)
  To: SF Markus Elfring, netdev, Mirko Lindner, Stephen Hemminger
  Cc: LKML, kernel-janitors
In-Reply-To: <1f36d45e-3ca9-bfb8-0634-901c42390e2c@users.sourceforge.net>

Hi,

On 08.05.2017 18:42, SF Markus Elfring wrote:
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Mon, 8 May 2017 18:38:17 +0200
> 
> A single character (line break) should be put into a sequence.

Why?

> Thus use the corresponding function "seq_putc".
> 
> This issue was detected by using the Coccinelle software.

Which issue do you mean? I dont see any issue you fix here.

> 
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
> ---
>  drivers/net/ethernet/marvell/sky2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
> index 1145cde2274a..73575101cd72 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -4562,7 +4562,7 @@ static int sky2_debug_show(struct seq_file *seq, void *v)
>  			seq_printf(seq, "[%d] %#x %d %#x\n",
>  				   idx, le->opcode, le->length, le->status);
>  		}
> -		seq_puts(seq, "\n");
> +		seq_putc(seq, '\n');
>  	}
>  
>  	seq_printf(seq, "Tx ring pending=%u...%u report=%d done=%d\n",
> 

Regards,
Lino

^ permalink raw reply

* [PATCH] drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison
From: Karim Eshapa @ 2017-05-08 16:58 UTC (permalink / raw)
  To: inaky.perez-gonzalez; +Cc: linux-wimax, netdev, linux-kernel, Karim Eshapa

cast timeframe variable with (unsigned long) then
use time_after() kernel macro for time comparison.

Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
---
 drivers/net/wimax/i2400m/i2400m-usb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wimax/i2400m/i2400m-usb.h b/drivers/net/wimax/i2400m/i2400m-usb.h
index 649ecad..6fc941c 100644
--- a/drivers/net/wimax/i2400m/i2400m-usb.h
+++ b/drivers/net/wimax/i2400m/i2400m-usb.h
@@ -131,7 +131,7 @@ static inline int edc_inc(struct edc *edc, u16 max_err, u16 timeframe)
 	unsigned long now;
 
 	now = jiffies;
-	if (now - edc->timestart > timeframe) {
+	if (time_after(now - edc->timestart, (unsigned long)timeframe)) {
 		edc->errorcount = 1;
 		edc->timestart = now;
 	} else if (++edc->errorcount > max_err) {
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: David Miller @ 2017-05-08 16:53 UTC (permalink / raw)
  To: Pavel.Belous; +Cc: netdev, darcari, LinoSanfilippo, simon.edelhaus
In-Reply-To: <9ea5a2ba21b4f8b22a63acf39135a41e2f1da23c.1493928470.git.pavel.belous@aquantia.com>

From: Pavel Belous <Pavel.Belous@aquantia.com>
Date: Thu,  4 May 2017 23:10:56 +0300

> From: Pavel Belous <pavel.belous@aquantia.com>
> 
> This patch fixes the crash that happens when driver tries to collect statistics
> from already released "aq_vec" object.
> If adapter is in "down" state we still allow user to see statistics from HW.
> 
> V2: fixed braces around "aq_vec_free".
> 
> Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>

Applied, thanks.

^ permalink raw reply

* [PATCH RFC] net: Fix inconsistent teardown and release of private netdev state.
From: David Miller @ 2017-05-08 16:52 UTC (permalink / raw)
  To: netdev


Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2be7880..d507b68 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4185,7 +4185,6 @@ static void bond_destructor(struct net_device *bond_dev)
 	struct bonding *bond = netdev_priv(bond_dev);
 	if (bond->wq)
 		destroy_workqueue(bond->wq);
-	free_netdev(bond_dev);
 }
 
 void bond_setup(struct net_device *bond_dev)
@@ -4205,7 +4204,8 @@ void bond_setup(struct net_device *bond_dev)
 	bond_dev->netdev_ops = &bond_netdev_ops;
 	bond_dev->ethtool_ops = &bond_ethtool_ops;
 
-	bond_dev->destructor = bond_destructor;
+	bond_dev->needs_free_netdev = true;
+	bond_dev->priv_destructor = bond_destructor;
 
 	SET_NETDEV_DEVTYPE(bond_dev, &bond_type);
 
@@ -4730,7 +4730,7 @@ int bond_create(struct net *net, const char *name)
 
 	rtnl_unlock();
 	if (res < 0)
-		bond_destructor(bond_dev);
+		free_netdev(bond_dev);
 	return res;
 }
 
diff --git a/drivers/net/caif/caif_hsi.c b/drivers/net/caif/caif_hsi.c
index ddabce7..71a7c3b 100644
--- a/drivers/net/caif/caif_hsi.c
+++ b/drivers/net/caif/caif_hsi.c
@@ -1121,7 +1121,7 @@ static void cfhsi_setup(struct net_device *dev)
 	dev->flags = IFF_POINTOPOINT | IFF_NOARP;
 	dev->mtu = CFHSI_MAX_CAIF_FRAME_SZ;
 	dev->priv_flags |= IFF_NO_QUEUE;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	dev->netdev_ops = &cfhsi_netdevops;
 	for (i = 0; i < CFHSI_PRIO_LAST; ++i)
 		skb_queue_head_init(&cfhsi->qhead[i]);
diff --git a/drivers/net/caif/caif_serial.c b/drivers/net/caif/caif_serial.c
index c2dea49..76e1d35 100644
--- a/drivers/net/caif/caif_serial.c
+++ b/drivers/net/caif/caif_serial.c
@@ -428,7 +428,7 @@ static void caifdev_setup(struct net_device *dev)
 	dev->flags = IFF_POINTOPOINT | IFF_NOARP;
 	dev->mtu = CAIF_MAX_MTU;
 	dev->priv_flags |= IFF_NO_QUEUE;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	skb_queue_head_init(&serdev->head);
 	serdev->common.link_select = CAIF_LINK_LOW_LATENCY;
 	serdev->common.use_frag = true;
diff --git a/drivers/net/caif/caif_spi.c b/drivers/net/caif/caif_spi.c
index 3a529fb..fc21afe 100644
--- a/drivers/net/caif/caif_spi.c
+++ b/drivers/net/caif/caif_spi.c
@@ -712,7 +712,7 @@ static void cfspi_setup(struct net_device *dev)
 	dev->flags = IFF_NOARP | IFF_POINTOPOINT;
 	dev->priv_flags |= IFF_NO_QUEUE;
 	dev->mtu = SPI_MAX_PAYLOAD_SIZE;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	skb_queue_head_init(&cfspi->qhead);
 	skb_queue_head_init(&cfspi->chead);
 	cfspi->cfdev.link_select = CAIF_LINK_HIGH_BANDW;
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index bc0eb47..8bffd25 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -617,7 +617,7 @@ static void cfv_netdev_setup(struct net_device *netdev)
 	netdev->tx_queue_len = 100;
 	netdev->flags = IFF_POINTOPOINT | IFF_NOARP;
 	netdev->mtu = CFV_DEF_MTU_SIZE;
-	netdev->destructor = free_netdev;
+	netdev->needs_free_netdev = true;
 }
 
 /* Create debugfs counters for the device */
diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c
index eb71737..6a6e896 100644
--- a/drivers/net/can/slcan.c
+++ b/drivers/net/can/slcan.c
@@ -417,7 +417,7 @@ static int slc_open(struct net_device *dev)
 static void slc_free_netdev(struct net_device *dev)
 {
 	int i = dev->base_addr;
-	free_netdev(dev);
+
 	slcan_devs[i] = NULL;
 }
 
@@ -436,7 +436,8 @@ static const struct net_device_ops slc_netdev_ops = {
 static void slc_setup(struct net_device *dev)
 {
 	dev->netdev_ops		= &slc_netdev_ops;
-	dev->destructor		= slc_free_netdev;
+	dev->needs_free_netdev	= true;
+	dev->priv_destructor	= slc_free_netdev;
 
 	dev->hard_header_len	= 0;
 	dev->addr_len		= 0;
@@ -761,8 +762,6 @@ static void __exit slcan_exit(void)
 		if (sl->tty) {
 			printk(KERN_ERR "%s: tty discipline still running\n",
 			       dev->name);
-			/* Intentionally leak the control block. */
-			dev->destructor = NULL;
 		}
 
 		unregister_netdev(dev);
diff --git a/drivers/net/can/vcan.c b/drivers/net/can/vcan.c
index facca33..0eda1b3 100644
--- a/drivers/net/can/vcan.c
+++ b/drivers/net/can/vcan.c
@@ -163,7 +163,7 @@ static void vcan_setup(struct net_device *dev)
 		dev->flags |= IFF_ECHO;
 
 	dev->netdev_ops		= &vcan_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 }
 
 static struct rtnl_link_ops vcan_link_ops __read_mostly = {
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 7fbb247..30cf236 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -156,7 +156,7 @@ static void vxcan_setup(struct net_device *dev)
 	dev->tx_queue_len	= 0;
 	dev->flags		= (IFF_NOARP|IFF_ECHO);
 	dev->netdev_ops		= &vxcan_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 }
 
 /* forward declaration for rtnl_create_link() */
diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 149244a..9905b52 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -328,7 +328,6 @@ static void dummy_free_netdev(struct net_device *dev)
 	struct dummy_priv *priv = netdev_priv(dev);
 
 	kfree(priv->vfinfo);
-	free_netdev(dev);
 }
 
 static void dummy_setup(struct net_device *dev)
@@ -338,7 +337,8 @@ static void dummy_setup(struct net_device *dev)
 	/* Initialize the device structure. */
 	dev->netdev_ops = &dummy_netdev_ops;
 	dev->ethtool_ops = &dummy_ethtool_ops;
-	dev->destructor = dummy_free_netdev;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = dummy_free_netdev;
 
 	/* Fill in device structure with ethernet-generic values. */
 	dev->flags |= IFF_NOARP;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index c12c4a3..2649a06e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4537,7 +4537,7 @@ static void dummy_setup(struct net_device *dev)
 	/* Initialize the device structure. */
 	dev->netdev_ops = &cxgb4_mgmt_netdev_ops;
 	dev->ethtool_ops = &cxgb4_mgmt_ethtool_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 }
 
 static int config_mgmt_dev(struct pci_dev *pdev)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index dec5d56..4d44c02 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1007,7 +1007,7 @@ static void geneve_setup(struct net_device *dev)
 
 	dev->netdev_ops = &geneve_netdev_ops;
 	dev->ethtool_ops = &geneve_ethtool_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 
 	SET_NETDEV_DEVTYPE(dev, &geneve_type);
 
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 4fea1b3..abb82ff 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -611,7 +611,7 @@ static const struct net_device_ops gtp_netdev_ops = {
 static void gtp_link_setup(struct net_device *dev)
 {
 	dev->netdev_ops		= &gtp_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 
 	dev->hard_header_len = 0;
 	dev->addr_len = 0;
diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c
index 922bf44..021a8ec 100644
--- a/drivers/net/hamradio/6pack.c
+++ b/drivers/net/hamradio/6pack.c
@@ -311,7 +311,7 @@ static void sp_setup(struct net_device *dev)
 {
 	/* Finish setting up the DEVICE info. */
 	dev->netdev_ops		= &sp_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 	dev->mtu		= SIXP_MTU;
 	dev->hard_header_len	= AX25_MAX_HEADER_LEN;
 	dev->header_ops 	= &ax25_header_ops;
diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index f62e7f3..78a6414 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -476,7 +476,7 @@ static const struct net_device_ops bpq_netdev_ops = {
 static void bpq_setup(struct net_device *dev)
 {
 	dev->netdev_ops	     = &bpq_netdev_ops;
-	dev->destructor	     = free_netdev;
+	dev->needs_free_netdev = true;
 
 	memcpy(dev->broadcast, &ax25_bcast, AX25_ADDR_LEN);
 	memcpy(dev->dev_addr,  &ax25_defaddr, AX25_ADDR_LEN);
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 312fce7..144ea5a 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -207,7 +207,6 @@ static void ifb_dev_free(struct net_device *dev)
 		__skb_queue_purge(&txp->tq);
 	}
 	kfree(dp->tx_private);
-	free_netdev(dev);
 }
 
 static void ifb_setup(struct net_device *dev)
@@ -230,7 +229,8 @@ static void ifb_setup(struct net_device *dev)
 	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	netif_keep_dst(dev);
 	eth_hw_addr_random(dev);
-	dev->destructor = ifb_dev_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = ifb_dev_free;
 }
 
 static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 618ed88..7c7680c 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -632,7 +632,7 @@ void ipvlan_link_setup(struct net_device *dev)
 	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
 	dev->netdev_ops = &ipvlan_netdev_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	dev->header_ops = &ipvlan_header_ops;
 	dev->ethtool_ops = &ipvlan_ethtool_ops;
 }
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 224f65c..3061249 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -159,7 +159,6 @@ static void loopback_dev_free(struct net_device *dev)
 {
 	dev_net(dev)->loopback_dev = NULL;
 	free_percpu(dev->lstats);
-	free_netdev(dev);
 }
 
 static const struct net_device_ops loopback_ops = {
@@ -196,7 +195,8 @@ static void loopback_setup(struct net_device *dev)
 	dev->ethtool_ops	= &loopback_ethtool_ops;
 	dev->header_ops		= &eth_header_ops;
 	dev->netdev_ops		= &loopback_ops;
-	dev->destructor		= loopback_dev_free;
+	dev->needs_free_netdev	= true;
+	dev->priv_destructor	= loopback_dev_free;
 }
 
 /* Setup and register the loopback device. */
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index cdc347b..7941167 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -2996,7 +2996,6 @@ static void macsec_free_netdev(struct net_device *dev)
 	free_percpu(macsec->secy.tx_sc.stats);
 
 	dev_put(real_dev);
-	free_netdev(dev);
 }
 
 static void macsec_setup(struct net_device *dev)
@@ -3006,7 +3005,8 @@ static void macsec_setup(struct net_device *dev)
 	dev->max_mtu = ETH_MAX_MTU;
 	dev->priv_flags |= IFF_NO_QUEUE;
 	dev->netdev_ops = &macsec_netdev_ops;
-	dev->destructor = macsec_free_netdev;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = macsec_free_netdev;
 	SET_NETDEV_DEVTYPE(dev, &macsec_type);
 
 	eth_zero_addr(dev->broadcast);
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index b34eaaa..b8cec52 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1089,7 +1089,7 @@ void macvlan_common_setup(struct net_device *dev)
 	netif_keep_dst(dev);
 	dev->priv_flags	       |= IFF_UNICAST_FLT;
 	dev->netdev_ops		= &macvlan_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 	dev->header_ops		= &macvlan_hard_header_ops;
 	dev->ethtool_ops	= &macvlan_ethtool_ops;
 }
diff --git a/drivers/net/nlmon.c b/drivers/net/nlmon.c
index b916038..c4b3362 100644
--- a/drivers/net/nlmon.c
+++ b/drivers/net/nlmon.c
@@ -113,7 +113,7 @@ static void nlmon_setup(struct net_device *dev)
 
 	dev->netdev_ops	= &nlmon_ops;
 	dev->ethtool_ops = &nlmon_ethtool_ops;
-	dev->destructor	= free_netdev;
+	dev->needs_free_netdev = true;
 
 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST |
 			NETIF_F_HIGHDMA | NETIF_F_LLTX;
diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index 1da31dc..74b9072 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -629,7 +629,7 @@ static void sl_uninit(struct net_device *dev)
 static void sl_free_netdev(struct net_device *dev)
 {
 	int i = dev->base_addr;
-	free_netdev(dev);
+
 	slip_devs[i] = NULL;
 }
 
@@ -651,7 +651,8 @@ static const struct net_device_ops sl_netdev_ops = {
 static void sl_setup(struct net_device *dev)
 {
 	dev->netdev_ops		= &sl_netdev_ops;
-	dev->destructor		= sl_free_netdev;
+	dev->needs_free_netdev	= true;
+	dev->priv_destructor	= sl_free_netdev;
 
 	dev->hard_header_len	= 0;
 	dev->addr_len		= 0;
@@ -1369,8 +1370,6 @@ static void __exit slip_exit(void)
 		if (sl->tty) {
 			printk(KERN_ERR "%s: tty discipline still running\n",
 			       dev->name);
-			/* Intentionally leak the control block. */
-			dev->destructor = NULL;
 		}
 
 		unregister_netdev(dev);
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 6c5d5ef..fba8c13 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1643,7 +1643,6 @@ static void team_destructor(struct net_device *dev)
 	struct team *team = netdev_priv(dev);
 
 	free_percpu(team->pcpu_stats);
-	free_netdev(dev);
 }
 
 static int team_open(struct net_device *dev)
@@ -2079,7 +2078,8 @@ static void team_setup(struct net_device *dev)
 
 	dev->netdev_ops = &team_netdev_ops;
 	dev->ethtool_ops = &team_ethtool_ops;
-	dev->destructor	= team_destructor;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = team_destructor;
 	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	dev->priv_flags |= IFF_NO_QUEUE;
 	dev->priv_flags |= IFF_TEAM;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bbd707b..9ee7d42 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1560,7 +1560,6 @@ static void tun_free_netdev(struct net_device *dev)
 	free_percpu(tun->pcpu_stats);
 	tun_flow_uninit(tun);
 	security_tun_dev_free_security(tun->security);
-	free_netdev(dev);
 }
 
 static void tun_setup(struct net_device *dev)
@@ -1571,7 +1570,8 @@ static void tun_setup(struct net_device *dev)
 	tun->group = INVALID_GID;
 
 	dev->ethtool_ops = &tun_ethtool_ops;
-	dev->destructor = tun_free_netdev;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = tun_free_netdev;
 	/* We prefer our own queue length */
 	dev->tx_queue_len = TUN_READQ_SIZE;
 }
diff --git a/drivers/net/usb/cdc-phonet.c b/drivers/net/usb/cdc-phonet.c
index eb52de8..c7a350b 100644
--- a/drivers/net/usb/cdc-phonet.c
+++ b/drivers/net/usb/cdc-phonet.c
@@ -298,7 +298,7 @@ static void usbpn_setup(struct net_device *dev)
 	dev->addr_len		= 1;
 	dev->tx_queue_len	= 3;
 
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 }
 
 /*
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index d716576..e76f988 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -123,7 +123,7 @@ static void qmimux_setup(struct net_device *dev)
 	dev->addr_len        = 0;
 	dev->flags           = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
 	dev->netdev_ops      = &qmimux_netdev_ops;
-	dev->destructor      = free_netdev;
+	dev->needs_free_netdev = true;
 }
 
 static struct net_device *qmimux_find_dev(struct usbnet *dev, u8 mux_id)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 38f0f03..0156fe8 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -222,7 +222,6 @@ static int veth_dev_init(struct net_device *dev)
 static void veth_dev_free(struct net_device *dev)
 {
 	free_percpu(dev->vstats);
-	free_netdev(dev);
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -317,7 +316,8 @@ static void veth_setup(struct net_device *dev)
 			       NETIF_F_HW_VLAN_STAG_TX |
 			       NETIF_F_HW_VLAN_CTAG_RX |
 			       NETIF_F_HW_VLAN_STAG_RX);
-	dev->destructor = veth_dev_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = veth_dev_free;
 	dev->max_mtu = ETH_MAX_MTU;
 
 	dev->hw_features = VETH_FEATURES;
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index ceda586..de3dcbb 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1347,7 +1347,7 @@ static void vrf_setup(struct net_device *dev)
 	dev->netdev_ops = &vrf_netdev_ops;
 	dev->l3mdev_ops = &vrf_l3mdev_ops;
 	dev->ethtool_ops = &vrf_ethtool_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 
 	/* Fill in device structure with ethernet-generic values. */
 	eth_hw_addr_random(dev);
diff --git a/drivers/net/vsockmon.c b/drivers/net/vsockmon.c
index 7f0136f..c28bdce 100644
--- a/drivers/net/vsockmon.c
+++ b/drivers/net/vsockmon.c
@@ -135,7 +135,7 @@ static void vsockmon_setup(struct net_device *dev)
 
 	dev->netdev_ops	= &vsockmon_ops;
 	dev->ethtool_ops = &vsockmon_ethtool_ops;
-	dev->destructor	= free_netdev;
+	dev->needs_free_netdev = true;
 
 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST |
 			NETIF_F_HIGHDMA | NETIF_F_LLTX;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 328b471..927c6a9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2584,7 +2584,7 @@ static void vxlan_setup(struct net_device *dev)
 	eth_hw_addr_random(dev);
 	ether_setup(dev);
 
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	SET_NETDEV_DEVTYPE(dev, &vxlan_type);
 
 	dev->features	|= NETIF_F_LLTX;
diff --git a/drivers/net/wan/dlci.c b/drivers/net/wan/dlci.c
index 65ee2a6..a0d76f7 100644
--- a/drivers/net/wan/dlci.c
+++ b/drivers/net/wan/dlci.c
@@ -475,7 +475,7 @@ static void dlci_setup(struct net_device *dev)
 	dev->flags		= 0;
 	dev->header_ops		= &dlci_header_ops;
 	dev->netdev_ops		= &dlci_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 
 	dlp->receive		= dlci_receive;
 
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index eb91528..78596e4 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1106,7 +1106,7 @@ static int fr_add_pvc(struct net_device *frad, unsigned int dlci, int type)
 		return -EIO;
 	}
 
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	*get_dev_p(pvc, type) = dev;
 	if (!used) {
 		state(hdlc)->dce_changed = 1;
diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index 9df9ed6..63f7490 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -306,7 +306,7 @@ static const struct net_device_ops lapbeth_netdev_ops = {
 static void lapbeth_setup(struct net_device *dev)
 {
 	dev->netdev_ops	     = &lapbeth_netdev_ops;
-	dev->destructor	     = free_netdev;
+	dev->needs_free_netdev = true;
 	dev->type            = ARPHRD_X25;
 	dev->hard_header_len = 3;
 	dev->mtu             = 1000;
diff --git a/drivers/net/wireless/ath/ath6kl/main.c b/drivers/net/wireless/ath/ath6kl/main.c
index 91ee542..b90c77e 100644
--- a/drivers/net/wireless/ath/ath6kl/main.c
+++ b/drivers/net/wireless/ath/ath6kl/main.c
@@ -1287,7 +1287,7 @@ void init_netdev(struct net_device *dev)
 	struct ath6kl *ar = ath6kl_priv(dev);
 
 	dev->netdev_ops = &ath6kl_netdev_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	dev->watchdog_timeo = ATH6KL_TX_TIMEOUT;
 
 	dev->needed_headroom = ETH_HLEN;
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index cd1d673..617199c 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -5225,7 +5225,6 @@ void brcmf_cfg80211_free_netdev(struct net_device *ndev)
 
 	if (vif)
 		brcmf_free_vif(vif);
-	free_netdev(ndev);
 }
 
 static bool brcmf_is_linkup(const struct brcmf_event_msg *e)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index a3d8236..511d190 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -624,7 +624,8 @@ struct brcmf_if *brcmf_add_if(struct brcmf_pub *drvr, s32 bsscfgidx, s32 ifidx,
 		if (!ndev)
 			return ERR_PTR(-ENOMEM);
 
-		ndev->destructor = brcmf_cfg80211_free_netdev;
+		ndev->needs_free_netdev = true;
+		ndev->priv_destructor = brcmf_cfg80211_free_netdev;
 		ifp = netdev_priv(ndev);
 		ifp->ndev = ndev;
 		/* store mapping ifidx to bsscfgidx */
diff --git a/drivers/net/wireless/intersil/hostap/hostap_main.c b/drivers/net/wireless/intersil/hostap/hostap_main.c
index 544fc09..1372b20 100644
--- a/drivers/net/wireless/intersil/hostap/hostap_main.c
+++ b/drivers/net/wireless/intersil/hostap/hostap_main.c
@@ -73,7 +73,7 @@ struct net_device * hostap_add_interface(struct local_info *local,
 	dev->mem_end = mdev->mem_end;
 
 	hostap_setup_dev(dev, local, type);
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 
 	sprintf(dev->name, "%s%s", prefix, name);
 	if (!rtnl_locked)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 87444af..a659104 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -2855,7 +2855,7 @@ static const struct net_device_ops hwsim_netdev_ops = {
 static void hwsim_mon_setup(struct net_device *dev)
 {
 	dev->netdev_ops = &hwsim_netdev_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	ether_setup(dev);
 	dev->priv_flags |= IFF_NO_QUEUE;
 	dev->type = ARPHRD_IEEE80211_RADIOTAP;
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index dd87b9f..39b6b5e 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1280,7 +1280,7 @@ void mwifiex_init_priv_params(struct mwifiex_private *priv,
 			      struct net_device *dev)
 {
 	dev->netdev_ops = &mwifiex_netdev_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	/* Initialize private structure */
 	priv->current_key_index = 0;
 	priv->media_connected = false;
diff --git a/drivers/staging/rtl8188eu/os_dep/mon.c b/drivers/staging/rtl8188eu/os_dep/mon.c
index cfe37eb..859d0d6 100644
--- a/drivers/staging/rtl8188eu/os_dep/mon.c
+++ b/drivers/staging/rtl8188eu/os_dep/mon.c
@@ -152,7 +152,7 @@ static const struct net_device_ops mon_netdev_ops = {
 static void mon_setup(struct net_device *dev)
 {
 	dev->netdev_ops = &mon_netdev_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	ether_setup(dev);
 	dev->priv_flags |= IFF_NO_QUEUE;
 	dev->type = ARPHRD_IEEE80211;
diff --git a/drivers/usb/gadget/function/f_phonet.c b/drivers/usb/gadget/function/f_phonet.c
index b4058f0..6a1ce6a 100644
--- a/drivers/usb/gadget/function/f_phonet.c
+++ b/drivers/usb/gadget/function/f_phonet.c
@@ -281,7 +281,7 @@ static void pn_net_setup(struct net_device *dev)
 	dev->tx_queue_len	= 1;
 
 	dev->netdev_ops		= &pn_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 	dev->header_ops		= &phonet_header_ops;
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9c23bd2..2ead32a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1596,8 +1596,8 @@ enum netdev_priv_flags {
  *	@rtnl_link_state:	This enum represents the phases of creating
  *				a new link
  *
- *	@destructor:		Called from unregister,
- *				can be used to call free_netdev
+ *	@needs_free_netdev:	Should unregister perform free_netdev?
+ *	@priv_destructor:	Called from unregister
  *	@npinfo:		XXX: need comments on this one
  * 	@nd_net:		Network namespace this network device is inside
  *
@@ -1858,7 +1858,8 @@ struct net_device {
 		RTNL_LINK_INITIALIZING,
 	} rtnl_link_state:16;
 
-	void (*destructor)(struct net_device *dev);
+	bool needs_free_netdev;
+	void (*priv_destructor)(struct net_device *dev);
 
 #ifdef CONFIG_NETPOLL
 	struct netpoll_info __rcu	*npinfo;
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 9ee5787..33885dc 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -806,7 +806,6 @@ static void vlan_dev_free(struct net_device *dev)
 
 	free_percpu(vlan->vlan_pcpu_stats);
 	vlan->vlan_pcpu_stats = NULL;
-	free_netdev(dev);
 }
 
 void vlan_setup(struct net_device *dev)
@@ -819,7 +818,8 @@ void vlan_setup(struct net_device *dev)
 	netif_keep_dst(dev);
 
 	dev->netdev_ops		= &vlan_netdev_ops;
-	dev->destructor		= vlan_dev_free;
+	dev->needs_free_netdev	= true;
+	dev->priv_destructor	= vlan_dev_free;
 	dev->ethtool_ops	= &vlan_ethtool_ops;
 
 	dev->min_mtu		= 0;
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index b25789a..10f7edf 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -1034,8 +1034,6 @@ static void batadv_softif_free(struct net_device *dev)
 	 * netdev and its private data (bat_priv)
 	 */
 	rcu_barrier();
-
-	free_netdev(dev);
 }
 
 /**
@@ -1047,7 +1045,8 @@ static void batadv_softif_init_early(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->netdev_ops = &batadv_netdev_ops;
-	dev->destructor = batadv_softif_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = batadv_softif_free;
 	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_NETNS_LOCAL;
 	dev->priv_flags |= IFF_NO_QUEUE;
 
diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c
index 6089599..ab3b654 100644
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -598,7 +598,7 @@ static void netdev_setup(struct net_device *dev)
 
 	dev->netdev_ops		= &netdev_ops;
 	dev->header_ops		= &header_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 }
 
 static struct device_type bt_type = {
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 430b53e..f0f3447 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -379,7 +379,7 @@ void br_dev_setup(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->netdev_ops = &br_netdev_ops;
-	dev->destructor = free_netdev;
+	dev->needs_free_netdev = true;
 	dev->ethtool_ops = &br_ethtool_ops;
 	SET_NETDEV_DEVTYPE(dev, &br_type);
 	dev->priv_flags = IFF_EBRIDGE | IFF_NO_QUEUE;
diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 1816fc9..fe3c53e 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -392,14 +392,14 @@ static void chnl_net_destructor(struct net_device *dev)
 {
 	struct chnl_net *priv = netdev_priv(dev);
 	caif_free_client(&priv->chnl);
-	free_netdev(dev);
 }
 
 static void ipcaif_net_setup(struct net_device *dev)
 {
 	struct chnl_net *priv;
 	dev->netdev_ops = &netdev_ops;
-	dev->destructor = chnl_net_destructor;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = chnl_net_destructor;
 	dev->flags |= IFF_NOARP;
 	dev->flags |= IFF_POINTOPOINT;
 	dev->mtu = GPRS_PDP_MTU;
diff --git a/net/core/dev.c b/net/core/dev.c
index d07aa5f..7507fea 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7485,6 +7485,8 @@ int register_netdevice(struct net_device *dev)
 err_uninit:
 	if (dev->netdev_ops->ndo_uninit)
 		dev->netdev_ops->ndo_uninit(dev);
+	if (dev->priv_destructor)
+		dev->priv_destructor(dev);
 	goto out;
 }
 EXPORT_SYMBOL(register_netdevice);
@@ -7692,8 +7694,10 @@ void netdev_run_todo(void)
 		WARN_ON(rcu_access_pointer(dev->ip6_ptr));
 		WARN_ON(dev->dn_ptr);
 
-		if (dev->destructor)
-			dev->destructor(dev);
+		if (dev->priv_destructor)
+			dev->priv_destructor(dev);
+		if (dev->needs_free_netdev)
+			free_netdev(dev);
 
 		/* Report a network device has been unregistered */
 		rtnl_lock();
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index c73160f..0a0a392 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -378,7 +378,6 @@ static void hsr_dev_destroy(struct net_device *hsr_dev)
 	del_timer_sync(&hsr->announce_timer);
 
 	synchronize_rcu();
-	free_netdev(hsr_dev);
 }
 
 static const struct net_device_ops hsr_device_ops = {
@@ -404,7 +403,8 @@ void hsr_dev_setup(struct net_device *dev)
 	SET_NETDEV_DEVTYPE(dev, &hsr_type);
 	dev->priv_flags |= IFF_NO_QUEUE;
 
-	dev->destructor = hsr_dev_destroy;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = hsr_dev_destroy;
 
 	dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |
 			   NETIF_F_GSO_MASK | NETIF_F_HW_CSUM |
diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
index d7efbf0..0a866f33 100644
--- a/net/ieee802154/6lowpan/core.c
+++ b/net/ieee802154/6lowpan/core.c
@@ -107,7 +107,7 @@ static void lowpan_setup(struct net_device *ldev)
 
 	ldev->netdev_ops	= &lowpan_netdev_ops;
 	ldev->header_ops	= &lowpan_header_ops;
-	ldev->destructor	= free_netdev;
+	ldev->needs_free_netdev	= true;
 	ldev->features		|= NETIF_F_NETNS_LOCAL;
 }
 
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index b878ecb..b436d07 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -967,7 +967,6 @@ static void ip_tunnel_dev_free(struct net_device *dev)
 	gro_cells_destroy(&tunnel->gro_cells);
 	dst_cache_destroy(&tunnel->dst_cache);
 	free_percpu(dev->tstats);
-	free_netdev(dev);
 }
 
 void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
@@ -1155,7 +1154,8 @@ int ip_tunnel_init(struct net_device *dev)
 	struct iphdr *iph = &tunnel->parms.iph;
 	int err;
 
-	dev->destructor	= ip_tunnel_dev_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = ip_tunnel_dev_free;
 	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
 	if (!dev->tstats)
 		return -ENOMEM;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 3a02d52..200b65a 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -501,7 +501,7 @@ static void reg_vif_setup(struct net_device *dev)
 	dev->mtu		= ETH_DATA_LEN - sizeof(struct iphdr) - 8;
 	dev->flags		= IFF_NOARP;
 	dev->netdev_ops		= &reg_vif_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 	dev->features		|= NETIF_F_NETNS_LOCAL;
 }
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 8d128ba..c8c386a 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -990,13 +990,13 @@ static void ip6gre_dev_free(struct net_device *dev)
 
 	dst_cache_destroy(&t->dst_cache);
 	free_percpu(dev->tstats);
-	free_netdev(dev);
 }
 
 static void ip6gre_tunnel_setup(struct net_device *dev)
 {
 	dev->netdev_ops = &ip6gre_netdev_ops;
-	dev->destructor = ip6gre_dev_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = ip6gre_dev_free;
 
 	dev->type = ARPHRD_IP6GRE;
 
@@ -1147,7 +1147,7 @@ static int __net_init ip6gre_init_net(struct net *net)
 	return 0;
 
 err_reg_dev:
-	ip6gre_dev_free(ign->fb_tunnel_dev);
+	free_netdev(ign->fb_tunnel_dev);
 err_alloc_dev:
 	return err;
 }
@@ -1299,7 +1299,8 @@ static void ip6gre_tap_setup(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->netdev_ops = &ip6gre_tap_netdev_ops;
-	dev->destructor = ip6gre_dev_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = ip6gre_dev_free;
 
 	dev->features |= NETIF_F_NETNS_LOCAL;
 	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 6eb2ae5..883b652 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -254,7 +254,6 @@ static void ip6_dev_free(struct net_device *dev)
 	gro_cells_destroy(&t->gro_cells);
 	dst_cache_destroy(&t->dst_cache);
 	free_percpu(dev->tstats);
-	free_netdev(dev);
 }
 
 static int ip6_tnl_create2(struct net_device *dev)
@@ -322,7 +321,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
 	return t;
 
 failed_free:
-	ip6_dev_free(dev);
+	free_netdev(dev);
 failed:
 	return ERR_PTR(err);
 }
@@ -1769,7 +1768,8 @@ static const struct net_device_ops ip6_tnl_netdev_ops = {
 static void ip6_tnl_dev_setup(struct net_device *dev)
 {
 	dev->netdev_ops = &ip6_tnl_netdev_ops;
-	dev->destructor = ip6_dev_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = ip6_dev_free;
 
 	dev->type = ARPHRD_TUNNEL6;
 	dev->flags |= IFF_NOARP;
@@ -2216,7 +2216,7 @@ static int __net_init ip6_tnl_init_net(struct net *net)
 	return 0;
 
 err_register:
-	ip6_dev_free(ip6n->fb_tnl_dev);
+	free_netdev(ip6n->fb_tnl_dev);
 err_alloc_dev:
 	return err;
 }
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d67ef56..837ea1e 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -180,7 +180,6 @@ vti6_tnl_unlink(struct vti6_net *ip6n, struct ip6_tnl *t)
 static void vti6_dev_free(struct net_device *dev)
 {
 	free_percpu(dev->tstats);
-	free_netdev(dev);
 }
 
 static int vti6_tnl_create2(struct net_device *dev)
@@ -235,7 +234,7 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
 	return t;
 
 failed_free:
-	vti6_dev_free(dev);
+	free_netdev(dev);
 failed:
 	return NULL;
 }
@@ -842,7 +841,8 @@ static const struct net_device_ops vti6_netdev_ops = {
 static void vti6_dev_setup(struct net_device *dev)
 {
 	dev->netdev_ops = &vti6_netdev_ops;
-	dev->destructor = vti6_dev_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = vti6_dev_free;
 
 	dev->type = ARPHRD_TUNNEL6;
 	dev->hard_header_len = LL_MAX_HEADER + sizeof(struct ipv6hdr);
@@ -1100,7 +1100,7 @@ static int __net_init vti6_init_net(struct net *net)
 	return 0;
 
 err_register:
-	vti6_dev_free(ip6n->fb_tnl_dev);
+	free_netdev(ip6n->fb_tnl_dev);
 err_alloc_dev:
 	return err;
 }
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 374997d..2ecb39b 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -733,7 +733,7 @@ static void reg_vif_setup(struct net_device *dev)
 	dev->mtu		= 1500 - sizeof(struct ipv6hdr) - 8;
 	dev->flags		= IFF_NOARP;
 	dev->netdev_ops		= &reg_vif_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 	dev->features		|= NETIF_F_NETNS_LOCAL;
 }
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 61e5902..2378503 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -265,7 +265,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
 	return nt;
 
 failed_free:
-	ipip6_dev_free(dev);
+	free_netdev(dev);
 failed:
 	return NULL;
 }
@@ -1336,7 +1336,6 @@ static void ipip6_dev_free(struct net_device *dev)
 
 	dst_cache_destroy(&tunnel->dst_cache);
 	free_percpu(dev->tstats);
-	free_netdev(dev);
 }
 
 #define SIT_FEATURES (NETIF_F_SG	   | \
@@ -1351,7 +1350,8 @@ static void ipip6_tunnel_setup(struct net_device *dev)
 	int t_hlen = tunnel->hlen + sizeof(struct iphdr);
 
 	dev->netdev_ops		= &ipip6_netdev_ops;
-	dev->destructor		= ipip6_dev_free;
+	dev->needs_free_netdev	= true;
+	dev->priv_destructor	= ipip6_dev_free;
 
 	dev->type		= ARPHRD_SIT;
 	dev->hard_header_len	= LL_MAX_HEADER + t_hlen;
diff --git a/net/irda/irlan/irlan_eth.c b/net/irda/irlan/irlan_eth.c
index 74d09f9..3be8528 100644
--- a/net/irda/irlan/irlan_eth.c
+++ b/net/irda/irlan/irlan_eth.c
@@ -65,7 +65,7 @@ static void irlan_eth_setup(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->netdev_ops		= &irlan_eth_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 	dev->min_mtu		= 0;
 	dev->max_mtu		= ETH_MAX_MTU;
 
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 8b21af7..f7c54ec 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -141,7 +141,7 @@ static void l2tp_eth_dev_setup(struct net_device *dev)
 	dev->priv_flags		&= ~IFF_TX_SKB_SHARING;
 	dev->features		|= NETIF_F_LLTX;
 	dev->netdev_ops		= &l2tp_eth_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 }
 
 static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb, int data_len)
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 3bd5b81..267e965 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1213,7 +1213,6 @@ static const struct net_device_ops ieee80211_monitorif_ops = {
 static void ieee80211_if_free(struct net_device *dev)
 {
 	free_percpu(dev->tstats);
-	free_netdev(dev);
 }
 
 static void ieee80211_if_setup(struct net_device *dev)
@@ -1221,7 +1220,8 @@ static void ieee80211_if_setup(struct net_device *dev)
 	ether_setup(dev);
 	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->netdev_ops = &ieee80211_dataif_ops;
-	dev->destructor = ieee80211_if_free;
+	dev->needs_free_netdev = true;
+	dev->priv_destructor = ieee80211_if_free;
 }
 
 static void ieee80211_if_setup_no_queue(struct net_device *dev)
@@ -1914,7 +1914,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
 
 		ret = register_netdevice(ndev);
 		if (ret) {
-			ieee80211_if_free(ndev);
+			free_netdev(ndev);
 			return ret;
 		}
 	}
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c
index 06019db..bd88a9b 100644
--- a/net/mac802154/iface.c
+++ b/net/mac802154/iface.c
@@ -526,8 +526,6 @@ static void mac802154_wpan_free(struct net_device *dev)
 	struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev);
 
 	mac802154_llsec_destroy(&sdata->sec);
-
-	free_netdev(dev);
 }
 
 static void ieee802154_if_setup(struct net_device *dev)
@@ -593,7 +591,8 @@ ieee802154_setup_sdata(struct ieee802154_sub_if_data *sdata,
 					sdata->dev->dev_addr);
 
 		sdata->dev->header_ops = &mac802154_header_ops;
-		sdata->dev->destructor = mac802154_wpan_free;
+		sdata->dev->needs_free_netdev = true;
+		sdata->dev->priv_destructor = mac802154_wpan_free;
 		sdata->dev->netdev_ops = &mac802154_wpan_ops;
 		sdata->dev->ml_priv = &mac802154_mlme_wpan;
 		wpan_dev->promiscuous_mode = false;
@@ -608,7 +607,7 @@ ieee802154_setup_sdata(struct ieee802154_sub_if_data *sdata,
 
 		break;
 	case NL802154_IFTYPE_MONITOR:
-		sdata->dev->destructor = free_netdev;
+		sdata->dev->needs_free_netdev = true;
 		sdata->dev->netdev_ops = &mac802154_monitor_ops;
 		wpan_dev->promiscuous_mode = true;
 		break;
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 89193a6..04a3128 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -94,7 +94,6 @@ static void internal_dev_destructor(struct net_device *dev)
 	struct vport *vport = ovs_internal_dev_get_vport(dev);
 
 	ovs_vport_free(vport);
-	free_netdev(dev);
 }
 
 static void
@@ -156,7 +155,8 @@ static void do_setup(struct net_device *netdev)
 	netdev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH |
 			      IFF_PHONY_HEADROOM | IFF_NO_QUEUE;
-	netdev->destructor = internal_dev_destructor;
+	netdev->needs_free_netdev = true;
+	netdev->priv_destructor = internal_dev_destructor;
 	netdev->ethtool_ops = &internal_dev_ethtool_ops;
 	netdev->rtnl_link_ops = &internal_dev_link_ops;
 
diff --git a/net/phonet/pep-gprs.c b/net/phonet/pep-gprs.c
index 21c28b5..2c93379 100644
--- a/net/phonet/pep-gprs.c
+++ b/net/phonet/pep-gprs.c
@@ -236,7 +236,7 @@ static void gprs_setup(struct net_device *dev)
 	dev->tx_queue_len	= 10;
 
 	dev->netdev_ops		= &gprs_netdev_ops;
-	dev->destructor		= free_netdev;
+	dev->needs_free_netdev	= true;
 }
 
 /*

^ permalink raw reply related

* [PATCH] sky2: Use seq_putc() in sky2_debug_show()
From: SF Markus Elfring @ 2017-05-08 16:42 UTC (permalink / raw)
  To: netdev, Mirko Lindner, Stephen Hemminger; +Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 8 May 2017 18:38:17 +0200

A single character (line break) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/marvell/sky2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index 1145cde2274a..73575101cd72 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4562,7 +4562,7 @@ static int sky2_debug_show(struct seq_file *seq, void *v)
 			seq_printf(seq, "[%d] %#x %d %#x\n",
 				   idx, le->opcode, le->length, le->status);
 		}
-		seq_puts(seq, "\n");
+		seq_putc(seq, '\n');
 	}
 
 	seq_printf(seq, "Tx ring pending=%u...%u report=%d done=%d\n",
-- 
2.12.2

^ permalink raw reply related

* Re: [RFC iproute2 0/8] RDMA tool
From: Andrew Lunn @ 2017-05-08 16:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jiri@resnulli.us, leonro@mellanox.com, jiri@mellanox.com,
	linux-rdma@vger.kernel.org, ram.amrani@cavium.com,
	sagi@grimberg.me, ogerlitz@mellanox.com, hch@lst.de,
	dennis.dalessandro@intel.com, netdev@vger.kernel.org,
	leon@kernel.org, stephen@networkplumber.org, dledford@redhat.com,
	jgunthorpe@obsidianresearch.com, ariela@mellanox.com
In-Reply-To: <1494256767.2591.3.camel@sandisk.com>

> Several companies maintain embedded Linux
> distributions and tools to build software images. These tools provide a user
> interface that allows to select what packages go into such an image.

The tools allow you to select what binary packages are placed into the
image. You can build multiple binary packages from one source package.
Desktop distributions are not likely to do this for something as small
as iproute2. But embedded distributions can easily break up iproute2
into a number of smaller packages, as you suggested, tipc, devlink,
tc, bridge, ss, etc.

Openwrt does exactly this:

https://github.com/openwrt-mirror/openwrt/blob/master/package/network/utils/iproute2/Makefile

    Andrew

^ permalink raw reply

* [PATCH] fm10k: Use seq_putc() in fm10k_dbg_desc_break()
From: SF Markus Elfring @ 2017-05-08 16:18 UTC (permalink / raw)
  To: intel-wired-lan, netdev, Jeff Kirsher; +Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 8 May 2017 18:10:39 +0200

Two single characters should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
index 5116fd043630..14df09e2d964 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
@@ -52,9 +52,9 @@ static void fm10k_dbg_desc_seq_stop(struct seq_file __always_unused *s,
 static void fm10k_dbg_desc_break(struct seq_file *s, int i)
 {
 	while (i--)
-		seq_puts(s, "-");
+		seq_putc(s, '-');
 
-	seq_puts(s, "\n");
+	seq_putc(s, '\n');
 }
 
 static int fm10k_dbg_tx_desc_seq_show(struct seq_file *s, void *v)
-- 
2.12.2

^ permalink raw reply related

* Re: [PATCH] net: ipv4: add code comment for clarification
From: Gustavo A. R. Silva @ 2017-05-08 15:44 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel, joe
In-Reply-To: <20170508.113637.783558334411383400.davem@davemloft.net>

Hi David,

Quoting David Miller <davem@davemloft.net>:

> From: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
> Date: Thu, 4 May 2017 14:44:16 -0500
>
>> @@ -389,6 +389,12 @@ static int sk_diag_fill(struct sock *sk,  
>> struct sk_buff *skb,
>>  				  nlmsg_flags, unlh, net_admin);
>>  }
>>
>> +/*
>> + * Ignore the position of the arguments req->id.idiag_dport and
>> + * req->id.idiag_sport in both calls to inet_lookup() and inet6_lookup()
>> + * functions, once this is a locked in behavior exposed to user space.
>> + * Changing this will break things for people.
>> + */
>
> This is implicit for every interface exposed to userspace.
>
> Therefore, saying it here and there in various comments provides
> questionable value.
>
> And in fact I think these arguments are probably in the correct order.
>
> I'm definitely not applying a patch like this, sorry.

I get it, thanks for clarifying.

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Dennis Dalessandro @ 2017-05-08 15:55 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Bart Van Assche, jiri-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ram.amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
	sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	ariela-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
In-Reply-To: <20170504194242.GF22833-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>

On 05/04/2017 03:42 PM, Leon Romanovsky wrote:
> On Thu, May 04, 2017 at 03:26:13PM -0400, Dennis Dalessandro wrote:
>> On 05/04/2017 02:45 PM, Leon Romanovsky wrote:
>>> On Thu, May 04, 2017 at 06:30:27PM +0000, Bart Van Assche wrote:
>>>> On Thu, 2017-05-04 at 21:25 +0300, Leon Romanovsky wrote:
>>>>> On Thu, May 04, 2017 at 06:10:54PM +0000, Bart Van Assche wrote:
>>>>>> On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
>>>>>>> Following our discussion both in mailing list [1] and at the LPC 2016 [2],
>>>>>>> we would like to propose this RDMA tool to be part of iproute2 package
>>>>>>> and finally improve this situation.
>>>>>>
>>>>>> Hello Leon,
>>>>>>
>>>>>> Although I really appreciate your work: can you clarify why you would like to
>>>>>> add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
>>>>>> for adding RDMA functionality to iproute2 in [1].
>>>>>
>>>>> We are planning to reuse the same infrastructure provided by iproute2,
>>>>> like netlink parsing, access to distributions, same CLI and same standards.
>>>>>
>>>>> Right now, RDMA is already tightened to netdev: iWARP, RoCE, IPoIB, HFI-VNIC.
>>>>> Many drivers (mlx, qed, i40, cxgb) are sharing code between net and
>>>>> RDMA.
>>>>>
>>>>> I do expect that iproute2 will be installed on every machine with any
>>>>> type of connection, including IB and OPA.
>>>>>
>>>>> So I think that it is enough to be part of that suite and don't invent
>>>>> our own for one specific tool.
>>>>
>>>> Hello Leon,
>>>>
>>>> Sorry but to me that sounds like a weak argument for including RDMA functionality
>>>> in iproute2. There is already a library for communication over netlink sockets,
>>>> namely libnl. Is there functionality that is in iproute2 but not in libnl and
>>>> that is needed for the new tool? If so, have you considered to create a new
>>>> library for that functionality?
>>>
>>> It is not hard to create new tool, the hardest part is to ensure that it is
>>> part of the distributions. Did you count how many months we are trying to
>>> add rdma-core to debian?
>>
>> I do agree that it is a strange pairing and am not really a fan. However at
>> the end of the day it's just a name for a repo/package. If the iproute folks
>> are fine to include rdma in their repo/package, great we can leverage their
>> code for CLI and other common stuff.
>>
>> Now if the interface was something like "ip -FlagForRdma ..." I would object
>> to that, but the interface is "rdma ... " so from users perspective it's
>> different tools. They don't need to care that it was sourced from a common
>> git repo.
>>
>> Just as an aside this already works a bit with OPA:
>>
>>  $ ./rdma link
>> 1/1: hfi1_0/1: ifname NONE cap_mask 0x00410022 lid 0x1 lid_mask_count 0
>> link_layer InfiniBand
>>          phys_state 5: LinkUp rate 100 Gb/sec (4X EDR) sm_lid 0x1 sm_sl 0
>> state 4: ACTIVE
>>
>> Leon I'll get you more feedback and testing, I've just been really bogged
>> down this week, sorry.
>
> Thanks Denny,
>
> Before you are starting to test it, can you please provide your feedback
> on my initial questions? Usability and need of sysfs.
> ----
> This is initial phase to understand if user experience for this tool fits
> RDMA and netdev communities exepectations. Also I would like to get feedback
> if it is really worth to provide legacy sysfs for old kernels, or maybe I should
> implement netlink from the beginning and abandon sysfs completely.
> -----

For the initial phase I think you did the right thing by reading sysfs. 
I like the ability for the tool to be compatible with legacy kernels, 
but at the same time I don't know if it's worth the hassle. I won't 
fight too hard either way.

Perhaps we should take a stab and seeing what a dual sysfs/netlink 
interface would look like, and see just how hard and complicated it 
really is. Then we can make that call. You already have the sysfs 
version, and have to do netink anyway, so let's not rip out what's there 
just yet, this is an RFC after all, not like you are asking for this 
exact version to be merged yet.

As far as usability, I think what's here is a great start and we can 
continue to refine. I'm particularly interested in the stats 
capabilities. In particular being able to filter what stats are shown 
and watch as they change over time. RDMA devices have lots of counters 
and stats. A common tool for users to be able to get that data across HW 
types would be a really good thing.

-Denny

















--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Patch net v2] ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
From: David Miller @ 2017-05-08 15:37 UTC (permalink / raw)
  To: dsahern; +Cc: xiyou.wangcong, netdev, andreyknvl
In-Reply-To: <11c2aeb4-cf2c-eded-0e56-789da2ba55ee@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Thu, 4 May 2017 13:41:15 -0600

> On 5/4/17 11:36 AM, Cong Wang wrote:
>> For each netns (except init_net), we initialize its null entry
>> in 3 places:
>> 
>> 1) The template itself, as we use kmemdup()
>> 2) Code around dst_init_metrics() in ip6_route_net_init()
>> 3) ip6_route_dev_notify(), which is supposed to initialize it after
>>    loopback registers
>> 
>> Unfortunately the last one still happens in a wrong order because
>> we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
>> net->loopback_dev's idev, so we have to do that after we add
>> idev to it. However, this notifier has priority == 0 same as
>> ipv6_dev_notf, and ipv6_dev_notf is registered after
>> ip6_route_dev_notifier so it is called actually after
>> ip6_route_dev_notifier.
>> 
>> Fix it by picking a smaller priority for ip6_route_dev_notifier.
>> Also, we have to release the refcnt accordingly when unregistering
>> loopback_dev because device exit functions are called before subsys
>> exit functions.
>> 
>> Cc: David Ahern <dsahern@gmail.com>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> ---
> 
> Commit message needs a tie in to the problem that Andrey reported. It
> solves the same problem for namespaces other than init_net.

Cong, please update the commit message as David is requesting.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: ipv4: add code comment for clarification
From: David Miller @ 2017-05-08 15:36 UTC (permalink / raw)
  To: garsilva; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel, joe
In-Reply-To: <20170504194415.GA29391@embeddedgus>

From: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
Date: Thu, 4 May 2017 14:44:16 -0500

> @@ -389,6 +389,12 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb,
>  				  nlmsg_flags, unlh, net_admin);
>  }
>  
> +/*
> + * Ignore the position of the arguments req->id.idiag_dport and
> + * req->id.idiag_sport in both calls to inet_lookup() and inet6_lookup()
> + * functions, once this is a locked in behavior exposed to user space.
> + * Changing this will break things for people.
> + */

This is implicit for every interface exposed to userspace.

Therefore, saying it here and there in various comments provides
questionable value.

And in fact I think these arguments are probably in the correct order.

I'm definitely not applying a patch like this, sorry.

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Stephen Hemminger @ 2017-05-08 15:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org,
	leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	jiri-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ram.amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
	sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org,
	ariela-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
In-Reply-To: <1494256767.2591.3.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On Mon, 8 May 2017 15:19:28 +0000
Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:

> On Sun, 2017-05-07 at 12:20 +0200, Jiri Pirko wrote:
> > Sat, May 06, 2017 at 04:40:24PM CEST, Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org wrote:  
> > > On Sat, 2017-05-06 at 12:40 +0200, Jiri Pirko wrote:  
> > > > Thu, May 04, 2017 at 08:10:54PM CEST, Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org wrote:  
> > > > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:  
> > > > > > Following our discussion both in mailing list [1] and at the LPC 2016 [2],
> > > > > > we would like to propose this RDMA tool to be part of iproute2 package
> > > > > > and finally improve this situation.  
> > > > > 
> > > > > Although I really appreciate your work: can you clarify why you would like to
> > > > > add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
> > > > > for adding RDMA functionality to iproute2 in [1].  
> > > > 
> > > > Bart, please realize that iproute2 is much more than "*IP routing* tool".
> > > > I understand you got confused by the name. Please see sources. Your comment
> > > > is totally pointless...  
> > > 
> > > I asked for a clarification that should have been in the cover letter but that
> > > was missing from that cover letter. So I think that was the right thing to do  
> > 
> > I think that was just complete misunderstanding about what iproute2 is.  
> 
> Hello Jiri,
> 
> I do not agree with your reply. The abbreviation "IP" occurs in the package
> name and that is a reference to the "Internet Protocol". As far as I know
> originally the iproute2 package contained only tools related to the Internet
> Protocol. Other tools, e.g. the tipc tool, were added later on. What I'm
> wondering about is whether it really is a good idea to add tools like tipc
> and rdma to the iproute2 package. The iproute2 package is so essential that
> it gets installed on every Linux system, including embedded systems and
> smartphones based on Linux. Several companies maintain embedded Linux
> distributions and tools to build software images. These tools provide a user
> interface that allows to select what packages go into such an image. Adding
> tools like tipc and rdma to the iproute2 package makes it harder than
> necessary for those who build software images for embedded devices to
> minimize the size of such an image. As you probably know even today the size
> of a software image still matters for embedded devices. Something else I have
> been wondering about is whether bundling the tipc and rdma tools in the
> iproute2 package will make the job harder of people who build Android ROMs?
> The ip tool is present in every Android ROM, and the size of these ROMs matters
> because the larger these ROMs are the less space remains for apps.
> 
> Bart.

I would assume embedded world does not use the standard distro model of "got to
have them all". The sane way to do builds for embedded is build everything
in the source, but selectively install components based on a Bill Of Materials
file.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH net-next 9/9] ipvlan: introduce individual MAC addresses
From: Chiappero, Marco @ 2017-05-08 15:29 UTC (permalink / raw)
  To: Jiri Benc
  Cc: Dan Williams, netdev@vger.kernel.org, David S . Miller,
	Kirsher, Jeffrey T, Duyck, Alexander H, Grandhi, Sainath,
	Mahesh Bandewar
In-Reply-To: <20170504184345.3f9afb8e@griffin>

> -----Original Message-----
> From: Jiri Benc [mailto:jbenc@redhat.com]
> Sent: Thursday, May 4, 2017 5:44 PM
> To: Chiappero, Marco <marco.chiappero@intel.com>
> Cc: Dan Williams <dcbw@redhat.com>; netdev@vger.kernel.org; David S .
> Miller <davem@davemloft.net>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; Duyck, Alexander H
> <alexander.h.duyck@intel.com>; Grandhi, Sainath
> <sainath.grandhi@intel.com>; Mahesh Bandewar <maheshb@google.com>
> Subject: Re: [PATCH net-next 9/9] ipvlan: introduce individual MAC addresses
> 
> On Thu, 4 May 2017 09:37:00 +0000, Chiappero, Marco wrote:
> > This looks conceptually wrong. Yes, ipvlan works at L3 (which is an
> > implementation detail anyway), but slaves are Ethernet interfaces and
> > should behave as much as possible as such regardless, with an
> > individual MAC address assigned.
> 
> Isn't the proper fix then converting ipvlan interfaces to be L3 only interfaces?
> I.e., ARPHRD_NONE? There's not much ipvlan can do with arbitrary Ethernet
> frames anyway. Of course, a flag to switch to the new behavior would be
> needed in order to preserve backwards compatibility.

Yes, L3 only interfaces would be a valid solution and a major differentiation with respect to macvlan. In fact it's been considered but abandoned because:
- it would be a break from the past
- VMs and containers usually expect Ethernet devices

Having that said, I'm ok with this solution if deemed preferable.

> This patchset looks very wrong. For proper support of multiple MAC addresses,
> we have macvlan and it's pointless to add that to ipvlan.

Ipvlan will never have proper support for L2, it's a L3 thing and doesn't aim at replacing macvlan. So, the options are:
1) remove it and provide L3 only interfaces - as you are proposing
2) fully emulate it preserving the single unicast MAC rule - my proposal

I'm open to both solutions, to me the important thing is to address the problems with the current implementation, one way or another, making it comply with basic networking concepts and expectations.

> And doing some kind of weird MAC NAT in ipvlan just to satisfy broken tools that
> can't cope with multiple interfaces with the same MAC address is wrong, too.

Ipvlan has always had the MAC issue, regardless, these tools simply make it more apparent. And as I said already, whether they are broken is debatable (yet I have to read a reasonable motivation). At the very least their expectation to have unique addresses on the same broadcast domain is hardly arguable. Should ipvlan considered special? Again, questionable.

> Those tools are already broken anyway, there's nothing preventing anyone to
> set the same MAC address to multiple interfaces.

What are you trying to demonstrate, that you can shoot yourself in the foot? That you can interfere with the controller? That you can intentionally break your network? Yes, of course you can, so what?
As long as network components comply with expected behaviors, including the ability to change the MAC address, every orchestrator will do its job of managing the network and will do it correctly.

> I suppose those tools don't work with bonding and bridge, either?

NaaS software and SDN controllers are built around bridges, a bridge is a well-defined, coherent, fundamental network component. As soon as ipvlan will behave coherently as well, upper layers will handle it properly.
By the way, none of them has the same limitations of ipvlan.

> > So, either we fix this by forcing slaves to stay in sync with master,
> 
> Yes, that's the correct behavior. Well, at least as correct as one can get with the
> ipvlan broken design of pretending that an interface is L2 when in fact, it is not.

Eventually we got there:  "ipvlan broken design". Let's fix it then. Would moving to L3 only interfaces be accepted instead?
--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263

This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

^ permalink raw reply

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
From: Alexander Duyck @ 2017-05-08 15:22 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Casey Leedom, Raj, Ashok, Bjorn Helgaas, Michael Werner,
	Ganesh GR, Arjun V., Asit K Mallick, Patrick J Cramer,
	Suravee Suthikulpanit, Bob Shaw, h, Mark Rutland, Amir Ancel,
	Gabriele Paoloni, Catalin Marinas, Will Deacon, LinuxArm,
	David Laight, Jeff Kirsher, Netdev
In-Reply-To: <2a3f064b-e885-a528-4945-adb32904bd0e@huawei.com>

On Mon, May 8, 2017 at 7:33 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/5/7 2:07, Alexander Duyck wrote:
>> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>>
>>>
>>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>>>> | ...
>>>>> | It sounds like we are more or less in agreement. My only concern is
>>>>> | really what we default this to. On x86 I would say we could probably
>>>>> | default this to disabled for existing platforms since my understanding
>>>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>>>> | there right now when performing DMA through the root complex. As far
>>>>> | as peer-to-peer I would say we should probably look at enabling the
>>>>> | ability to have Relaxed Ordering enabled for some channels but not
>>>>> | others. In those cases the hardware needs to be smart enough to allow
>>>>> | for you to indicate you want it disabled by default for most of your
>>>>> | DMA channels, and then enabled for the select channels that are
>>>>> | handling the peer-to-peer traffic.
>>>>>
>>>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>>>> sure that whatever scheme was developed would allow for simultaneously
>>>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>>>> Ordering for others within the same system.  I.e. not simply
>>>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>>>
>>>>>   By the way, I've started our QA folks off looking at what things look like
>>>>> in Linux Virtual Machines under different Hypervisors to see what
>>>>> information they may provide to the VM in the way of what Root Complex Port
>>>>> is being used, etc.  So far they've got Windows HyperV done and there
>>>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>>>> Maybe NULL?
>>>>
>>>> I believe NULL is one of the options. It all depends on what qemu is
>>>> emulating. Most likely you won't find a pcie root port on KVM because
>>>> the default is to emulate an older system that only supports PCI.
>>>>
>>>>>   With your reservations (which I also share), I think that it probably
>>>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>>>> being "No" for any architecture which doesn't implement the predicate.  And
>>>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>>>> False for that as well.  I can't see any reason to pass in the Source End
>>>>> Node but I may be missing something.
>>>>>
>>>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>>>> experience in the PCIe Infrastructure were to want to carry the ball
>>>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>>>> Engagement", so let me know what my next step should be.  Thanks.
>>>>>
>>>>> Casey
>>>>
>>>> For now we can probably keep this on the linux-pci mailing list. Going
>>>> that route is the most straight forward for now since step one is
>>>> probably just making sure we are setting the relaxed ordering bit in
>>>> the setups that make sense. I would say we could probably keep it
>>>> simple. We just need to enable relaxed ordering by default for SPARC
>>>> architectures, on most others we can probably default it to off.
>>>>
>>>
>>> Casey, Alexander:
>>>
>>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>>> is more safe for all the other platform, as no one want to break anything.
>>>
>>>> I believe this all had started as Ding Tianhong was hoping to enable
>>>> this for the ARM architecture. That is the only one I can think of
>>>> where it might be difficult to figure out which way to default as we
>>>> were attempting to follow the same code that was enabled for SPARC and
>>>> that is what started this tug-of-war about how this should be done.
>>>> What we might do is take care of this in two phases. The first one
>>>> enables the infrastructure generically but leaves it defaulted to off
>>>> for everyone but SPARC. Then we can go through and start enabling it
>>>> for other platforms such as some of those on ARM in the platforms that
>>>> Ding Tianhong was working with.
>>>>
>>>
>>> According the suggestion, I could only think of this code:
>>>
>>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>>                               quirk_tw686x_class);
>>>
>>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>>> +{
>>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>>> +     dev->vendor != PCI_VENDOR_ID_SUN)
>>> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>>> +}
>>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
>>> +                       quirk_relaxedordering_disable);
>>> +
>>>  /*
>>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>>   * values for the Attribute as were supplied in the header of the
>>>
>>>
>>> What do you think of it?
>>>
>>> Thanks
>>> Ding
>>>
>>
>> This is a bit simplistic but it is a start.
>>
>> The other bit I was getting at is that we need to update the core PCIe
>> code so that when we configure devices and the root complex reports no
>> support for relaxed ordering it should be clearing the relaxed
>> ordering bits in the PCIe configuration registers on the upstream
>> facing devices.
>
> How about this:
> rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it
> when pcie root configure if it support the RO mode, otherwise we will not set it to indicate
> that the pcie dev did not support RO mode.

The problem is we need to have something that can be communicated
through a VM. Your change doesn't work in that regard. That was why I
suggested just updating the code so that we when we initialized PCIe
devices what we do is either set or clear the relaxed ordering bit in
the PCIe device control register. That way when we direct assign an
interface it could know just based on the bits int the PCIe
configuration if it could use relaxed ordering or not.

At that point the driver code itself becomes very simple since you
could just enable the relaxed ordering by default in the igb/ixgbe
driver and if the bit is set or cleared in the PCIe configuration then
we are either sending with relaxed ordering requests or not and don't
have to try and locate the root complex.

>>
>> The last bit we need in all this is a way to allow for setups where
>> peer-to-peer wants to perform relaxed ordering but for writes to the
>> host we have to not use relaxed ordering. For that we need to enable a
>> special case and that isn't handled right now in any of the solutions
>> we have coded up so far.
>>
>
> Sorry I am not clear of this way, can you explain more about this or give me
> a special case, thanks a lot.

So from the sound of it Casey has a special use case where he doesn't
want to send relaxed ordering frames to the root complex, but instead
would like to send them to another PCIe device. To do that he needs to
have a way to enable the relaxed ordering bit in the PCIe
configuration but then not send any to the root complex. Odds are that
is something he might be able to just implement in the driver, but is
something that may become a more general case in the future. I don't
see our change here impacting it as long as we keep the solution
generic and mostly confined to when we instantiate the devices as the
driver could likely make the decision to change the behavior later.

- Alex

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Bart Van Assche @ 2017-05-08 15:19 UTC (permalink / raw)
  To: jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org
  Cc: leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	jiri-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ram.amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
	sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org,
	ariela-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
In-Reply-To: <20170507102046.GA1889-6KJVSR23iU488b5SBfVpbw@public.gmane.org>

On Sun, 2017-05-07 at 12:20 +0200, Jiri Pirko wrote:
> Sat, May 06, 2017 at 04:40:24PM CEST, Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org wrote:
> > On Sat, 2017-05-06 at 12:40 +0200, Jiri Pirko wrote:
> > > Thu, May 04, 2017 at 08:10:54PM CEST, Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org wrote:
> > > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> > > > > Following our discussion both in mailing list [1] and at the LPC 2016 [2],
> > > > > we would like to propose this RDMA tool to be part of iproute2 package
> > > > > and finally improve this situation.
> > > > 
> > > > Although I really appreciate your work: can you clarify why you would like to
> > > > add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
> > > > for adding RDMA functionality to iproute2 in [1].
> > > 
> > > Bart, please realize that iproute2 is much more than "*IP routing* tool".
> > > I understand you got confused by the name. Please see sources. Your comment
> > > is totally pointless...
> > 
> > I asked for a clarification that should have been in the cover letter but that
> > was missing from that cover letter. So I think that was the right thing to do
> 
> I think that was just complete misunderstanding about what iproute2 is.

Hello Jiri,

I do not agree with your reply. The abbreviation "IP" occurs in the package
name and that is a reference to the "Internet Protocol". As far as I know
originally the iproute2 package contained only tools related to the Internet
Protocol. Other tools, e.g. the tipc tool, were added later on. What I'm
wondering about is whether it really is a good idea to add tools like tipc
and rdma to the iproute2 package. The iproute2 package is so essential that
it gets installed on every Linux system, including embedded systems and
smartphones based on Linux. Several companies maintain embedded Linux
distributions and tools to build software images. These tools provide a user
interface that allows to select what packages go into such an image. Adding
tools like tipc and rdma to the iproute2 package makes it harder than
necessary for those who build software images for embedded devices to
minimize the size of such an image. As you probably know even today the size
of a software image still matters for embedded devices. Something else I have
been wondering about is whether bundling the tipc and rdma tools in the
iproute2 package will make the job harder of people who build Android ROMs?
The ip tool is present in every Android ROM, and the size of these ROMs matters
because the larger these ROMs are the less space remains for apps.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: bpf pointer alignment validation
From: David Miller @ 2017-05-08 15:04 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <59104D35.8080108@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 08 May 2017 12:49:25 +0200

> On 05/06/2017 04:47 AM, David Miller wrote:
>> From: David Miller <davem@davemloft.net>
>> Date: Fri, 05 May 2017 16:20:44 -0400 (EDT)
>>
>>> Anyways, I'll play with this design and see what happens...
>>> Feedback is of course welcome.
>>
>> Here is a prototype that works for me with test_pkt_access.c,
>> which otherwise won't load on sparc.
> 
> Code looks good to me as far as I can tell, thanks for working
> on this.
> 
> Could you also add test cases specifically to this for test_verifier
> in bpf selftests? I'm thinking of the cases when we have no pkt id
> and offset originated from reg->off (accumulated through const imm
> ops on reg) and insn->off, where we had i) no pkt id and ii) a
> specific pkt id (so we can probe for aux_off_align rejection as well).
> I believe we do have coverage to some extend in some of the tests
> (more on the map_value_adj though), but it would be good to keep
> tracking this specifically as well.

Yes, I am working on also on special tests that parse the verifier
trace to make sure the alignment values were calculated properly.

^ permalink raw reply

* Re: [PATCH net] ipv6: make sure dev is not NULL before call ip6_frag_reasm
From: Eric Dumazet @ 2017-05-08 14:50 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev
In-Reply-To: <1494212964-17861-1-git-send-email-liuhangbin@gmail.com>

On Mon, 2017-05-08 at 11:09 +0800, Hangbin Liu wrote:
> Since ip6_frag_reasm() will call __in6_dev_get(dev), which will access
> dev->ip6_ptr. We need to make sure dev is not NULL.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  net/ipv6/reassembly.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index e1da5b8..e3ebd62 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -348,7 +348,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  		fq->q.flags |= INET_FRAG_FIRST_IN;
>  	}
>  
> -	if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
> +	if (dev && fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
>  	    fq->q.meat == fq->q.len) {
>  		int res;
>  		unsigned long orefdst = skb->_skb_refdst;


How dev could be possibly NULL here ?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox