Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: net/core: BUG in unregister_netdevice_many
From: Cong Wang @ 2017-04-21 18:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrey Konovalov, Eric Dumazet, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Alexander Duyck, David Ahern, Daniel Borkmann, tcharding,
	Jiri Pirko, stephen hemminger, Dmitry Vyukov, Kostya Serebryany,
	syzkaller
In-Reply-To: <CA+55aFwg=OtMWmU153uM27Fwk5eVv5ZBGBA9phQqVqn3nNAy6Q@mail.gmail.com>

On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Apr 21, 2017 at 5:48 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
>>
>> I've got the following error report while fuzzing the kernel with syzkaller.
>>
>> ------------[ cut here ]------------
>> kernel BUG at net/core/dev.c:6813!
>
> Another useless BUG_ON() that

I think we are double-unregister'ing the pim6reg device,
we probably need something like:

commit 7dc00c82cbb0119cf4663f65bbaa2cc55f961db2
Author: Wang Chen <wangchen@cn.fujitsu.com>
Date:   Mon Jul 14 20:56:34 2008 -0700

    ipv4: Fix ipmr unregister device oops

    An oops happens during device unregister.

^ permalink raw reply

* Re: [PATCH v1] net: phy: fix auto-negotiation stall due to unavailable interrupt
From: Alexander Kochetkov @ 2017-04-21 14:42 UTC (permalink / raw)
  To: Roger Quadros
  Cc: Florian Fainelli, netdev, LKML, Sergei Shtylyov, Madalin Bucur
In-Reply-To: <2f44a2db-da7a-2ca9-ea5e-6cc551c3008c@ti.com>

> 21 апр. 2017 г., в 17:18, Roger Quadros <rogerq@ti.com> написал(а):
> 
> I think the following commit broke functionality with interrupt driven PHYs
> 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

Probably this one[1] broke, according to Alexandre’s commit[2].
And it was since Nov 16 2015. But it was hidden by some other commits.

For Roger problem became visible after 3c293f4e08b5 ("net: phy:
Trigger state machine on state change and not polling.»),

For my problem became visible after 529ed1275263 ("net: phy: phy drivers
should not set SUPPORTED_[Asym_]Pause»). As commit 529ed1275263 
removed SUPPORTED_Pause flag from PHY advertising property and
genphy_config_aneg() began to skip PHY auto-negotiation.

Alexander.

[1] Fixes: 321beec5047a (net: phy: Use interrupts when available in NOLINK state)
     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=321beec5047af83db90c88114b7e664b156f49fe
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=99f81afc139c6edd14d77a91ee91685a414a1c66

^ permalink raw reply

* Re: [PATCH V2 net] netdevice: Include NETIF_F_HW_CSUM when intersecting features
From: Alexander Duyck @ 2017-04-21 18:53 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: Michal Kubecek, Vlad Yasevich, Netdev, Tom Herbert
In-Reply-To: <CAGCdqXFM-+OeQFznZ9kcjpnyeBU=1gcdjv1v7RR3KEGPyiZ_GQ@mail.gmail.com>

On Fri, Apr 21, 2017 at 10:33 AM, Vladislav Yasevich
<vyasevich@gmail.com> wrote:
> On Fri, Apr 21, 2017 at 1:33 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
>> On Thu, Apr 20, 2017 at 07:19:55PM -0400, Vlad Yasevich wrote:
>>>
>>> Having said that, the other alternative is to inherit hw_features from
>>> lower devices.  BTW, bonding I think has a similar "issue" you are
>>> describing since it prefers HW_CSUM if any of the slaves have it set.
>>
>> It does but bonding uses netdev_increment_features() to combine slave
>> features and this function handles checksumming like "or", not "and"
>> (not only checksumming, also flags in NETIF_F_ONE_FOR_ALL).
>>
>
> I am not saying that it doesn't.  What I am saying is that if you
> form a bond with two devices: one with only NETIF_F_IP_CSUM and the
> other with NETIF_F_HW_CSUM, then the bonding device will have NETIF_F_HW_CSUM
> set.  This is similar to what is being proposed in the patch.
>
> Alex's objection, at least as I understand it, is that we never want to
> allow the above condition.  However, it looks like we already allow it
> and correctly handle it.

My objection is that the change as you have proposed it doesn't work
that way. It is one thing to advertise NETIF_F_HW_CSUM on an upper
device, the problem is netdev_intersect_features isn't used on just
the upper device. It is used to perform what is essentially a logical
AND of the features. What you are doing changes that logic. That is
why I suggested fixing this in the VLAN driver code instead.

>> That said, it's legitimate to ask if we want some of the features to be
>> handled differently when computing features for a vlan device. My point
>> before was that if the helper is called netdev_intersect_features(), it
>> shouldn't return any features that are not supported by both argument
>> sets, even if all its current users would benefit from slightly
>> different behaviour. If it does, it's a trap that someone might one day
>> fall in.
>
> Ok.  I think I understand, but we've always handled the checksum intersection
> stangely.  I'll see what I can figure out.
>

I'm okay with us setting NETIF_F_HW_CSUM if the lower device supports
any checksum offload. I just don't want the change to impact
netif_skb_features() in any way that could cause us to advertise
offload support that isn't there. That was why I suggested updating
vlan_dev_fix_features so that it would zero out the IP_CSUM and
IP6_CSUM flags and set the HW_CSUM if any offload was supported.
Basically it would just consist of adding the following lines after
the calls to netdev_intersect_features:

if (features & NETIF_F_CSUM_MASK) {
        features &= ~NETIF_F_CSUM_MASK;
        features |= NETIF_F_HW_CSUM;
}

Just that should be enough to resolve the issues you were seeing and
make it so that you always advertise NETIF_F_HW_CSUM instead of
IP_CSUM or IPV6_CSUM.

- Alex

^ permalink raw reply

* Re: [PATCH net-next 1/5] nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC attr
From: Jakub Kicinski @ 2017-04-21 17:10 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <CAKgT0UfdbLV9tdYZXVQK6U2cF1Ljzc27uuVoNaZF-rHcmLUnPQ@mail.gmail.com>

On Fri, Apr 21, 2017 at 8:07 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Fri, Apr 21, 2017 at 7:20 AM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> DMA unmap may destroy changes CPU made to the buffer.  To make XDP
>> run correctly on non-x86 platforms we should use the
>> DMA_ATTR_SKIP_CPU_SYNC attribute.
>>
>> Thanks to using the attribute we can now push the sync operation to
>> the common code path from XDP handler.
>>
>> A little bit of variable name reshuffling is required to bring the
>> code back to readable state.
>>
>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>
> So I see where you added the sync_single_for_cpu, but what about the
> sync_single_for_device? It needs to be called for the buffer before
> you assign it for Rx. On x86 it won't really matter but for proper
> utilization of the DMA API you need to sync the buffer even for Rx
> just to make certain that the cache lines are evicted prior to the
> device attempting to write to the buffer.

Ah, indeed, thanks for catching this.  We need to invalidate the
caches in case they are dirty and could get written back between the
device DMA and sync_for_cpu, correct?

^ permalink raw reply

* Re: [PATCH net-next 1/5] nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC attr
From: Alexander Duyck @ 2017-04-21 15:07 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20170421142052.107388-2-jakub.kicinski@netronome.com>

On Fri, Apr 21, 2017 at 7:20 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> DMA unmap may destroy changes CPU made to the buffer.  To make XDP
> run correctly on non-x86 platforms we should use the
> DMA_ATTR_SKIP_CPU_SYNC attribute.
>
> Thanks to using the attribute we can now push the sync operation to
> the common code path from XDP handler.
>
> A little bit of variable name reshuffling is required to bring the
> code back to readable state.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

So I see where you added the sync_single_for_cpu, but what about the
sync_single_for_device? It needs to be called for the buffer before
you assign it for Rx. On x86 it won't really matter but for proper
utilization of the DMA API you need to sync the buffer even for Rx
just to make certain that the cache lines are evicted prior to the
device attempting to write to the buffer.

> ---
>  .../net/ethernet/netronome/nfp/nfp_net_common.c    | 43 +++++++++++++---------
>  1 file changed, 25 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> index e2197160e4dc..1274a70c9a38 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> @@ -87,16 +87,23 @@ void nfp_net_get_fw_version(struct nfp_net_fw_version *fw_ver,
>
>  static dma_addr_t nfp_net_dma_map_rx(struct nfp_net_dp *dp, void *frag)
>  {
> -       return dma_map_single(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
> -                             dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
> -                             dp->rx_dma_dir);
> +       return dma_map_single_attrs(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
> +                                   dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
> +                                   dp->rx_dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>  }
>
>  static void nfp_net_dma_unmap_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr)
>  {
> -       dma_unmap_single(dp->dev, dma_addr,
> -                        dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
> -                        dp->rx_dma_dir);
> +       dma_unmap_single_attrs(dp->dev, dma_addr,
> +                              dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
> +                              dp->rx_dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +}
> +
> +static void nfp_net_dma_sync_cpu_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr,
> +                                   unsigned int len)
> +{
> +       dma_sync_single_for_cpu(dp->dev, dma_addr - NFP_NET_RX_BUF_HEADROOM,
> +                               len, dp->rx_dma_dir);
>  }
>
>  /* Firmware reconfig
> @@ -1569,7 +1576,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>         tx_ring = r_vec->xdp_ring;
>
>         while (pkts_polled < budget) {
> -               unsigned int meta_len, data_len, data_off, pkt_len;
> +               unsigned int meta_len, data_len, meta_off, pkt_len, pkt_off;
>                 u8 meta_prepend[NFP_NET_MAX_PREPEND];
>                 struct nfp_net_rx_buf *rxbuf;
>                 struct nfp_net_rx_desc *rxd;
> @@ -1608,11 +1615,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>                 data_len = le16_to_cpu(rxd->rxd.data_len);
>                 pkt_len = data_len - meta_len;
>
> +               pkt_off = NFP_NET_RX_BUF_HEADROOM + dp->rx_dma_off;
>                 if (dp->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC)
> -                       data_off = NFP_NET_RX_BUF_HEADROOM + meta_len;
> +                       pkt_off += meta_len;
>                 else
> -                       data_off = NFP_NET_RX_BUF_HEADROOM + dp->rx_offset;
> -               data_off += dp->rx_dma_off;
> +                       pkt_off += dp->rx_offset;
> +               meta_off = pkt_off - meta_len;
>
>                 /* Stats update */
>                 u64_stats_update_begin(&r_vec->rx_sync);
> @@ -1621,7 +1629,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>                 u64_stats_update_end(&r_vec->rx_sync);
>
>                 /* Pointer to start of metadata */
> -               meta = rxbuf->frag + data_off - meta_len;
> +               meta = rxbuf->frag + meta_off;
>
>                 if (unlikely(meta_len > NFP_NET_MAX_PREPEND ||
>                              (dp->rx_offset && meta_len > dp->rx_offset))) {
> @@ -1631,6 +1639,9 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>                         continue;
>                 }
>
> +               nfp_net_dma_sync_cpu_rx(dp, rxbuf->dma_addr + meta_off,
> +                                       data_len);
> +
>                 if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
>                                   dp->bpf_offload_xdp)) {
>                         unsigned int dma_off;
> @@ -1638,10 +1649,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>                         int act;
>
>                         hard_start = rxbuf->frag + NFP_NET_RX_BUF_HEADROOM;
> -                       dma_off = data_off - NFP_NET_RX_BUF_HEADROOM;
> -                       dma_sync_single_for_cpu(dp->dev, rxbuf->dma_addr,
> -                                               dma_off + pkt_len,
> -                                               DMA_BIDIRECTIONAL);
>
>                         /* Move prepend out of the way */
>                         if (xdp_prog->xdp_adjust_head) {
> @@ -1650,12 +1657,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>                         }
>
>                         act = nfp_net_run_xdp(xdp_prog, rxbuf->frag, hard_start,
> -                                             &data_off, &pkt_len);
> +                                             &pkt_off, &pkt_len);
>                         switch (act) {
>                         case XDP_PASS:
>                                 break;
>                         case XDP_TX:
> -                               dma_off = data_off - NFP_NET_RX_BUF_HEADROOM;
> +                               dma_off = pkt_off - NFP_NET_RX_BUF_HEADROOM;
>                                 if (unlikely(!nfp_net_tx_xdp_buf(dp, rx_ring,
>                                                                  tx_ring, rxbuf,
>                                                                  dma_off,
> @@ -1689,7 +1696,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>
>                 nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
>
> -               skb_reserve(skb, data_off);
> +               skb_reserve(skb, pkt_off);
>                 skb_put(skb, pkt_len);
>
>                 if (!dp->chained_metadata_format) {
> --
> 2.11.0
>

^ permalink raw reply

* Re: [PATCH RFC] sparc64: eBPF JIT
From: Alexei Starovoitov @ 2017-04-21 18:49 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, sparclinux, netdev, ast
In-Reply-To: <20170421.124640.1134810340055330244.davem@davemloft.net>

On Fri, Apr 21, 2017 at 12:46:40PM -0400, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Mon, 17 Apr 2017 20:44:35 +0200
> 
> > There is samples/bpf/sockex3_kern.c, which exercises it. To
> > run it, it would be (clang/llvm needed due to BPF backend not
> > available in gcc):
> > 
> > # cd samples/bpf
> > # make
> > # ./sockex3
> > IP     src.port -> dst.port               bytes      packets
> > 127.0.0.1.12865 -> 127.0.0.1.49711          148            2
> > 127.0.0.1.49711 -> 127.0.0.1.12865          108            2
> > [...]
> > 
> > Inside parse_eth_proto(), it will do tail calls based on the
> > eth protocol. Over time, we'll move such C based tests over to
> > tools/testing/selftests/bpf/.
> 
> Ok, after a lot of work setting up an LLVM/CLANG environment and other
> things, 

was it painful because of sparc environment?
fedora/ubuntu on x86 ship with modern clang already and bpf backend
is compiled-in by default.
redhat folks have been back and forth on adding bpf support to gcc.
The backend itself was fully functional before it was abandoned.
Last time we discussed it the lack of integrated asm in gcc was the main blocker.
Can we bend gcc rules and let bpf backend emit custom binary and/or elf?


^ permalink raw reply

* Re: net: heap out-of-bounds in fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone
From: Eric Dumazet @ 2017-04-21 16:47 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrey Konovalov, Dmitry Vyukov, Cong Wang, Mahesh Bandewar,
	Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
In-Reply-To: <e64e9ae6-5de9-1fc8-6a79-1724ef03c252@cumulusnetworks.com>

On Fri, 2017-04-21 at 08:27 -0600, David Ahern wrote:
> On 4/20/17 10:09 AM, Andrey Konovalov wrote:
> > On Thu, Apr 20, 2017 at 5:39 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
> >> On Thu, Apr 20, 2017 at 5:35 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> >>> On 4/20/17 9:28 AM, Andrey Konovalov wrote:
> >>>> This one seems to be much closer to what Dmitry reported intially.
> >>> does not repro here; I ran in a loop and nothing.
> > Here's strace log, maybe it'll help figuring out why it doesn't reproduce:
> 
> reproduced. working on it.

Thanks guys for working on this ;)

^ permalink raw reply

* [PATCH net] ip6mr: fix notification device destruction
From: Nikolay Aleksandrov @ 2017-04-21 17:42 UTC (permalink / raw)
  To: netdev
  Cc: davem, yoshfuji, dvyukov, kcc, syzkaller, edumazet, roopa,
	Nikolay Aleksandrov
In-Reply-To: <CAAeHK+zUZMGQuW=7Qhz3EWYjh6Zwv9eMBCiMCU79QrYNWox0-Q@mail.gmail.com>

Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.

The trace from Andrey:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:6813!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: netns cleanup_net
task: ffff880069208000 task.stack: ffff8800692d8000
RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
Call Trace:
 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
 call_netdevice_notifiers net/core/dev.c:1663
 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
 unregister_netdevice_many net/core/dev.c:7880
 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
 kthread+0x35e/0x430 kernel/kthread.c:231
 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
---[ end trace e0b29c57e9b3292c ]---

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
Andrey could you please test with this patch applied ?
I have run the reproducer locally and can no longer trigger the bug.
I've made "notify" an int instead of a bool only to be closer to the ipmr
code for easier future patches.

 net/ipv6/ip6mr.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index fb4546e80c82..374997d26488 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -774,7 +774,8 @@ static struct net_device *ip6mr_reg_vif(struct net *net, struct mr6_table *mrt)
  *	Delete a VIF entry
  */
 
-static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head)
+static int mif6_delete(struct mr6_table *mrt, int vifi, int notify,
+		       struct list_head *head)
 {
 	struct mif_device *v;
 	struct net_device *dev;
@@ -820,7 +821,7 @@ static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head)
 					     dev->ifindex, &in6_dev->cnf);
 	}
 
-	if (v->flags & MIFF_REGISTER)
+	if ((v->flags & MIFF_REGISTER) && !notify)
 		unregister_netdevice_queue(dev, head);
 
 	dev_put(dev);
@@ -1331,7 +1332,6 @@ static int ip6mr_device_event(struct notifier_block *this,
 	struct mr6_table *mrt;
 	struct mif_device *v;
 	int ct;
-	LIST_HEAD(list);
 
 	if (event != NETDEV_UNREGISTER)
 		return NOTIFY_DONE;
@@ -1340,10 +1340,9 @@ static int ip6mr_device_event(struct notifier_block *this,
 		v = &mrt->vif6_table[0];
 		for (ct = 0; ct < mrt->maxvif; ct++, v++) {
 			if (v->dev == dev)
-				mif6_delete(mrt, ct, &list);
+				mif6_delete(mrt, ct, 1, NULL);
 		}
 	}
-	unregister_netdevice_many(&list);
 
 	return NOTIFY_DONE;
 }
@@ -1552,7 +1551,7 @@ static void mroute_clean_tables(struct mr6_table *mrt, bool all)
 	for (i = 0; i < mrt->maxvif; i++) {
 		if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC))
 			continue;
-		mif6_delete(mrt, i, &list);
+		mif6_delete(mrt, i, 0, &list);
 	}
 	unregister_netdevice_many(&list);
 
@@ -1708,7 +1707,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		if (copy_from_user(&mifi, optval, sizeof(mifi_t)))
 			return -EFAULT;
 		rtnl_lock();
-		ret = mif6_delete(mrt, mifi, NULL);
+		ret = mif6_delete(mrt, mifi, 0, NULL);
 		rtnl_unlock();
 		return ret;
 
-- 
2.1.4

^ permalink raw reply related

* Re: macvlan: Fix device ref leak when purging bc_queue
From: David Miller @ 2017-04-21 15:02 UTC (permalink / raw)
  To: Joe.Ghalam; +Cc: herbert, Clifford.Wichmann, netdev
In-Reply-To: <1492785652578.53801@Dell.com>


Please DO NOT top-post.

Thank you.

^ permalink raw reply

* Re: [PATCH net] ip6mr: fix notification device destruction
From: Andrey Konovalov @ 2017-04-21 18:46 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, David S. Miller, Hideaki YOSHIFUJI, Dmitry Vyukov,
	Kostya Serebryany, syzkaller, Eric Dumazet, roopa, Linus Torvalds,
	LKML
In-Reply-To: <6d7dca26-4bc6-870c-8eb9-409f6c6b8fd5@cumulusnetworks.com>

On Fri, Apr 21, 2017 at 8:30 PM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> On 21/04/17 20:42, Nikolay Aleksandrov wrote:
>> Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
>> because we call unregister_netdevice_many for a device that is already
>> being destroyed. In IPv4's ipmr that has been resolved by two commits
>> long time ago by introducing the "notify" parameter to the delete
>> function and avoiding the unregister when called from a notifier, so
>> let's do the same for ip6mr.

Hi Nikolay,

Your patch fixes BUG_ON() being triggered for me.

Tested-by: Andrey Konovalov <andreyknvl@google.com>

Thanks!

>>
>> The trace from Andrey:
>> ------------[ cut here ]------------
>> kernel BUG at net/core/dev.c:6813!
>> invalid opcode: 0000 [#1] SMP KASAN
>> Modules linked in:
>> CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>> 01/01/2011
>> Workqueue: netns cleanup_net
>> task: ffff880069208000 task.stack: ffff8800692d8000
>> RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
>> RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
>> RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
>> RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
>> R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
>> R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
>> FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
>> knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
>> Call Trace:
>>  unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
>>  unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
>>  ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
>>  notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
>>  __raw_notifier_call_chain kernel/notifier.c:394
>>  raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>>  call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
>>  call_netdevice_notifiers net/core/dev.c:1663
>>  rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
>>  unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
>>  unregister_netdevice_many net/core/dev.c:7880
>>  default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
>>  ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
>>  cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
>>  process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
>>  worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
>>  kthread+0x35e/0x430 kernel/kthread.c:231
>>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
>> Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
>> 47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
>> 0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
>> RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
>> ---[ end trace e0b29c57e9b3292c ]---
>>
>> Reported-by: Andrey Konovalov <andreyknvl@google.com>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>
> +CC LKML and Linus
>
>> Andrey could you please test with this patch applied ?
>> I have run the reproducer locally and can no longer trigger the bug.
>> I've made "notify" an int instead of a bool only to be closer to the ipmr
>> code for easier future patches.
>>
>>  net/ipv6/ip6mr.c | 13 ++++++-------
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
>> index fb4546e80c82..374997d26488 100644
>> --- a/net/ipv6/ip6mr.c
>> +++ b/net/ipv6/ip6mr.c
>> @@ -774,7 +774,8 @@ static struct net_device *ip6mr_reg_vif(struct net *net, struct mr6_table *mrt)
>>   *   Delete a VIF entry
>>   */
>>
>> -static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head)
>> +static int mif6_delete(struct mr6_table *mrt, int vifi, int notify,
>> +                    struct list_head *head)
>>  {
>>       struct mif_device *v;
>>       struct net_device *dev;
>> @@ -820,7 +821,7 @@ static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head)
>>                                            dev->ifindex, &in6_dev->cnf);
>>       }
>>
>> -     if (v->flags & MIFF_REGISTER)
>> +     if ((v->flags & MIFF_REGISTER) && !notify)
>>               unregister_netdevice_queue(dev, head);
>>
>>       dev_put(dev);
>> @@ -1331,7 +1332,6 @@ static int ip6mr_device_event(struct notifier_block *this,
>>       struct mr6_table *mrt;
>>       struct mif_device *v;
>>       int ct;
>> -     LIST_HEAD(list);
>>
>>       if (event != NETDEV_UNREGISTER)
>>               return NOTIFY_DONE;
>> @@ -1340,10 +1340,9 @@ static int ip6mr_device_event(struct notifier_block *this,
>>               v = &mrt->vif6_table[0];
>>               for (ct = 0; ct < mrt->maxvif; ct++, v++) {
>>                       if (v->dev == dev)
>> -                             mif6_delete(mrt, ct, &list);
>> +                             mif6_delete(mrt, ct, 1, NULL);
>>               }
>>       }
>> -     unregister_netdevice_many(&list);
>>
>>       return NOTIFY_DONE;
>>  }
>> @@ -1552,7 +1551,7 @@ static void mroute_clean_tables(struct mr6_table *mrt, bool all)
>>       for (i = 0; i < mrt->maxvif; i++) {
>>               if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC))
>>                       continue;
>> -             mif6_delete(mrt, i, &list);
>> +             mif6_delete(mrt, i, 0, &list);
>>       }
>>       unregister_netdevice_many(&list);
>>
>> @@ -1708,7 +1707,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
>>               if (copy_from_user(&mifi, optval, sizeof(mifi_t)))
>>                       return -EFAULT;
>>               rtnl_lock();
>> -             ret = mif6_delete(mrt, mifi, NULL);
>> +             ret = mif6_delete(mrt, mifi, 0, NULL);
>>               rtnl_unlock();
>>               return ret;
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* [PATCH net-next 4/7] ibmvnic: Move initialization of the stats token to ibmvnic_open
From: Nathan Fontenot @ 2017-04-21 19:38 UTC (permalink / raw)
  To: netdev; +Cc: brking, jallen, muvic, tlfalcon
In-Reply-To: <20170421193627.11030.34813.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>

We should be initializing the stats token in the same place we
initialize the other resources for the driver.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index cc34bf9..199cccb 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -607,6 +607,10 @@ static int ibmvnic_open(struct net_device *netdev)
 		return -1;
 	}
 
+	rc = init_stats_token(adapter);
+	if (rc)
+		return rc;
+
 	adapter->map_id = 1;
 	adapter->napi = kcalloc(adapter->req_rx_queues,
 				sizeof(struct napi_struct), GFP_KERNEL);
@@ -3241,12 +3245,6 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
 		return rc;
 	}
 
-	rc = init_stats_token(adapter);
-	if (rc) {
-		release_crq_queue(adapter);
-		return rc;
-	}
-
 	init_completion(&adapter->init_done);
 	ibmvnic_send_crq_init(adapter);
 	if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {

^ permalink raw reply related

* [PATCH net-next 0/3] packet: Add option to create new fanout group with unique id.
From: Willem de Bruijn @ 2017-04-21 14:56 UTC (permalink / raw)
  To: netdev; +Cc: maloneykernel, davem, Mike Maloney

From: Mike Maloney <maloney@google.com>

Fanout uses a per net global namespace. A process that intends to create a
new fanout group can accidentally join an existing group. It is
not possible to detect this.

Add a socket option to specify on the first call to
setsockopt(..., PACKET_FANOUT, ...) to ensure that a new group is created.
Also add tests.

Mike Maloney (3):
  selftests/net: cleanup unused parameter in psock_fanout
  packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.
  selftests/net: add tests for PACKET_FANOUT_FLAG_UNIQUEID

 include/uapi/linux/if_packet.h             |  1 +
 net/packet/af_packet.c                     | 44 ++++++++++++++
 tools/testing/selftests/net/psock_fanout.c | 93 ++++++++++++++++++++++++++----
 3 files changed, 128 insertions(+), 10 deletions(-)

-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply

* [PATCH net-next V3 2/2] rtnl: Add support for netdev event attribute to link messages
From: Vladislav Yasevich @ 2017-04-21 17:31 UTC (permalink / raw)
  To: netdev; +Cc: dsa, Vladislav Yasevich
In-Reply-To: <1492795881-11914-1-git-send-email-vyasevic@redhat.com>

When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list.  These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened.  So, it is impossible
to tell what just happend for these events.

This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of event that triggered this
message.  This would allow the the message consumer to easily determine
if it needs to perform certain actions.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 include/linux/rtnetlink.h    |  3 ++-
 include/uapi/linux/if_link.h | 11 ++++++++
 net/core/dev.c               |  2 +-
 net/core/rtnetlink.c         | 62 +++++++++++++++++++++++++++++++++++++-------
 4 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 57e5484..0459018 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -18,7 +18,8 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t flags);
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
-				       unsigned change, gfp_t flags);
+				       unsigned change, unsigned long event,
+				       gfp_t flags);
 void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
 		       gfp_t flags);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8b405af..aceb766 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
 	IFLA_GSO_MAX_SIZE,
 	IFLA_PAD,
 	IFLA_XDP,
+	IFLA_EVENT,
 	__IFLA_MAX
 };
 
@@ -899,4 +900,14 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+	IFLA_EVENT_UNSPEC,
+	IFLA_EVENT_REBOOT,
+	IFLA_EVENT_FEAT_CHANGE,
+	IFLA_EVENT_BONDING_FAILOVER,
+	IFLA_EVENT_NOTIFY_PEERS,
+	IFLA_EVENT_RESEND_IGMP,
+	IFLA_EVENT_CHANGE_INFO_DATA,
+};
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ef9fe60e..7efb417 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6840,7 +6840,7 @@ static void rollback_registered_many(struct list_head *head)
 
 		if (!dev->rtnl_link_ops ||
 		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
-			skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U,
+			skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0,
 						     GFP_KERNEL);
 
 		/*
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e8e6816..9082cdd 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -944,6 +944,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
 	       + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
 	       + rtnl_xdp_size(dev) /* IFLA_XDP */
+	       + nla_total_size(4)  /* IFLA_EVENT */
 	       + nla_total_size(1); /* IFLA_PROTO_DOWN */
 
 }
@@ -1276,9 +1277,40 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 	return err;
 }
 
+static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
+{
+	u32 rtnl_event;
+
+	switch (event) {
+	case NETDEV_REBOOT:
+		rtnl_event = IFLA_EVENT_REBOOT;
+		break;
+	case NETDEV_FEAT_CHANGE:
+		rtnl_event = IFLA_EVENT_FEAT_CHANGE;
+		break;
+	case NETDEV_BONDING_FAILOVER:
+		rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
+		break;
+	case NETDEV_NOTIFY_PEERS:
+		rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
+		break;
+	case NETDEV_RESEND_IGMP:
+		rtnl_event = IFLA_EVENT_RESEND_IGMP;
+		break;
+	case NETDEV_CHANGEINFODATA:
+		rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
+		break;
+	default:
+		return 0;
+	}
+
+	return nla_put_u32(skb, IFLA_EVENT, rtnl_event);
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
-			    unsigned int flags, u32 ext_filter_mask)
+			    unsigned int flags, u32 ext_filter_mask,
+			    unsigned long event)
 {
 	struct ifinfomsg *ifm;
 	struct nlmsghdr *nlh;
@@ -1327,6 +1359,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	    nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down))
 		goto nla_put_failure;
 
+	if (rtnl_fill_link_event(skb, event))
+		goto nla_put_failure;
+
 	if (rtnl_fill_link_ifmap(skb, dev))
 		goto nla_put_failure;
 
@@ -1461,6 +1496,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_LINK_NETNSID]	= { .type = NLA_S32 },
 	[IFLA_PROTO_DOWN]	= { .type = NLA_U8 },
 	[IFLA_XDP]		= { .type = NLA_NESTED },
+	[IFLA_EVENT]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1620,7 +1656,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 					       NETLINK_CB(cb->skb).portid,
 					       cb->nlh->nlmsg_seq, 0,
 					       flags,
-					       ext_filter_mask);
+					       ext_filter_mask, 0);
 			/* If we ran out of room on the first message,
 			 * we're in trouble
 			 */
@@ -2715,7 +2751,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh)
 		return -ENOBUFS;
 
 	err = rtnl_fill_ifinfo(nskb, dev, RTM_NEWLINK, NETLINK_CB(skb).portid,
-			       nlh->nlmsg_seq, 0, 0, ext_filter_mask);
+			       nlh->nlmsg_seq, 0, 0, ext_filter_mask, 0);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_size */
 		WARN_ON(err == -EMSGSIZE);
@@ -2787,7 +2823,8 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 }
 
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
-				       unsigned int change, gfp_t flags)
+				       unsigned int change,
+				       unsigned long event, gfp_t flags)
 {
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
@@ -2798,7 +2835,7 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
 	if (skb == NULL)
 		goto errout;
 
-	err = rtnl_fill_ifinfo(skb, dev, type, 0, 0, change, 0, 0);
+	err = rtnl_fill_ifinfo(skb, dev, type, 0, 0, change, 0, 0, event);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_size() */
 		WARN_ON(err == -EMSGSIZE);
@@ -2819,18 +2856,25 @@ void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev, gfp_t flags)
 	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
 }
 
-void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
-		  gfp_t flags)
+static void rtmsg_ifinfo_event(int type, struct net_device *dev,
+			       unsigned int change, unsigned long event,
+			       gfp_t flags)
 {
 	struct sk_buff *skb;
 
 	if (dev->reg_state != NETREG_REGISTERED)
 		return;
 
-	skb = rtmsg_ifinfo_build_skb(type, dev, change, flags);
+	skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags);
 	if (skb)
 		rtmsg_ifinfo_send(skb, dev, flags);
 }
+
+void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
+		  gfp_t flags)
+{
+	rtmsg_ifinfo_event(type, dev, change, 0, flags);
+}
 EXPORT_SYMBOL(rtmsg_ifinfo);
 
 static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
@@ -4128,7 +4172,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
 	case NETDEV_NOTIFY_PEERS:
 	case NETDEV_RESEND_IGMP:
 	case NETDEV_CHANGEINFODATA:
-		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
+		rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL);
 		break;
 	default:
 		break;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 0/3] l3mdev: Improve use with main table
From: Robert Shearman @ 2017-04-21 17:44 UTC (permalink / raw)
  To: David Ahern, davem; +Cc: netdev
In-Reply-To: <3bdb570e-4d7d-6e5c-3270-0bf0064ef73a@cumulusnetworks.com>

On 20/04/17 23:36, David Ahern wrote:
> On 4/10/17 8:21 AM, Robert Shearman wrote:
>> Attempting to create a TCP socket not bound to a VRF device when a TCP
>> socket bound to a VRF device with the same port exists (and vice
>> versa) fails with EADDRINUSE. This limits the ability to use programs
>> in selected mixed VRF/non-VRF contexts.
>>
>> This patch series solves the problem by extending the l3mdev be aware
>> of the special semantics of the main table and fixing issues arising
>> from the split local/main tables. A VRF master device created linking
>> to the main table and used for these programs in the same way as those
>> created for VRF tables can.
>>
>> Robert Shearman (3):
>>   ipv6: Fix route handling when using l3mdev set to main table
>>   ipv4: Fix route handling when using l3mdev set to main table
>>   l3mdev: Fix lookup in local table when using main table
>>
>>  net/ipv4/af_inet.c      |  4 +++-
>>  net/ipv4/fib_frontend.c | 14 +++++++++-----
>>  net/ipv4/raw.c          |  5 ++++-
>>  net/ipv6/addrconf.c     | 12 +++++++++---
>>  net/ipv6/route.c        | 23 ++++++++++++++++++-----
>>  net/l3mdev/l3mdev.c     | 26 ++++++++++++++++++++------
>>  6 files changed, 63 insertions(+), 21 deletions(-)
>>
>
> With the change I mentioned earlier to fix the refcnt issue on top of
> this patch set I see a number of failures:
> - local IPv4 with 127.0.0.1 address - ping, tcp, udp tests fail
> - all of the IPv4 multicast tests fail
> - IPv6 linklocal and mcast addresses generally fail
> - IPv6 global address on vrf device

Can you send me some more details of your testing?

These work for me:

$ ping -c1 -I vrf-default 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 vrf-default: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.141 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.141/0.141/0.141/0.000 ms
$ ping6 -c1 -I vrf-default 2001::1
PING 2001::1(2001::1) from 2001::1 vrf-default: 56 data bytes
64 bytes from 2001::1: icmp_seq=1 ttl=64 time=0.069 ms

--- 2001::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.069/0.069/0.069/0.000 ms
$ sudo ip vrf exec vrf-default nc -l -p 4013
TEST
$ sudo ip vrf exec vrf-default nc -l -u -p 4013
TEST
^C

(with a neighbouring host using nc to send the TEST string for the udp 
and tcp cases)

Thanks,
Rob

^ permalink raw reply

* mr neil
From: mr neil trotter @ 2017-04-21 16:34 UTC (permalink / raw)


I HAVE A DONATION OF 1MILLION POUNDS TO YOU

^ permalink raw reply

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-21 18:11 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <20170421.121217.1268145592512455181.davem@davemloft.net>

On 17-04-21 12:12 PM, David Miller wrote:

> Yes for existing attributes we are stuck in the mud because of how
> we've handled things in the past.  I'm not saying we should change
> behavior for existing attributes.
>
> I'm talking about any newly added attribute from here on out, and
> that we need to require checks for them.
>

Please bear with me. I want to make sure to get this right.

Lets say I updated the kernel today to reject transactions with
bits it didnt understand. Lets call this "old kernel". A tc that
understands/sets these bits and nothing else. Call it "old tc".
3 months later:
I add one more bit setting to introduce a new feature in a new
kernel version. Lets call this new "kernel". I update to
understand new bits. Call it "new tc".

The possibilities:
a) old tc + old kernel combo. No problem
b) new tc + new kernel combo. No problem.
c) old tc + new kernel combo. No problem.
d) new tc + old kernel. Rejection.

For #d if i have a smart tc it would retry with a new combination
which restores its behavior to old tc level. Of course this means
apps would have to be rewritten going forward to understand these
mechanics.
Alternative is to request for capabilities first then doing a
lowest common denominator request.
But even that is a lot more code and crossing user/kernel twice.

There is a simpler approach that would work going forward.
How about letting the user choose their fate? Set something maybe
in the netlink header to tell the kernel "if you dont understand
something I am asking for - please ignore it and do what you can".
This would maintain current behavior but would force the user to
explicitly state so.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: David Miller @ 2017-04-21 16:11 UTC (permalink / raw)
  To: jhs; +Cc: eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <c2d61246-b92d-efef-cd07-005f8a8dacc0@mojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Fri, 21 Apr 2017 11:55:40 -0400

> On 17-04-21 11:38 AM, David Miller wrote:
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>> Date: Fri, 21 Apr 2017 11:29:19 -0400
>>
>> Even something as benign as "give melarger action dumps" _must_ still
>> have the same behavior because the user has no alternative action plan
>> possible if it cannot tell if the kernel supports the facility or not.
>>

   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I am pretty sure I was clear in my position above.  And then you say:

> But i think there are other cases like "please give me a large
> dump" which require less harsh reaction in particular because
> I have alternative means in the kernel to achieve the dump.
> Would logging or no reaction be fine then?

I clearly said that the large dump should be handled the exactly the
same way as other kinds of attributes and requests.  And I told you
why, and it's because the user cannot act upon the situation if it
wants to.

You give the user no way to perform alternative actions.

Any feature whatsoever, even "give me large dumps" may require the
user to take alternative actions.  You give the user no option
whatsoever by silently ignoring stuff, and that is simply
unacceptable.

Please get out of the mindset of "oh, ignoring this and silently
proceeding in situation X is OK".

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/2] rtnetlink: Disable notification for NETDEV_NAMECHANGE event
From: David Ahern @ 2017-04-21 18:08 UTC (permalink / raw)
  To: Vladislav Yasevich, netdev; +Cc: Vladislav Yasevich
In-Reply-To: <1492795881-11914-2-git-send-email-vyasevic@redhat.com>

On 4/21/17 11:31 AM, Vladislav Yasevich wrote:
> The data signaling name change is already provided at
> the end of do_setlink().  This event handler just generates
> a duplicate announcement.  Disable it.
> 
> CC: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  net/core/rtnetlink.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 0ee5479..e8e6816 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4123,7 +4123,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
>  
>  	switch (event) {
>  	case NETDEV_REBOOT:
> -	case NETDEV_CHANGENAME:
>  	case NETDEV_FEAT_CHANGE:
>  	case NETDEV_BONDING_FAILOVER:
>  	case NETDEV_NOTIFY_PEERS:
> 


I only see one using the ip monitor.

$ ip li set foobar name fubar

generates these 3 messages:

[LINK]12: fubar: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN
group default
    link/ether 76:cd:72:dd:2a:cb brd ff:ff:ff:ff:ff:ff
Unknown message: type=0x00000051(81) flags=0x00000000(0)len=0x0000001c(28)
[NETCONF]ipv4 dev dummy2 forwarding on rp_filter off mc_forwarding off
proxy_neigh off ignore_routes_with_linkdown off
Unknown message: type=0x00000051(81) flags=0x00000000(0)len=0x0000001c(28)
[NETCONF]ipv6 dev dummy2 forwarding on mc_forwarding off proxy_neigh off
ignore_routes_with_linkdown off

do_setlink only sets DO_SETLINK_MODIFIED so a name change alone will not
generate 2 messages.

^ permalink raw reply

* [PATCH net-next 0/7] ibmvnic: Additional updates and bug fixes
From: Nathan Fontenot @ 2017-04-21 19:38 UTC (permalink / raw)
  To: netdev; +Cc: brking, jallen, muvic, tlfalcon

This set of patches is an additional set of updates and bug fixes to
the ibmvnic driver which applies on top of the previous set of updates
sent out on 4/19.

---

Murilo Fossa Vicentini (1):
      ibmvnic: Insert header on VLAN tagged received frame

Nathan Fontenot (4):
      ibmvnic: Only retrieve error info if present
      ibmvnic: Move initialization of the stats token to ibmvnic_open
      ibmvnic: Add set_link_state routine for setting adapter link state
      ibmvnic: Validate napi exist before disabling them

Thomas Falcon (2):
      ibmvnic: Set real number of rx queues
      ibmvnic: Free skb's in cases of failure in transmit


 drivers/net/ethernet/ibm/ibmvnic.c |  223 ++++++++++++++++++++++++++++--------
 drivers/net/ethernet/ibm/ibmvnic.h |    3 
 2 files changed, 175 insertions(+), 51 deletions(-)

^ permalink raw reply

* [PATCH net-next 6/7] ibmvnic: Validate napi exist before disabling them
From: Nathan Fontenot @ 2017-04-21 19:39 UTC (permalink / raw)
  To: netdev; +Cc: brking, jallen, muvic, tlfalcon
In-Reply-To: <20170421193627.11030.34813.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>

Validate that the napi structs exist before trying to disable them
at driver close.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 115f216..5a916a2 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -729,8 +729,10 @@ static int ibmvnic_close(struct net_device *netdev)
 	adapter->closing = true;
 	disable_sub_crqs(adapter);
 
-	for (i = 0; i < adapter->req_rx_queues; i++)
-		napi_disable(&adapter->napi[i]);
+	if (adapter->napi) {
+		for (i = 0; i < adapter->req_rx_queues; i++)
+			napi_disable(&adapter->napi[i]);
+	}
 
 	if (!adapter->failover)
 		netif_tx_stop_all_queues(netdev);

^ permalink raw reply related

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
From: Robert Shearman @ 2017-04-21 17:17 UTC (permalink / raw)
  To: David Ahern, davem; +Cc: netdev
In-Reply-To: <7c395b52-d639-9001-c6fa-ccacec4ce0d9@cumulusnetworks.com>

On 20/04/17 23:18, David Ahern wrote:
> On 4/20/17 6:58 AM, Robert Shearman wrote:
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index acd69cfe2951..f667783ffd19 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -2125,6 +2125,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
>>  		fi = NULL;
>>  	}
>>
>> +	/* If the flag to skip the nh oif check is set then the output
>> +	 * device may not match the nh device, so cannot use or add to
>> +	 * cache in that case.
>> +	 */
>> +	if (unlikely(fl4->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF &&
>> +		     FIB_RES_NH(*res).nh_dev != dev_out))
>> +		do_cache = false;
>> +
>>  	fnhe = NULL;
>>  	do_cache &= fi != NULL;
>>  	if (do_cache) {
>>
>
> I believe this is a better fix:
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 5e1e60546fce..fb74a16958af 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2407,7 +2407,7 @@ struct rtable *__ip_route_output_key_hash(struct
> net *net, struct flowi4 *fl4,
>                 }
>
>                 /* L3 master device is the loopback for that domain */
> -               dev_out = l3mdev_master_dev_rcu(dev_out) ? :
> net->loopback_dev;
> +               dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
> net->loopback_dev;
>                 fl4->flowi4_oif = dev_out->ifindex;
>                 flags |= RTCF_LOCAL;
>                 goto make_route;
>
> Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback")
>
> With your change above, references to vrf devices are still taken
> (dev_out is the vrf device based on the flow struct) even though the
> route's nexthop is in another domain. And the commit log should
> reference the use case which is policy routing overriding the VRF rule.

That is indeed a nicer fix - it survives all of my local testing. Thanks 
for correcting the fixes tag too.

I had included this text in the commit message to capture the condition 
of the rules ordering: "when the rule for the lookup in the local table
is ordered before the rule for lookups using l3mdevs". However, I'll try 
to make it more prominent and expand it to note the policy routing use 
case too.

Thanks,
Rob

^ permalink raw reply

* Re: [PATCH net] ip6mr: fix notification device destruction
From: Nikolay Aleksandrov @ 2017-04-21 18:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, yoshfuji, dvyukov, kcc, syzkaller, edumazet, roopa,
	torvalds, linux-kernel
In-Reply-To: <1492796536-28781-1-git-send-email-nikolay@cumulusnetworks.com>

On 21/04/17 20:42, Nikolay Aleksandrov wrote:
> Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
> because we call unregister_netdevice_many for a device that is already
> being destroyed. In IPv4's ipmr that has been resolved by two commits
> long time ago by introducing the "notify" parameter to the delete
> function and avoiding the unregister when called from a notifier, so
> let's do the same for ip6mr.
> 
> The trace from Andrey:
> ------------[ cut here ]------------
> kernel BUG at net/core/dev.c:6813!
> invalid opcode: 0000 [#1] SMP KASAN
> Modules linked in:
> CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
> 01/01/2011
> Workqueue: netns cleanup_net
> task: ffff880069208000 task.stack: ffff8800692d8000
> RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
> RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
> RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
> RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
> R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
> R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
> FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
> Call Trace:
>  unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
>  unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
>  ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
>  notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
>  __raw_notifier_call_chain kernel/notifier.c:394
>  raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>  call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
>  call_netdevice_notifiers net/core/dev.c:1663
>  rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
>  unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
>  unregister_netdevice_many net/core/dev.c:7880
>  default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
>  ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
>  cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
>  process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
>  worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
>  kthread+0x35e/0x430 kernel/kthread.c:231
>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
> Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
> 47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
> 0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
> RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
> ---[ end trace e0b29c57e9b3292c ]---
> 
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---

+CC LKML and Linus

> Andrey could you please test with this patch applied ?
> I have run the reproducer locally and can no longer trigger the bug.
> I've made "notify" an int instead of a bool only to be closer to the ipmr
> code for easier future patches.
> 
>  net/ipv6/ip6mr.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index fb4546e80c82..374997d26488 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -774,7 +774,8 @@ static struct net_device *ip6mr_reg_vif(struct net *net, struct mr6_table *mrt)
>   *	Delete a VIF entry
>   */
>  
> -static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head)
> +static int mif6_delete(struct mr6_table *mrt, int vifi, int notify,
> +		       struct list_head *head)
>  {
>  	struct mif_device *v;
>  	struct net_device *dev;
> @@ -820,7 +821,7 @@ static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head)
>  					     dev->ifindex, &in6_dev->cnf);
>  	}
>  
> -	if (v->flags & MIFF_REGISTER)
> +	if ((v->flags & MIFF_REGISTER) && !notify)
>  		unregister_netdevice_queue(dev, head);
>  
>  	dev_put(dev);
> @@ -1331,7 +1332,6 @@ static int ip6mr_device_event(struct notifier_block *this,
>  	struct mr6_table *mrt;
>  	struct mif_device *v;
>  	int ct;
> -	LIST_HEAD(list);
>  
>  	if (event != NETDEV_UNREGISTER)
>  		return NOTIFY_DONE;
> @@ -1340,10 +1340,9 @@ static int ip6mr_device_event(struct notifier_block *this,
>  		v = &mrt->vif6_table[0];
>  		for (ct = 0; ct < mrt->maxvif; ct++, v++) {
>  			if (v->dev == dev)
> -				mif6_delete(mrt, ct, &list);
> +				mif6_delete(mrt, ct, 1, NULL);
>  		}
>  	}
> -	unregister_netdevice_many(&list);
>  
>  	return NOTIFY_DONE;
>  }
> @@ -1552,7 +1551,7 @@ static void mroute_clean_tables(struct mr6_table *mrt, bool all)
>  	for (i = 0; i < mrt->maxvif; i++) {
>  		if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC))
>  			continue;
> -		mif6_delete(mrt, i, &list);
> +		mif6_delete(mrt, i, 0, &list);
>  	}
>  	unregister_netdevice_many(&list);
>  
> @@ -1708,7 +1707,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
>  		if (copy_from_user(&mifi, optval, sizeof(mifi_t)))
>  			return -EFAULT;
>  		rtnl_lock();
> -		ret = mif6_delete(mrt, mifi, NULL);
> +		ret = mif6_delete(mrt, mifi, 0, NULL);
>  		rtnl_unlock();
>  		return ret;
>  
> 

^ permalink raw reply

* Re: net/core: BUG in unregister_netdevice_many
From: Nikolay Aleksandrov @ 2017-04-21 18:30 UTC (permalink / raw)
  To: Linus Torvalds, Andrey Konovalov, Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Alexander Duyck,
	David Ahern, Daniel Borkmann, tcharding, Jiri Pirko,
	stephen hemminger, Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CA+55aFyTFzML+_8E_kDjX=YFkaDi_ejGo15UgGtAT29244UwgA@mail.gmail.com>

On 21/04/17 20:42, Linus Torvalds wrote:
> On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I'm assuming that the real cause is simply that "dev->reg_state" ends
>> up being NETREG_UNREGISTERING or something. Maybe the BUG_ON() could
>> be just removed, and replaced by the previous warning about
>> NETREG_UNINITIALIZED.
>>
>> Something like the attached (TOTALLY UNTESTED) patch.
> 
> .. might as well test it.
> 
> That patch doesn't fix the problem, but it does show that yes, it was
> NETREG_UNREGISTERING:
> 
>   unregister_netdevice: device pim6reg/ffff962dc4606000 was not registered (2)
> 
> but then immediately afterwards we get
> 
>   general protection fault: 0000 [#1] SMP
>   Workqueue: netns cleanup_net
>   RIP: 0010:dev_shutdown+0xe/0xc0
>   Call Trace:
>      rollback_registered_many+0x2a5/0x440
>      unregister_netdevice_many+0x1e/0xb0
>      default_device_exit_batch+0x145/0x170
> 
> which is due to a
> 
>         mov    0x388(%rdi),%eax
> 
> where %rdi is 0xdead000000000090. That is at the very beginning of
> dev_shutdown, it's "dev" itself that has that value, so it comes from
> (_another_) invocation of rollback_registered_many(), when it does
> that
> 
>         list_for_each_entry(dev, head, unreg_list) {
> 
> so it seems to be a case of another "list_del() leaves list in bad
> state", and it was the added test for "dev->reg_state !=
> NETREG_REGISTERED" that did that
> 
>         list_del(&dev->unreg_list);
> 
> and left random contents in the unreg_list.
> 
> So that "handle error case" was almost certainly just buggy too.
> 
> And the bug seems to be that we're trying to unregister a netdevice
> that has already been unregistered.
> 
> Over to Eric and networking people. This oops is user-triggerable, and
> leaves the machine in a bad state (the original BUG_ON() and the new
> GP fault both happen while holding the RTNL, so networking is not
> healthy afterwards.
> 
>                       Linus
> 

Right, I've already posted a patch for ip6mr that should fix the issue.
CCed you and LKML just now.

Thanks,
 Nik

^ permalink raw reply

* [PATCH v3 net-next 0/2] rtnetlink: Updates to rtnetlink_event()
From: Vladislav Yasevich @ 2017-04-21 17:31 UTC (permalink / raw)
  To: netdev; +Cc: dsa, Vladislav Yasevich

This is a version 3 series came out of the conversation that started
as a result my first attempt to add netdevice event info to netlink messages.

First, let's clean-up the duplicate notifications for NETDEV_CHANGNAME
event.  We get a message from the event handler as well as a result of
do_setlink().  The two messages are identical, so remove the event.

Second, update the original patch to add IFLA_EVENT attribute
to the link message to only support currently white-listed events.
Like before, this is just an attribute that gets added to the rtnetlink
message only when the messaged was generated as a result of a netdev event.
In my case, this is necessary since I want to trap NETDEV_NOTIFY_PEERS
event (also possibly NETDEV_RESEND_IGMP event) and perform certain actions
in user space.  This is not possible since the messages generated as
a result of netdev events do not usually contain any changed data.  They
are just notifications.  This patch exposes this notification type to
userspace.

I will also update my patch to iproute that will show this data
through 'ip monitor'. 

V3: Rebased.  Cleaned-up duplicate event.

V2: Added missed events (from David Ahern)

Vladislav Yasevich (2):
  rtnetlink: Convert rtnetlink_event to white list
  rtnl: Add support for netdev event to link messages

 include/linux/rtnetlink.h    |   3 +-
 include/uapi/linux/if_link.h |  21 ++++++++
 net/core/dev.c               |   2 +-
 net/core/rtnetlink.c         | 121 +++++++++++++++++++++++++++++++++++--------
 4 files changed, 123 insertions(+), 24 deletions(-)

-- 
2.7.4

Vladislav Yasevich (2):
  rtnetlink: Disable notification for NETDEV_NAMECHANGE event
  rtnl: Add support for netdev event to link messages

 include/linux/rtnetlink.h    |  3 ++-
 include/uapi/linux/if_link.h | 11 ++++++++
 net/core/dev.c               |  2 +-
 net/core/rtnetlink.c         | 63 +++++++++++++++++++++++++++++++++++++-------
 4 files changed, 67 insertions(+), 12 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net-next 7/7] ibmvnic: Free skb's in cases of failure in transmit
From: Nathan Fontenot @ 2017-04-21 19:39 UTC (permalink / raw)
  To: netdev; +Cc: brking, jallen, muvic, tlfalcon
In-Reply-To: <20170421193627.11030.34813.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

When an error is encountered during transmit we need to free the
skb instead of returning TX_BUSY.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 5a916a2..51bf337 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -908,9 +908,13 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 				   be32_to_cpu(adapter->login_rsp_buf->
 					       off_txsubm_subcrqs));
 	if (adapter->migrated) {
+		if (!netif_subqueue_stopped(netdev, skb))
+			netif_stop_subqueue(netdev, queue_num);
+		dev_kfree_skb_any(skb);
+
 		tx_send_failed++;
 		tx_dropped++;
-		ret = NETDEV_TX_BUSY;
+		ret = NETDEV_TX_OK;
 		goto out;
 	}
 
@@ -976,11 +980,13 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 						    sizeof(tx_buff->indir_arr),
 						    DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, tx_buff->indir_dma)) {
+			dev_kfree_skb_any(skb);
+			tx_buff->skb = NULL;
 			if (!firmware_has_feature(FW_FEATURE_CMO))
 				dev_err(dev, "tx: unable to map descriptor array\n");
 			tx_map_failed++;
 			tx_dropped++;
-			ret = NETDEV_TX_BUSY;
+			ret = NETDEV_TX_OK;
 			goto out;
 		}
 		lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
@@ -999,9 +1005,15 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 		else
 			tx_pool->consumer_index--;
 
+		dev_kfree_skb_any(skb);
+		tx_buff->skb = NULL;
+
+		if (lpar_rc == H_CLOSED)
+			netif_stop_subqueue(netdev, queue_num);
+
 		tx_send_failed++;
 		tx_dropped++;
-		ret = NETDEV_TX_BUSY;
+		ret = NETDEV_TX_OK;
 		goto out;
 	}
 

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox