Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] ipv6: fix possible deadlock in rt6_age_examine_exception()
From: Wei Wang @ 2018-03-23 16:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Linux Kernel Network Developers, Eric Dumazet,
	Martin KaFai Lau
In-Reply-To: <20180323145658.154636-1-edumazet@google.com>

On Fri, Mar 23, 2018 at 7:57 AM Eric Dumazet <edumazet@google.com> wrote:

> syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()

> rt6_age_examine_exception() is called while rt6_exception_lock is held.
> This lock is the lower one in the lock hierarchy, thus we can not
> call dst_neigh_lookup() function, as it can fallback to neigh_create()

> We should instead do a pure RCU lookup. As a bonus we avoid
> a pair of atomic operations on neigh refcount.

> [1]

> WARNING: possible circular locking dependency detected
> 4.16.0-rc4+ #277 Not tainted

> syz-executor7/4015 is trying to acquire lock:
>   (&ndev->lock){++--}, at: [<00000000416dce19>]
__ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928

> but task is already holding lock:
>   (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250
net/core/neighbour.c:292

> which lock already depends on the new lock.

> the existing dependency chain (in reverse order) is:

> -> #3 (&tbl->lock){++-.}:
>         __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
>         _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
>         __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528
>         neigh_create include/net/neighbour.h:315 [inline]
>         ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228
>         dst_neigh_lookup include/net/dst.h:405 [inline]
>         rt6_age_examine_exception net/ipv6/route.c:1609 [inline]
>         rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645
>         fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033
>         fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919
>         fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845
>         fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893
>         fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970
>         __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986
>         fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline]
>         fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053
>         ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781
>         notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>         __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>         raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>         call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>         call_netdevice_notifiers net/core/dev.c:1725 [inline]
>         __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
>         dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>         devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
>         inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
>         sock_do_ioctl+0xef/0x390 net/socket.c:957
>         sock_ioctl+0x36b/0x610 net/socket.c:1081
>         vfs_ioctl fs/ioctl.c:46 [inline]
>         do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>         SYSC_ioctl fs/ioctl.c:701 [inline]
>         SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>         do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> -> #2 (rt6_exception_lock){+.-.}:
>         __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>         _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
>         spin_lock_bh include/linux/spinlock.h:315 [inline]
>         rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367
>         fib6_del_route net/ipv6/ip6_fib.c:1677 [inline]
>         fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761
>         __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980
>         ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993
>         __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332
>         ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline]
>         ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200
>         inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433
>         sock_release+0x8d/0x1e0 net/socket.c:594
>         sock_close+0x16/0x20 net/socket.c:1149
>         __fput+0x327/0x7e0 fs/file_table.c:209
>         ____fput+0x15/0x20 fs/file_table.c:243
>         task_work_run+0x199/0x270 kernel/task_work.c:113
>         exit_task_work include/linux/task_work.h:22 [inline]
>         do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>         do_group_exit+0x149/0x400 kernel/exit.c:968
>         get_signal+0x73a/0x16d0 kernel/signal.c:2469
>         do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>         exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>         prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>         syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>         do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> -> #1 (&(&tb->tb6_lock)->rlock){+.-.}:
>         __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>         _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
>         spin_lock_bh include/linux/spinlock.h:315 [inline]
>         __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007
>         ip6_route_add+0x141/0x190 net/ipv6/route.c:2955
>         addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359
>         fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline]
>         addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline]
>         addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460
>         notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>         __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>         raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>         call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>         call_netdevice_notifiers net/core/dev.c:1725 [inline]
>         __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958
>         dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>         do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357
>         rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965
>         rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641
>         netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444
>         rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659
>         netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
>         netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
>         netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
>         sock_sendmsg_nosec net/socket.c:629 [inline]
>         sock_sendmsg+0xca/0x110 net/socket.c:639
>         ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
>         __sys_sendmsg+0xe5/0x210 net/socket.c:2081
>         SYSC_sendmsg net/socket.c:2092 [inline]
>         SyS_sendmsg+0x2d/0x50 net/socket.c:2088
>         do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> -> #0 (&ndev->lock){++--}:
>         lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>         __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
>         _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
>         __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
>         ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
>         pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
>         pneigh_ifdown net/core/neighbour.c:695 [inline]
>         neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
>         rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
>         addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
>         addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
>         notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>         __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>         raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>         call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>         call_netdevice_notifiers net/core/dev.c:1725 [inline]
>         __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
>         dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>         devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
>         inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
>         packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
>         sock_do_ioctl+0xef/0x390 net/socket.c:957
>         sock_ioctl+0x36b/0x610 net/socket.c:1081
>         vfs_ioctl fs/ioctl.c:46 [inline]
>         do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>         SYSC_ioctl fs/ioctl.c:701 [inline]
>         SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>         do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> other info that might help us debug this:

> Chain exists of:
>    &ndev->lock --> rt6_exception_lock --> &tbl->lock

>   Possible unsafe locking scenario:

>         CPU0                    CPU1
>         ----                    ----
>    lock(&tbl->lock);
>                                 lock(rt6_exception_lock);
>                                 lock(&tbl->lock);
>    lock(&ndev->lock);

>   *** DEADLOCK ***

> 2 locks held by syz-executor7/4015:
>   #0:  (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
>   #1:  (&tbl->lock){++-.}, at: [<00000000b5cb1d65>]
neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292

> stack backtrace:
> CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:17 [inline]
>   dump_stack+0x194/0x24d lib/dump_stack.c:53
>   print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
>   check_prev_add kernel/locking/lockdep.c:1863 [inline]
>   check_prevs_add kernel/locking/lockdep.c:1976 [inline]
>   validate_chain kernel/locking/lockdep.c:2417 [inline]
>   __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
>   lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>   __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
>   _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
>   __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
>   ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
>   pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
>   pneigh_ifdown net/core/neighbour.c:695 [inline]
>   neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
>   rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
>   addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
>   addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
>   notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>   __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>   raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>   call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>   call_netdevice_notifiers net/core/dev.c:1725 [inline]
>   __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
>   dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>   devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
>   inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
>   packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
>   sock_do_ioctl+0xef/0x390 net/socket.c:957
>   sock_ioctl+0x36b/0x610 net/socket.c:1081
>   vfs_ioctl fs/ioctl.c:46 [inline]
>   do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>   SYSC_ioctl fs/ioctl.c:701 [inline]
>   SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>   do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7

> Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for exception table")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Wei Wang <weiwan@google.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> ---

Nice fix. Thanks Eric.

Acked-by: Wei Wang <weiwan@google.com>

>   net/ipv6/route.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)

> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index
b0d5c64e19780ce94feb112285ed1d85dbe07e9e..b33d057ac5eb2a85e19be59f0bceacf547cc9e59
100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1626,11 +1626,10 @@ static void rt6_age_examine_exception(struct
rt6_exception_bucket *bucket,
>                  struct neighbour *neigh;
>                  __u8 neigh_flags = 0;

> -               neigh = dst_neigh_lookup(&rt->dst, &rt->rt6i_gateway);
> -               if (neigh) {
> +               neigh = __ipv6_neigh_lookup_noref(rt->dst.dev,
&rt->rt6i_gateway);
> +               if (neigh)
>                          neigh_flags = neigh->flags;
> -                       neigh_release(neigh);
> -               }
> +
>                  if (!(neigh_flags & NTF_ROUTER)) {
>                          RT6_TRACE("purging route %p via non-router but
gateway\n",
>                                    rt);
> @@ -1654,7 +1653,8 @@ void rt6_age_exceptions(struct rt6_info *rt,
>          if (!rcu_access_pointer(rt->rt6i_exception_bucket))
>                  return;

> -       spin_lock_bh(&rt6_exception_lock);
> +       rcu_read_lock_bh();
> +       spin_lock(&rt6_exception_lock);
>          bucket = rcu_dereference_protected(rt->rt6i_exception_bucket,
>                                      lockdep_is_held(&rt6_exception_lock));

> @@ -1668,7 +1668,8 @@ void rt6_age_exceptions(struct rt6_info *rt,
>                          bucket++;
>                  }
>          }
> -       spin_unlock_bh(&rt6_exception_lock);
> +       spin_unlock(&rt6_exception_lock);
> +       rcu_read_unlock_bh();
>   }

>   struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
> --
> 2.17.0.rc0.231.g781580f067-goog

^ permalink raw reply

* Re: [PATCH net 0/3] mlxsw: Handle changes to MTU in GRE tunnels
From: David Miller @ 2018-03-23 16:56 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, lucien.xin, mlxsw
In-Reply-To: <20180322175335.26232-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Thu, 22 Mar 2018 19:53:32 +0200

> Petr says:
> 
> When offloading GRE tunnels, the MTU setting is kept fixed after the
> initial offload even as the slow-path configuration changed. Worse: the
> offloaded MTU setting is actually just a transient value set at the time
> of NETDEV_REGISTER of the tunnel. As of commit ffc2b6ee4174 ("ip_gre:
> fix IFLA_MTU ignored on NEWLINK"), that transient value is zero, and
> unless there's e.g. a VRF migration that prompts re-offload, it stays at
> zero, and all GRE packets end up trapping.
> 
> Thus, in patch #1, change the way the MTU is changed post-registration,
> so that the full event protocol is observed. That way the drivers get to
> see the change and have a chance to react.
> 
> In the remaining two patches, implement support for MTU change in mlxsw
> driver.

Series applied, thanks.

^ permalink raw reply

* Re: [bpf-next V5 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping
From: Alexander Duyck @ 2018-03-23 16:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
	Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
	Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <152180752969.20167.10855856167552164450.stgit@firesoul>

On Fri, Mar 23, 2018 at 5:18 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> Use the IDA infrastructure for getting a cyclic increasing ID number,
> that is used for keeping track of each registered allocator per
> RX-queue xdp_rxq_info.  Instead of using the IDR infrastructure, which
> uses a radix tree, use a dynamic rhashtable, for creating ID to
> pointer lookup table, because this is faster.
>
> The problem that is being solved here is that, the xdp_rxq_info
> pointer (stored in xdp_buff) cannot be used directly, as the
> guaranteed lifetime is too short.  The info is needed on a
> (potentially) remote CPU during DMA-TX completion time . In an
> xdp_frame the xdp_mem_info is stored, when it got converted from an
> xdp_buff, which is sufficient for the simple page refcnt based recycle
> schemes.
>
> For more advanced allocators there is a need to store a pointer to the
> registered allocator.  Thus, there is a need to guard the lifetime or
> validity of the allocator pointer, which is done through this
> rhashtable ID map to pointer. The removal and validity of of the
> allocator and helper struct xdp_mem_allocator is guarded by RCU.  The
> allocator will be created by the driver, and registered with
> xdp_rxq_info_reg_mem_model().
>
> It is up-to debate who is responsible for freeing the allocator
> pointer or invoking the allocator destructor function.  In any case,
> this must happen via RCU freeing.
>
> Use the IDA infrastructure for getting a cyclic increasing ID number,
> that is used for keeping track of each registered allocator per
> RX-queue xdp_rxq_info.
>
> V4: Per req of Jason Wang
> - Use xdp_rxq_info_reg_mem_model() in all drivers implementing
>   XDP_REDIRECT, even-though it's not strictly necessary when
>   allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero).
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    9 +
>  drivers/net/tun.c                             |    6 +
>  drivers/net/virtio_net.c                      |    7 +
>  include/net/xdp.h                             |   15 --
>  net/core/xdp.c                                |  230 ++++++++++++++++++++++++-
>  5 files changed, 248 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 45520eb503ee..ff069597fccf 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -6360,7 +6360,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
>         struct device *dev = rx_ring->dev;
>         int orig_node = dev_to_node(dev);
>         int ring_node = -1;
> -       int size;
> +       int size, err;
>
>         size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count;
>
> @@ -6397,6 +6397,13 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
>                              rx_ring->queue_index) < 0)
>                 goto err;
>
> +       err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp_rxq,
> +                                        MEM_TYPE_PAGE_SHARED, NULL);
> +       if (err) {
> +               xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
> +               goto err;
> +       }
> +
>         rx_ring->xdp_prog = adapter->xdp_prog;
>
>         return 0;
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 6750980d9f30..81fddf9cc58f 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -846,6 +846,12 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>                                        tun->dev, tfile->queue_index);
>                 if (err < 0)
>                         goto out;
> +               err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
> +                                                MEM_TYPE_PAGE_SHARED, NULL);
> +               if (err < 0) {
> +                       xdp_rxq_info_unreg(&tfile->xdp_rxq);
> +                       goto out;
> +               }
>                 err = 0;
>         }
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6c4220450506..48c86accd3b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1312,6 +1312,13 @@ static int virtnet_open(struct net_device *dev)
>                 if (err < 0)
>                         return err;
>
> +               err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq,
> +                                                MEM_TYPE_PAGE_SHARED, NULL);
> +               if (err < 0) {
> +                       xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
> +                       return err;
> +               }
> +
>                 virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
>                 virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
>         }
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index bc0cb97e20dc..859aa9b737fe 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -41,7 +41,7 @@ enum mem_type {
>
>  struct xdp_mem_info {
>         u32 type; /* enum mem_type, but known size type */
> -       /* u32 id; will be added later in this patchset */
> +       u32 id;
>  };
>
>  struct xdp_rxq_info {
> @@ -100,18 +100,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
>         return xdp_frame;
>  }
>
> -static inline
> -void xdp_return_frame(void *data, struct xdp_mem_info *mem)
> -{
> -       if (mem->type == MEM_TYPE_PAGE_SHARED)
> -               page_frag_free(data);
> -
> -       if (mem->type == MEM_TYPE_PAGE_ORDER0) {
> -               struct page *page = virt_to_page(data); /* Assumes order0 page*/
> -
> -               put_page(page);
> -       }
> -}
> +void xdp_return_frame(void *data, struct xdp_mem_info *mem);
>
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
>                      struct net_device *dev, u32 queue_index);
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 9eee0c431126..06a5b39491ad 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -5,6 +5,9 @@
>   */
>  #include <linux/types.h>
>  #include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/idr.h>
> +#include <linux/rhashtable.h>
>
>  #include <net/xdp.h>
>
> @@ -13,6 +16,99 @@
>  #define REG_STATE_UNREGISTERED 0x2
>  #define REG_STATE_UNUSED       0x3
>
> +DEFINE_IDA(mem_id_pool);
> +static DEFINE_MUTEX(mem_id_lock);
> +#define MEM_ID_MAX 0xFFFE
> +#define MEM_ID_MIN 1
> +static int mem_id_next = MEM_ID_MIN;
> +
> +static bool mem_id_init; /* false */
> +static struct rhashtable *mem_id_ht;
> +
> +struct xdp_mem_allocator {
> +       struct xdp_mem_info mem;
> +       void *allocator;
> +       struct rhash_head node;
> +       struct rcu_head rcu;
> +};
> +
> +static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed)
> +{
> +       const u32 *k = data;
> +       const u32 key = *k;
> +
> +       BUILD_BUG_ON(FIELD_SIZEOF(struct xdp_mem_allocator, mem.id)
> +                    != sizeof(u32));
> +
> +       /* Use cyclic increasing ID as direct hash key, see rht_bucket_index */
> +       return key << RHT_HASH_RESERVED_SPACE;
> +}
> +
> +static int xdp_mem_id_cmp(struct rhashtable_compare_arg *arg,
> +                         const void *ptr)
> +{
> +       const struct xdp_mem_allocator *xa = ptr;
> +       u32 mem_id = *(u32 *)arg->key;
> +
> +       return xa->mem.id != mem_id;
> +}
> +
> +static const struct rhashtable_params mem_id_rht_params = {
> +       .nelem_hint = 64,
> +       .head_offset = offsetof(struct xdp_mem_allocator, node),
> +       .key_offset  = offsetof(struct xdp_mem_allocator, mem.id),
> +       .key_len = FIELD_SIZEOF(struct xdp_mem_allocator, mem.id),
> +       .max_size = MEM_ID_MAX,
> +       .min_size = 8,
> +       .automatic_shrinking = true,
> +       .hashfn    = xdp_mem_id_hashfn,
> +       .obj_cmpfn = xdp_mem_id_cmp,
> +};
> +
> +void __xdp_mem_allocator_rcu_free(struct rcu_head *rcu)
> +{
> +       struct xdp_mem_allocator *xa;
> +
> +       xa = container_of(rcu, struct xdp_mem_allocator, rcu);
> +
> +       /* Allow this ID to be reused */
> +       ida_simple_remove(&mem_id_pool, xa->mem.id);
> +
> +       /* TODO: Depending on allocator type/pointer free resources */
> +
> +       /* Poison memory */
> +       xa->mem.id = 0xFFFF;
> +       xa->mem.type = 0xF0F0;
> +       xa->allocator = (void *)0xDEAD9001;
> +
> +       kfree(xa);
> +}
> +
> +void __xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
> +{
> +       struct xdp_mem_allocator *xa;
> +       int id = xdp_rxq->mem.id;
> +       int err;
> +
> +       if (id == 0)
> +               return;
> +
> +       mutex_lock(&mem_id_lock);
> +
> +       xa = rhashtable_lookup(mem_id_ht, &id, mem_id_rht_params);
> +       if (!xa) {
> +               mutex_unlock(&mem_id_lock);
> +               return;
> +       }
> +
> +       err = rhashtable_remove_fast(mem_id_ht, &xa->node, mem_id_rht_params);
> +       WARN_ON(err);
> +
> +       call_rcu(&xa->rcu, __xdp_mem_allocator_rcu_free);
> +
> +       mutex_unlock(&mem_id_lock);
> +}
> +
>  void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq)
>  {
>         /* Simplify driver cleanup code paths, allow unreg "unused" */
> @@ -21,8 +117,14 @@ void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq)
>
>         WARN(!(xdp_rxq->reg_state == REG_STATE_REGISTERED), "Driver BUG");
>
> +       __xdp_rxq_info_unreg_mem_model(xdp_rxq);
> +
>         xdp_rxq->reg_state = REG_STATE_UNREGISTERED;
>         xdp_rxq->dev = NULL;
> +
> +       /* Reset mem info to defaults */
> +       xdp_rxq->mem.id = 0;
> +       xdp_rxq->mem.type = 0;
>  }
>  EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg);
>
> @@ -72,20 +174,138 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq)
>  }
>  EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg);
>
> +int __mem_id_init_hash_table(void)
> +{
> +       struct rhashtable *rht;
> +       int ret;
> +
> +       if (unlikely(mem_id_init))
> +               return 0;
> +
> +       rht = kzalloc(sizeof(*rht), GFP_KERNEL);
> +       if (!rht)
> +               return -ENOMEM;
> +
> +       ret = rhashtable_init(rht, &mem_id_rht_params);
> +       if (ret < 0) {
> +               kfree(rht);
> +               return ret;
> +       }
> +       mem_id_ht = rht;
> +       smp_mb(); /* mutex lock should provide enough pairing */
> +       mem_id_init = true;
> +
> +       return 0;
> +}
> +
> +/* Allocate a cyclic ID that maps to allocator pointer.
> + * See: https://www.kernel.org/doc/html/latest/core-api/idr.html
> + *
> + * Caller must lock mem_id_lock.
> + */
> +static int __mem_id_cyclic_get(gfp_t gfp)
> +{
> +       int retries = 1;
> +       int id;
> +
> +again:
> +       id = ida_simple_get(&mem_id_pool, mem_id_next, MEM_ID_MAX, gfp);
> +       if (id < 0) {
> +               if (id == -ENOSPC) {
> +                       /* Cyclic allocator, reset next id */
> +                       if (retries--) {
> +                               mem_id_next = MEM_ID_MIN;
> +                               goto again;
> +                       }
> +               }
> +               return id; /* errno */
> +       }
> +       mem_id_next = id + 1;
> +
> +       return id;
> +}
> +
>  int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>                                enum mem_type type, void *allocator)
>  {
> +       struct xdp_mem_allocator *xdp_alloc;
> +       gfp_t gfp = GFP_KERNEL;
> +       int id, errno, ret;
> +       void *ptr;
> +
> +       if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
> +               WARN(1, "Missing register, driver bug");
> +               return -EFAULT;
> +       }
> +
>         if (type >= MEM_TYPE_MAX)
>                 return -EINVAL;
>
>         xdp_rxq->mem.type = type;
>
> -       if (allocator)
> -               return -EOPNOTSUPP;
> +       if (!allocator)
> +               return 0;
> +
> +       /* Delay init of rhashtable to save memory if feature isn't used */
> +       if (!mem_id_init) {
> +               mutex_lock(&mem_id_lock);
> +               ret = __mem_id_init_hash_table();
> +               mutex_unlock(&mem_id_lock);
> +               if (ret < 0) {
> +                       WARN_ON(1);
> +                       return ret;
> +               }
> +       }
> +
> +       xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
> +       if (!xdp_alloc)
> +               return -ENOMEM;
> +
> +       mutex_lock(&mem_id_lock);
> +       id = __mem_id_cyclic_get(gfp);
> +       if (id < 0) {
> +               errno = id;
> +               goto err;
> +       }
> +       xdp_rxq->mem.id = id;
> +       xdp_alloc->mem  = xdp_rxq->mem;
> +       xdp_alloc->allocator = allocator;
> +
> +       /* Insert allocator into ID lookup table */
> +       ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
> +       if (IS_ERR(ptr)) {
> +               errno = PTR_ERR(ptr);
> +               goto err;
> +       }
> +
> +       mutex_unlock(&mem_id_lock);
>
> -       /* TODO: Allocate an ID that maps to allocator pointer
> -        * See: https://www.kernel.org/doc/html/latest/core-api/idr.html
> -        */
>         return 0;
> +err:
> +       mutex_unlock(&mem_id_lock);
> +       kfree(xdp_alloc);
> +       return errno;
>  }
>  EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
> +
> +void xdp_return_frame(void *data, struct xdp_mem_info *mem)
> +{
> +       struct xdp_mem_allocator *xa;
> +
> +       rcu_read_lock();
> +       if (mem->id)
> +               xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> +       rcu_read_unlock();
> +
> +       if (mem->type == MEM_TYPE_PAGE_SHARED) {
> +               page_frag_free(data);
> +               return;
> +       }
> +
> +       if (mem->type == MEM_TYPE_PAGE_ORDER0) {
> +               struct page *page = virt_to_page(data); /* Assumes order0 page*/
> +
> +               put_page(page);
> +       }
> +}
> +EXPORT_SYMBOL_GPL(xdp_return_frame);
>

I'm not sure what the point is of getting the xa value if it is not
going to be used. Also I would assume there are types that won't even
need the hash table lookup. I would prefer to see this bit held off on
until you have something that actually needs it.

^ permalink raw reply

* Re: [PATCH net-next v2] tc-testing: add selftests for 'bpf' action
From: David Miller @ 2018-03-23 16:59 UTC (permalink / raw)
  To: dcaratti; +Cc: mrv, bjb, jhs, netdev
In-Reply-To: <678d3ddd35780ade1c67c33de84f548858f45f15.1521741967.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Thu, 22 Mar 2018 19:12:19 +0100

> Test d959: Add cBPF action with valid bytecode
> Test f84a: Add cBPF action with invalid bytecode
> Test e939: Add eBPF action with valid object-file
> Test 282d: Add eBPF action with invalid object-file
> Test d819: Replace cBPF bytecode and action control
> Test 6ae3: Delete cBPF action
> Test 3e0d: List cBPF actions
> Test 55ce: Flush BPF actions
> Test ccc3: Add cBPF action with duplicate index
> Test 89c7: Add cBPF action with invalid index
> Test 7ab9: Add cBPF action with cookie
> 
> Changes since v1:
>  - use index=2^32-1 in test ccc3, add tests 7a89, 89c7 (thanks Roman Mashak)
>  - added test 282d
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied, thanks Davide.

^ permalink raw reply

* Re: [PATCH net-next] mlxsw: spectrum_span: Fix initialization of struct mlxsw_sp_span_parms
From: David Miller @ 2018-03-23 17:00 UTC (permalink / raw)
  To: idosch; +Cc: netdev, petrm, jiri, mlxsw
In-Reply-To: <20180322181447.27554-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Thu, 22 Mar 2018 20:14:47 +0200

> From: Petr Machata <petrm@mellanox.com>
> 
> Since the first element of struct mlxsw_sp_span_parms is a pointer,
> to zero-initialize this structure the correct notation is not = {0}, but
> rather = {NULL}, as reported by sparse.
> 
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/2] Converting pernet_operations (part #11)
From: David Miller @ 2018-03-23 17:01 UTC (permalink / raw)
  To: ktkhai; +Cc: yoshfuji, dhowells, netdev
In-Reply-To: <152174314653.22875.17369758119928369672.stgit@localhost.localdomain>

From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Thu, 22 Mar 2018 21:34:37 +0300

> this series continues to review and to convert pernet_operations
> to make them possible to be executed in parallel for several
> net namespaces at the same time.
> 
> I thought last series was last, but there is one
> new pernet_operations came to kernel. This is
> udp_sysctl_ops, and here we convert it.
> 
> Also, David Howells acked rxrpc_net_ops, so I resend
> the patch in case of it should be queued by patchwork:
> 
> https://www.spinics.net/lists/netdev/msg490678.html

Series applied, thank you.

^ permalink raw reply

* Re: [bpf-next V5 PATCH 12/15] xdp: allow page_pool as an allocator type in xdp_return_frame
From: Alexander Duyck @ 2018-03-23 17:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
	Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
	Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <152180753987.20167.7085747286116186761.stgit@firesoul>

On Fri, Mar 23, 2018 at 5:18 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> New allocator type MEM_TYPE_PAGE_POOL for page_pool usage.
>
> The registered allocator page_pool pointer is not available directly
> from xdp_rxq_info, but it could be (if needed).  For now, the driver
> should keep separate track of the page_pool pointer, which it should
> use for RX-ring page allocation.
>
> As suggested by Saeed, to maintain a symmetric API it is the drivers
> responsibility to allocate/create and free/destroy the page_pool.
> Thus, after the driver have called xdp_rxq_info_unreg(), it is drivers
> responsibility to free the page_pool, but with a RCU free call.  This
> is done easily via the page_pool helper page_pool_destroy_rcu() (which
> avoids touching any driver code during the RCU callback, which could
> happen after the driver have been unloaded).
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/net/xdp.h |    3 +++
>  net/core/xdp.c    |   23 ++++++++++++++++++++---
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 859aa9b737fe..98b55eaf8fd7 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -36,6 +36,7 @@
>  enum mem_type {
>         MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
>         MEM_TYPE_PAGE_ORDER0,     /* Orig XDP full page model */
> +       MEM_TYPE_PAGE_POOL,
>         MEM_TYPE_MAX,
>  };
>
> @@ -44,6 +45,8 @@ struct xdp_mem_info {
>         u32 id;
>  };
>
> +struct page_pool;
> +
>  struct xdp_rxq_info {
>         struct net_device *dev;
>         u32 queue_index;
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 06a5b39491ad..fe8e87abc266 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -8,6 +8,7 @@
>  #include <linux/slab.h>
>  #include <linux/idr.h>
>  #include <linux/rhashtable.h>
> +#include <net/page_pool.h>
>
>  #include <net/xdp.h>
>
> @@ -27,7 +28,10 @@ static struct rhashtable *mem_id_ht;
>
>  struct xdp_mem_allocator {
>         struct xdp_mem_info mem;
> -       void *allocator;
> +       union {
> +               void *allocator;
> +               struct page_pool *page_pool;
> +       };
>         struct rhash_head node;
>         struct rcu_head rcu;
>  };
> @@ -74,7 +78,9 @@ void __xdp_mem_allocator_rcu_free(struct rcu_head *rcu)
>         /* Allow this ID to be reused */
>         ida_simple_remove(&mem_id_pool, xa->mem.id);
>
> -       /* TODO: Depending on allocator type/pointer free resources */
> +       /* Notice, driver is expected to free the *allocator,
> +        * e.g. page_pool, and MUST also use RCU free.
> +        */
>
>         /* Poison memory */
>         xa->mem.id = 0xFFFF;
> @@ -290,11 +296,21 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
>
>  void xdp_return_frame(void *data, struct xdp_mem_info *mem)
>  {
> -       struct xdp_mem_allocator *xa;
> +       struct xdp_mem_allocator *xa = NULL;
>
>         rcu_read_lock();
>         if (mem->id)
>                 xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> +
> +       if (mem->type == MEM_TYPE_PAGE_POOL) {
> +               struct page *page = virt_to_head_page(data);
> +
> +               if (xa)
> +                       page_pool_put_page(xa->page_pool, page);
> +               else
> +                       put_page(page);
> +               return;
> +       }

So at this point there need to be a few changes.

I would suggest doing a switch based on mem->type instead of this mess
of if statements.

Also the PAGE_POOL is the only one that needs the reference to the
memory allocator. You should move all of the logic for accessing the
hash table into the section specific to the page pool with a
fall-through into the other paths.

>         rcu_read_unlock();
>
>         if (mem->type == MEM_TYPE_PAGE_SHARED) {
> @@ -306,6 +322,7 @@ void xdp_return_frame(void *data, struct xdp_mem_info *mem)
>                 struct page *page = virt_to_page(data); /* Assumes order0 page*/
>
>                 put_page(page);
> +               return;

Why add a return here? This was the end of the function already wasn't it?

>         }
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame);
>

^ permalink raw reply

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: David Miller @ 2018-03-23 17:04 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel
In-Reply-To: <4bd9ccd2-df8f-acad-2513-eefe065dc852@codeaurora.org>

From: Sinan Kaya <okaya@codeaurora.org>
Date: Fri, 23 Mar 2018 12:51:47 -0400

> It could if txdata->tx_db was not a union. There is a data dependency
> between txdata->tx_db.data.prod and txdata->tx_db.raw. 
> 
> So, no reordering.

I don't see it that way, the code requires that:

 	txdata->tx_db.data.prod += nbd;

is visible before the doorbell update.

barrier() doesn't provide that.

Neither does writel_relaxed().  However plain writel() does.

Therefore the code is only correct as-is, and your change potentially
adds a reordering problem.

^ permalink raw reply

* Re: [PATCH v2] net/mlx5: Fix use-after-free
From: David Miller @ 2018-03-23 17:06 UTC (permalink / raw)
  To: gustavo
  Cc: yuval.shaia, ilant, borisp, saeedm, matanb, leon, netdev,
	linux-rdma, linux-kernel
In-Reply-To: <20180322184456.GA22259@embeddedgus>

From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Thu, 22 Mar 2018 13:44:56 -0500

> _rule_ is being freed and then dereferenced by accessing rule->ctx
> 
> Fix this by copying the value returned by PTR_ERR(rule->ctx) into a local
> variable for its safe use after freeing _rule_
> 
> Addresses-Coverity-ID: 1466041 ("Read from pointer after free")
> Fixes: 05564d0ae075 ("net/mlx5: Add flow-steering commands for FPGA IPSec implementation")
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> ---
> Changes in v2:
>  - Use a short subject prefix as suggested by Yuval Shaia.
>  - Add Yuval's Reviewed-by.

Applied to net-next.

Thank you.

^ permalink raw reply

* Re: [patch net-next RFC 04/12] dsa: set devlink port attrs for dsa ports
From: Florian Fainelli @ 2018-03-23 17:09 UTC (permalink / raw)
  To: Jiri Pirko, Andrew Lunn
  Cc: netdev, davem, idosch, jakub.kicinski, mlxsw, vivien.didelot,
	michael.chan, ganeshgr, saeedm, simon.horman,
	pieter.jansenvanvuuren, john.hurley, dirk.vandermerwe,
	alexander.h.duyck, ogerlitz, dsahern, vijaya.guvva,
	satananda.burla, raghu.vatsavayi, felix.manlunas, gospo,
	sathya.perla, vasundhara-v.volam, tariqt, eranbe,
	jeffrey.t.kirsher
In-Reply-To: <20180323144914.GA2125@nanopsycho>

On 03/23/2018 07:49 AM, Jiri Pirko wrote:
> Fri, Mar 23, 2018 at 02:30:02PM CET, andrew@lunn.ch wrote:
>> On Thu, Mar 22, 2018 at 11:55:14AM +0100, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Set the attrs and allow to expose port flavour to user via devlink.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  net/dsa/dsa2.c | 23 +++++++++++++++++++++++
>>>  1 file changed, 23 insertions(+)
>>>
>>> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
>>> index adf50fbc4c13..49453690696d 100644
>>> --- a/net/dsa/dsa2.c
>>> +++ b/net/dsa/dsa2.c
>>> @@ -270,7 +270,27 @@ static int dsa_port_setup(struct dsa_port *dp)
>>>  	case DSA_PORT_TYPE_UNUSED:
>>>  		break;
>>>  	case DSA_PORT_TYPE_CPU:
>>> +		/* dp->index is used now as port_number. However
>>> +		 * CPU ports should have separate numbering
>>> +		 * independent from front panel port numbers.
>>> +		 */
>>> +		devlink_port_attrs_set(&dp->devlink_port,
>>> +				       DEVLINK_PORT_FLAVOUR_CPU,
>>> +				       dp->index, false, 0);
>>> +		err = dsa_port_link_register_of(dp);
>>> +		if (err) {
>>> +			dev_err(ds->dev, "failed to setup link for port %d.%d\n",
>>> +				ds->index, dp->index);
>>> +			return err;
>>> +		}
>>
>> Ah, i get it. These used to be two case statements with one code
>> block. But you split them apart, so needed to duplicate the
>> dsa_port_link_register.
>>
>> Unfortunately, you forgot to add a 'break;', so it still falls
>> through, and overwrites the port flavour to DSA.
> 
> ah, crap. Don't have hw to test this :/
> Will fix. Thanks!

You don't need hardware, there is drivers/net/dsa/dsa_loop.c which will
emulate a DSA switch. It won't create interconnect ports, since only one
switch can be created with the method chosen, but this would have helped
you catch the missing break since the "CPU" port would have been
displayed as "DSA" anyway.

If you need hardware, I am sure this can be somehow arranged. By that, I
mean something on which you can run upstream Linux on without out of
tree patches.
-- 
Florian

^ permalink raw reply

* Re: [bpf-next V5 PATCH 05/15] xdp: introduce a new xdp_frame type
From: Alexander Duyck @ 2018-03-23 17:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
	Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
	Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <152180750428.20167.335753789470211325.stgit@firesoul>

On Fri, Mar 23, 2018 at 5:18 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> This is needed to convert drivers tuntap and virtio_net.
>
> This is a generalization of what is done inside cpumap, which will be
> converted later.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/net/xdp.h |   40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 1ee154fe0be6..13f71a15c79f 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -59,6 +59,46 @@ struct xdp_buff {
>         struct xdp_rxq_info *rxq;
>  };
>
> +struct xdp_frame {
> +       void *data;
> +       u16 len;
> +       u16 headroom;
> +       u16 metasize;
> +       /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
> +        * while mem info is valid on remote CPU.
> +        */
> +       struct xdp_mem_info mem;
> +};
> +
> +/* Convert xdp_buff to xdp_frame */
> +static inline
> +struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
> +{
> +       struct xdp_frame *xdp_frame;
> +       int metasize;
> +       int headroom;
> +
> +       /* Assure headroom is available for storing info */
> +       headroom = xdp->data - xdp->data_hard_start;
> +       metasize = xdp->data - xdp->data_meta;
> +       metasize = metasize > 0 ? metasize : 0;
> +       if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
> +               return NULL;
> +
> +       /* Store info in top of packet */
> +       xdp_frame = xdp->data_hard_start;
> +

I thought the point of this stuff was supposed to be performance. This
is going to hurt. You are adding yet another cacheline to the number
of lines accessed to process the frame.

I would like to know how of a slow down is being added after you apply
this to the Tx path of ixgbe. I suspect there should be a noticeable
difference for your basic xmit loopback test.

> +       xdp_frame->data = xdp->data;
> +       xdp_frame->len  = xdp->data_end - xdp->data;
> +       xdp_frame->headroom = headroom - sizeof(*xdp_frame);
> +       xdp_frame->metasize = metasize;
> +
> +       /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
> +       xdp_frame->mem = xdp->rxq->mem;
> +
> +       return xdp_frame;
> +}
> +
>  static inline
>  void xdp_return_frame(void *data, struct xdp_mem_info *mem)
>  {
>

^ permalink raw reply

* Re: [net-next 0/8] tipc: introduce 128-bit auto-configurable node id
From: David Miller @ 2018-03-23 17:12 UTC (permalink / raw)
  To: jon.maloy; +Cc: netdev, tipc-discussion, mohan.krishna.ghanta.krishnamurthy
In-Reply-To: <1521747772-7727-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 22 Mar 2018 20:42:44 +0100

> We introduce a 128-bit free-format node identity as an alternative to
> the legacy <Zone.Cluster.Node> structured 32-bit node address.
> 
> We also make configuration of this identity optional; if a bearer is
> enabled without a pre-configured node id it will be set automatically
> based on the used interface's MAC or IP address.

Looks good, series applied, thanks Jon.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 17:13 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel
In-Reply-To: <20180323.130418.2223623186761161723.davem@davemloft.net>

On 3/23/2018 1:04 PM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Fri, 23 Mar 2018 12:51:47 -0400
> 
>> It could if txdata->tx_db was not a union. There is a data dependency
>> between txdata->tx_db.data.prod and txdata->tx_db.raw. 
>>
>> So, no reordering.
> 
> I don't see it that way, the code requires that:
> 
>  	txdata->tx_db.data.prod += nbd;
> 
> is visible before the doorbell update.> 
> barrier() doesn't provide that.
> 
> Neither does writel_relaxed().  However plain writel() does.

Correct for some architectures including ARM but not correct universally.

writel() just guarantees register read/writes before and after to be
ordered when HW observes it. 

writel() doesn't guarantee that the memory update is visible to the HW
on all architectures.

If you need memory update visibility, that barrier() should have been a
wmb()

A correct multi-arch pattern is

wmb()
writel_relaxed()
mmiowb()

We have decided to drop a similar patch on Infiniband due to incorrect
usage of barrier(). If you feel strongly about it, I can remove the
DOORBELL change.

> 
> Therefore the code is only correct as-is, and your change potentially
> adds a reordering problem.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: David Miller @ 2018-03-23 17:16 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel
In-Reply-To: <b4553d26-b326-15be-dcac-12594ea7c5b5@codeaurora.org>

From: Sinan Kaya <okaya@codeaurora.org>
Date: Fri, 23 Mar 2018 13:13:46 -0400

> We have decided to drop a similar patch on Infiniband due to incorrect
> usage of barrier(). If you feel strongly about it, I can remove the
> DOORBELL change.

Either remove the DOORBELL change or properly adjust the barrier() to
be a wmb().

Be sure to mention this in the commit message, explicitly.

Thank you.

^ permalink raw reply

* Re: [PATCH] dpaa_eth: use true and false for boolean values
From: David Miller @ 2018-03-23 17:18 UTC (permalink / raw)
  To: gustavo; +Cc: madalin.bucur, netdev, linux-kernel
In-Reply-To: <20180322195927.GA26854@embeddedgus>

From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Thu, 22 Mar 2018 14:59:27 -0500

> Assign true or false to boolean variables instead of an integer value.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Applied.

^ permalink raw reply

* Re: [PATCH] qed: Use true and false for boolean values
From: David Miller @ 2018-03-23 17:18 UTC (permalink / raw)
  To: gustavo; +Cc: Ariel.Elior, everest-linux-l2, netdev, linux-kernel
In-Reply-To: <20180322200849.GA28033@embeddedgus>

From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Thu, 22 Mar 2018 15:08:49 -0500

> Assign true or false to boolean variables instead of an integer value.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Applied.

^ permalink raw reply

* Re: [RFC PATCH net-next] net: hns3: hclge_inform_reset_assert_to_vf() can be static
From: David Miller @ 2018-03-23 17:18 UTC (permalink / raw)
  To: fengguang.wu
  Cc: salil.mehta, kbuild-all, netdev, yisen.zhuang, lipeng321,
	liangfuyun1, shenjian15, linyunsheng, linux-kernel
In-Reply-To: <20180322213107.GA29844@roam>

From: kbuild test robot <fengguang.wu@intel.com>
Date: Fri, 23 Mar 2018 05:31:07 +0800

> 
> Fixes: 2bfbd35d8ecd ("net: hns3: Changes required in PF mailbox to support VF reset")
> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] udp6: set dst cache for a connected sk before udp_v6_send_skb
From: Alexey Kodanev @ 2018-03-23 17:13 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David Miller
In-Reply-To: <538ef522-619b-becd-7391-0770d19fa33d@gmail.com>

On 03/23/2018 06:50 PM, Eric Dumazet wrote:
> 
> 
> On 03/23/2018 07:39 AM, Alexey Kodanev wrote:
>> After commit 33c162a980fe ("ipv6: datagram: Update dst cache of a
>> connected datagram sk during pmtu update"), when the error occurs on
>> sending datagram in udpv6_sendmsg() due to ICMPV6_PKT_TOOBIG type,
>> error handler can trigger the following path and call ip6_dst_store():
>>
>>     udpv6_err()
>>         ip6_sk_update_pmtu()
>>             ip6_datagram_dst_update()
>>                 ip6_dst_lookup_flow(), can create a RTF_CACHE clone
>>                 ...
>>                 ip6_dst_store()
>>
>> It can happen before a connected UDP socket invokes ip6_dst_store()
>> in the end of udpv6_sendmsg(), on destination release, as a result,
>> the last one changes dst to the old one, preventing getting updated
>> dst cache on the next udpv6_sendmsg() call.
>>
>> This patch moves ip6_dst_store() in udpv6_sendmsg(), so that it is
>> invoked after ip6_sk_dst_lookup_flow() and before udp_v6_send_skb().
>>
> 
> 
> A Fixes: tag would be nice, for automatic tools (and humans as well)

Hi Eric,

I see, will add the commit 33c162a980fe.

...
>>  
>> +	if (connected)
>> +		ip6_dst_store(sk, dst,
>> +			      ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr) ?
>> +			      &sk->sk_v6_daddr : NULL,
>> +#ifdef CONFIG_IPV6_SUBTREES
>> +			      ipv6_addr_equal(&fl6.saddr, &np->saddr) ?
>> +			      &np->saddr :
>> +#endif
>> +			      NULL);
>> +
> 
> What about the MSG_CONFIRM stuff ?
> 
>>  	if (msg->msg_flags&MSG_CONFIRM)
>>  		goto do_confirm;
>>  back_from_confirm:
> 
> Should not you move the above code here instead ?

Ah, you are right, it can release that dst if it go to "do_confirm".

> > Also ip6_dst_store() does not increment dst refcount.
> 
> I fear that as soon as dst is visible to other cpus, it might be stolen.
> 

So we should pass dst_clone(dst) to ip6_dst_store() instead of dst,
because udpv6_err() can release it if it's set the new one.

Then, I guess, we could left udpv6_sendmsg()/ip6_dst_store() where it
is now in the patch and remove the check for "connected" before dst_relase(),
similar to udp_sendmsg(), right?

Thanks,
Alexey

^ permalink raw reply

* Re: [PATCH 1/2] net: phy: intel-xway: add VR9 version number
From: David Miller @ 2018-03-23 17:20 UTC (permalink / raw)
  To: dev; +Cc: andrew, f.fainelli, hauke, netdev, linux-kernel
In-Reply-To: <1521757899-27813-1-git-send-email-dev@kresin.me>

From: Mathias Kresin <dev@kresin.me>
Date: Thu, 22 Mar 2018 23:31:38 +0100

> The VR9 phy ids are matching only for the SoC version 1.2. Rename the
> macros and change the names to take this into account.
> 
> Signed-off-by: Mathias Kresin <dev@kresin.me>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] net: phy: intel-xway: add VR9 v1.1 phy ids
From: David Miller @ 2018-03-23 17:20 UTC (permalink / raw)
  To: dev; +Cc: andrew, f.fainelli, hauke, netdev, linux-kernel
In-Reply-To: <1521757899-27813-2-git-send-email-dev@kresin.me>

From: Mathias Kresin <dev@kresin.me>
Date: Thu, 22 Mar 2018 23:31:39 +0100

> The phys embedded into the v1.1 of the VR9 SoC are using different phy
> ids. Add the phy ids to use the driver for this VR9 version as well.
> 
> Signed-off-by: Mathias Kresin <dev@kresin.me>

Also applied, thank you.

^ permalink raw reply

* Re: [GIT] 'net' merged into 'net-next'
From: Jason Gunthorpe @ 2018-03-23 17:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dledford, idosch, dsahern, sd
In-Reply-To: <20180323.114059.297397691681485266.davem@davemloft.net>

On Fri, Mar 23, 2018 at 11:40:59AM -0400, David Miller wrote:
> 
> This merge was a little bit more hectic than usual.
> 
> But thankfully, I had some sample conflict resolutions to work
> with, in particular for the mlx5 infiniband changes which were
> the most difficult to resolve.
> 
> Please double check my work and provide any fixup patches if
> necessary.

The drivers/infiniband looks OK, and I also checked that merging
netdev and rdma together gets us to the right result.

Jason

^ permalink raw reply

* Re: [bpf-next V5 PATCH 14/15] xdp: transition into using xdp_frame for return API
From: Alexander Duyck @ 2018-03-23 17:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
	Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
	Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <152180755002.20167.11254036418774114364.stgit@firesoul>

On Fri, Mar 23, 2018 at 5:19 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> Changing API xdp_return_frame() to take struct xdp_frame as argument,
> seems like a natural choice. But there are some subtle performance
> details here that needs extra care, which is a deliberate choice.
>
> When de-referencing xdp_frame on a remote CPU during DMA-TX
> completion, result in the cache-line is change to "Shared"
> state. Later when the page is reused for RX, then this xdp_frame
> cache-line is written, which change the state to "Modified".
>
> This situation already happens (naturally) for, virtio_net, tun and
> cpumap as the xdp_frame pointer is the queued object.  In tun and
> cpumap, the ptr_ring is used for efficiently transferring cache-lines
> (with pointers) between CPUs. Thus, the only option is to
> de-referencing xdp_frame.
>
> It is only the ixgbe driver that had an optimization, in which it can
> avoid doing the de-reference of xdp_frame.  The driver already have
> TX-ring queue, which (in case of remote DMA-TX completion) have to be
> transferred between CPUs anyhow.  In this data area, we stored a
> struct xdp_mem_info and a data pointer, which allowed us to avoid
> de-referencing xdp_frame.
>
> To compensate for this, a prefetchw is used for telling the cache
> coherency protocol about our access pattern.  My benchmarks show that
> this prefetchw is enough to compensate the ixgbe driver.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

I'm really not a fan of this patch. It seems like it is adding a ton
of overhead for one case that is going to penalize all of the other
use cases for XDP. Just the extra prefetch is going to have a
significant impact on things like the XDP_DROP case.

> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h        |    4 +---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |   17 +++++++++++------
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |    1 +
>  drivers/net/tun.c                               |    4 ++--
>  drivers/net/virtio_net.c                        |    2 +-
>  include/net/xdp.h                               |    2 +-
>  kernel/bpf/cpumap.c                             |    6 +++---
>  net/core/xdp.c                                  |    4 +++-
>  8 files changed, 23 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index cbc20f199364..dfbc15a45cb4 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -240,8 +240,7 @@ struct ixgbe_tx_buffer {
>         unsigned long time_stamp;
>         union {
>                 struct sk_buff *skb;
> -               /* XDP uses address ptr on irq_clean */
> -               void *data;
> +               struct xdp_frame *xdpf;
>         };
>         unsigned int bytecount;
>         unsigned short gso_segs;
> @@ -249,7 +248,6 @@ struct ixgbe_tx_buffer {
>         DEFINE_DMA_UNMAP_ADDR(dma);
>         DEFINE_DMA_UNMAP_LEN(len);
>         u32 tx_flags;
> -       struct xdp_mem_info xdp_mem;
>  };
>
>  struct ixgbe_rx_buffer {

I suppose this makes most of my earlier suggestions pointless, though
I would be interested in seeing what the extra cost is we are taking
for having to initialize one more cache line than we were before to
support the xdp_frame format.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index ff069597fccf..e6e9b28ecfba 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1207,7 +1207,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>
>                 /* free the skb */
>                 if (ring_is_xdp(tx_ring))
> -                       xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem);
> +                       xdp_return_frame(tx_buffer->xdpf);
>                 else
>                         napi_consume_skb(tx_buffer->skb, napi_budget);
>
> @@ -2376,6 +2376,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>                         xdp.data_hard_start = xdp.data -
>                                               ixgbe_rx_offset(rx_ring);
>                         xdp.data_end = xdp.data + size;
> +                       prefetchw(xdp.data_hard_start); /* xdp_frame write */
>
>                         skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
>                 }

Do we really need to be prefetching yet another cache line? This is
going to hurt general performance since for most cases you are now
prefetching a cache line that will never be used.

> @@ -5787,7 +5788,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
>
>                 /* Free all the Tx ring sk_buffs */
>                 if (ring_is_xdp(tx_ring))
> -                       xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem);
> +                       xdp_return_frame(tx_buffer->xdpf);
>                 else
>                         dev_kfree_skb_any(tx_buffer->skb);
>
> @@ -8333,16 +8334,21 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
>         struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
>         struct ixgbe_tx_buffer *tx_buffer;
>         union ixgbe_adv_tx_desc *tx_desc;
> +       struct xdp_frame *xdpf;
>         u32 len, cmd_type;
>         dma_addr_t dma;
>         u16 i;
>
> -       len = xdp->data_end - xdp->data;
> +       xdpf = convert_to_xdp_frame(xdp);
> +       if (unlikely(!xdpf))
> +               return -EOVERFLOW;
> +
> +       len = xdpf->len;
>
>         if (unlikely(!ixgbe_desc_unused(ring)))
>                 return IXGBE_XDP_CONSUMED;
>
> -       dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
> +       dma = dma_map_single(ring->dev, xdpf->data, len, DMA_TO_DEVICE);
>         if (dma_mapping_error(ring->dev, dma))
>                 return IXGBE_XDP_CONSUMED;
>
> @@ -8357,8 +8363,7 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
>
>         dma_unmap_len_set(tx_buffer, len, len);
>         dma_unmap_addr_set(tx_buffer, dma, dma);
> -       tx_buffer->data = xdp->data;
> -       tx_buffer->xdp_mem = xdp->rxq->mem;
> +       tx_buffer->xdpf = xdpf;
>
>         tx_desc->read.buffer_addr = cpu_to_le64(dma);
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 2ac78b88fc3d..00b9b13d9fea 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -896,6 +896,7 @@ struct sk_buff *skb_from_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
>                                       di->addr + wi->offset,
>                                       0, frag_size,
>                                       DMA_FROM_DEVICE);
> +       prefetchw(va); /* xdp_frame data area */
>         prefetch(data);
>         wi->offset += frag_size;
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 81fddf9cc58f..a7e42ae1b220 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -663,7 +663,7 @@ static void tun_ptr_free(void *ptr)
>         if (tun_is_xdp_frame(ptr)) {
>                 struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
>
> -               xdp_return_frame(xdpf->data, &xdpf->mem);
> +               xdp_return_frame(xdpf);
>         } else {
>                 __skb_array_destroy_skb(ptr);
>         }
> @@ -2188,7 +2188,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
>                 struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
>
>                 ret = tun_put_user_xdp(tun, tfile, xdpf, to);
> -               xdp_return_frame(xdpf->data, &xdpf->mem);
> +               xdp_return_frame(xdpf);
>         } else {
>                 struct sk_buff *skb = ptr;
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 48c86accd3b8..479a80339fad 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -430,7 +430,7 @@ static bool __virtnet_xdp_xmit(struct virtnet_info *vi,
>
>         /* Free up any pending old buffers before queueing new ones. */
>         while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
> -               xdp_return_frame(xdpf_sent->data, &xdpf_sent->mem);
> +               xdp_return_frame(xdpf_sent);
>
>         xdpf = convert_to_xdp_frame(xdp);
>         if (unlikely(!xdpf))
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 98b55eaf8fd7..35aa9825fdd0 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -103,7 +103,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
>         return xdp_frame;
>  }
>
> -void xdp_return_frame(void *data, struct xdp_mem_info *mem);
> +void xdp_return_frame(struct xdp_frame *xdpf);
>
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
>                      struct net_device *dev, u32 queue_index);
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index bcdc4dea5ce7..c95b04ec103e 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -219,7 +219,7 @@ static void __cpu_map_ring_cleanup(struct ptr_ring *ring)
>
>         while ((xdpf = ptr_ring_consume(ring)))
>                 if (WARN_ON_ONCE(xdpf))
> -                       xdp_return_frame(xdpf->data, &xdpf->mem);
> +                       xdp_return_frame(xdpf);
>  }
>
>  static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
> @@ -275,7 +275,7 @@ static int cpu_map_kthread_run(void *data)
>
>                         skb = cpu_map_build_skb(rcpu, xdpf);
>                         if (!skb) {
> -                               xdp_return_frame(xdpf->data, &xdpf->mem);
> +                               xdp_return_frame(xdpf);
>                                 continue;
>                         }
>
> @@ -578,7 +578,7 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu,
>                 err = __ptr_ring_produce(q, xdpf);
>                 if (err) {
>                         drops++;
> -                       xdp_return_frame(xdpf->data, &xdpf->mem);
> +                       xdp_return_frame(xdpf);
>                 }
>                 processed++;
>         }
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index fe8e87abc266..6ed3d73a73be 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -294,9 +294,11 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>  }
>  EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
>
> -void xdp_return_frame(void *data, struct xdp_mem_info *mem)
> +void xdp_return_frame(struct xdp_frame *xdpf)
>  {
>         struct xdp_mem_allocator *xa = NULL;
> +       struct xdp_mem_info *mem = &xdpf->mem;
> +       void *data = xdpf->data;
>
>         rcu_read_lock();
>         if (mem->id)
>

^ permalink raw reply

* Re: [PATCH V3 net-next 01/14] tcp: Add clean acked data hook
From: David Miller @ 2018-03-23 17:29 UTC (permalink / raw)
  To: saeedm; +Cc: netdev, davejwatson, borisp, ilyal, aviadye
In-Reply-To: <20180322223351.31801-2-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 22 Mar 2018 15:33:38 -0700

> +#if IS_ENABLED(CONFIG_TLS_DEVICE)
> +DEFINE_STATIC_KEY_FALSE(clean_acked_data_enabled);
> +EXPORT_SYMBOL_GPL(clean_acked_data_enabled);
> +#endif

Please mark this symbol static (otherwise it's name pollutes the global
name space, "clean_acked" what?  for what subsystem? etc.)

And add two helper functions, icsk_clean_acked_enable() and
icsk_clean_acked_disable() which adjust the static key as
needed.

For the enable helper, you can pass in an icsk and the function
pointer, so that the helper can do the assignment and the
connection between the static key and registering on an icsk
is clear.

^ permalink raw reply

* Re: [PATCH V3 net-next 06/14] net/tls: Add generic NIC offload infrastructure
From: David Miller @ 2018-03-23 17:31 UTC (permalink / raw)
  To: saeedm; +Cc: netdev, davejwatson, borisp, ilyal, aviadye
In-Reply-To: <20180322223351.31801-7-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 22 Mar 2018 15:33:43 -0700

> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> new file mode 100644
> index 000000000000..34555ac0b959
> --- /dev/null
> +++ b/net/tls/tls_device.c
 ...
> +static inline void tls_append_frag(struct tls_record_info *record,
> +				   struct page_frag *pfrag,
> +				   int size)

Do not use the inline keyword in foo.c files, please let the compiler
make that determination.

Please audit this entire submission for this problem.

Thank you.

^ permalink raw reply

* Re: [PATCH net] udp6: set dst cache for a connected sk before udp_v6_send_skb
From: Alexey Kodanev @ 2018-03-23 17:43 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David Miller
In-Reply-To: <6e3bf7fc-572c-9d6f-3d15-85a3b7699f3b@oracle.com>

On 03/23/2018 08:13 PM, Alexey Kodanev wrote:
> On 03/23/2018 06:50 PM, Eric Dumazet wrote:
...
>>> +	if (connected)
>>> +		ip6_dst_store(sk, dst,
>>> +			      ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr) ?
>>> +			      &sk->sk_v6_daddr : NULL,
>>> +#ifdef CONFIG_IPV6_SUBTREES
>>> +			      ipv6_addr_equal(&fl6.saddr, &np->saddr) ?
>>> +			      &np->saddr :
>>> +#endif
>>> +			      NULL);
>>> +
>>
>> What about the MSG_CONFIRM stuff ?
>>
>>>  	if (msg->msg_flags&MSG_CONFIRM)
>>>  		goto do_confirm;
>>>  back_from_confirm:
>>
>> Should not you move the above code here instead ?
> 
> Ah, you are right, it can release that dst if it go to "do_confirm".
> 
>>> Also ip6_dst_store() does not increment dst refcount.
>>
>> I fear that as soon as dst is visible to other cpus, it might be stolen.
>>
> 
> So we should pass dst_clone(dst) to ip6_dst_store() instead of dst,
> because udpv6_err() can release it if it's set the new one.
> 
> Then, I guess, we could left udpv6_sendmsg()/ip6_dst_store() where it
> is now in the patch and remove the check for "connected" before dst_relase(),
> similar to udp_sendmsg(), right?

And the section "release_dst:" looks redundant after that too.

Thanks,
Alexey

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox