Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH v2] net: phy: Added device tree binding for dev-addr and dev-addr code check-up
From: Andrew Lunn @ 2018-03-23 15:44 UTC (permalink / raw)
  To: Vicentiu Galanopulo
  Cc: netdev, linux-kernel, robh+dt, mark.rutland, davem, marcel,
	devicetree, madalin.bucur, alexandru.marginean
In-Reply-To: <20180323150522.9603-1-vicentiu.galanopulo@nxp.com>

> --- a/drivers/of/of_mdio.c
> +++ b/drivers/of/of_mdio.c
> @@ -24,6 +24,8 @@
>  
>  #define DEFAULT_GPIO_RESET_DELAY	10	/* in microseconds */
>  
> +struct phy_c45_device_ids mdio_c45_ids = {0};

You do know that Linux is multi-threaded. It could be probing two MDIO
busses at once.

	Andrew

^ permalink raw reply

* Re: [PATCH net] udp6: set dst cache for a connected sk before udp_v6_send_skb
From: Eric Dumazet @ 2018-03-23 15:50 UTC (permalink / raw)
  To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <1521815989-26412-1-git-send-email-alexey.kodanev@oracle.com>



On 03/23/2018 07:39 AM, Alexey Kodanev wrote:
> After commit 33c162a980fe ("ipv6: datagram: Update dst cache of a
> connected datagram sk during pmtu update"), when the error occurs on
> sending datagram in udpv6_sendmsg() due to ICMPV6_PKT_TOOBIG type,
> error handler can trigger the following path and call ip6_dst_store():
> 
>     udpv6_err()
>         ip6_sk_update_pmtu()
>             ip6_datagram_dst_update()
>                 ip6_dst_lookup_flow(), can create a RTF_CACHE clone
>                 ...
>                 ip6_dst_store()
> 
> It can happen before a connected UDP socket invokes ip6_dst_store()
> in the end of udpv6_sendmsg(), on destination release, as a result,
> the last one changes dst to the old one, preventing getting updated
> dst cache on the next udpv6_sendmsg() call.
> 
> This patch moves ip6_dst_store() in udpv6_sendmsg(), so that it is
> invoked after ip6_sk_dst_lookup_flow() and before udp_v6_send_skb().
> 


A Fixes: tag would be nice, for automatic tools (and humans as well)

> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/ipv6/udp.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 52e3ea0..0d413c6 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1299,6 +1299,16 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  	if (ipc6.hlimit < 0)
>  		ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
>  
> +	if (connected)
> +		ip6_dst_store(sk, dst,
> +			      ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr) ?
> +			      &sk->sk_v6_daddr : NULL,
> +#ifdef CONFIG_IPV6_SUBTREES
> +			      ipv6_addr_equal(&fl6.saddr, &np->saddr) ?
> +			      &np->saddr :
> +#endif
> +			      NULL);
> +

What about the MSG_CONFIRM stuff ?

>  	if (msg->msg_flags&MSG_CONFIRM)
>  		goto do_confirm;
>  back_from_confirm:

Should not you move the above code here instead ?

Also ip6_dst_store() does not increment dst refcount.

I fear that as soon as dst is visible to other cpus, it might be stolen.

> @@ -1350,18 +1360,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  
>  release_dst:
>  	if (dst) {
> -		if (connected) {
> -			ip6_dst_store(sk, dst,
> -				      ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr) ?
> -				      &sk->sk_v6_daddr : NULL,
> -#ifdef CONFIG_IPV6_SUBTREES
> -				      ipv6_addr_equal(&fl6.saddr, &np->saddr) ?
> -				      &np->saddr :
> -#endif
> -				      NULL);
> -		} else {
> +		if (!connected)
>  			dst_release(dst);
> -		}
>  		dst = NULL;
>  	}
>  
> 

^ permalink raw reply

* Re: [patch iproute2] devlink: fix port new monitoring message typo
From: David Ahern @ 2018-03-23 16:11 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: stephen
In-Reply-To: <20180323121913.12598-1-jiri@resnulli.us>

On 3/23/18 6:19 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> s/net/new/
> 
> Fixes: a3c4b484a1ed ("add devlink tool")
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  devlink/devlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next] devlink: Remove top_hierarchy arg for DEVLINK disabled path
From: David Miller @ 2018-03-23 16:14 UTC (permalink / raw)
  To: dsahern; +Cc: netdev
In-Reply-To: <20180323150948.6087-1-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Fri, 23 Mar 2018 08:09:48 -0700

> Earlier change missed the path where CONFIG_NET_DEVLINK is disabled.
> Thanks to Jiri for spotting.
> 
> Fixes: 145307460ba9 ("devlink: Remove top_hierarchy arg to devlink_resource_register")
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied, thanks David.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: Drop NETDEV_UNREGISTER_FINAL
From: Jason Gunthorpe @ 2018-03-23 16:15 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: davem, Michal.Kalderon, Ariel.Elior, dledford, benve, 1dgoodell,
	daniel, jakub.kicinski, ast, edumazet, linux, john.fastabend,
	brouer, dsahern, netdev
In-Reply-To: <152179796956.13076.17482079697536507473.stgit@localhost.localdomain>

On Fri, Mar 23, 2018 at 12:39:33PM +0300, Kirill Tkhai wrote:
> Last user is gone after bdf5bd7f2132 "rds: tcp: remove
> register_netdevice_notifier infrastructure.", so we can
> remove this netdevice command. This allows to delete
> rtnl_lock() in netdev_run_todo(), which is hot path for
> net namespace unregistration.
> 
> dev_change_net_namespace() and netdev_wait_allrefs()
> have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
> and the source commits say they were introduced to
> delemit the call with NETDEV_UNREGISTER, but this patch
> leaves them on the places, since they require additional
> analysis, whether we need in them for something else.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>  drivers/infiniband/hw/qedr/main.c |    4 ++--
>  include/linux/netdevice.h         |    1 -
>  include/rdma/ib_verbs.h           |    4 ++--
>  net/core/dev.c                    |    6 ------
>  4 files changed, 4 insertions(+), 11 deletions(-)

No problem to take the rdma part of this through netdev

Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply

* Re: [PATCH v2 net-next] virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
From: David Miller @ 2018-03-23 16:17 UTC (permalink / raw)
  To: jay.vosburgh; +Cc: netdev, mst, jasowang, ben
In-Reply-To: <23981.1521729761@nyx>

From: Jay Vosburgh <jay.vosburgh@canonical.com>
Date: Thu, 22 Mar 2018 14:42:41 +0000

> 	The operstate update logic will leave an interface in the
> default UNKNOWN operstate if the interface carrier state never changes
> from the default carrier up state set at creation.  This includes the
> case of an explicit call to netif_carrier_on, as the carrier on to on
> transition has no effect on operstate.
> 
> 	This affects virtio-net for the case that the virtio peer does
> not support VIRTIO_NET_F_STATUS (the feature that provides carrier state
> updates).  Without this feature, the virtio specification states that
> "the link should be assumed active," so, logically, the operstate should
> be UP instead of UNKNOWN.  This has impact on user space applications
> that use the operstate to make availability decisions for the interface.
> 
> 	Resolve this by changing the virtio probe logic slightly to call
> netif_carrier_off for both the "with" and "without" VIRTIO_NET_F_STATUS
> cases, and then the existing call to netif_carrier_on for the "without"
> case will cause an operstate transition.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Ben Hutchings <ben@decadent.org.uk>
> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] bridge: Allow max MTU when multiple VLANs present
From: David Miller @ 2018-03-23 16:17 UTC (permalink / raw)
  To: 3chas3; +Cc: netdev, stephen
In-Reply-To: <20180322153406.17760-1-3chas3@gmail.com>

From: Chas Williams <3chas3@gmail.com>
Date: Thu, 22 Mar 2018 11:34:06 -0400

> If the bridge is allowing multiple VLANs, some VLANs may have
> different MTUs.  Instead of choosing the minimum MTU for the
> bridge interface, choose the maximum MTU of the bridge members.
> With this the user only needs to set a larger MTU on the member
> ports that are participating in the large MTU VLANS.
> 
> Signed-off-by: Chas Williams <3chas3@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [bpf-next V5 PATCH 09/15] mlx5: register a memory model when XDP is enabled
From: Sergei Shtylyov @ 2018-03-23 16:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, BjörnTöpel,
	magnus.karlsson
  Cc: eugenia, Jason Wang, John Fastabend, Eran Ben Elisha,
	Saeed Mahameed, galp, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan
In-Reply-To: <152180752460.20167.1098864148126196703.stgit@firesoul>

Hello!

On 03/23/2018 03:18 PM, Jesper Dangaard Brouer wrote:

> Now all the users of ndo_xdp_xmit have been converted to use xdp_return_frame.
> This enable a different memory model, thus activating another code path
> in the xdp_return_frame API.
> 
> V2: Fixed issues pointed out by Tariq.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index da94c8cba5ee..2e4ca0f15b62 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -506,6 +506,14 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>  		rq->mkey_be = c->mkey_be;
>  	}
>  
> +	/* This must only be activate for order-0 pages */

   Activated?

[...]

MBR, Sergei

^ permalink raw reply

* Re: [net-next v2] intel: add SPDX identifiers to all the Intel drivers
From: David Miller @ 2018-03-23 16:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180322170848.21107-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 22 Mar 2018 10:08:48 -0700

> Add the SPDX identifiers to all the Intel wired LAN driver files, as
> outlined in Documentation/process/license-rules.rst.
> 
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> ---
> v2: based on community feedback, went back to the v2.6 version of the
>     SPDX identifier string until the kernel documentation gets updated

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] bridge: Allow max MTU when multiple VLANs present
From: Nikolay Aleksandrov @ 2018-03-23 16:20 UTC (permalink / raw)
  To: David Miller, 3chas3; +Cc: netdev, stephen
In-Reply-To: <20180323.121747.2120157916968005999.davem@davemloft.net>

On 23/03/18 18:17, David Miller wrote:
> From: Chas Williams <3chas3@gmail.com>
> Date: Thu, 22 Mar 2018 11:34:06 -0400
> 
>> If the bridge is allowing multiple VLANs, some VLANs may have
>> different MTUs.  Instead of choosing the minimum MTU for the
>> bridge interface, choose the maximum MTU of the bridge members.
>> With this the user only needs to set a larger MTU on the member
>> ports that are participating in the large MTU VLANS.
>>
>> Signed-off-by: Chas Williams <3chas3@gmail.com>
> 
> Applied, thanks.
> 

Argh, this will break on builds without vlans because br->vlan_enabled shouldn't
be accessed directly. I missed that when reviewing.
I'll send a follow up fix in a second that uses br_vlan_enabled().

^ permalink raw reply

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: David Miller @ 2018-03-23 16:20 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel
In-Reply-To: <1521738603-23596-4-git-send-email-okaya@codeaurora.org>

From: Sinan Kaya <okaya@codeaurora.org>
Date: Thu, 22 Mar 2018 13:10:00 -0400

> Code includes wmb() followed by writel(). writel() already has a
> barrier on some architectures like arm64.
 ...
> @@ -4155,7 +4155,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	txdata->tx_db.data.prod += nbd;
>  	barrier();
>  
> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
>  
>  	mmiowb();
 ...
> @@ -2592,7 +2592,7 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
>  
>  	txdata->tx_db.data.prod += 2;
>  	barrier();
> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);

These are compiler barriers being used here, not wmb().

Look, if I can't see a clear:

	wmb()
	writel()

sequence in the patch hunks, I am going to keep pushing back on
these changes.

Thank you.

^ permalink raw reply

* [PATCH net-next] net: bridge: fix direct access to bridge vlan_enabled and use helper
From: Nikolay Aleksandrov @ 2018-03-23 16:27 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov, roopa, bridge, 3chas3, davem
In-Reply-To: <20180322153406.17760-1-3chas3@gmail.com>

We need to use br_vlan_enabled() helper otherwise we'll break builds
without bridge vlans:
net/bridge//br_if.c: In function ‘br_mtu’:
net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
member named ‘vlan_enabled’
  if (br->vlan_enabled)
        ^
net/bridge//br_if.c:462:1: warning: control reaches end of non-void
function [-Wreturn-type]
 }
 ^
scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
failed

Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 net/bridge/br_if.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 48dc4d2e2be3..87b2afd455c7 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -455,7 +455,7 @@ static int __br_mtu(const struct net_bridge *br, bool (compare_fn)(int, int))
 
 int br_mtu(const struct net_bridge *br)
 {
-	if (br->vlan_enabled)
+	if (br_vlan_enabled(br->dev))
 		return __br_mtu(br, max_mtu);
 	else
 		return __br_mtu(br, min_mtu);
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v2 net-next 0/6] TLS Rx
From: David Miller @ 2018-03-23 16:27 UTC (permalink / raw)
  To: davejwatson
  Cc: tom, alexei.starovoitov, herbert, linux-crypto, netdev, borisp,
	atul.gupta, vakul.garg, hannes, steffen.klassert, john.fastabend,
	daniel
In-Reply-To: <20180322170944.GA67793@GeorgeHnsiPhone.dhcp.thefacebook.com>

From: Dave Watson <davejwatson@fb.com>
Date: Thu, 22 Mar 2018 10:09:44 -0700

> TLS tcp socket RX implementation, to match existing TX code.

Looks great Dave.  Applied, and once my build tests finish I'll push
this out to net-next.

In some of the new functions, reverse christmas tree ordering of local
variables was not adhered to.

I know it can be difficult in some situations, but please fix this up in
followon patches.  You can move the assignments down into the function
body to accomodate this, if necessary.

Thanks.

^ permalink raw reply

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 16:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel
In-Reply-To: <20180323.122035.1380806748695640531.davem@davemloft.net>

On 3/23/2018 12:20 PM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Thu, 22 Mar 2018 13:10:00 -0400
> 
>> Code includes wmb() followed by writel(). writel() already has a
>> barrier on some architectures like arm64.
>  ...
>> @@ -4155,7 +4155,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>  	txdata->tx_db.data.prod += nbd;
>>  	barrier();
>>  
>> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
>> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
>>  
>>  	mmiowb();
>  ...
>> @@ -2592,7 +2592,7 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
>>  
>>  	txdata->tx_db.data.prod += 2;
>>  	barrier();
>> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
>> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
> 
> These are compiler barriers being used here, not wmb().
> 
> Look, if I can't see a clear:
> 
> 	wmb()
> 	writel()
> 
> sequence in the patch hunks, I am going to keep pushing back on
> these changes.

Sorry, you got me confused now.

If you look at the code closer, you'll see this.

	wmb();

	txdata->tx_db.data.prod += nbd;
	barrier();

	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);

and you also asked me to rename DOORBELL to DOORBELL_RELAXED() to make
it obvious that we have a relaxed operator inside the macro.

Did I miss something?

of course, treating barrier() universally as a write barrier is wrong.

> 
> Thank you.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [bpf-next V5 PATCH 02/15] xdp: introduce xdp_return_frame API and use in cpumap
From: Alexander Duyck @ 2018-03-23 16:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
	Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
	Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <152180748903.20167.14050388865469930299.stgit@firesoul>

On Fri, Mar 23, 2018 at 5:18 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> Introduce an xdp_return_frame API, and convert over cpumap as
> the first user, given it have queued XDP frame structure to leverage.
>
> V3: Cleanup and remove C99 style comments, pointed out by Alex Duyck.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/net/xdp.h   |   28 ++++++++++++++++++++++++
>  kernel/bpf/cpumap.c |   60 +++++++++++++++++++++++++++++++--------------------
>  net/core/xdp.c      |   18 +++++++++++++++
>  3 files changed, 82 insertions(+), 24 deletions(-)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index b2362ddfa694..15b546325e31 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -33,16 +33,44 @@
>   * also mandatory during RX-ring setup.
>   */
>
> +enum mem_type {
> +       MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
> +       MEM_TYPE_PAGE_ORDER0,     /* Orig XDP full page model */
> +       MEM_TYPE_MAX,
> +};
> +
> +struct xdp_mem_info {
> +       u32 type; /* enum mem_type, but known size type */

Do you really need to make t his a full u32 value for something that
is likely never going to exceed a single digit value?

> +       /* u32 id; will be added later in this patchset */

Wouldn't it be better to just hold off and add it then instead of
adding it as a comment?

> +};
> +
>  struct xdp_rxq_info {
>         struct net_device *dev;
>         u32 queue_index;
>         u32 reg_state;
> +       struct xdp_mem_info mem;
>  } ____cacheline_aligned; /* perf critical, avoid false-sharing */
>
> +
> +static inline
> +void xdp_return_frame(void *data, struct xdp_mem_info *mem)
> +{
> +       if (mem->type == MEM_TYPE_PAGE_SHARED)
> +               page_frag_free(data);
> +
> +       if (mem->type == MEM_TYPE_PAGE_ORDER0) {
> +               struct page *page = virt_to_page(data); /* Assumes order0 page*/
> +
> +               put_page(page);
> +       }

Actually page_frag_free would probably work for either one. Also is it
safe to assume that the page is order 0? Are the only users of
compound pages that support XDP also only supporting the page fragment
setup?

Also you probably don't need put_page. It might be better to use
__free_page if you are certain the pages are coming from the Rx path
of drivers and don't have any special destructors associated with
them.

> +}
> +
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
>                      struct net_device *dev, u32 queue_index);
>  void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
>  void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
>  bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
> +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> +                              enum mem_type type, void *allocator);
>
>  #endif /* __LINUX_NET_XDP_H__ */
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index a4bb0b34375a..3e4bbcbe3e86 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -19,6 +19,7 @@
>  #include <linux/bpf.h>
>  #include <linux/filter.h>
>  #include <linux/ptr_ring.h>
> +#include <net/xdp.h>
>
>  #include <linux/sched.h>
>  #include <linux/workqueue.h>
> @@ -137,27 +138,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>         return ERR_PTR(err);
>  }
>
> -static void __cpu_map_queue_destructor(void *ptr)
> -{
> -       /* The tear-down procedure should have made sure that queue is
> -        * empty.  See __cpu_map_entry_replace() and work-queue
> -        * invoked cpu_map_kthread_stop(). Catch any broken behaviour
> -        * gracefully and warn once.
> -        */
> -       if (WARN_ON_ONCE(ptr))
> -               page_frag_free(ptr);
> -}
> -
> -static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
> -{
> -       if (atomic_dec_and_test(&rcpu->refcnt)) {
> -               /* The queue should be empty at this point */
> -               ptr_ring_cleanup(rcpu->queue, __cpu_map_queue_destructor);
> -               kfree(rcpu->queue);
> -               kfree(rcpu);
> -       }
> -}
> -
>  static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
>  {
>         atomic_inc(&rcpu->refcnt);
> @@ -188,6 +168,10 @@ struct xdp_pkt {
>         u16 len;
>         u16 headroom;
>         u16 metasize;
> +       /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
> +        * while mem info is valid on remote CPU.
> +        */
> +       struct xdp_mem_info mem;
>         struct net_device *dev_rx;
>  };
>
> @@ -213,6 +197,9 @@ static struct xdp_pkt *convert_to_xdp_pkt(struct xdp_buff *xdp)
>         xdp_pkt->headroom = headroom - sizeof(*xdp_pkt);
>         xdp_pkt->metasize = metasize;
>
> +       /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
> +       xdp_pkt->mem = xdp->rxq->mem;
> +
>         return xdp_pkt;
>  }
>
> @@ -265,6 +252,31 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
>         return skb;
>  }
>
> +static void __cpu_map_ring_cleanup(struct ptr_ring *ring)
> +{
> +       /* The tear-down procedure should have made sure that queue is
> +        * empty.  See __cpu_map_entry_replace() and work-queue
> +        * invoked cpu_map_kthread_stop(). Catch any broken behaviour
> +        * gracefully and warn once.
> +        */
> +       struct xdp_pkt *xdp_pkt;
> +
> +       while ((xdp_pkt = ptr_ring_consume(ring)))
> +               if (WARN_ON_ONCE(xdp_pkt))
> +                       xdp_return_frame(xdp_pkt, &xdp_pkt->mem);
> +}
> +
> +static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
> +{
> +       if (atomic_dec_and_test(&rcpu->refcnt)) {
> +               /* The queue should be empty at this point */
> +               __cpu_map_ring_cleanup(rcpu->queue);
> +               ptr_ring_cleanup(rcpu->queue, NULL);
> +               kfree(rcpu->queue);
> +               kfree(rcpu);
> +       }
> +}
> +
>  static int cpu_map_kthread_run(void *data)
>  {
>         struct bpf_cpu_map_entry *rcpu = data;
> @@ -307,7 +319,7 @@ static int cpu_map_kthread_run(void *data)
>
>                         skb = cpu_map_build_skb(rcpu, xdp_pkt);
>                         if (!skb) {
> -                               page_frag_free(xdp_pkt);
> +                               xdp_return_frame(xdp_pkt, &xdp_pkt->mem);
>                                 continue;
>                         }
>
> @@ -604,13 +616,13 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu,
>         spin_lock(&q->producer_lock);
>
>         for (i = 0; i < bq->count; i++) {
> -               void *xdp_pkt = bq->q[i];
> +               struct xdp_pkt *xdp_pkt = bq->q[i];
>                 int err;
>
>                 err = __ptr_ring_produce(q, xdp_pkt);
>                 if (err) {
>                         drops++;
> -                       page_frag_free(xdp_pkt); /* Free xdp_pkt */
> +                       xdp_return_frame(xdp_pkt->data, &xdp_pkt->mem);
>                 }
>                 processed++;
>         }
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 097a0f74e004..9eee0c431126 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -71,3 +71,21 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq)
>         return (xdp_rxq->reg_state == REG_STATE_REGISTERED);
>  }
>  EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg);
> +
> +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> +                              enum mem_type type, void *allocator)
> +{
> +       if (type >= MEM_TYPE_MAX)
> +               return -EINVAL;
> +
> +       xdp_rxq->mem.type = type;
> +
> +       if (allocator)
> +               return -EOPNOTSUPP;
> +
> +       /* TODO: Allocate an ID that maps to allocator pointer
> +        * See: https://www.kernel.org/doc/html/latest/core-api/idr.html
> +        */
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
>

^ permalink raw reply

* Re: [PATCH net-next] net: bridge: fix direct access to bridge vlan_enabled and use helper
From: David Miller @ 2018-03-23 16:41 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, roopa, bridge, 3chas3
In-Reply-To: <1521822426-18563-1-git-send-email-nikolay@cumulusnetworks.com>

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Fri, 23 Mar 2018 18:27:06 +0200

> We need to use br_vlan_enabled() helper otherwise we'll break builds
> without bridge vlans:
> net/bridge//br_if.c: In function ‘br_mtu’:
> net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
> member named ‘vlan_enabled’
>   if (br->vlan_enabled)
>         ^
> net/bridge//br_if.c:462:1: warning: control reaches end of non-void
> function [-Wreturn-type]
>  }
>  ^
> scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
> failed
> 
> Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Applied, thanks for responding to this issue so quickly.

^ permalink raw reply

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: David Miller @ 2018-03-23 16:43 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel
In-Reply-To: <cb6a0522-2c53-b252-49db-5779d24963c6@codeaurora.org>

From: Sinan Kaya <okaya@codeaurora.org>
Date: Fri, 23 Mar 2018 12:31:12 -0400

> Sorry, you got me confused now.
> 
> If you look at the code closer, you'll see this.
> 
> 	wmb();
> 
> 	txdata->tx_db.data.prod += nbd;
> 	barrier();
> 
> 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
> 
> and you also asked me to rename DOORBELL to DOORBELL_RELAXED() to make
> it obvious that we have a relaxed operator inside the macro.

This still doesn't match the stated pattern.

	wmb();
	/* no other memory or I/O or IOMEM operation */
	writel();

There is a write to a producer index there and then no non-compiler
barrier or any kind before the writel().

So, in fact, it might really need that implicit writel() barrier here!

^ permalink raw reply

* Re: [bpf-next V5 PATCH 03/15] ixgbe: use xdp_return_frame API
From: Alexander Duyck @ 2018-03-23 16:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
	Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
	Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <152180749411.20167.16232784042872304317.stgit@firesoul>

On Fri, Mar 23, 2018 at 5:18 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> Extend struct ixgbe_tx_buffer to store the xdp_mem_info.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    6 ++++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index c1e3a0039ea5..cbc20f199364 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -249,6 +249,7 @@ struct ixgbe_tx_buffer {
>         DEFINE_DMA_UNMAP_ADDR(dma);
>         DEFINE_DMA_UNMAP_LEN(len);
>         u32 tx_flags;
> +       struct xdp_mem_info xdp_mem;

Instead of increasing the size of this structure it might make more
sense to find free space somewhere in side of it.

One thing you could probably look at doing is making it so that the
gso_segs, protocol, and tx_flags fields could be put into an anonymous
structure that is then part of a union with the xdp_mem_info
structure. Then you would just have to tweak things slightly so that
the else section for the ring_is_xdp block pulls the code just above
it into that section so that those fields are not read.

You would need to flip the dma, length, and type field ordering but
that shouldn't be much of an issue and it should have no effect on
performance since they are all in the same 16 byte aligned block
anyway.

>  };
>
>  struct ixgbe_rx_buffer {
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 85369423452d..45520eb503ee 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1207,7 +1207,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>
>                 /* free the skb */
>                 if (ring_is_xdp(tx_ring))
> -                       page_frag_free(tx_buffer->data);
> +                       xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem);

Based on my suggestion above this section would have:
    total_packets++;

>                 else
>                         napi_consume_skb(tx_buffer->skb, napi_budget);

This section would pull the lines that read gso_segs and tx_flags down
into this else statement so that the fields go unused.

>
> @@ -5787,7 +5787,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
>
>                 /* Free all the Tx ring sk_buffs */
>                 if (ring_is_xdp(tx_ring))
> -                       page_frag_free(tx_buffer->data);
> +                       xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem);
>                 else
>                         dev_kfree_skb_any(tx_buffer->skb);
>
> @@ -8351,6 +8351,8 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
>         dma_unmap_len_set(tx_buffer, len, len);
>         dma_unmap_addr_set(tx_buffer, dma, dma);
>         tx_buffer->data = xdp->data;
> +       tx_buffer->xdp_mem = xdp->rxq->mem;
> +

It would work out better if you moved this up to the lines that are
setting the tx_buffer fields. Really you could probably pull the line
that is recording xdp->data up as well.

>         tx_desc->read.buffer_addr = cpu_to_le64(dma);
>
>         /* put descriptor type bits */
>

^ permalink raw reply

* [PATCH net-next v2 0/3] Drop NETDEV_UNREGISTER_FINAL (was unnamed)
From: Kirill Tkhai @ 2018-03-23 16:47 UTC (permalink / raw)
  To: davem, Michal.Kalderon, Ariel.Elior, dledford, jgg, benve,
	1dgoodell, daniel, jakub.kicinski, ast, edumazet, linux,
	john.fastabend, brouer, dsahern, netdev, ktkhai

This series drops unused NETDEV_UNREGISTER_FINAL
after some preparations.

v2: New patch [2/3]. Use switch() in [1/3].

The first version was acked by Jason Gunthorpe,
and [1/3] was acked by David Ahern.

Since there are differences to v1, I haven't added
Acked-by tags of people. It would be nice, if you
fill OK to tag v2 too.

Thanks,
Kirill
---

Kirill Tkhai (3):
      net: Make NETDEV_XXX commands enum { }
      infiniband: Replace usnic_ib_netdev_event_to_string() with netdev_cmd_to_name()
      net: Drop NETDEV_UNREGISTER_FINAL


 drivers/infiniband/hw/qedr/main.c           |    4 +-
 drivers/infiniband/hw/usnic/usnic_ib_main.c |   28 ++---------
 include/linux/netdevice.h                   |   68 ++++++++++++++-------------
 include/rdma/ib_verbs.h                     |    4 +-
 net/core/dev.c                              |   25 ++++++++--
 5 files changed, 63 insertions(+), 66 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

^ permalink raw reply

* Re: [PATCH 0/2] bpf: Change print_bpf_insn interface
From: Daniel Borkmann @ 2018-03-23 16:47 UTC (permalink / raw)
  To: Quentin Monnet, Jiri Olsa, Alexei Starovoitov; +Cc: lkml, netdev
In-Reply-To: <f5586965-1447-ba8e-a27c-398b221f18e3@netronome.com>

On 03/23/2018 02:34 PM, Quentin Monnet wrote:
> 2018-03-23 11:41 UTC+0100 ~ Jiri Olsa <jolsa@kernel.org>
>> hi,
>> this patchset removes struct bpf_verifier_env argument
>> from print_bpf_insn function (patch 1) and changes user
>> space bpftool user to use it that way (patch 2).
>>
>> thanks,
>> jirka
>>
>> ---
>> Jiri Olsa (2):
>>       bpf: Remove struct bpf_verifier_env argument from print_bpf_insn
>>       bpftool: Adjust to new print_bpf_insn interface
>>
>>  kernel/bpf/disasm.c               | 52 ++++++++++++++++++++++++++--------------------------
>>  kernel/bpf/disasm.h               |  5 +----
>>  kernel/bpf/verifier.c             | 44 +++++++++++++++++++++++++++-----------------
>>  tools/bpf/bpftool/xlated_dumper.c | 12 ++++++------
>>  4 files changed, 60 insertions(+), 53 deletions(-)
>>
> 
> Thanks, this version looks good to me. Please keep the "Reviewed-by"
> tags when resubmitting new versions of your patch sets :)
> 
> For the series:
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Applied to bpf-next, thanks everyone!

^ permalink raw reply

* [PATCH net-next v2 3/3] net: Drop NETDEV_UNREGISTER_FINAL
From: Kirill Tkhai @ 2018-03-23 16:47 UTC (permalink / raw)
  To: davem, Michal.Kalderon, Ariel.Elior, dledford, jgg, benve,
	1dgoodell, daniel, jakub.kicinski, ast, edumazet, linux,
	john.fastabend, brouer, dsahern, netdev, ktkhai
In-Reply-To: <152182320819.2767.12723786642825837986.stgit@localhost.localdomain>

Last user is gone after bdf5bd7f2132 "rds: tcp: remove
register_netdevice_notifier infrastructure.", so we can
remove this netdevice command. This allows to delete
rtnl_lock() in netdev_run_todo(), which is hot path for
net namespace unregistration.

dev_change_net_namespace() and netdev_wait_allrefs()
have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
and the source commits say they were introduced to
delemit the call with NETDEV_UNREGISTER, but this patch
leaves them on the places, since they require additional
analysis, whether we need in them for something else.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 drivers/infiniband/hw/qedr/main.c |    4 ++--
 include/linux/netdevice.h         |    1 -
 include/rdma/ib_verbs.h           |    4 ++--
 net/core/dev.c                    |    7 -------
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index db4bf97c0e15..eb32abb0099a 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -90,8 +90,8 @@ static struct net_device *qedr_get_netdev(struct ib_device *dev, u8 port_num)
 	dev_hold(qdev->ndev);
 
 	/* The HW vendor's device driver must guarantee
-	 * that this function returns NULL before the net device reaches
-	 * NETDEV_UNREGISTER_FINAL state.
+	 * that this function returns NULL before the net device has finished
+	 * NETDEV_UNREGISTER state.
 	 */
 	return qdev->ndev;
 }
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dd5a04c971d5..2a2d9cf50aa2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2336,7 +2336,6 @@ enum netdev_cmd {
 	NETDEV_PRE_TYPE_CHANGE,
 	NETDEV_POST_TYPE_CHANGE,
 	NETDEV_POST_INIT,
-	NETDEV_UNREGISTER_FINAL,
 	NETDEV_RELEASE,
 	NETDEV_NOTIFY_PEERS,
 	NETDEV_JOIN,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 73b2387e3f74..a1c5e8320f86 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2126,8 +2126,8 @@ struct ib_device {
 	 * net device of device @device at port @port_num or NULL if such
 	 * a net device doesn't exist. The vendor driver should call dev_hold
 	 * on this net device. The HW vendor's device driver must guarantee
-	 * that this function returns NULL before the net device reaches
-	 * NETDEV_UNREGISTER_FINAL state.
+	 * that this function returns NULL before the net device has finished
+	 * NETDEV_UNREGISTER state.
 	 */
 	struct net_device	  *(*get_netdev)(struct ib_device *device,
 						 u8 port_num);
diff --git a/net/core/dev.c b/net/core/dev.c
index eb8ff44f46d6..cb4d2ceb36fb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1584,7 +1584,6 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd)
 	N(RESEND_IGMP) N(PRECHANGEMTU) N(CHANGEINFODATA) N(BONDING_INFO)
 	N(PRECHANGEUPPER) N(CHANGELOWERSTATE) N(UDP_TUNNEL_PUSH_INFO)
 	N(UDP_TUNNEL_DROP_INFO) N(CHANGE_TX_QUEUE_LEN)
-	N(UNREGISTER_FINAL)
 	};
 #undef N
 	return "UNKNOWN_NETDEV_EVENT";
@@ -8089,7 +8088,6 @@ static void netdev_wait_allrefs(struct net_device *dev)
 			rcu_barrier();
 			rtnl_lock();
 
-			call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
 			if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
 				     &dev->state)) {
 				/* We must not have linkwatch events
@@ -8161,10 +8159,6 @@ void netdev_run_todo(void)
 			= list_first_entry(&list, struct net_device, todo_list);
 		list_del(&dev->todo_list);
 
-		rtnl_lock();
-		call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
-		__rtnl_unlock();
-
 		if (unlikely(dev->reg_state != NETREG_UNREGISTERING)) {
 			pr_err("network todo '%s' but state %d\n",
 			       dev->name, dev->reg_state);
@@ -8606,7 +8600,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	 */
 	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
 	rcu_barrier();
-	call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
 
 	new_nsid = peernet2id_alloc(dev_net(dev), net);
 	/* If there is an ifindex conflict assign a new one */

^ permalink raw reply related

* [PATCH net-next v2 2/3] infiniband: Replace usnic_ib_netdev_event_to_string() with netdev_cmd_to_name()
From: Kirill Tkhai @ 2018-03-23 16:47 UTC (permalink / raw)
  To: davem, Michal.Kalderon, Ariel.Elior, dledford, jgg, benve,
	1dgoodell, daniel, jakub.kicinski, ast, edumazet, linux,
	john.fastabend, brouer, dsahern, netdev, ktkhai
In-Reply-To: <152182320819.2767.12723786642825837986.stgit@localhost.localdomain>

This function just calls netdev_cmd_to_name().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 drivers/infiniband/hw/usnic/usnic_ib_main.c |   15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 5bf3b20eba25..ca5638091b55 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -95,11 +95,6 @@ void usnic_ib_log_vf(struct usnic_ib_vf *vf)
 }
 
 /* Start of netdev section */
-static inline const char *usnic_ib_netdev_event_to_string(unsigned long event)
-{
-	return netdev_cmd_to_name(event);
-}
-
 static void usnic_ib_qp_grp_modify_active_to_err(struct usnic_ib_dev *us_ibdev)
 {
 	struct usnic_ib_ucontext *ctx;
@@ -172,7 +167,7 @@ static void usnic_ib_handle_usdev_event(struct usnic_ib_dev *us_ibdev,
 			ib_dispatch_event(&ib_event);
 		} else {
 			usnic_dbg("Ignoring %s on %s\n",
-					usnic_ib_netdev_event_to_string(event),
+					netdev_cmd_to_name(event),
 					us_ibdev->ib_dev.name);
 		}
 		break;
@@ -209,7 +204,7 @@ static void usnic_ib_handle_usdev_event(struct usnic_ib_dev *us_ibdev,
 		break;
 	default:
 		usnic_dbg("Ignoring event %s on %s",
-				usnic_ib_netdev_event_to_string(event),
+				netdev_cmd_to_name(event),
 				us_ibdev->ib_dev.name);
 	}
 	mutex_unlock(&us_ibdev->usdev_lock);
@@ -251,7 +246,7 @@ static int usnic_ib_handle_inet_event(struct usnic_ib_dev *us_ibdev,
 	switch (event) {
 	case NETDEV_DOWN:
 		usnic_info("%s via ip notifiers",
-				usnic_ib_netdev_event_to_string(event));
+				netdev_cmd_to_name(event));
 		usnic_fwd_del_ipaddr(us_ibdev->ufdev);
 		usnic_ib_qp_grp_modify_active_to_err(us_ibdev);
 		ib_event.event = IB_EVENT_GID_CHANGE;
@@ -262,7 +257,7 @@ static int usnic_ib_handle_inet_event(struct usnic_ib_dev *us_ibdev,
 	case NETDEV_UP:
 		usnic_fwd_add_ipaddr(us_ibdev->ufdev, ifa->ifa_address);
 		usnic_info("%s via ip notifiers: ip %pI4",
-				usnic_ib_netdev_event_to_string(event),
+				netdev_cmd_to_name(event),
 				&us_ibdev->ufdev->inaddr);
 		ib_event.event = IB_EVENT_GID_CHANGE;
 		ib_event.device = &us_ibdev->ib_dev;
@@ -271,7 +266,7 @@ static int usnic_ib_handle_inet_event(struct usnic_ib_dev *us_ibdev,
 		break;
 	default:
 		usnic_info("Ignoring event %s on %s",
-				usnic_ib_netdev_event_to_string(event),
+				netdev_cmd_to_name(event),
 				us_ibdev->ib_dev.name);
 	}
 	mutex_unlock(&us_ibdev->usdev_lock);

^ permalink raw reply related

* Re: [PATCH net-next v2 0/3] Drop NETDEV_UNREGISTER_FINAL (was unnamed)
From: Jason Gunthorpe @ 2018-03-23 16:50 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: davem, Michal.Kalderon, Ariel.Elior, dledford, benve, 1dgoodell,
	daniel, jakub.kicinski, ast, edumazet, linux, john.fastabend,
	brouer, dsahern, netdev
In-Reply-To: <152182320819.2767.12723786642825837986.stgit@localhost.localdomain>

On Fri, Mar 23, 2018 at 07:47:09PM +0300, Kirill Tkhai wrote:
> This series drops unused NETDEV_UNREGISTER_FINAL
> after some preparations.
> 
> v2: New patch [2/3]. Use switch() in [1/3].
> 
> The first version was acked by Jason Gunthorpe,
> and [1/3] was acked by David Ahern.
> 
> Since there are differences to v1, I haven't added
> Acked-by tags of people. It would be nice, if you
> fill OK to tag v2 too.
>
> Thanks,
> Kirill
> 
> Kirill Tkhai (3):
>       net: Make NETDEV_XXX commands enum { }
>       infiniband: Replace usnic_ib_netdev_event_to_string() with netdev_cmd_to_name()
>       net: Drop NETDEV_UNREGISTER_FINAL
> 
>  drivers/infiniband/hw/qedr/main.c           |    4 +-
>  drivers/infiniband/hw/usnic/usnic_ib_main.c |   28 ++---------
>  include/linux/netdevice.h                   |   68 ++++++++++++++-------------
>  include/rdma/ib_verbs.h                     |    4 +-
>  net/core/dev.c                              |   25 ++++++++--
>  5 files changed, 63 insertions(+), 66 deletions(-)

All the infiniband stuff is OK to merge through netdev:

Acked-by: Jason Gunthorpe <jgg@mellanox>

Jason

^ permalink raw reply

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 16:51 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel
In-Reply-To: <20180323.124326.2170503491903886041.davem@davemloft.net>

On 3/23/2018 12:43 PM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Fri, 23 Mar 2018 12:31:12 -0400
> 
>> Sorry, you got me confused now.
>>
>> If you look at the code closer, you'll see this.
>>
>> 	wmb();
>>
>> 	txdata->tx_db.data.prod += nbd;
>> 	barrier();
>>
>> 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
>>
>> and you also asked me to rename DOORBELL to DOORBELL_RELAXED() to make
>> it obvious that we have a relaxed operator inside the macro.
> 
> This still doesn't match the stated pattern.

I can certainly update the commit text for this or spin into its own
patch to make it obvious.

> 
> 	wmb();
> 	/* no other memory or I/O or IOMEM operation */
> 	writel();
> 
> There is a write to a producer index there and then no non-compiler
> barrier or any kind before the writel().
> 
> So, in fact, it might really need that implicit writel() barrier here!
> 

It could if txdata->tx_db was not a union. There is a data dependency
between txdata->tx_db.data.prod and txdata->tx_db.raw. 

So, no reordering.

I can argue that barrier() here is useless in fact.

Anyhow, I'll spin this piece out of this patch so that we pay special
attention with a better description.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH net] ipv6: fix possible deadlock in rt6_age_examine_exception()
From: Wei Wang @ 2018-03-23 16:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Linux Kernel Network Developers, Eric Dumazet,
	Martin KaFai Lau
In-Reply-To: <20180323145658.154636-1-edumazet@google.com>

On Fri, Mar 23, 2018 at 7:57 AM Eric Dumazet <edumazet@google.com> wrote:

> syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()

> rt6_age_examine_exception() is called while rt6_exception_lock is held.
> This lock is the lower one in the lock hierarchy, thus we can not
> call dst_neigh_lookup() function, as it can fallback to neigh_create()

> We should instead do a pure RCU lookup. As a bonus we avoid
> a pair of atomic operations on neigh refcount.

> [1]

> WARNING: possible circular locking dependency detected
> 4.16.0-rc4+ #277 Not tainted

> syz-executor7/4015 is trying to acquire lock:
>   (&ndev->lock){++--}, at: [<00000000416dce19>]
__ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928

> but task is already holding lock:
>   (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250
net/core/neighbour.c:292

> which lock already depends on the new lock.

> the existing dependency chain (in reverse order) is:

> -> #3 (&tbl->lock){++-.}:
>         __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
>         _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
>         __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528
>         neigh_create include/net/neighbour.h:315 [inline]
>         ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228
>         dst_neigh_lookup include/net/dst.h:405 [inline]
>         rt6_age_examine_exception net/ipv6/route.c:1609 [inline]
>         rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645
>         fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033
>         fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919
>         fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845
>         fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893
>         fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970
>         __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986
>         fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline]
>         fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053
>         ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781
>         notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>         __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>         raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>         call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>         call_netdevice_notifiers net/core/dev.c:1725 [inline]
>         __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
>         dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>         devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
>         inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
>         sock_do_ioctl+0xef/0x390 net/socket.c:957
>         sock_ioctl+0x36b/0x610 net/socket.c:1081
>         vfs_ioctl fs/ioctl.c:46 [inline]
>         do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>         SYSC_ioctl fs/ioctl.c:701 [inline]
>         SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>         do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> -> #2 (rt6_exception_lock){+.-.}:
>         __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>         _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
>         spin_lock_bh include/linux/spinlock.h:315 [inline]
>         rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367
>         fib6_del_route net/ipv6/ip6_fib.c:1677 [inline]
>         fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761
>         __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980
>         ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993
>         __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332
>         ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline]
>         ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200
>         inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433
>         sock_release+0x8d/0x1e0 net/socket.c:594
>         sock_close+0x16/0x20 net/socket.c:1149
>         __fput+0x327/0x7e0 fs/file_table.c:209
>         ____fput+0x15/0x20 fs/file_table.c:243
>         task_work_run+0x199/0x270 kernel/task_work.c:113
>         exit_task_work include/linux/task_work.h:22 [inline]
>         do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>         do_group_exit+0x149/0x400 kernel/exit.c:968
>         get_signal+0x73a/0x16d0 kernel/signal.c:2469
>         do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>         exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>         prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>         syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>         do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> -> #1 (&(&tb->tb6_lock)->rlock){+.-.}:
>         __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>         _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
>         spin_lock_bh include/linux/spinlock.h:315 [inline]
>         __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007
>         ip6_route_add+0x141/0x190 net/ipv6/route.c:2955
>         addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359
>         fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline]
>         addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline]
>         addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460
>         notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>         __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>         raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>         call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>         call_netdevice_notifiers net/core/dev.c:1725 [inline]
>         __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958
>         dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>         do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357
>         rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965
>         rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641
>         netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444
>         rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659
>         netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
>         netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
>         netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
>         sock_sendmsg_nosec net/socket.c:629 [inline]
>         sock_sendmsg+0xca/0x110 net/socket.c:639
>         ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
>         __sys_sendmsg+0xe5/0x210 net/socket.c:2081
>         SYSC_sendmsg net/socket.c:2092 [inline]
>         SyS_sendmsg+0x2d/0x50 net/socket.c:2088
>         do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> -> #0 (&ndev->lock){++--}:
>         lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>         __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
>         _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
>         __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
>         ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
>         pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
>         pneigh_ifdown net/core/neighbour.c:695 [inline]
>         neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
>         rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
>         addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
>         addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
>         notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>         __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>         raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>         call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>         call_netdevice_notifiers net/core/dev.c:1725 [inline]
>         __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
>         dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>         devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
>         inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
>         packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
>         sock_do_ioctl+0xef/0x390 net/socket.c:957
>         sock_ioctl+0x36b/0x610 net/socket.c:1081
>         vfs_ioctl fs/ioctl.c:46 [inline]
>         do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>         SYSC_ioctl fs/ioctl.c:701 [inline]
>         SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>         do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>         entry_SYSCALL_64_after_hwframe+0x42/0xb7

> other info that might help us debug this:

> Chain exists of:
>    &ndev->lock --> rt6_exception_lock --> &tbl->lock

>   Possible unsafe locking scenario:

>         CPU0                    CPU1
>         ----                    ----
>    lock(&tbl->lock);
>                                 lock(rt6_exception_lock);
>                                 lock(&tbl->lock);
>    lock(&ndev->lock);

>   *** DEADLOCK ***

> 2 locks held by syz-executor7/4015:
>   #0:  (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
>   #1:  (&tbl->lock){++-.}, at: [<00000000b5cb1d65>]
neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292

> stack backtrace:
> CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:17 [inline]
>   dump_stack+0x194/0x24d lib/dump_stack.c:53
>   print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
>   check_prev_add kernel/locking/lockdep.c:1863 [inline]
>   check_prevs_add kernel/locking/lockdep.c:1976 [inline]
>   validate_chain kernel/locking/lockdep.c:2417 [inline]
>   __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
>   lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>   __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
>   _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
>   __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
>   ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
>   pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
>   pneigh_ifdown net/core/neighbour.c:695 [inline]
>   neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
>   rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
>   addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
>   addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
>   notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
>   __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>   raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>   call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
>   call_netdevice_notifiers net/core/dev.c:1725 [inline]
>   __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
>   dev_change_flags+0xf5/0x140 net/core/dev.c:6994
>   devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
>   inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
>   packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
>   sock_do_ioctl+0xef/0x390 net/socket.c:957
>   sock_ioctl+0x36b/0x610 net/socket.c:1081
>   vfs_ioctl fs/ioctl.c:46 [inline]
>   do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>   SYSC_ioctl fs/ioctl.c:701 [inline]
>   SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>   do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7

> Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for exception table")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Wei Wang <weiwan@google.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> ---

Nice fix. Thanks Eric.

Acked-by: Wei Wang <weiwan@google.com>

>   net/ipv6/route.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)

> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index
b0d5c64e19780ce94feb112285ed1d85dbe07e9e..b33d057ac5eb2a85e19be59f0bceacf547cc9e59
100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1626,11 +1626,10 @@ static void rt6_age_examine_exception(struct
rt6_exception_bucket *bucket,
>                  struct neighbour *neigh;
>                  __u8 neigh_flags = 0;

> -               neigh = dst_neigh_lookup(&rt->dst, &rt->rt6i_gateway);
> -               if (neigh) {
> +               neigh = __ipv6_neigh_lookup_noref(rt->dst.dev,
&rt->rt6i_gateway);
> +               if (neigh)
>                          neigh_flags = neigh->flags;
> -                       neigh_release(neigh);
> -               }
> +
>                  if (!(neigh_flags & NTF_ROUTER)) {
>                          RT6_TRACE("purging route %p via non-router but
gateway\n",
>                                    rt);
> @@ -1654,7 +1653,8 @@ void rt6_age_exceptions(struct rt6_info *rt,
>          if (!rcu_access_pointer(rt->rt6i_exception_bucket))
>                  return;

> -       spin_lock_bh(&rt6_exception_lock);
> +       rcu_read_lock_bh();
> +       spin_lock(&rt6_exception_lock);
>          bucket = rcu_dereference_protected(rt->rt6i_exception_bucket,
>                                      lockdep_is_held(&rt6_exception_lock));

> @@ -1668,7 +1668,8 @@ void rt6_age_exceptions(struct rt6_info *rt,
>                          bucket++;
>                  }
>          }
> -       spin_unlock_bh(&rt6_exception_lock);
> +       spin_unlock(&rt6_exception_lock);
> +       rcu_read_unlock_bh();
>   }

>   struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
> --
> 2.17.0.rc0.231.g781580f067-goog

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox