Netdev List

Netdev List
 help / color / mirror / Atom feed

* greetings
From: Miss Zeliha ömer Faruk @ 2018-05-30 22:34 UTC (permalink / raw)




-- 
Hello

I have been trying to contact you. Did you get my business proposal?

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215 Sisli - Istanbul, Turke

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Jakub Kicinski @ 2018-05-30 22:36 UTC (permalink / raw)
  To: Michael Chan; +Cc: Samudrala, Sridhar, David Miller, Netdev
In-Reply-To: <CACKFLi=-QVePYwAiG+boXoH0tdWTaM1ZRyV4qxsj0Au1w5PKBQ@mail.gmail.com>

On Wed, 30 May 2018 00:18:39 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 11:33 PM, Jakub Kicinski wrote:
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC.  The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set.  In an ideal world
> > we would try to keep all those in one place :)  
> 
> Yeah, another colleague is now working with Mellanox on something similar.
> 
> One difference between those devlink parameters and these queue
> parameters is that the former are more permanent and global settings.
> For example, number of VFs or number of MSIX per VF are persistent
> settings once they are set and after PCIe reset.  On the other hand,
> these queue settings are pure run-time settings and may be unique for
> each VF.  These are not stored as there is no room in NVRAM to store
> 128 sets or more of these parameters.

Indeed, I think the API must be flexible as to what is persistent and
what is not because HW will certainly differ in that regard.  And
agreed, queues may be a bit of a stretch here, but worth a try.

> Anyway, let me discuss this with my colleague to see if there is a
> natural fit for these queue parameters in the devlink infrastructure
> that they are working on.

Thank you!

^ permalink raw reply

* [PATCH] rtnetlink: Remove VLA usage
From: Kees Cook @ 2018-05-30 22:20 UTC (permalink / raw)
  To: David S. Miller; +Cc: Florian Westphal, David Ahern, netdev, linux-kernel

In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible types and adds
sanity-checks at both registration and usage to make sure nothing gets
out of sync. This matches the proposed VLA solution for nfnetlink[2]. The
values chosen here were based on finding assignments for .maxtype and
.slave_maxtype and manually counting the enums:

slave_maxtype (max 33):
	IFLA_BRPORT_MAX     33
	IFLA_BOND_SLAVE_MAX  9

maxtype (max 45):
	IFLA_BOND_MAX       28
	IFLA_BR_MAX         45
	__IFLA_CAIF_HSI_MAX  8
	IFLA_CAIF_MAX        4
	IFLA_CAN_MAX        16
	IFLA_GENEVE_MAX     12
	IFLA_GRE_MAX        25
	IFLA_GTP_MAX         5
	IFLA_HSR_MAX         7
	IFLA_IPOIB_MAX       4
	IFLA_IPTUN_MAX      21
	IFLA_IPVLAN_MAX      3
	IFLA_MACSEC_MAX     15
	IFLA_MACVLAN_MAX     7
	IFLA_PPP_MAX         2
	__IFLA_RMNET_MAX     4
	IFLA_VLAN_MAX        6
	IFLA_VRF_MAX         2
	IFLA_VTI_MAX         7
	IFLA_VXLAN_MAX      28
	VETH_INFO_MAX        2
	VXCAN_INFO_MAX       2

This additionally changes maxtype and slave_maxtype fields to unsigned,
since they're only ever using positive values.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://patchwork.kernel.org/patch/10439647/

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/rtnetlink.h |  4 ++--
 net/core/rtnetlink.c    | 18 ++++++++++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 14b6b3af8918..0bbaa5488423 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -64,7 +64,7 @@ struct rtnl_link_ops {
 	size_t			priv_size;
 	void			(*setup)(struct net_device *dev);
 
-	int			maxtype;
+	unsigned int		maxtype;
 	const struct nla_policy	*policy;
 	int			(*validate)(struct nlattr *tb[],
 					    struct nlattr *data[],
@@ -92,7 +92,7 @@ struct rtnl_link_ops {
 	unsigned int		(*get_num_tx_queues)(void);
 	unsigned int		(*get_num_rx_queues)(void);
 
-	int			slave_maxtype;
+	unsigned int		slave_maxtype;
 	const struct nla_policy	*slave_policy;
 	int			(*slave_changelink)(struct net_device *dev,
 						    struct net_device *slave_dev,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 45936922d7e2..4ede33719ca9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -59,6 +59,9 @@
 #include <net/rtnetlink.h>
 #include <net/net_namespace.h>
 
+#define RTNL_MAX_TYPE		48
+#define RTNL_SLAVE_MAX_TYPE	36
+
 struct rtnl_link {
 	rtnl_doit_func		doit;
 	rtnl_dumpit_func	dumpit;
@@ -389,6 +392,11 @@ int rtnl_link_register(struct rtnl_link_ops *ops)
 {
 	int err;
 
+	/* Sanity-check max sizes to avoid stack buffer overflow. */
+	if (WARN_ON(ops->maxtype > RTNL_MAX_TYPE ||
+		    ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE))
+		return -EINVAL;
+
 	rtnl_lock();
 	err = __rtnl_link_register(ops);
 	rtnl_unlock();
@@ -2900,13 +2908,16 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 
 	if (1) {
-		struct nlattr *attr[ops ? ops->maxtype + 1 : 1];
-		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 1];
+		struct nlattr *attr[RTNL_MAX_TYPE + 1];
+		struct nlattr *slave_attr[RTNL_SLAVE_MAX_TYPE + 1];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
 		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
+			if (ops->maxtype > RTNL_MAX_TYPE)
+				return -EINVAL;
+
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
 				err = nla_parse_nested(attr, ops->maxtype,
 						       linkinfo[IFLA_INFO_DATA],
@@ -2923,6 +2934,9 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		}
 
 		if (m_ops) {
+			if (ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
+				return -EINVAL;
+
 			if (m_ops->slave_maxtype &&
 			    linkinfo[IFLA_INFO_SLAVE_DATA]) {
 				err = nla_parse_nested(slave_attr,
-- 
2.17.0


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [bpf-next V1 PATCH 0/8] bpf/xdp: add flags argument to ndo_xdp_xmit and flag flush operation
From: Song Liu @ 2018-05-30 22:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	makita.toshiaki
In-Reply-To: <152770312703.20510.5854417568847239931.stgit@firesoul>

Overall, this set looks good to me. The only suggestion I have is to add more
documentation on the expected behavior of XDP_XMIT_FLUSH in netdevice.h
(as part of 01/08).

Thanks,
Song


On Wed, May 30, 2018 at 11:00 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> As I mentioned in merge commit 10f678683e4 ("Merge branch 'xdp_xmit-bulking'")
> I plan to change the API for ndo_xdp_xmit once more, by adding a flags
> argument, which is done in this patchset.
>
> I know it is late in the cycle (currently at rc7), but it would be
> nice to avoid changing NDOs over several kernel releases, as it is
> annoying to vendors and distro backporters, but it is not strictly
> UAPI so it is allowed (according to Alexei).
>
> The end-goal is getting rid of the ndo_xdp_flush operation, as it will
> make it possible for drivers to implement a TXQ synchronization mechanism
> that is not necessarily derived from the CPU id (smp_processor_id).
>
> This patchset removes all callers of the ndo_xdp_flush operation, but
> it doesn't take the last step of removing it from all drivers.  This
> can be done later, or I can update the patchset on request.
>
> Micro-benchmarks only show a very small performance improvement, for
> map-redirect around ~2 ns, and for non-map redirect ~7 ns.  I've not
> benchmarked this with CONFIG_RETPOLINE, but the performance benefit
> should be more visible given we end-up removing an indirect call.
>
> ---
>
> Jesper Dangaard Brouer (8):
>       xdp: add flags argument to ndo_xdp_xmit API
>       i40e: implement flush flag for ndo_xdp_xmit
>       ixgbe: implement flush flag for ndo_xdp_xmit
>       tun: implement flush flag for ndo_xdp_xmit
>       virtio_net: implement flush flag for ndo_xdp_xmit
>       xdp: done implementing ndo_xdp_xmit flush flag for all drivers
>       bpf/xdp: non-map redirect can avoid calling ndo_xdp_flush
>       bpf/xdp: devmap can avoid calling ndo_xdp_flush
>
>
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |    9 ++++++++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    3 ++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   23 +++++++++++++++++------
>  drivers/net/tun.c                             |   25 ++++++++++++++++++-------
>  drivers/net/virtio_net.c                      |    9 ++++++++-
>  include/linux/netdevice.h                     |    7 ++++---
>  include/net/xdp.h                             |    4 ++++
>  kernel/bpf/devmap.c                           |   20 +++++++-------------
>  net/core/filter.c                             |    3 +--
>  9 files changed, 69 insertions(+), 34 deletions(-)
>
> --

^ permalink raw reply

* [PATCH net-next] net: dsa: mv88e6xxx: Be explicit about DT or pdata
From: Andrew Lunn @ 2018-05-30 22:15 UTC (permalink / raw)
  To: David Miller
  Cc: Vivien Didelot, Florian Fainelli, dan.carpenter, netdev,
	Andrew Lunn

Make it explicit that either device tree is used or platform data.  If
neither is available, abort the probe.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 877b7cb0b6f2 ("net: dsa: mv88e6xxx: Add minimal platform_data support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 12df00f593b7..437cd6eb4faa 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4389,6 +4389,9 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	int port;
 	int err;
 
+	if (!np && !pdata)
+		return -EINVAL;
+
 	if (np)
 		compat_info = of_device_get_match_data(dev);
 
-- 
2.17.0

^ permalink raw reply related

* Re: [bpf-next V1 PATCH 8/8] bpf/xdp: devmap can avoid calling ndo_xdp_flush
From: Song Liu @ 2018-05-30 22:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	makita.toshiaki
In-Reply-To: <152770327343.20510.17045843025201198801.stgit@firesoul>

On Wed, May 30, 2018 at 11:01 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> The XDP_REDIRECT map devmap can avoid using ndo_xdp_flush, by instead
> instructing ndo_xdp_xmit to flush via XDP_XMIT_FLUSH flag in
> appropriate places.
>
> Notice after this patch it is possible to remove ndo_xdp_flush
> completely, as this is the last user of ndo_xdp_flush. This is left
> for later patches, to keep driver changes separate.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  kernel/bpf/devmap.c |   20 +++++++-------------
>  1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 04fbd75a5274..9c846a7a8cff 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -217,7 +217,7 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
>  }
>
>  static int bq_xmit_all(struct bpf_dtab_netdev *obj,
> -                        struct xdp_bulk_queue *bq)
> +                      struct xdp_bulk_queue *bq, bool flush)

How about we use "int flags" instead of "bool flush" for easier extension?

Thanks,
Song

>  {
>         struct net_device *dev = obj->dev;
>         int sent = 0, drops = 0, err = 0;
> @@ -232,7 +232,8 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
>                 prefetch(xdpf);
>         }
>
> -       sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, 0);
> +       sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q,
> +                                            flush ? XDP_XMIT_FLUSH : 0);
>         if (sent < 0) {
>                 err = sent;
>                 sent = 0;
> @@ -276,7 +277,6 @@ void __dev_map_flush(struct bpf_map *map)
>         for_each_set_bit(bit, bitmap, map->max_entries) {
>                 struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]);
>                 struct xdp_bulk_queue *bq;
> -               struct net_device *netdev;
>
>                 /* This is possible if the dev entry is removed by user space
>                  * between xdp redirect and flush op.
> @@ -287,10 +287,7 @@ void __dev_map_flush(struct bpf_map *map)
>                 __clear_bit(bit, bitmap);
>
>                 bq = this_cpu_ptr(dev->bulkq);
> -               bq_xmit_all(dev, bq);
> -               netdev = dev->dev;
> -               if (likely(netdev->netdev_ops->ndo_xdp_flush))
> -                       netdev->netdev_ops->ndo_xdp_flush(netdev);
> +               bq_xmit_all(dev, bq, true);
>         }
>  }
>
> @@ -320,7 +317,7 @@ static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
>         struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
>
>         if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
> -               bq_xmit_all(obj, bq);
> +               bq_xmit_all(obj, bq, false);
>
>         /* Ingress dev_rx will be the same for all xdp_frame's in
>          * bulk_queue, because bq stored per-CPU and must be flushed
> @@ -359,8 +356,7 @@ static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
>
>  static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
>  {
> -       if (dev->dev->netdev_ops->ndo_xdp_flush) {
> -               struct net_device *fl = dev->dev;
> +       if (dev->dev->netdev_ops->ndo_xdp_xmit) {
>                 struct xdp_bulk_queue *bq;
>                 unsigned long *bitmap;
>
> @@ -371,9 +367,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
>                         __clear_bit(dev->bit, bitmap);
>
>                         bq = per_cpu_ptr(dev->bulkq, cpu);
> -                       bq_xmit_all(dev, bq);
> -
> -                       fl->netdev_ops->ndo_xdp_flush(dev->dev);
> +                       bq_xmit_all(dev, bq, true);
>                 }
>         }
>  }
>

^ permalink raw reply

* Re: [PATCH net-next] net: bridge: Lock before br_fdb_find()
From: Petr Machata @ 2018-05-30 22:03 UTC (permalink / raw)
  To: David Miller; +Cc: bridge, netdev, stephen
In-Reply-To: <20180530.124215.898586103229215718.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Petr Machata <petrm@mellanox.com>
> Date: Mon, 28 May 2018 17:44:16 +0200
>
>> Callers of br_fdb_find() need to hold the hash lock, which
>> br_fdb_find_port() doesn't do. Add the missing lock/unlock
>> pair.
>> 
>> Signed-off-by: Petr Machata <petrm@mellanox.com>
>
> If all of the these uses of br_fdb_find_port() are safe, then it
> should use the RCU fdb lookup variant.
>
> So I basically agree with Stephen that this locking doesn't make any
> sense.
>
> The lock is needed when you are going to add or delete an FDB entry.
>
> Here we are doing a lookup and returning a device pointer via the FDB
> entry found in the lookup.
>
> The RTNL assertion assures that the device returned won't disappear.
>
> If the device can disappear, the spinlock added by this patch doesn't
> change that at all.

OK, I'll take another look at this.

Thanks,
Petr

^ permalink raw reply

* Re: [bpf-next V1 PATCH 2/8] i40e: implement flush flag for ndo_xdp_xmit
From: Song Liu @ 2018-05-30 21:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	makita.toshiaki
In-Reply-To: <152770324304.20510.3553783983496502887.stgit@firesoul>

On Wed, May 30, 2018 at 11:00 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

I guess we still need to say something in the commit message?

> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index c0451d6e0790..03c1446f0465 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -3685,7 +3685,7 @@ int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
>         if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
>                 return -ENXIO;
>
> -       if (unlikely(flags & ~XDP_XMIT_FLAGS_NONE))
> +       if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
>                 return -EINVAL;
>
>         for (i = 0; i < n; i++) {
> @@ -3699,6 +3699,9 @@ int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
>                 }
>         }
>
> +       if (unlikely(flags & XDP_XMIT_FLUSH))
> +               i40e_xdp_ring_update_tail(vsi->xdp_rings[queue_index]);
> +
>         return n - drops;

Do we still flush when drops > 0?

Thanks,
Song

>  }
>
>

^ permalink raw reply

* Re: [bpf-next V1 PATCH 1/8] xdp: add flags argument to ndo_xdp_xmit API
From: Song Liu @ 2018-05-30 21:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	makita.toshiaki
In-Reply-To: <152770323794.20510.15169206222123614428.stgit@firesoul>

On Wed, May 30, 2018 at 11:00 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> This patch only change the API and reject any use of flags. This is an
> intermediate step that allows us to implement the flush flag operation
> later, for each individual driver in a separate patch.
>
> The plan is to implement flush operation via XDP_XMIT_FLUSH flag
> and then remove XDP_XMIT_FLAGS_NONE when done.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |    6 +++++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    3 ++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    5 ++++-
>  drivers/net/tun.c                             |    8 ++++++--
>  drivers/net/virtio_net.c                      |    5 ++++-
>  include/linux/netdevice.h                     |    7 ++++---
>  include/net/xdp.h                             |    5 +++++
>  kernel/bpf/devmap.c                           |    2 +-
>  net/core/filter.c                             |    2 +-
>  9 files changed, 32 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 9b698c5acd05..c0451d6e0790 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -3670,7 +3670,8 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
>   * For error cases, a negative errno code is returned and no-frames
>   * are transmitted (caller must handle freeing frames).
>   **/
> -int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
> +int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
> +                 u32 flags)
>  {
>         struct i40e_netdev_priv *np = netdev_priv(dev);
>         unsigned int queue_index = smp_processor_id();
> @@ -3684,6 +3685,9 @@ int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
>         if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
>                 return -ENXIO;
>
> +       if (unlikely(flags & ~XDP_XMIT_FLAGS_NONE))
> +               return -EINVAL;
> +
>         for (i = 0; i < n; i++) {
>                 struct xdp_frame *xdpf = frames[i];
>                 int err;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index eb8804b3d7b6..820f76db251b 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -487,7 +487,8 @@ u32 i40e_get_tx_pending(struct i40e_ring *ring, bool in_sw);
>  void i40e_detect_recover_hung(struct i40e_vsi *vsi);
>  int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size);
>  bool __i40e_chk_linearize(struct sk_buff *skb);
> -int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames);
> +int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
> +                 u32 flags);
>  void i40e_xdp_flush(struct net_device *dev);
>
>  /**
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 031d65c4178d..87f088f4af52 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10023,7 +10023,7 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>  }
>
>  static int ixgbe_xdp_xmit(struct net_device *dev, int n,
> -                         struct xdp_frame **frames)
> +                         struct xdp_frame **frames, u32 flags)
>  {
>         struct ixgbe_adapter *adapter = netdev_priv(dev);
>         struct ixgbe_ring *ring;
> @@ -10033,6 +10033,9 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n,
>         if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state)))
>                 return -ENETDOWN;
>
> +       if (unlikely(flags & ~XDP_XMIT_FLAGS_NONE))
> +               return -EINVAL;
> +
>         /* During program transitions its possible adapter->xdp_prog is assigned
>          * but ring has not been configured yet. In this case simply abort xmit.
>          */
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2265d2ccea47..b182b8cdd219 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1285,7 +1285,8 @@ static const struct net_device_ops tun_netdev_ops = {
>         .ndo_get_stats64        = tun_net_get_stats64,
>  };
>
> -static int tun_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
> +static int tun_xdp_xmit(struct net_device *dev, int n,
> +                       struct xdp_frame **frames, u32 flags)
>  {
>         struct tun_struct *tun = netdev_priv(dev);
>         struct tun_file *tfile;
> @@ -1294,6 +1295,9 @@ static int tun_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames
>         int cnt = n;
>         int i;
>
> +       if (unlikely(flags & ~XDP_XMIT_FLAGS_NONE))
> +               return -EINVAL;
> +
>         rcu_read_lock();
>
>         numqueues = READ_ONCE(tun->numqueues);
> @@ -1332,7 +1336,7 @@ static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
>         if (unlikely(!frame))
>                 return -EOVERFLOW;
>
> -       return tun_xdp_xmit(dev, 1, &frame);
> +       return tun_xdp_xmit(dev, 1, &frame, 0);
>  }
>
>  static void tun_xdp_flush(struct net_device *dev)
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b2647dd5d302..4ed823625953 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -468,7 +468,7 @@ static int __virtnet_xdp_tx_xmit(struct virtnet_info *vi,
>  }
>
>  static int virtnet_xdp_xmit(struct net_device *dev,
> -                           int n, struct xdp_frame **frames)
> +                           int n, struct xdp_frame **frames, u32 flags)
>  {
>         struct virtnet_info *vi = netdev_priv(dev);
>         struct receive_queue *rq = vi->rq;
> @@ -481,6 +481,9 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>         int err;
>         int i;
>
> +       if (unlikely(flags & ~XDP_XMIT_FLAGS_NONE))
> +               return -EINVAL;
> +
>         qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
>         sq = &vi->sq[qp];
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8452f72087ef..7f17785a59d7 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1185,13 +1185,13 @@ struct dev_ifalias {
>   *     This function is used to set or query state related to XDP on the
>   *     netdevice and manage BPF offload. See definition of
>   *     enum bpf_netdev_command for details.
> - * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp);
> + * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp,
> + *                     u32 flags);
>   *     This function is used to submit @n XDP packets for transmit on a
>   *     netdevice. Returns number of frames successfully transmitted, frames
>   *     that got dropped are freed/returned via xdp_return_frame().
>   *     Returns negative number, means general error invoking ndo, meaning
>   *     no frames were xmit'ed and core-caller will free all frames.
> - *     TODO: Consider add flag to allow sending flush operation.
>   * void (*ndo_xdp_flush)(struct net_device *dev);
>   *     This function is used to inform the driver to flush a particular
>   *     xdp tx queue. Must be called on same CPU as xdp_xmit.
> @@ -1380,7 +1380,8 @@ struct net_device_ops {
>         int                     (*ndo_bpf)(struct net_device *dev,
>                                            struct netdev_bpf *bpf);
>         int                     (*ndo_xdp_xmit)(struct net_device *dev, int n,
> -                                               struct xdp_frame **xdp);
> +                                               struct xdp_frame **xdp,
> +                                               u32 flags);
>         void                    (*ndo_xdp_flush)(struct net_device *dev);
>  };
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 7ad779237ae8..308a4b30b484 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -40,6 +40,11 @@ enum xdp_mem_type {
>         MEM_TYPE_MAX,
>  };
>
> +/* XDP flags for ndo_xdp_xmit */
> +#define XDP_XMIT_FLAGS_NONE    0U
> +#define XDP_XMIT_FLUSH         (1U << 0)
> +#define XDP_XMIT_FLAGS_MASK    XDP_XMIT_FLUSH
> +

I guess we need more documentation here on what XDP_XMIT_FLUSH does.

Other than this, it looks good to me.

Acked-by: Song Liu <songliubraving@fb.com>


>  struct xdp_mem_info {
>         u32 type; /* enum xdp_mem_type, but known size type */
>         u32 id;
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index ae16d0c373ef..04fbd75a5274 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -232,7 +232,7 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
>                 prefetch(xdpf);
>         }
>
> -       sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q);
> +       sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, 0);
>         if (sent < 0) {
>                 err = sent;
>                 sent = 0;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 81bd2e9fe8fc..6a21dbcad350 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3056,7 +3056,7 @@ static int __bpf_tx_xdp(struct net_device *dev,
>         if (unlikely(!xdpf))
>                 return -EOVERFLOW;
>
> -       sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf);
> +       sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, 0);
>         if (sent <= 0)
>                 return sent;
>         dev->netdev_ops->ndo_xdp_flush(dev);
>

^ permalink raw reply

* [PATCH] net: ti: cpsw: include gpio/consumer.h
From: Arnd Bergmann @ 2018-05-30 21:51 UTC (permalink / raw)
  To: David S. Miller, Grygorii Strashko
  Cc: linux-gpio, Florian Fainelli, Arnd Bergmann, Ivan Khoronzhuk,
	Keerthy, linux-omap, netdev, linux-kernel

On platforms that don't always enable CONFIG_GPIOLIB, we run into
a build failure:

drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_probe':
drivers/net/ethernet/ti/cpsw.c:3006:9: error: implicit declaration of function 'devm_gpiod_get_array_optional' [-Werror=implicit-function-declaration]
  mode = devm_gpiod_get_array_optional(&pdev->dev, "mode", GPIOD_OUT_LOW);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/ti/cpsw.c:3006:59: error: 'GPIOD_OUT_LOW' undeclared (first use in this function); did you mean 'GPIOF_INIT_LOW'?
  mode = devm_gpiod_get_array_optional(&pdev->dev, "mode", GPIOD_OUT_LOW);

Since we cannot rely on this to be visible from gpio.h, we have to include
gpio/consumer.h directly.

Fixes: 2652113ff043 ("net: ethernet: ti: Allow most drivers with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/ti/cpsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 643cd2d9dfb6..534596ce00d3 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -29,7 +29,7 @@
 #include <linux/workqueue.h>
 #include <linux/delay.h>
 #include <linux/pm_runtime.h>
-#include <linux/gpio.h>
+#include <linux/gpio/consumer.h>
 #include <linux/of.h>
 #include <linux/of_mdio.h>
 #include <linux/of_net.h>
-- 
2.9.0

^ permalink raw reply related

* [PATCH] ieee802154: mcr20a: add missing includes
From: Arnd Bergmann @ 2018-05-30 21:47 UTC (permalink / raw)
  To: Xue Liu, Alexander Aring, Stefan Schmidt, David S. Miller
  Cc: Arnd Bergmann, Colin Ian King, Gustavo A. R. Silva, linux-wpan,
	netdev, linux-kernel

Without CONFIG_GPIOLIB, some headers are not included implicitly,
leading to a build failure:

drivers/net/ieee802154/mcr20a.c: In function 'mcr20a_probe':
drivers/net/ieee802154/mcr20a.c:1347:13: error: implicit declaration of function 'irq_get_trigger_type'; did you mean 'irq_get_irqchip_state'? [-Werror=implicit-function-declaration]

This includes gpio/consumer.h and irq.h directly rather through the
gpiolib header.

Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ieee802154/mcr20a.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ieee802154/mcr20a.c b/drivers/net/ieee802154/mcr20a.c
index de0d7f28a181..e428277781ac 100644
--- a/drivers/net/ieee802154/mcr20a.c
+++ b/drivers/net/ieee802154/mcr20a.c
@@ -15,10 +15,11 @@
  */
 #include <linux/kernel.h>
 #include <linux/module.h>
-#include <linux/gpio.h>
+#include <linux/gpio/consumer.h>
 #include <linux/spi/spi.h>
 #include <linux/workqueue.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/skbuff.h>
 #include <linux/of_gpio.h>
 #include <linux/regmap.h>
-- 
2.9.0

^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2018-05-30 21:38 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20180529132548.7f71c2ff@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

Hi all,

On Tue, 29 May 2018 13:25:48 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> After merging the net-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> x86_64-linux-ld: unknown architecture of input file `net/bpfilter/bpfilter_umh.o' is incompatible with i386:x86-64 output
> 
> Caused by commit
> 
>   d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
> 
> In my builds, the host is PowerPC 64 LE ...
> 
> I have reverted that commit along with
> 
>   61a552eb487f ("bpfilter: fix build dependency")
>   13405468f49d ("bpfilter: don't pass O_CREAT when opening console for debug")
> 
> for today.

I am still getting this failure (well, at least yesterday I did).

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Eric Dumazet @ 2018-05-30 21:30 UTC (permalink / raw)
  To: Qing Huang
  Cc: David Miller, netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan, linux-rdma,
	Santosh Shilimkar
In-Reply-To: <2cd40c9e-16a5-ba96-db7c-3aafc5af0957@oracle.com>

On Wed, May 30, 2018 at 5:08 PM Qing Huang <qing.huang@oracle.com> wrote:
>
>
>
> On 5/30/2018 1:50 PM, Eric Dumazet wrote:
> > On Wed, May 30, 2018 at 4:30 PM Qing Huang <qing.huang@oracle.com> wrote:
> >>
> >>
> >> On 5/29/2018 9:11 PM, Eric Dumazet wrote:
> >>> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> >>> brought a regression caught in our regression suite, thanks to KASAN.
> >> If KASAN reported issue was really caused by smaller chunk sizes,
> >> changing allocation
> >> order dynamically will eventually hit the same issue.
> > Sigh, you have little idea of what your patch really did...
> >
> > The KASAN part only shows the tip of the iceberg, but our main concern
> > is an increase of memory overhead.
>
> Well, the commit log only mentioned KASAN and but the change here didn't
> seem to solve
> the issue.

Can you elaborate ?

My patch solves our problems.

Both the memory overhead and KASAN splats are gone.

>
> >
> > Alternative is to revert your patch, since we are now very late in 4.17 cycle.
> >
> > Memory usage has grown a lot with your patch, since each 4KB page needs a full
> > struct mlx4_icm_chunk (256 bytes of overhead !)
>
> Going to smaller chunks will have some overhead. It depends on the
> application though.
> What's the total increased memory consumption in your env?


As I explained, your patch adds 256 bytes of overhead per 4KB.

Your changelog did not mentioned that at all, and we discovered this
the hard way.

That is pretty intolerable, and is a blocker for us, memory is precious.

^ permalink raw reply

* [PATCH bpf-next] bpf: prevent non-IPv4 socket to be added into sock hash
From: Wei Wang @ 2018-05-30 21:29 UTC (permalink / raw)
  To: netdev, John Fastabend; +Cc: Eric Dumazet, Willem de Bruijn, Wei Wang

From: Wei Wang <weiwan@google.com>

Sock hash only supports IPv4 socket proto right now.
If a non-IPv4 socket gets stored in the BPF map, sk->sk_prot gets
overwritten with the v4 tcp prot.

Syskaller reported the following related issue on an IPv6 socket:
BUG: KASAN: slab-out-of-bounds in ip6_dst_idev include/net/ip6_fib.h:203 [inline]
BUG: KASAN: slab-out-of-bounds in ip6_xmit+0x2002/0x23f0 net/ipv6/ip6_output.c:264
Read of size 8 at addr ffff8801b300edb0 by task syz-executor888/4522

CPU: 0 PID: 4522 Comm: syz-executor888 Not tainted 4.17.0-rc4+ #17
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 ip6_dst_idev include/net/ip6_fib.h:203 [inline]
 ip6_xmit+0x2002/0x23f0 net/ipv6/ip6_output.c:264
 inet6_csk_xmit+0x377/0x630 net/ipv6/inet6_connection_sock.c:139
 tcp_transmit_skb+0x1be0/0x3e40 net/ipv4/tcp_output.c:1159
 tcp_send_syn_data net/ipv4/tcp_output.c:3441 [inline]
 tcp_connect+0x2207/0x45a0 net/ipv4/tcp_output.c:3480
 tcp_v4_connect+0x1934/0x1d50 net/ipv4/tcp_ipv4.c:272
 __inet_stream_connect+0x943/0x1120 net/ipv4/af_inet.c:655
 tcp_sendmsg_fastopen net/ipv4/tcp.c:1162 [inline]
 tcp_sendmsg_locked+0x2859/0x3ee0 net/ipv4/tcp.c:1209
 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
 inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 ___sys_sendmsg+0x805/0x940 net/socket.c:2117
 __sys_sendmsg+0x115/0x270 net/socket.c:2155
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43ff99
RSP: 002b:00007ffc00bd1cf8 EFLAGS: 00000217 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ff99
RDX: 0000000020000000 RSI: 0000000020000580 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000217 R12: 00000000004018c0
R13: 0000000000401950 R14: 0000000000000000 R15: 0000000000000000

Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
---
 kernel/bpf/sockmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 3b28955a6383..0e7b88bc3e3f 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2300,6 +2300,11 @@ static int sock_hash_update_elem(struct bpf_map *map,
 		return -EINVAL;
 	}
 
+	if (skops.sk->sk_family != AF_INET) {
+		fput(socket->file);
+		return -EAFNOSUPPORT;
+	}
+
 	err = sock_hash_ctx_update_elem(&skops, map, key, flags);
 	fput(socket->file);
 	return err;
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related

* [PATCH bpf] bpf: prevent non-ipv4 socket to be added into sock map
From: Wei Wang @ 2018-05-30 21:28 UTC (permalink / raw)
  To: netdev, John Fastabend; +Cc: Eric Dumazet, Willem de Bruijn, Wei Wang

From: Wei Wang <weiwan@google.com>

Sock map only supports IPv4 socket proto right now.
If a non-IPv4 socket gets stored in the BPF map, sk->sk_prot gets
overwritten with the v4 tcp prot.
It could potentially cause issues when invoking functions from
sk->sk_prot later in the stack.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
---
 kernel/bpf/sockmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 95a84b2f10ce..1984922f99ee 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1873,6 +1873,11 @@ static int sock_map_update_elem(struct bpf_map *map,
 		return -EOPNOTSUPP;
 	}
 
+	if (skops.sk->sk_family != AF_INET) {
+		fput(socket->file);
+		return -EAFNOSUPPORT;
+	}
+
 	err = sock_map_ctx_update_elem(&skops, map, key, flags);
 	fput(socket->file);
 	return err;
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Samudrala, Sridhar @ 2018-05-30 21:23 UTC (permalink / raw)
  To: Jakub Kicinski, Michael Chan; +Cc: David Miller, Netdev, Or Gerlitz
In-Reply-To: <CAJpBn1yyyJueWrtR3frO1NkntTCZxsEZ4heW_sx5--7vpfF_Qw@mail.gmail.com>

On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>> extended?
>>> +1 it's painful to see this feature being added to the legacy
>>> API :(  Another duplicated configuration knob.
>>>
>>>> I didn't know about that.
>>>>
>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>> We discussed about this.  ethtool on the VF representor will only work
>>>> in switchdev mode and also will not support min/max values.
>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>> feel like implementing switchdev mode and rewriting features in devlink
>>> may be too much to ask.
>> Totally agreed.  And switchdev mode doesn't seem to be that widely
>> used at the moment.  Do you have other suggestions besides NDO?
> At some points you (Broadcom) were working whole bunch of devlink
> configuration options for the PCIe side of the ASIC.  The number of
> queues relates to things like number of allocated MSI-X vectors, which
> if memory serves me was in your devlink patch set.  In an ideal world
> we would try to keep all those in one place :)
>
> For PCIe config there is always the question of what can be configured
> at runtime, and what requires a HW reset.  Therefore that devlink API
> which could configure current as well as persistent device settings was
> quite nice.  I'm not sure if reallocating queues would ever require
> PCIe block reset but maybe...  Certainly it seems the notion of min
> queues would make more sense in PCIe configuration devlink API than
> ethtool channel API to me as well.
>
> Queues are in the grey area between netdev and non-netdev constructs.
> They make sense both from PCIe resource allocation perspective (i.e.
> devlink PCIe settings) and netdev perspective (ethtool) because they
> feed into things like qdisc offloads, maybe per-queue stats etc.
>
> So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
> API and/or switchdev representors.  But neither of those are really an
> option for you today so IDK :)

One reason why 'switchdev' mode is not yet widely used or enabled by default
could be due to the requirement to program the flow rules only via slow path.

Would it make sense to relax this requirement and support a mode where port
representors are created and let the PF driver implement a default policy that
adds flow rules for all the VFs to enable connectivity and let the user
add/modify the rules via port representors?

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Change bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address families
From: Alexei Starovoitov @ 2018-05-30 21:19 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, borkmann, ast, David Ahern
In-Reply-To: <20180530192417.9945-1-dsahern@kernel.org>

On Wed, May 30, 2018 at 12:24:17PM -0700, dsahern@kernel.org wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address
> families. Allows userspace to probe for support as more are added
> (e.g., AF_MPLS).
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied, Thanks

^ permalink raw reply

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Qing Huang @ 2018-05-30 21:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan, linux-rdma,
	santosh.shilimkar@oracle.com
In-Reply-To: <CANn89i+KRHvsxH9Gs4QhtMt4Ex2OH2h120gxjYkBf1_27O27_g@mail.gmail.com>



On 5/30/2018 1:50 PM, Eric Dumazet wrote:
> On Wed, May 30, 2018 at 4:30 PM Qing Huang <qing.huang@oracle.com> wrote:
>>
>>
>> On 5/29/2018 9:11 PM, Eric Dumazet wrote:
>>> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
>>> brought a regression caught in our regression suite, thanks to KASAN.
>> If KASAN reported issue was really caused by smaller chunk sizes,
>> changing allocation
>> order dynamically will eventually hit the same issue.
> Sigh, you have little idea of what your patch really did...
>
> The KASAN part only shows the tip of the iceberg, but our main concern
> is an increase of memory overhead.

Well, the commit log only mentioned KASAN and but the change here didn't 
seem to solve
the issue.

>
> Alternative is to revert your patch, since we are now very late in 4.17 cycle.
>
> Memory usage has grown a lot with your patch, since each 4KB page needs a full
> struct mlx4_icm_chunk (256 bytes of overhead !)

Going to smaller chunks will have some overhead. It depends on the 
application though.
What's the total increased memory consumption in your env?

>
> Really we have no choice here, your patch went too far and increased
> memory consumption quite a lot.



>
> My patch is simply the best way to address your original concern, and
> not increase overall overhead.
>
> ( each struct mlx4_icm_chunk should be able to store
> MLX4_ICM_CHUNK_LEN pages, instead of one page of 4KB )

^ permalink raw reply

* Re: [PATCH][next] bpf: devmap: remove redundant assignment of dev = dev
From: Alexei Starovoitov @ 2018-05-30 21:07 UTC (permalink / raw)
  To: Colin King
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20180530150916.16808-1-colin.king@canonical.com>

On Wed, May 30, 2018 at 04:09:16PM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> The assignment dev = dev is redundant and should be removed.
> 
> Detected by CoverityScan, CID#1469486 ("Evaluation order violation")
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied, Thanks.

Please use [PATCH bpf-next] in the subject next time.
See Documentation/bpf/bpf_devel_QA.rst

^ permalink raw reply

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Eric Dumazet @ 2018-05-30 20:50 UTC (permalink / raw)
  To: Qing Huang
  Cc: David Miller, netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan
In-Reply-To: <c87db444-266d-eb76-d8d5-d8a0c11038b1@oracle.com>

On Wed, May 30, 2018 at 4:30 PM Qing Huang <qing.huang@oracle.com> wrote:
>
>
>
> On 5/29/2018 9:11 PM, Eric Dumazet wrote:
> > Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> > brought a regression caught in our regression suite, thanks to KASAN.
>
> If KASAN reported issue was really caused by smaller chunk sizes,
> changing allocation
> order dynamically will eventually hit the same issue.

Sigh, you have little idea of what your patch really did...

The KASAN part only shows the tip of the iceberg, but our main concern
is an increase of memory overhead.

Alternative is to revert your patch, since we are now very late in 4.17 cycle.

Memory usage has grown a lot with your patch, since each 4KB page needs a full
struct mlx4_icm_chunk (256 bytes of overhead !)

Really we have no choice here, your patch went too far and increased
memory consumption quite a lot.

My patch is simply the best way to address your original concern, and
not increase overall overhead.

( each struct mlx4_icm_chunk should be able to store
MLX4_ICM_CHUNK_LEN pages, instead of one page of 4KB )

^ permalink raw reply

* Re: [PATCH v3 11/11] net: sched: change action API to use array of pointers to actions
From: Jiri Pirko @ 2018-05-30 20:37 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-12-git-send-email-vladbu@mellanox.com>

Sun, May 27, 2018 at 11:17:29PM CEST, vladbu@mellanox.com wrote:
>Act API used linked list to pass set of actions to functions. It is
>intrusive data structure that stores list nodes inside action structure
>itself, which means it is not safe to modify such list concurrently.
>However, action API doesn't use any linked list specific operations on this
>set of actions, so it can be safely refactored into plain pointer array.
>
>Refactor action API to use array of pointers to tc_actions instead of
>linked list. Change argument 'actions' type of exported action init,
>destroy and dump functions.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Even with the nit Marcelo found, this looks fine to me.

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: phy: improve PHY suspend/resume
From: Andrew Lunn @ 2018-05-30 20:35 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <5b9eceab-35b4-922c-7410-d8968d8364c7@gmail.com>

> I think we need a better solution than spending the effort needed
> to make the MDIO ops runtime-pm-aware. In general there seems to be
> just one network driver using both phylib and runtime pm, so most
> drivers aren't affected (yet).
> 
> I will spend few more thoughts on a solution ..

Hi Heiner

Please keep in mind that MDIO is a generic bus. Many Ethernet switches
are connected via MDIO. Some of those switches have MDIO busses of
their own. Also, some Broadcom devices have USB-PHYs controlled over
MDIO, etc.

So you need a generic solution here.

   Andrew

^ permalink raw reply

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Qing Huang @ 2018-05-30 20:30 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan
In-Reply-To: <20180530041152.113393-1-edumazet@google.com>



On 5/29/2018 9:11 PM, Eric Dumazet wrote:
> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> brought a regression caught in our regression suite, thanks to KASAN.

If KASAN reported issue was really caused by smaller chunk sizes, 
changing allocation
order dynamically will eventually hit the same issue.


> Note that mlx4_alloc_icm() is already able to try high order allocations
> and fallback to low-order allocations under high memory pressure.
>
> We only have to tweak gfp_mask a bit, to help falling back faster,
> without risking OOM killings.
>
> BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
> Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585
>
> CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G           O
> Call Trace:
>   [<ffffffffba80d7bb>] dump_stack+0x4d/0x72
>   [<ffffffffb951dc5f>] print_address_description+0x6f/0x260
>   [<ffffffffb951e1c7>] kasan_report+0x257/0x370
>   [<ffffffffb951e339>] __asan_report_load4_noabort+0x19/0x20
>   [<ffffffffc0256d28>] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
>   [<ffffffffc02785b3>] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib]
>   [<ffffffffc02dbfdb>] qpstat_print_qp+0x13b/0x500 [ib_uverbs]
>   [<ffffffffc02dc3ea>] qpstat_seq_show+0x4a/0xb0 [ib_uverbs]
>   [<ffffffffb95f125c>] seq_read+0xa9c/0x1230
>   [<ffffffffb96e0821>] proc_reg_read+0xc1/0x180
>   [<ffffffffb9577918>] __vfs_read+0xe8/0x730
>   [<ffffffffb9578057>] vfs_read+0xf7/0x300
>   [<ffffffffb95794d2>] SyS_read+0xd2/0x1b0
>   [<ffffffffb8e06b16>] do_syscall_64+0x186/0x420
>   [<ffffffffbaa00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f851a7bb30d
> RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d
> RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b
> RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000
> R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000
> R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000
>
> Allocated by task 4488:
>   save_stack+0x46/0xd0
>   kasan_kmalloc+0xad/0xe0
>   __kmalloc+0x101/0x5e0
>   ib_register_device+0xc03/0x1250 [ib_core]
>   mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib]
>   mlx4_add_device+0xa9/0x340 [mlx4_core]
>   mlx4_register_interface+0x16e/0x390 [mlx4_core]
>   xhci_pci_remove+0x7a/0x180 [xhci_pci]
>   do_one_initcall+0xa0/0x230
>   do_init_module+0x1b9/0x5a4
>   load_module+0x63e6/0x94c0
>   SYSC_init_module+0x1a4/0x1c0
>   SyS_init_module+0xe/0x10
>   do_syscall_64+0x186/0x420
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> Freed by task 0:
> (stack is not available)
>
> The buggy address belongs to the object at ffff8817df584f40
>   which belongs to the cache kmalloc-32 of size 32
> The buggy address is located 8 bytes to the right of
>   32-byte region [ffff8817df584f40, ffff8817df584f60)
> The buggy address belongs to the page:
> page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1
> flags: 0x880000000000100(slab)
> raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f
> raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0
> page dumped because: kasan: bad access detected
> page->mem_cgroup:ffff883ff78d26c0
>
> Memory state around the buggy address:
>   ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc
>   ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
>> ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc
>                                                            ^
>   ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
>   ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: John Sperbeck <jsperbeck@google.com>
> Cc: Tarick Bedeir <tarick@google.com>
> Cc: Qing Huang <qing.huang@oracle.com>
> Cc: Daniel Jurgens <danielj@mellanox.com>
> Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 685337d58276fc91baeeb64387c52985e1bc6dda..cae33d5c7dbd9ba7929adcf2127b104f6796fa5a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,13 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in page size (default 4KB on many archs) chunks to avoid high
> - * order memory allocations in fragmented/high usage memory situation.
> + * We allocate in as big chunks as we can, up to a maximum of 256 KB
> + * per chunk. Note that the chunks are not necessarily in contiguous
> + * physical memory.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
> -	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
> +	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> +	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   	struct mlx4_icm *icm;
>   	struct mlx4_icm_chunk *chunk = NULL;
>   	int cur_order;
> +	gfp_t mask;
>   	int ret;
>   
>   	/* We use sg_set_buf for coherent allocs, which assumes low memory */
> @@ -178,13 +180,16 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   		while (1 << cur_order > npages)
>   			--cur_order;
>   
> +		mask = gfp_mask;
> +		if (cur_order)
> +			mask = (mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
>   		if (coherent)
>   			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
>   						      &chunk->mem[chunk->npages],
> -						      cur_order, gfp_mask);
> +						      cur_order, mask);
>   		else
>   			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
> -						   cur_order, gfp_mask,
> +						   cur_order, mask,
>   						   dev->numa_node);
>   
>   		if (ret) {

^ permalink raw reply

* Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress
From: Jiri Pirko @ 2018-05-30 20:29 UTC (permalink / raw)
  To: John Hurley
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <CAK+XE=n9EejO2Cx8VLnRTcRqD2Ci+t0c4VqO1c07FeyFgPFSFg@mail.gmail.com>

Wed, May 30, 2018 at 11:26:23AM CEST, john.hurley@netronome.com wrote:
>On Tue, May 29, 2018 at 11:09 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, May 29, 2018 at 04:08:48PM CEST, john.hurley@netronome.com wrote:
>>>On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
>>><jakub.kicinski@netronome.com> wrote:
>>>> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
>>>>> Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicinski@netronome.com wrote:
>>>>> >Hi!
>>>>> >
>>>>> >This series from John adds bond offload to the nfp driver.  Patch 5
>>>>> >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
>>>>> >hashing matches that of the software LAG.  This may be unnecessarily
>>>>> >conservative, let's see what LAG maintainers think :)
>>>>>
>>>>> So you need to restrict offload to only certain hash algo? In mlxsw, we
>>>>> just ignore the lag setting and do some hw default hashing. Would not be
>>>>> enough? Note that there's a good reason for it, as you see, in team, the
>>>>> hashing is done in a BPF function and could be totally arbitrary.
>>>>> Your patchset effectively disables team offload for nfp.
>>>>
>>>> My understanding is that the project requirements only called for L3/L4
>>>> hash algorithm offload, hence the temptation to err on the side of
>>>> caution and not offload all the bond configurations.  John can provide
>>>> more details.  Not being able to offload team is unfortunate indeed.
>>>
>>>Hi Jiri,
>>>Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
>>>as this is currently what is supported in fw.
>>
>> In mlxsw, a default l3/l4 is used always, no matter what the
>> bonding/team sets. It is not correct, but it works with team as well.
>> Perhaps we can have NETDEV_LAG_HASH_UNKNOWN to indicate to the driver to
>> do some default? That would make the "team" offload functional.
>>
>
>yes, I would agree with that.
>Thanks

Okay, would you please adjust your driver?

I will teka care of mlxsw bits.

Thanks!

>
>>>Hopefully this will change as fw features are expanded.
>>>I understand the issue this presents with offloading team.
>>>Perhaps resorting to a default hw hash for team is acceptable.
>>>John

^ permalink raw reply

* Re: suspicius csum initialization in vmxnet3_rx_csum
From: Paolo Abeni @ 2018-05-30 20:29 UTC (permalink / raw)
  To: Guolin Yang, Ronak Doshi; +Cc: Neil Horman, Boon Ang, Louis Luo, netdev
In-Reply-To: <BLUPR05MB1907699C14C2F900BA57323DD96A0@BLUPR05MB1907.namprd05.prod.outlook.com>

Hi,

On Thu, 2018-05-24 at 21:48 +0000, Guolin Yang wrote:
> Yes, that code  is not correct, we should fix that code

Did you have any chance to address the issue and/or to give a more in-
deepth look to the change proposed in my initial email?

Thanks,

Paolo 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox