Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: David Miller @ 2019-08-29 22:08 UTC (permalink / raw)
  To: idosch
  Cc: andrew, jiri, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829175759.GA19471@splinter>

From: Ido Schimmel <idosch@idosch.org>
Date: Thu, 29 Aug 2019 20:57:59 +0300

> On a software switch, when you run tcpdump without '-p', do you incur
> major packet loss? No. Will this happen when you punt several Tbps to
> your CPU on the hardware switch? Yes.
> 
> Extending the definition of promiscuous mode to mean punt all traffic to
> the CPU is wrong, IMO. You will not be able to capture all the packets

This is so illogical, it is mind boggling.

How different is this to using tcpdump/wireshark on a 100GB or 1TB
network interface?

There is no difference.

Please stop portraying switches as special in this regard, they are
not.

^ permalink raw reply

* Re: [PATCH v2 net-next 05/15] net: sgi: ioc3-eth: allocate space for desc rings only once
From: Jakub Kicinski @ 2019-08-29 22:05 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190830000058.882feb357058437cddc71315@suse.de>

On Fri, 30 Aug 2019 00:00:58 +0200, Thomas Bogendoerfer wrote:
> On Thu, 29 Aug 2019 14:05:37 -0700
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> 
> > On Thu, 29 Aug 2019 17:50:03 +0200, Thomas Bogendoerfer wrote:  
> > > +		if (skb)
> > > +			dev_kfree_skb_any(skb);  
> > 
> > I think dev_kfree_skb_any() accepts NULL  
> 
> yes, I'll drop the if
> 
> > > +
> > > +	/* Allocate and rx ring.  4kb = 512 entries  */
> > > +	ip->rxr = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
> > > +	if (!ip->rxr) {
> > > +		pr_err("ioc3-eth: rx ring allocation failed\n");
> > > +		err = -ENOMEM;
> > > +		goto out_stop;
> > > +	}
> > > +
> > > +	/* Allocate tx rings.  16kb = 128 bufs.  */
> > > +	ip->txr = (struct ioc3_etxd *)__get_free_pages(GFP_KERNEL, 2);
> > > +	if (!ip->txr) {
> > > +		pr_err("ioc3-eth: tx ring allocation failed\n");
> > > +		err = -ENOMEM;
> > > +		goto out_stop;
> > > +	}  
> > 
> > Please just use kcalloc()/kmalloc_array() here,  
> 
> both allocation will be replaced in patch 11 with dma_direct_alloc_pages.
> So I hope I don't need to change it here.

Ah, missed that!

> Out of curiosity does kcalloc/kmalloc_array give me the same guarantees about
> alignment ? rx ring needs to be 4KB aligned, tx ring 16KB aligned.

I don't think so, actually, I was mostly worried you are passing
address from get_page() into kfree() here ;) But patch 11 cures that,
so that's good, too.

> >, and make sure the flags
> > are set to GFP_KERNEL whenever possible. Here and in ioc3_alloc_rings()
> > it looks like GFP_ATOMIC is unnecessary.  
> 
> yes, I'll change it

^ permalink raw reply

* Re: [PATCH bpf-next 01/13] bpf: add bpf_map_value_size and bp_map_copy_value helper functions
From: Song Liu @ 2019-08-29 22:04 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Networking, Alexei Starovoitov, Brian Vazquez,
	Daniel Borkmann, Kernel Team
In-Reply-To: <20190829064502.2750359-1-yhs@fb.com>



> On Aug 28, 2019, at 11:45 PM, Yonghong Song <yhs@fb.com> wrote:
> 
> From: Brian Vazquez <brianvv@google.com>
> 
> Move reusable code from map_lookup_elem to helper functions to avoid code
> duplication in kernel/bpf/syscall.c
> 
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>


Acked-by: Song Liu <songliubraving@fb.com>

Yonghong, we also need your SoB. 



^ permalink raw reply

* Re: [PATCH v2 net-next 05/15] net: sgi: ioc3-eth: allocate space for desc rings only once
From: Thomas Bogendoerfer @ 2019-08-29 22:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190829140537.68abfc9f@cakuba.netronome.com>

On Thu, 29 Aug 2019 14:05:37 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Thu, 29 Aug 2019 17:50:03 +0200, Thomas Bogendoerfer wrote:
> > +		if (skb)
> > +			dev_kfree_skb_any(skb);
> 
> I think dev_kfree_skb_any() accepts NULL

yes, I'll drop the if

> > +
> > +	/* Allocate and rx ring.  4kb = 512 entries  */
> > +	ip->rxr = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
> > +	if (!ip->rxr) {
> > +		pr_err("ioc3-eth: rx ring allocation failed\n");
> > +		err = -ENOMEM;
> > +		goto out_stop;
> > +	}
> > +
> > +	/* Allocate tx rings.  16kb = 128 bufs.  */
> > +	ip->txr = (struct ioc3_etxd *)__get_free_pages(GFP_KERNEL, 2);
> > +	if (!ip->txr) {
> > +		pr_err("ioc3-eth: tx ring allocation failed\n");
> > +		err = -ENOMEM;
> > +		goto out_stop;
> > +	}
> 
> Please just use kcalloc()/kmalloc_array() here,

both allocation will be replaced in patch 11 with dma_direct_alloc_pages.
So I hope I don't need to change it here.

Out of curiosity does kcalloc/kmalloc_array give me the same guarantees about
alignment ? rx ring needs to be 4KB aligned, tx ring 16KB aligned.

>, and make sure the flags
> are set to GFP_KERNEL whenever possible. Here and in ioc3_alloc_rings()
> it looks like GFP_ATOMIC is unnecessary.

yes, I'll change it

Thomas.

-- 
SUSE Software Solutions Germany GmbH
HRB 247165 (AG München)
Geschäftsführer: Felix Imendörffer

^ permalink raw reply

* ANNOUNCE: rpld an another RPL implementation for Linux
From: Alexander Aring @ 2019-08-29 21:57 UTC (permalink / raw)
  To: open list:NETWORKING [GENERAL]
  Cc: Michael Richardson, Jamal Hadi Salim, Robert Kaiser,
	Martin Gergeleit, Kai Beckmann, koen, linux-wpan - ML, reubenhwk,
	BlueZ development, Stefan Schmidt, sebastian.meiling,
	Marcel Holtmann, Werner Almesberger, Jukka Rissanen

Hi,

I had some free time, I wanted to know how RPL [0] works so I did a
implementation. It's _very_ basic as it only gives you a "routable"
(is that a word?) thing afterwards in a very constrained setup of RPL
messages.

Took ~1 month to implement it and I reused some great code from radvd
[1]. I released it under the same license (BSD?). Anyway, I know there
exists a lot of memory leaks and the parameters are just crazy as not
practical in a real environment BUT it works.

I changed a little bit the dependencies from radvd (because fancy new things):

- lua for config handling
- libev for event loop handling
- libmnl for netlink handling

The code is available at:

https://github.com/linux-wpan/rpld

With a recent kernel (I think 4.19 and above) and necessary user space
dependencies, just build it and run the start script. It will create
some virtual IEEE 802.15.4 6LoWPAN interfaces and you can run
traceroute from namespace ns0 (which is the RPL DODAG root) to any
other node e.g. namespace ns5. With more knowledge of the scripts you
can change the underlying topology, everybody is welcome to improve
them.

I will work more on it when I have time... to have at least something
running means the real fun can begin (but it was already fun before).

The big thing what everybody wants is source routing, which requires
some control plane for RPL into the kernel to say how and when to put
source routing headers in IPv6. I think somehow I know what's
necessary now... but I didn't implemented it, this simple
implementation just filling up routing tables as RPL supports storing
(routing table) or non-storing (source routing) modes. People tells me
to lookup frrouting to look how they do source routing, I will if I
get the chance.

It doesn't run on Bluetooth yet, I know there exists a lack of UAPI to
figure out the linklayer which is used by 6LoWPAN. I need somehow a
SLAVE_INFO attribute in netlink to figure this out and tell me some
6LoWPAN specific attributes. I am sorry Bluetooth people, but I think
you are also more interested in source routing because I heard
somebody saying it's the more common approach outside (but I never saw
any other RPL implementation than unstrung running).

Also I did something in my masters thesis to make a better parent
selection, if this implementation becomes stable I can look to get
this migrated.

Please, radvd maintainer let me know if everything is okay from your
side. As I said I reused some code from radvd. I also operate on
ICMPv6 sockets. The same to Michael Richardson unstrung [2]. If there
is anything to talk or you have complains, I can change it.

Thanks, I really only wanted to get more knowledge about routing
protocols and how to implement such. Besides all known issues, I still
think it's a good starting point.

- Alex

[0] https://tools.ietf.org/html/rfc6550
[1] https://github.com/reubenhwk/radvd
[2] https://github.com/AnimaGUS-minerva/unstrung

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag
From: Jakub Kicinski @ 2019-08-29 21:56 UTC (permalink / raw)
  To: Davide Caratti
  Cc: borisp, Eric Dumazet, aviadye, davejwatson, davem, john.fastabend,
	Matthieu Baerts, netdev
In-Reply-To: <22da29aa0d0c683afeba7549cabc64c5e073d308.1567095873.git.dcaratti@redhat.com>

On Thu, 29 Aug 2019 18:48:04 +0200, Davide Caratti wrote:
> When an application configures kernel TLS on top of a TCP socket, it's
> now possible for inet_diag_handler() to collect information regarding the
> protocol version, the cipher type and TX / RX configuration, in case
> INET_DIAG_INFO is requested.
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

> diff --git a/include/net/tls.h b/include/net/tls.h
> index 4997742475cd..990f1d9182a3 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -431,6 +431,25 @@ static inline bool is_tx_ready(struct tls_sw_context_tx *ctx)
>  	return READ_ONCE(rec->tx_ready);
>  }
>  
> +static inline u16 tls_user_config(struct tls_context *ctx, bool tx)
> +{
> +	u16 config = tx ? ctx->tx_conf : ctx->rx_conf;
> +
> +	switch (config) {
> +	case TLS_BASE:
> +		return TLS_CONF_BASE;
> +	case TLS_SW:
> +		return TLS_CONF_SW;
> +#ifdef CONFIG_TLS_DEVICE

Recently the TLS_HW define was taken out of the ifdef, so the ifdef
around this is no longer necessary.

> +	case TLS_HW:
> +		return TLS_CONF_HW;
> +#endif
> +	case TLS_HW_RECORD:
> +		return TLS_CONF_HW_RECORD;
> +	}
> +	return 0;
> +}
> +
>  struct sk_buff *
>  tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
>  		      struct sk_buff *skb);

> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index f8f2d2c3d627..3351a2ace369 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -39,6 +39,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/sched/signal.h>
>  #include <linux/inetdevice.h>
> +#include <linux/inet_diag.h>
>  
>  #include <net/tls.h>
>  
> @@ -835,6 +836,67 @@ static void tls_update(struct sock *sk, struct proto *p)
>  	}
>  }
>  
> +static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
> +{
> +	struct tls_context *ctx;
> +	u16 version, cipher_type;

Unfortunately revere christmas tree will be needed :(

> +	struct nlattr *start;
> +	int err;

^ permalink raw reply

* RE: linux-next: Tree for Aug 29 (mlx5)
From: Haiyang Zhang @ 2019-08-29 21:48 UTC (permalink / raw)
  To: Saeed Mahameed, sfr@canb.auug.org.au, Eran Ben Elisha,
	linux-next@vger.kernel.org, rdunlap@infradead.org
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Leon Romanovsky
In-Reply-To: <c92d20e27268f515e0d4c8a28f92c0da041c2acc.camel@mellanox.com>



> -----Original Message-----
> From: Saeed Mahameed <saeedm@mellanox.com>
> Sent: Thursday, August 29, 2019 2:32 PM
> To: sfr@canb.auug.org.au; Eran Ben Elisha <eranbe@mellanox.com>; linux-
> next@vger.kernel.org; rdunlap@infradead.org; Haiyang Zhang
> <haiyangz@microsoft.com>
> Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Leon
> Romanovsky <leonro@mellanox.com>
> Subject: Re: linux-next: Tree for Aug 29 (mlx5)
> 
> On Thu, 2019-08-29 at 12:55 -0700, Randy Dunlap wrote:
> > On 8/29/19 12:54 PM, Randy Dunlap wrote:
> > > On 8/29/19 4:08 AM, Stephen Rothwell wrote:
> > > > Hi all,
> > > >
> > > > Changes since 20190828:
> > > >
> > >
> > > on x86_64:
> > > when CONFIG_PCI_HYPERV=m
> >
> > and CONFIG_PCI_HYPERV_INTERFACE=m
> >
> 
> Haiyang and Eran, I think CONFIG_PCI_HYPERV_INTERFACE was never
> supposed to be a module ? it supposed to provide an always available
> interface to drivers ..
> 
> Anyway, maybe we need to imply CONFIG_PCI_HYPERV_INTERFACE in mlx5.

The symbolic dependency by driver mlx5e,  automatically triggers loading of
pci_hyperv_interface module. And this module can be loaded in any platforms.

Currently, mlx5e driver has #if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
around the code using the interface.

I agree --
Adding "select PCI_HYPERV_INTERFACE" for mlx5e will clean up these #if's.

Thanks,
- Haiyang

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] nfp: bpf: add simple map op cache
From: Jakub Kicinski @ 2019-08-29 21:36 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, oss-drivers,
	jaco.gericke, Quentin Monnet
In-Reply-To: <CAPhsuW5ExXPXYi5D2MND5JREh8EKNHUvSNoBEJ7L3-XK3GD9mA@mail.gmail.com>

On Thu, 29 Aug 2019 14:29:44 -0700, Song Liu wrote:
> On Tue, Aug 27, 2019 at 10:40 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> >
> > Each get_next and lookup call requires a round trip to the device.
> > However, the device is capable of giving us a few entries back,
> > instead of just one.
> >
> > In this patch we ask for a small yet reasonable number of entries
> > (4) on every get_next call, and on subsequent get_next/lookup calls
> > check this little cache for a hit. The cache is only kept for 250us,
> > and is invalidated on every operation which may modify the map
> > (e.g. delete or update call). Note that operations may be performed
> > simultaneously, so we have to keep track of operations in flight.
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
> > ---
> >  drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 179 +++++++++++++++++-
> >  drivers/net/ethernet/netronome/nfp/bpf/fw.h   |   1 +
> >  drivers/net/ethernet/netronome/nfp/bpf/main.c |  18 ++
> >  drivers/net/ethernet/netronome/nfp/bpf/main.h |  23 +++
> >  .../net/ethernet/netronome/nfp/bpf/offload.c  |   3 +
> >  5 files changed, 215 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
> > index fcf880c82f3f..0e2db6ea79e9 100644
> > --- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
> > +++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/bug.h>
> >  #include <linux/jiffies.h>
> >  #include <linux/skbuff.h>
> > +#include <linux/timekeeping.h>
> >
> >  #include "../ccm.h"
> >  #include "../nfp_app.h"
> > @@ -175,29 +176,151 @@ nfp_bpf_ctrl_reply_val(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
> >         return &reply->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
> >  }
> >
> > +static bool nfp_bpf_ctrl_op_cache_invalidate(enum nfp_ccm_type op)
> > +{
> > +       return op == NFP_CCM_TYPE_BPF_MAP_UPDATE ||
> > +              op == NFP_CCM_TYPE_BPF_MAP_DELETE;
> > +}
> > +
> > +static bool nfp_bpf_ctrl_op_cache_capable(enum nfp_ccm_type op)
> > +{
> > +       return op == NFP_CCM_TYPE_BPF_MAP_LOOKUP ||
> > +              op == NFP_CCM_TYPE_BPF_MAP_GETNEXT;
> > +}
> > +
> > +static bool nfp_bpf_ctrl_op_cache_fill(enum nfp_ccm_type op)
> > +{
> > +       return op == NFP_CCM_TYPE_BPF_MAP_GETFIRST ||
> > +              op == NFP_CCM_TYPE_BPF_MAP_GETNEXT;
> > +}
> > +
> > +static unsigned int
> > +nfp_bpf_ctrl_op_cache_get(struct nfp_bpf_map *nfp_map, enum nfp_ccm_type op,
> > +                         const u8 *key, u8 *out_key, u8 *out_value,
> > +                         u32 *cache_gen)
> > +{
> > +       struct bpf_map *map = &nfp_map->offmap->map;
> > +       struct nfp_app_bpf *bpf = nfp_map->bpf;
> > +       unsigned int i, count, n_entries;
> > +       struct cmsg_reply_map_op *reply;
> > +
> > +       n_entries = nfp_bpf_ctrl_op_cache_fill(op) ? bpf->cmsg_cache_cnt : 1;
> > +
> > +       spin_lock(&nfp_map->cache_lock);
> > +       *cache_gen = nfp_map->cache_gen;
> > +       if (nfp_map->cache_blockers)
> > +               n_entries = 1;
> > +
> > +       if (nfp_bpf_ctrl_op_cache_invalidate(op))
> > +               goto exit_block;
> > +       if (!nfp_bpf_ctrl_op_cache_capable(op))
> > +               goto exit_unlock;
> > +
> > +       if (!nfp_map->cache)
> > +               goto exit_unlock;
> > +       if (nfp_map->cache_to < ktime_get_ns())
> > +               goto exit_invalidate;
> > +
> > +       reply = (void *)nfp_map->cache->data;
> > +       count = be32_to_cpu(reply->count);  
> 
> Do we need to check whether count is too big (from firmware bug)?

It's validated below, when the skb is received (see my "here" below)

> > +
> > +       for (i = 0; i < count; i++) {
> > +               void *cached_key;
> > +
> > +               cached_key = nfp_bpf_ctrl_reply_key(bpf, reply, i);
> > +               if (memcmp(cached_key, key, map->key_size))
> > +                       continue;
> > +
> > +               if (op == NFP_CCM_TYPE_BPF_MAP_LOOKUP)
> > +                       memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, i),
> > +                              map->value_size);
> > +               if (op == NFP_CCM_TYPE_BPF_MAP_GETNEXT) {
> > +                       if (i + 1 == count)
> > +                               break;
> > +
> > +                       memcpy(out_key,
> > +                              nfp_bpf_ctrl_reply_key(bpf, reply, i + 1),
> > +                              map->key_size);
> > +               }
> > +
> > +               n_entries = 0;
> > +               goto exit_unlock;
> > +       }
> > +       goto exit_unlock;
> > +
> > +exit_block:
> > +       nfp_map->cache_blockers++;
> > +exit_invalidate:
> > +       dev_consume_skb_any(nfp_map->cache);
> > +       nfp_map->cache = NULL;
> > +exit_unlock:
> > +       spin_unlock(&nfp_map->cache_lock);
> > +       return n_entries;
> > +}

> >  static int
> >  nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_ccm_type op,
> >                       u8 *key, u8 *value, u64 flags, u8 *out_key, u8 *out_value)
> >  {
> >         struct nfp_bpf_map *nfp_map = offmap->dev_priv;
> > +       unsigned int n_entries, reply_entries, count;
> >         struct nfp_app_bpf *bpf = nfp_map->bpf;
> >         struct bpf_map *map = &offmap->map;
> >         struct cmsg_reply_map_op *reply;
> >         struct cmsg_req_map_op *req;
> >         struct sk_buff *skb;
> > +       u32 cache_gen;
> >         int err;
> >
> >         /* FW messages have no space for more than 32 bits of flags */
> >         if (flags >> 32)
> >                 return -EOPNOTSUPP;
> >
> > +       /* Handle op cache */
> > +       n_entries = nfp_bpf_ctrl_op_cache_get(nfp_map, op, key, out_key,
> > +                                             out_value, &cache_gen);
> > +       if (!n_entries)
> > +               return 0;
> > +
> >         skb = nfp_bpf_cmsg_map_req_alloc(bpf, 1);
> > -       if (!skb)
> > -               return -ENOMEM;
> > +       if (!skb) {
> > +               err = -ENOMEM;
> > +               goto err_cache_put;
> > +       }
> >
> >         req = (void *)skb->data;
> >         req->tid = cpu_to_be32(nfp_map->tid);
> > -       req->count = cpu_to_be32(1);
> > +       req->count = cpu_to_be32(n_entries);
> >         req->flags = cpu_to_be32(flags);
> >
> >         /* Copy inputs */
> > @@ -207,16 +330,38 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_ccm_type op,
> >                 memcpy(nfp_bpf_ctrl_req_val(bpf, req, 0), value,
> >                        map->value_size);
> >
> > -       skb = nfp_ccm_communicate(&bpf->ccm, skb, op,
> > -                                 nfp_bpf_cmsg_map_reply_size(bpf, 1));
> > -       if (IS_ERR(skb))
> > -               return PTR_ERR(skb);
> > +       skb = nfp_ccm_communicate(&bpf->ccm, skb, op, 0);
> > +       if (IS_ERR(skb)) {
> > +               err = PTR_ERR(skb);
> > +               goto err_cache_put;
> > +       }
> > +
> > +       if (skb->len < sizeof(*reply)) {
> > +               cmsg_warn(bpf, "cmsg drop - type 0x%02x too short %d!\n",
> > +                         op, skb->len);
> > +               err = -EIO;
> > +               goto err_free;
> > +       }
> >
> >         reply = (void *)skb->data;
> > +       count = be32_to_cpu(reply->count);
> >         err = nfp_bpf_ctrl_rc_to_errno(bpf, &reply->reply_hdr);
> > +       /* FW responds with message sized to hold the good entries,
> > +        * plus one extra entry if there was an error.
> > +        */
> > +       reply_entries = count + !!err;
> > +       if (n_entries > 1 && count)
> > +               err = 0;
> >         if (err)
> >                 goto err_free;
> >
> > +       if (skb->len != nfp_bpf_cmsg_map_reply_size(bpf, reply_entries)) {

here, reply_entries is derived directly from reply->count

> > +               cmsg_warn(bpf, "cmsg drop - type 0x%02x too short %d for %d entries!\n",
> > +                         op, skb->len, reply_entries);
> > +               err = -EIO;
> > +               goto err_free;
> > +       }
> > +
> >         /* Copy outputs */
> >         if (out_key)
> >                 memcpy(out_key, nfp_bpf_ctrl_reply_key(bpf, reply, 0),
> > @@ -225,11 +370,13 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_ccm_type op,
> >                 memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, 0),
> >                        map->value_size);
> >
> > -       dev_consume_skb_any(skb);
> > +       nfp_bpf_ctrl_op_cache_put(nfp_map, op, skb, cache_gen);
> >
> >         return 0;
> >  err_free:
> >         dev_kfree_skb_any(skb);
> > +err_cache_put:
> > +       nfp_bpf_ctrl_op_cache_put(nfp_map, op, NULL, cache_gen);
> >         return err;
> >  }
> >
> > @@ -275,7 +422,21 @@ unsigned int nfp_bpf_ctrl_cmsg_min_mtu(struct nfp_app_bpf *bpf)
> >
> >  unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf)
> >  {
> > -       return max(NFP_NET_DEFAULT_MTU, nfp_bpf_ctrl_cmsg_min_mtu(bpf));
> > +       return max3(NFP_NET_DEFAULT_MTU,
> > +                   nfp_bpf_cmsg_map_req_size(bpf, NFP_BPF_MAP_CACHE_CNT),
> > +                   nfp_bpf_cmsg_map_reply_size(bpf, NFP_BPF_MAP_CACHE_CNT));
> > +}
> > +
> > +unsigned int nfp_bpf_ctrl_cmsg_cache_cnt(struct nfp_app_bpf *bpf)
> > +{
> > +       unsigned int mtu, req_max, reply_max, entry_sz;
> > +
> > +       mtu = bpf->app->ctrl->dp.mtu;
> > +       entry_sz = bpf->cmsg_key_sz + bpf->cmsg_val_sz;
> > +       req_max = (mtu - sizeof(struct cmsg_req_map_op)) / entry_sz;
> > +       reply_max = (mtu - sizeof(struct cmsg_reply_map_op)) / entry_sz;
> > +
> > +       return min3(req_max, reply_max, NFP_BPF_MAP_CACHE_CNT);
> >  }
> >
> >  void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
> > diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
> > index 06c4286bd79e..a83a0ad5e27d 100644
> > --- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
> > +++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
> > @@ -24,6 +24,7 @@ enum bpf_cap_tlv_type {
> >         NFP_BPF_CAP_TYPE_QUEUE_SELECT   = 5,
> >         NFP_BPF_CAP_TYPE_ADJUST_TAIL    = 6,
> >         NFP_BPF_CAP_TYPE_ABI_VERSION    = 7,
> > +       NFP_BPF_CAP_TYPE_CMSG_MULTI_ENT = 8,
> >  };
> >
> >  struct nfp_bpf_cap_tlv_func {
> > diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
> > index 2b1773ed3de9..8f732771d3fa 100644
> > --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
> > +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
> > @@ -299,6 +299,14 @@ nfp_bpf_parse_cap_adjust_tail(struct nfp_app_bpf *bpf, void __iomem *value,
> >         return 0;
> >  }
> >
> > +static int
> > +nfp_bpf_parse_cap_cmsg_multi_ent(struct nfp_app_bpf *bpf, void __iomem *value,
> > +                                u32 length)
> > +{
> > +       bpf->cmsg_multi_ent = true;
> > +       return 0;
> > +}
> > +
> >  static int
> >  nfp_bpf_parse_cap_abi_version(struct nfp_app_bpf *bpf, void __iomem *value,
> >                               u32 length)
> > @@ -375,6 +383,11 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
> >                                                           length))
> >                                 goto err_release_free;
> >                         break;
> > +               case NFP_BPF_CAP_TYPE_CMSG_MULTI_ENT:
> > +                       if (nfp_bpf_parse_cap_cmsg_multi_ent(app->priv, value,
> > +                                                            length))  
> 
> Do we plan to extend nfp_bpf_parse_cap_cmsg_multi_ent() to return
> non-zero in the
> future?

Yes, the TLV format allows for the entry to be extended and then
parsing may fail. It's mostly a pattern the BPF TLV parsing follows,
though.

^ permalink raw reply

* Re: linux-next: Tree for Aug 29 (mlx5)
From: Saeed Mahameed @ 2019-08-29 21:31 UTC (permalink / raw)
  To: sfr@canb.auug.org.au, Eran Ben Elisha, linux-next@vger.kernel.org,
	rdunlap@infradead.org, haiyangz
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Leon Romanovsky
In-Reply-To: <52bcddef-fcf2-8de5-d15a-9e7ee2d5b14d@infradead.org>

On Thu, 2019-08-29 at 12:55 -0700, Randy Dunlap wrote:
> On 8/29/19 12:54 PM, Randy Dunlap wrote:
> > On 8/29/19 4:08 AM, Stephen Rothwell wrote:
> > > Hi all,
> > > 
> > > Changes since 20190828:
> > > 
> > 
> > on x86_64:
> > when CONFIG_PCI_HYPERV=m
> 
> and CONFIG_PCI_HYPERV_INTERFACE=m
> 

Haiyang and Eran, I think CONFIG_PCI_HYPERV_INTERFACE was never
supposed to be a module ? it supposed to provide an always available 
interface to drivers .. 

Anyway, maybe we need to imply CONFIG_PCI_HYPERV_INTERFACE in mlx5.

Thanks,
Saeed.
 
> > and mxlx5 is builtin (=y).
> > 
> > ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function
> > `mlx5_unload':
> > main.c:(.text+0x5d): undefined reference to `mlx5_hv_vhca_cleanup'
> > ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function
> > `mlx5_cleanup_once':
> > main.c:(.text+0x158): undefined reference to `mlx5_hv_vhca_destroy'
> > ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function
> > `mlx5_load_one':
> > main.c:(.text+0x4191): undefined reference to `mlx5_hv_vhca_create'
> > ld: main.c:(.text+0x4772): undefined reference to
> > `mlx5_hv_vhca_init'
> > ld: main.c:(.text+0x4b07): undefined reference to
> > `mlx5_hv_vhca_cleanup'
> > 
> > 
> 
> 

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] nfp: bpf: rework MTU checking
From: Song Liu @ 2019-08-29 21:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, oss-drivers,
	jaco.gericke, Quentin Monnet
In-Reply-To: <20190828053629.28658-2-jakub.kicinski@netronome.com>

On Tue, Aug 27, 2019 at 10:38 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> If control channel MTU is too low to support map operations a warning
> will be printed. This is not enough, we want to make sure probe fails
> in such scenario, as this would clearly be a faulty configuration.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] nfp: bpf: add simple map op cache
From: Song Liu @ 2019-08-29 21:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, oss-drivers,
	jaco.gericke, Quentin Monnet
In-Reply-To: <20190828053629.28658-3-jakub.kicinski@netronome.com>

On Tue, Aug 27, 2019 at 10:40 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> Each get_next and lookup call requires a round trip to the device.
> However, the device is capable of giving us a few entries back,
> instead of just one.
>
> In this patch we ask for a small yet reasonable number of entries
> (4) on every get_next call, and on subsequent get_next/lookup calls
> check this little cache for a hit. The cache is only kept for 250us,
> and is invalidated on every operation which may modify the map
> (e.g. delete or update call). Note that operations may be performed
> simultaneously, so we have to keep track of operations in flight.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
>  drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 179 +++++++++++++++++-
>  drivers/net/ethernet/netronome/nfp/bpf/fw.h   |   1 +
>  drivers/net/ethernet/netronome/nfp/bpf/main.c |  18 ++
>  drivers/net/ethernet/netronome/nfp/bpf/main.h |  23 +++
>  .../net/ethernet/netronome/nfp/bpf/offload.c  |   3 +
>  5 files changed, 215 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
> index fcf880c82f3f..0e2db6ea79e9 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
> @@ -6,6 +6,7 @@
>  #include <linux/bug.h>
>  #include <linux/jiffies.h>
>  #include <linux/skbuff.h>
> +#include <linux/timekeeping.h>
>
>  #include "../ccm.h"
>  #include "../nfp_app.h"
> @@ -175,29 +176,151 @@ nfp_bpf_ctrl_reply_val(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
>         return &reply->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
>  }
>
> +static bool nfp_bpf_ctrl_op_cache_invalidate(enum nfp_ccm_type op)
> +{
> +       return op == NFP_CCM_TYPE_BPF_MAP_UPDATE ||
> +              op == NFP_CCM_TYPE_BPF_MAP_DELETE;
> +}
> +
> +static bool nfp_bpf_ctrl_op_cache_capable(enum nfp_ccm_type op)
> +{
> +       return op == NFP_CCM_TYPE_BPF_MAP_LOOKUP ||
> +              op == NFP_CCM_TYPE_BPF_MAP_GETNEXT;
> +}
> +
> +static bool nfp_bpf_ctrl_op_cache_fill(enum nfp_ccm_type op)
> +{
> +       return op == NFP_CCM_TYPE_BPF_MAP_GETFIRST ||
> +              op == NFP_CCM_TYPE_BPF_MAP_GETNEXT;
> +}
> +
> +static unsigned int
> +nfp_bpf_ctrl_op_cache_get(struct nfp_bpf_map *nfp_map, enum nfp_ccm_type op,
> +                         const u8 *key, u8 *out_key, u8 *out_value,
> +                         u32 *cache_gen)
> +{
> +       struct bpf_map *map = &nfp_map->offmap->map;
> +       struct nfp_app_bpf *bpf = nfp_map->bpf;
> +       unsigned int i, count, n_entries;
> +       struct cmsg_reply_map_op *reply;
> +
> +       n_entries = nfp_bpf_ctrl_op_cache_fill(op) ? bpf->cmsg_cache_cnt : 1;
> +
> +       spin_lock(&nfp_map->cache_lock);
> +       *cache_gen = nfp_map->cache_gen;
> +       if (nfp_map->cache_blockers)
> +               n_entries = 1;
> +
> +       if (nfp_bpf_ctrl_op_cache_invalidate(op))
> +               goto exit_block;
> +       if (!nfp_bpf_ctrl_op_cache_capable(op))
> +               goto exit_unlock;
> +
> +       if (!nfp_map->cache)
> +               goto exit_unlock;
> +       if (nfp_map->cache_to < ktime_get_ns())
> +               goto exit_invalidate;
> +
> +       reply = (void *)nfp_map->cache->data;
> +       count = be32_to_cpu(reply->count);

Do we need to check whether count is too big (from firmware bug)?

> +
> +       for (i = 0; i < count; i++) {
> +               void *cached_key;
> +
> +               cached_key = nfp_bpf_ctrl_reply_key(bpf, reply, i);
> +               if (memcmp(cached_key, key, map->key_size))
> +                       continue;
> +
> +               if (op == NFP_CCM_TYPE_BPF_MAP_LOOKUP)
> +                       memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, i),
> +                              map->value_size);
> +               if (op == NFP_CCM_TYPE_BPF_MAP_GETNEXT) {
> +                       if (i + 1 == count)
> +                               break;
> +
> +                       memcpy(out_key,
> +                              nfp_bpf_ctrl_reply_key(bpf, reply, i + 1),
> +                              map->key_size);
> +               }
> +
> +               n_entries = 0;
> +               goto exit_unlock;
> +       }
> +       goto exit_unlock;
> +
> +exit_block:
> +       nfp_map->cache_blockers++;
> +exit_invalidate:
> +       dev_consume_skb_any(nfp_map->cache);
> +       nfp_map->cache = NULL;
> +exit_unlock:
> +       spin_unlock(&nfp_map->cache_lock);
> +       return n_entries;
> +}
> +
> +static void
> +nfp_bpf_ctrl_op_cache_put(struct nfp_bpf_map *nfp_map, enum nfp_ccm_type op,
> +                         struct sk_buff *skb, u32 cache_gen)
> +{
> +       bool blocker, filler;
> +
> +       blocker = nfp_bpf_ctrl_op_cache_invalidate(op);
> +       filler = nfp_bpf_ctrl_op_cache_fill(op);
> +       if (blocker || filler) {
> +               u64 to = 0;
> +
> +               if (filler)
> +                       to = ktime_get_ns() + NFP_BPF_MAP_CACHE_TIME_NS;
> +
> +               spin_lock(&nfp_map->cache_lock);
> +               if (blocker) {
> +                       nfp_map->cache_blockers--;
> +                       nfp_map->cache_gen++;
> +               }
> +               if (filler && !nfp_map->cache_blockers &&
> +                   nfp_map->cache_gen == cache_gen) {
> +                       nfp_map->cache_to = to;
> +                       swap(nfp_map->cache, skb);
> +               }
> +               spin_unlock(&nfp_map->cache_lock);
> +       }
> +
> +       dev_consume_skb_any(skb);
> +}
> +
>  static int
>  nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_ccm_type op,
>                       u8 *key, u8 *value, u64 flags, u8 *out_key, u8 *out_value)
>  {
>         struct nfp_bpf_map *nfp_map = offmap->dev_priv;
> +       unsigned int n_entries, reply_entries, count;
>         struct nfp_app_bpf *bpf = nfp_map->bpf;
>         struct bpf_map *map = &offmap->map;
>         struct cmsg_reply_map_op *reply;
>         struct cmsg_req_map_op *req;
>         struct sk_buff *skb;
> +       u32 cache_gen;
>         int err;
>
>         /* FW messages have no space for more than 32 bits of flags */
>         if (flags >> 32)
>                 return -EOPNOTSUPP;
>
> +       /* Handle op cache */
> +       n_entries = nfp_bpf_ctrl_op_cache_get(nfp_map, op, key, out_key,
> +                                             out_value, &cache_gen);
> +       if (!n_entries)
> +               return 0;
> +
>         skb = nfp_bpf_cmsg_map_req_alloc(bpf, 1);
> -       if (!skb)
> -               return -ENOMEM;
> +       if (!skb) {
> +               err = -ENOMEM;
> +               goto err_cache_put;
> +       }
>
>         req = (void *)skb->data;
>         req->tid = cpu_to_be32(nfp_map->tid);
> -       req->count = cpu_to_be32(1);
> +       req->count = cpu_to_be32(n_entries);
>         req->flags = cpu_to_be32(flags);
>
>         /* Copy inputs */
> @@ -207,16 +330,38 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_ccm_type op,
>                 memcpy(nfp_bpf_ctrl_req_val(bpf, req, 0), value,
>                        map->value_size);
>
> -       skb = nfp_ccm_communicate(&bpf->ccm, skb, op,
> -                                 nfp_bpf_cmsg_map_reply_size(bpf, 1));
> -       if (IS_ERR(skb))
> -               return PTR_ERR(skb);
> +       skb = nfp_ccm_communicate(&bpf->ccm, skb, op, 0);
> +       if (IS_ERR(skb)) {
> +               err = PTR_ERR(skb);
> +               goto err_cache_put;
> +       }
> +
> +       if (skb->len < sizeof(*reply)) {
> +               cmsg_warn(bpf, "cmsg drop - type 0x%02x too short %d!\n",
> +                         op, skb->len);
> +               err = -EIO;
> +               goto err_free;
> +       }
>
>         reply = (void *)skb->data;
> +       count = be32_to_cpu(reply->count);
>         err = nfp_bpf_ctrl_rc_to_errno(bpf, &reply->reply_hdr);
> +       /* FW responds with message sized to hold the good entries,
> +        * plus one extra entry if there was an error.
> +        */
> +       reply_entries = count + !!err;
> +       if (n_entries > 1 && count)
> +               err = 0;
>         if (err)
>                 goto err_free;
>
> +       if (skb->len != nfp_bpf_cmsg_map_reply_size(bpf, reply_entries)) {
> +               cmsg_warn(bpf, "cmsg drop - type 0x%02x too short %d for %d entries!\n",
> +                         op, skb->len, reply_entries);
> +               err = -EIO;
> +               goto err_free;
> +       }
> +
>         /* Copy outputs */
>         if (out_key)
>                 memcpy(out_key, nfp_bpf_ctrl_reply_key(bpf, reply, 0),
> @@ -225,11 +370,13 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_ccm_type op,
>                 memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, 0),
>                        map->value_size);
>
> -       dev_consume_skb_any(skb);
> +       nfp_bpf_ctrl_op_cache_put(nfp_map, op, skb, cache_gen);
>
>         return 0;
>  err_free:
>         dev_kfree_skb_any(skb);
> +err_cache_put:
> +       nfp_bpf_ctrl_op_cache_put(nfp_map, op, NULL, cache_gen);
>         return err;
>  }
>
> @@ -275,7 +422,21 @@ unsigned int nfp_bpf_ctrl_cmsg_min_mtu(struct nfp_app_bpf *bpf)
>
>  unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf)
>  {
> -       return max(NFP_NET_DEFAULT_MTU, nfp_bpf_ctrl_cmsg_min_mtu(bpf));
> +       return max3(NFP_NET_DEFAULT_MTU,
> +                   nfp_bpf_cmsg_map_req_size(bpf, NFP_BPF_MAP_CACHE_CNT),
> +                   nfp_bpf_cmsg_map_reply_size(bpf, NFP_BPF_MAP_CACHE_CNT));
> +}
> +
> +unsigned int nfp_bpf_ctrl_cmsg_cache_cnt(struct nfp_app_bpf *bpf)
> +{
> +       unsigned int mtu, req_max, reply_max, entry_sz;
> +
> +       mtu = bpf->app->ctrl->dp.mtu;
> +       entry_sz = bpf->cmsg_key_sz + bpf->cmsg_val_sz;
> +       req_max = (mtu - sizeof(struct cmsg_req_map_op)) / entry_sz;
> +       reply_max = (mtu - sizeof(struct cmsg_reply_map_op)) / entry_sz;
> +
> +       return min3(req_max, reply_max, NFP_BPF_MAP_CACHE_CNT);
>  }
>
>  void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
> index 06c4286bd79e..a83a0ad5e27d 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
> @@ -24,6 +24,7 @@ enum bpf_cap_tlv_type {
>         NFP_BPF_CAP_TYPE_QUEUE_SELECT   = 5,
>         NFP_BPF_CAP_TYPE_ADJUST_TAIL    = 6,
>         NFP_BPF_CAP_TYPE_ABI_VERSION    = 7,
> +       NFP_BPF_CAP_TYPE_CMSG_MULTI_ENT = 8,
>  };
>
>  struct nfp_bpf_cap_tlv_func {
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
> index 2b1773ed3de9..8f732771d3fa 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
> @@ -299,6 +299,14 @@ nfp_bpf_parse_cap_adjust_tail(struct nfp_app_bpf *bpf, void __iomem *value,
>         return 0;
>  }
>
> +static int
> +nfp_bpf_parse_cap_cmsg_multi_ent(struct nfp_app_bpf *bpf, void __iomem *value,
> +                                u32 length)
> +{
> +       bpf->cmsg_multi_ent = true;
> +       return 0;
> +}
> +
>  static int
>  nfp_bpf_parse_cap_abi_version(struct nfp_app_bpf *bpf, void __iomem *value,
>                               u32 length)
> @@ -375,6 +383,11 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
>                                                           length))
>                                 goto err_release_free;
>                         break;
> +               case NFP_BPF_CAP_TYPE_CMSG_MULTI_ENT:
> +                       if (nfp_bpf_parse_cap_cmsg_multi_ent(app->priv, value,
> +                                                            length))

Do we plan to extend nfp_bpf_parse_cap_cmsg_multi_ent() to return
non-zero in the
future?

> +                               goto err_release_free;
> +                       break;
>                 default:
>                         nfp_dbg(cpp, "unknown BPF capability: %d\n", type);
>                         break;
> @@ -426,6 +439,11 @@ static int nfp_bpf_start(struct nfp_app *app)
>                 return -EINVAL;
>         }
>
> +       if (bpf->cmsg_multi_ent)
> +               bpf->cmsg_cache_cnt = nfp_bpf_ctrl_cmsg_cache_cnt(bpf);
> +       else
> +               bpf->cmsg_cache_cnt = 1;
> +
>         return 0;
>  }
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> index f4802036eb42..fac9c6f9e197 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> @@ -99,6 +99,7 @@ enum pkt_vec {
>   * @maps_neutral:      hash table of offload-neutral maps (on pointer)
>   *
>   * @abi_version:       global BPF ABI version
> + * @cmsg_cache_cnt:    number of entries to read for caching
>   *
>   * @adjust_head:       adjust head capability
>   * @adjust_head.flags:         extra flags for adjust head
> @@ -124,6 +125,7 @@ enum pkt_vec {
>   * @pseudo_random:     FW initialized the pseudo-random machinery (CSRs)
>   * @queue_select:      BPF can set the RX queue ID in packet vector
>   * @adjust_tail:       BPF can simply trunc packet size for adjust tail
> + * @cmsg_multi_ent:    FW can pack multiple map entries in a single cmsg
>   */
>  struct nfp_app_bpf {
>         struct nfp_app *app;
> @@ -134,6 +136,8 @@ struct nfp_app_bpf {
>         unsigned int cmsg_key_sz;
>         unsigned int cmsg_val_sz;
>
> +       unsigned int cmsg_cache_cnt;
> +
>         struct list_head map_list;
>         unsigned int maps_in_use;
>         unsigned int map_elems_in_use;
> @@ -169,6 +173,7 @@ struct nfp_app_bpf {
>         bool pseudo_random;
>         bool queue_select;
>         bool adjust_tail;
> +       bool cmsg_multi_ent;
>  };
>
>  enum nfp_bpf_map_use {
> @@ -183,11 +188,21 @@ struct nfp_bpf_map_word {
>         unsigned char non_zero_update   :1;
>  };
>
> +#define NFP_BPF_MAP_CACHE_CNT          4U
> +#define NFP_BPF_MAP_CACHE_TIME_NS      (250 * 1000)
> +
>  /**
>   * struct nfp_bpf_map - private per-map data attached to BPF maps for offload
>   * @offmap:    pointer to the offloaded BPF map
>   * @bpf:       back pointer to bpf app private structure
>   * @tid:       table id identifying map on datapath
> + *
> + * @cache_lock:        protects @cache_blockers, @cache_to, @cache
> + * @cache_blockers:    number of ops in flight which block caching
> + * @cache_gen: counter incremented by every blocker on exit
> + * @cache_to:  time when cache will no longer be valid (ns)
> + * @cache:     skb with cached response
> + *
>   * @l:         link on the nfp_app_bpf->map_list list
>   * @use_map:   map of how the value is used (in 4B chunks)
>   */
> @@ -195,6 +210,13 @@ struct nfp_bpf_map {
>         struct bpf_offloaded_map *offmap;
>         struct nfp_app_bpf *bpf;
>         u32 tid;
> +
> +       spinlock_t cache_lock;
> +       u32 cache_blockers;
> +       u32 cache_gen;
> +       u64 cache_to;
> +       struct sk_buff *cache;
> +
>         struct list_head l;
>         struct nfp_bpf_map_word use_map[];
>  };
> @@ -566,6 +588,7 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv);
>
>  unsigned int nfp_bpf_ctrl_cmsg_min_mtu(struct nfp_app_bpf *bpf);
>  unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf);
> +unsigned int nfp_bpf_ctrl_cmsg_cache_cnt(struct nfp_app_bpf *bpf);
>  long long int
>  nfp_bpf_ctrl_alloc_map(struct nfp_app_bpf *bpf, struct bpf_map *map);
>  void
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> index 39c9fec222b4..88fab6a82acf 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> @@ -385,6 +385,7 @@ nfp_bpf_map_alloc(struct nfp_app_bpf *bpf, struct bpf_offloaded_map *offmap)
>         offmap->dev_priv = nfp_map;
>         nfp_map->offmap = offmap;
>         nfp_map->bpf = bpf;
> +       spin_lock_init(&nfp_map->cache_lock);
>
>         res = nfp_bpf_ctrl_alloc_map(bpf, &offmap->map);
>         if (res < 0) {
> @@ -407,6 +408,8 @@ nfp_bpf_map_free(struct nfp_app_bpf *bpf, struct bpf_offloaded_map *offmap)
>         struct nfp_bpf_map *nfp_map = offmap->dev_priv;
>
>         nfp_bpf_ctrl_free_map(bpf, nfp_map);
> +       dev_consume_skb_any(nfp_map->cache);
> +       WARN_ON_ONCE(nfp_map->cache_blockers);
>         list_del_init(&nfp_map->l);
>         bpf->map_elems_in_use -= offmap->map.max_entries;
>         bpf->maps_in_use--;
> --
> 2.21.0
>

^ permalink raw reply

* Re: [PATCH v2 net-next 02/15] MIPS: SGI-IP27: restructure ioc3 register access
From: Shannon Nelson @ 2019-08-29 21:26 UTC (permalink / raw)
  To: Thomas Bogendoerfer, Ralf Baechle, Paul Burton, James Hogan,
	David S. Miller, linux-mips, linux-kernel, netdev
In-Reply-To: <20190829155014.9229-3-tbogendoerfer@suse.de>

On 8/29/19 8:50 AM, Thomas Bogendoerfer wrote:
> Break up the big ioc3 register struct into functional pieces to
> make use in sub-function drivers more straightforward. And while
> doing that get rid of all volatile access by using readX/writeX.
>
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> ---

> diff --git a/arch/mips/sgi-ip27/ip27-console.c b/arch/mips/sgi-ip27/ip27-console.c
> index 6bdb48d41276..5886bee89d06 100644
> --- a/arch/mips/sgi-ip27/ip27-console.c
> +++ b/arch/mips/sgi-ip27/ip27-console.c
> @@ -35,6 +35,7 @@ void prom_putchar(char c)
>   {
>   	struct ioc3_uartregs *uart = console_uart();
>   
> -	while ((uart->iu_lsr & 0x20) == 0);
> -	uart->iu_thr = c;
> +	while ((readb(&uart->iu_lsr) & 0x20) == 0)
> +		;
> +	writeb(c, &uart->iu_thr);
>   }

Is it ever possible to never see your bit get set?
Instead of a tight forever spin, you might add a short delay and a retry 
limit.

I see this in several other times in the following code as well.  It 
might be interesting to see how many times through and perhaps how many 
usecs are normally spent in these loops.

Not a binding request, just a thought...

sln



^ permalink raw reply

* Re: [PATCH v3 03/11] net/mlx5e: Remove unlikely() from WARN*() condition
From: Saeed Mahameed @ 2019-08-29 21:23 UTC (permalink / raw)
  To: efremov@linux.com, linux-kernel@vger.kernel.org,
	davem@davemloft.net
  Cc: joe@perches.com, Boris Pismenny, netdev@vger.kernel.org,
	leon@kernel.org, akpm@linux-foundation.org
In-Reply-To: <20190829165025.15750-3-efremov@linux.com>

On Thu, 2019-08-29 at 19:50 +0300, Denis Efremov wrote:
> "unlikely(WARN_ON_ONCE(x))" is excessive. WARN_ON_ONCE() already uses
> unlikely() internally.
> 
> Signed-off-by: Denis Efremov <efremov@linux.com>
> Cc: Boris Pismenny <borisp@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Joe Perches <joe@perches.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: netdev@vger.kernel.org
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git
> a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> index 7833ddef0427..e5222d17df35 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> @@ -408,7 +408,7 @@ struct sk_buff *mlx5e_ktls_handle_tx_skb(struct
> net_device *netdev,
>  		goto out;
>  
>  	tls_ctx = tls_get_ctx(skb->sk);
> -	if (unlikely(WARN_ON_ONCE(tls_ctx->netdev != netdev)))
> +	if (WARN_ON_ONCE(tls_ctx->netdev != netdev))
>  		goto err_out;
>  
>  	priv_tx = mlx5e_get_ktls_tx_priv_ctx(tls_ctx);

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

Dave, you can take this one.

^ permalink raw reply

* Re: [PATCH v2 net-next 00/15] ioc3-eth improvements
From: Jakub Kicinski @ 2019-08-29 21:15 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190829155014.9229-1-tbogendoerfer@suse.de>

On Thu, 29 Aug 2019 17:49:58 +0200, Thomas Bogendoerfer wrote:
> In my patch series for splitting out the serial code from ioc3-eth
> by using a MFD device there was one big patch for ioc3-eth.c,
> which wasn't really usefull for reviews. This series contains the
> ioc3-eth changes splitted in smaller steps and few more cleanups.
> Only the conversion to MFD will be done later in a different series.
> 
> Changes in v2:
> - use net_err_ratelimited for printing various ioc3 errors
> - added missing clearing of rx buf valid flags into ioc3_alloc_rings
> - use __func__ for printing out of memory messages

Only a few more comments on patch 5, otherwise looks good!

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/3] capability: introduce CAP_BPF and CAP_TRACING
From: Alexei Starovoitov @ 2019-08-29 21:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, luto, davem,
	peterz, rostedt, netdev, bpf, kernel-team, linux-api
In-Reply-To: <20190829222530.3c6163ac@carbon>

On Thu, Aug 29, 2019 at 10:25:30PM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 29 Aug 2019 20:05:49 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > 
> > > On Thu, Aug 29, 2019 at 09:44:18AM +0200, Toke Høiland-Jørgensen wrote:  
> > >> Alexei Starovoitov <ast@kernel.org> writes:
> > >>   
> > >> > CAP_BPF allows the following BPF operations:
> > >> > - Loading all types of BPF programs
> > >> > - Creating all types of BPF maps except:
> > >> >    - stackmap that needs CAP_TRACING
> > >> >    - devmap that needs CAP_NET_ADMIN
> > >> >    - cpumap that needs CAP_SYS_ADMIN  
> > >> 
> > >> Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap?  
> > >
> > > Currently it's cap_sys_admin and I think it should stay this way
> > > because it creates kthreads.  
> > 
> > Ah, right. I can sorta see that makes sense because of the kthreads, but
> > it also means that you can use all of XDP *except* cpumap with
> > CAP_NET_ADMIN+CAP_BPF. That is bound to create confusion, isn't it?
>  
> Hmm... I see 'cpumap' primarily as a network stack feature.  It is about
> starting the network stack on a specific CPU, allocating and building
> SKBs on that remote CPU.  It can only be used together with XDP_REDIRECT.
> I would prefer CAP_NET_ADMIN like the devmap, to keep the XDP
> capabilities consistent.

I don't mind relaxing cpumap to cap_net_admin.
Looking at the reaction to the rest of the set. I'd rather discuss it
and do it later after basic cap_bpf is in.


^ permalink raw reply

* Re: [PATCH v2 net-next 05/15] net: sgi: ioc3-eth: allocate space for desc rings only once
From: Jakub Kicinski @ 2019-08-29 21:05 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190829155014.9229-6-tbogendoerfer@suse.de>

On Thu, 29 Aug 2019 17:50:03 +0200, Thomas Bogendoerfer wrote:
> Memory for descriptor rings are allocated/freed, when interface is
> brought up/down. Since the size of the rings is not changeable by
> hardware, we now allocate rings now during probe and free it, when
> device is removed.
> 
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> ---
>  drivers/net/ethernet/sgi/ioc3-eth.c | 103 ++++++++++++++++++------------------
>  1 file changed, 51 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sgi/ioc3-eth.c b/drivers/net/ethernet/sgi/ioc3-eth.c
> index ba18a53fbbe6..d9d94a55ac34 100644
> --- a/drivers/net/ethernet/sgi/ioc3-eth.c
> +++ b/drivers/net/ethernet/sgi/ioc3-eth.c
> @@ -803,25 +803,17 @@ static void ioc3_free_rings(struct ioc3_private *ip)
>  	struct sk_buff *skb;
>  	int rx_entry, n_entry;
>  
> -	if (ip->txr) {
> -		ioc3_clean_tx_ring(ip);
> -		free_pages((unsigned long)ip->txr, 2);
> -		ip->txr = NULL;
> -	}
> +	ioc3_clean_tx_ring(ip);
>  
> -	if (ip->rxr) {
> -		n_entry = ip->rx_ci;
> -		rx_entry = ip->rx_pi;
> +	n_entry = ip->rx_ci;
> +	rx_entry = ip->rx_pi;
>  
> -		while (n_entry != rx_entry) {
> -			skb = ip->rx_skbs[n_entry];
> -			if (skb)
> -				dev_kfree_skb_any(skb);
> +	while (n_entry != rx_entry) {
> +		skb = ip->rx_skbs[n_entry];
> +		if (skb)
> +			dev_kfree_skb_any(skb);

I think dev_kfree_skb_any() accepts NULL

>  
> -			n_entry = (n_entry + 1) & RX_RING_MASK;
> -		}
> -		free_page((unsigned long)ip->rxr);
> -		ip->rxr = NULL;
> +		n_entry = (n_entry + 1) & RX_RING_MASK;
>  	}
>  }
>  
> @@ -829,49 +821,34 @@ static void ioc3_alloc_rings(struct net_device *dev)
>  {
>  	struct ioc3_private *ip = netdev_priv(dev);
>  	struct ioc3_erxbuf *rxb;
> -	unsigned long *rxr;
>  	int i;
>  
> -	if (!ip->rxr) {
> -		/* Allocate and initialize rx ring.  4kb = 512 entries  */
> -		ip->rxr = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
> -		rxr = ip->rxr;
> -		if (!rxr)
> -			pr_err("%s: get_zeroed_page() failed!\n", __func__);
> -
> -		/* Now the rx buffers.  The RX ring may be larger but
> -		 * we only allocate 16 buffers for now.  Need to tune
> -		 * this for performance and memory later.
> -		 */
> -		for (i = 0; i < RX_BUFFS; i++) {
> -			struct sk_buff *skb;
> +	/* Now the rx buffers.  The RX ring may be larger but
> +	 * we only allocate 16 buffers for now.  Need to tune
> +	 * this for performance and memory later.
> +	 */
> +	for (i = 0; i < RX_BUFFS; i++) {
> +		struct sk_buff *skb;
>  
> -			skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC);
> -			if (!skb) {
> -				show_free_areas(0, NULL);
> -				continue;
> -			}
> +		skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC);
> +		if (!skb) {
> +			show_free_areas(0, NULL);
> +			continue;
> +		}
>  
> -			ip->rx_skbs[i] = skb;
> +		ip->rx_skbs[i] = skb;
>  
> -			/* Because we reserve afterwards. */
> -			skb_put(skb, (1664 + RX_OFFSET));
> -			rxb = (struct ioc3_erxbuf *)skb->data;
> -			rxr[i] = cpu_to_be64(ioc3_map(rxb, 1));
> -			skb_reserve(skb, RX_OFFSET);
> -		}
> -		ip->rx_ci = 0;
> -		ip->rx_pi = RX_BUFFS;
> +		/* Because we reserve afterwards. */
> +		skb_put(skb, (1664 + RX_OFFSET));
> +		rxb = (struct ioc3_erxbuf *)skb->data;
> +		ip->rxr[i] = cpu_to_be64(ioc3_map(rxb, 1));
> +		skb_reserve(skb, RX_OFFSET);
>  	}
> +	ip->rx_ci = 0;
> +	ip->rx_pi = RX_BUFFS;
>  
> -	if (!ip->txr) {
> -		/* Allocate and initialize tx rings.  16kb = 128 bufs.  */
> -		ip->txr = (struct ioc3_etxd *)__get_free_pages(GFP_KERNEL, 2);
> -		if (!ip->txr)
> -			pr_err("%s: __get_free_pages() failed!\n", __func__);
> -		ip->tx_pi = 0;
> -		ip->tx_ci = 0;
> -	}
> +	ip->tx_pi = 0;
> +	ip->tx_ci = 0;
>  }
>  
>  static void ioc3_init_rings(struct net_device *dev)
> @@ -1239,6 +1216,23 @@ static int ioc3_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	timer_setup(&ip->ioc3_timer, ioc3_timer, 0);
>  
>  	ioc3_stop(ip);
> +
> +	/* Allocate and rx ring.  4kb = 512 entries  */
> +	ip->rxr = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
> +	if (!ip->rxr) {
> +		pr_err("ioc3-eth: rx ring allocation failed\n");
> +		err = -ENOMEM;
> +		goto out_stop;
> +	}
> +
> +	/* Allocate tx rings.  16kb = 128 bufs.  */
> +	ip->txr = (struct ioc3_etxd *)__get_free_pages(GFP_KERNEL, 2);
> +	if (!ip->txr) {
> +		pr_err("ioc3-eth: tx ring allocation failed\n");
> +		err = -ENOMEM;
> +		goto out_stop;
> +	}

Please just use kcalloc()/kmalloc_array() here, and make sure the flags
are set to GFP_KERNEL whenever possible. Here and in ioc3_alloc_rings()
it looks like GFP_ATOMIC is unnecessary.

^ permalink raw reply

* Re: [PATCH v2 1/1] netfilter: nf_tables: fib: Drop IPV6 packages if IPv6 is disabled on boot
From: Florian Westphal @ 2019-08-29 20:58 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Pablo Neira Ayuso, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel, Jozsef Kadlecsik,
	Alexey Kuznetsov, Hideaki YOSHIFUJI
In-Reply-To: <b6585989069fd832a65b73d1c4f4319a10714165.camel@linux.ibm.com>

Leonardo Bras <leonardo@linux.ibm.com> wrote:
> On Thu, 2019-08-29 at 17:04 -0300, Leonardo Bras wrote:
> > > Thats a good point -- Leonardo, is the
> > > "net.bridge.bridge-nf-call-ip6tables" sysctl on?
> > 
> > Running
> > # sudo sysctl -a
> > I can see:
> > net.bridge.bridge-nf-call-ip6tables = 1
> 
> Also, doing
> # echo 0 >  /proc/sys/net/bridge/bridge-nf-call-ip6tables 
> And then trying to boot the guest will not crash the host.
> 
> Which would make sense, since host iptables is not dealing with guest
> IPv6 packets.

Yes.

> So, the real cause of this bug is the bridge making host ip6tables deal
> with guest IPv6 packets ? 
> If so, would it be ok if write a patch testing ipv6_mod_enabled()
> before passing guest ipv6 packets to host ip6tables? 

I'm not sure.  This switch is very old, it was added 10 years ago
in v2.6.31-rc1.

Even if we disable call-ip6tables in br_netfilter we will at least
in addition need a patch for nft_fib_netdev.c.

From a "avoid calls to ipv6 stack when its disabled" standpoint,
the safest fix is to disable call-ip6tables functionality if ipv6
module is off *and* fix nft_fib_netdev.c to BREAK in ipv6 is off case.

I started to place a list of suspicous modules here, but that got out
of hand quickly.

So, given I don't want to plaster ipv6_mod_enabled() everywhere, I
would suggest this course of action:

1. add a patch to BREAK in nft_fib_netdev.c for !ipv6_mod_enabled()
2. change net/bridge/br_netfilter_hooks.c, br_nf_pre_routing() to
   make sure ipv6_mod_enabled() is true before doing the ipv6 stack
   "emulation".

Makes sense?

Thanks,
Florian

^ permalink raw reply

* Re: [bpf-next, v2] samples: bpf: add max_pckt_size option at xdp_adjust_tail
From: Song Liu @ 2019-08-29 20:41 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, Networking
In-Reply-To: <20190826162517.8082-1-danieltimlee@gmail.com>

On Mon, Aug 26, 2019 at 9:52 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> Currently, at xdp_adjust_tail_kern.c, MAX_PCKT_SIZE is limited
> to 600. To make this size flexible, a new map 'pcktsz' is added.
>
> By updating new packet size to this map from the userland,
> xdp_adjust_tail_kern.o will use this value as a new max_pckt_size.
>
> If no '-P <MAX_PCKT_SIZE>' option is used, the size of maximum packet
> will be 600 as a default.

Please also cc bpf@vger.kernel.org for bpf patches.

>
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

With a nit below.

[...]

> diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
> index a3596b617c4c..29ade7caf841 100644
> --- a/samples/bpf/xdp_adjust_tail_user.c
> +++ b/samples/bpf/xdp_adjust_tail_user.c
> @@ -72,6 +72,7 @@ static void usage(const char *cmd)
>         printf("Usage: %s [...]\n", cmd);
>         printf("    -i <ifname|ifindex> Interface\n");
>         printf("    -T <stop-after-X-seconds> Default: 0 (forever)\n");
> +       printf("    -P <MAX_PCKT_SIZE> Default: 600\n");

nit: printf("    -P <MAX_PCKT_SIZE> Default: %u\n", MAX_PCKT_SIZE);

^ permalink raw reply

* Re: [PATCH v2 1/1] netfilter: nf_tables: fib: Drop IPV6 packages if IPv6 is disabled on boot
From: Leonardo Bras @ 2019-08-29 20:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, David S. Miller
  Cc: netfilter-devel, coreteam, netdev, linux-kernel, Jozsef Kadlecsik,
	Alexey Kuznetsov, Hideaki YOSHIFUJI
In-Reply-To: <db0f02c5b1a995fde174f036540a3d11008cf116.camel@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

On Thu, 2019-08-29 at 17:04 -0300, Leonardo Bras wrote:
> > Thats a good point -- Leonardo, is the
> > "net.bridge.bridge-nf-call-ip6tables" sysctl on?
> 
> Running
> # sudo sysctl -a
> I can see:
> net.bridge.bridge-nf-call-ip6tables = 1

Also, doing
# echo 0 >  /proc/sys/net/bridge/bridge-nf-call-ip6tables 
And then trying to boot the guest will not crash the host.

Which would make sense, since host iptables is not dealing with guest
IPv6 packets.

So, the real cause of this bug is the bridge making host ip6tables deal
with guest IPv6 packets ? 
If so, would it be ok if write a patch testing ipv6_mod_enabled()
before passing guest ipv6 packets to host ip6tables? 

Best regards,

>  
> So this packets are sent to host iptables for processing?
> 
> 
> (Sorry for the delay, I did not received the previous e-mails.
> Please include me in to/cc.)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/1] netfilter: nf_tables: fib: Drop IPV6 packages if IPv6 is disabled on boot
From: Florian Westphal @ 2019-08-29 20:29 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Pablo Neira Ayuso, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel, Jozsef Kadlecsik,
	Alexey Kuznetsov, Hideaki YOSHIFUJI
In-Reply-To: <db0f02c5b1a995fde174f036540a3d11008cf116.camel@linux.ibm.com>

Leonardo Bras <leonardo@linux.ibm.com> wrote:
> > Thats a good point -- Leonardo, is the
> > "net.bridge.bridge-nf-call-ip6tables" sysctl on?
> 
> Running
> # sudo sysctl -a
> I can see:
> net.bridge.bridge-nf-call-ip6tables = 1
>  
> So this packets are sent to host iptables for processing?

Yes, this is an hold hack that was made because ebtables is
very feature-limited.

However, as I mentioned before I don't think there is anything
we can do here except audit all affected nft expressions and ip6tables
matches and add this check where needed.  ip6t_rpfilter.c comes to mind.

In any case your patch looks ok to me.

> (Sorry for the delay, I did not received the previous e-mails.
> Please include me in to/cc.)

Sorry about that.

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/3] capability: introduce CAP_BPF and CAP_TRACING
From: Jesper Dangaard Brouer @ 2019-08-29 20:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Alexei Starovoitov, luto, davem, peterz,
	rostedt, netdev, bpf, kernel-team, linux-api, brouer
In-Reply-To: <87imqfhmo2.fsf@toke.dk>

On Thu, 29 Aug 2019 20:05:49 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Thu, Aug 29, 2019 at 09:44:18AM +0200, Toke Høiland-Jørgensen wrote:  
> >> Alexei Starovoitov <ast@kernel.org> writes:
> >>   
> >> > CAP_BPF allows the following BPF operations:
> >> > - Loading all types of BPF programs
> >> > - Creating all types of BPF maps except:
> >> >    - stackmap that needs CAP_TRACING
> >> >    - devmap that needs CAP_NET_ADMIN
> >> >    - cpumap that needs CAP_SYS_ADMIN  
> >> 
> >> Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap?  
> >
> > Currently it's cap_sys_admin and I think it should stay this way
> > because it creates kthreads.  
> 
> Ah, right. I can sorta see that makes sense because of the kthreads, but
> it also means that you can use all of XDP *except* cpumap with
> CAP_NET_ADMIN+CAP_BPF. That is bound to create confusion, isn't it?
 
Hmm... I see 'cpumap' primarily as a network stack feature.  It is about
starting the network stack on a specific CPU, allocating and building
SKBs on that remote CPU.  It can only be used together with XDP_REDIRECT.
I would prefer CAP_NET_ADMIN like the devmap, to keep the XDP
capabilities consistent.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v2 1/1] netfilter: nf_tables: fib: Drop IPV6 packages if IPv6 is disabled on boot
From: Leonardo Bras @ 2019-08-29 20:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, David S. Miller
  Cc: netfilter-devel, coreteam, netdev, linux-kernel, Jozsef Kadlecsik,
	Alexey Kuznetsov, Hideaki YOSHIFUJI
In-Reply-To: <20190821141505.2394-1-leonardo@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 336 bytes --]

> Thats a good point -- Leonardo, is the
> "net.bridge.bridge-nf-call-ip6tables" sysctl on?

Running
# sudo sysctl -a
I can see:
net.bridge.bridge-nf-call-ip6tables = 1
 
So this packets are sent to host iptables for processing?


(Sorry for the delay, I did not received the previous e-mails.
Please include me in to/cc.)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next] tools: libbpf: update extended attributes version of bpf_object__open()
From: Song Liu @ 2019-08-29 20:02 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Martin Lau,
	Yonghong Song, Networking, bpf, linux-kernel@vger.kernel.org
In-Reply-To: <20190815000330.12044-1-a.s.protopopov@gmail.com>



> On Aug 14, 2019, at 5:03 PM, Anton Protopopov <a.s.protopopov@gmail.com> wrote:
> 

[...]

> 
> 
> int bpf_object__unload(struct bpf_object *obj)
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index e8f70977d137..634f278578dd 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -63,8 +63,13 @@ LIBBPF_API libbpf_print_fn_t libbpf_set_print(libbpf_print_fn_t fn);
> struct bpf_object;
> 
> struct bpf_object_open_attr {
> -	const char *file;
> +	union {
> +		const char *file;
> +		const char *obj_name;
> +	};
> 	enum bpf_prog_type prog_type;
> +	void *obj_buf;
> +	size_t obj_buf_sz;
> };

I think this would break dynamically linked libbpf. No?

Thanks,
Song


^ permalink raw reply

* Re: linux-next: Tree for Aug 29 (mlx5)
From: Randy Dunlap @ 2019-08-29 19:55 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, netdev@vger.kernel.org, Saeed Mahameed,
	Leon Romanovsky
In-Reply-To: <3cbf3e88-53b5-0eb3-9863-c4031b9aed9f@infradead.org>

On 8/29/19 12:54 PM, Randy Dunlap wrote:
> On 8/29/19 4:08 AM, Stephen Rothwell wrote:
>> Hi all,
>>
>> Changes since 20190828:
>>
> 
> 
> on x86_64:
> when CONFIG_PCI_HYPERV=m

and CONFIG_PCI_HYPERV_INTERFACE=m

> and mxlx5 is builtin (=y).
> 
> ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function `mlx5_unload':
> main.c:(.text+0x5d): undefined reference to `mlx5_hv_vhca_cleanup'
> ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function `mlx5_cleanup_once':
> main.c:(.text+0x158): undefined reference to `mlx5_hv_vhca_destroy'
> ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function `mlx5_load_one':
> main.c:(.text+0x4191): undefined reference to `mlx5_hv_vhca_create'
> ld: main.c:(.text+0x4772): undefined reference to `mlx5_hv_vhca_init'
> ld: main.c:(.text+0x4b07): undefined reference to `mlx5_hv_vhca_cleanup'
> 
> 


-- 
~Randy

^ permalink raw reply

* Re: linux-next: Tree for Aug 29 (mlx5)
From: Randy Dunlap @ 2019-08-29 19:54 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, netdev@vger.kernel.org, Saeed Mahameed,
	Leon Romanovsky
In-Reply-To: <20190829210845.41a9e193@canb.auug.org.au>

On 8/29/19 4:08 AM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20190828:
> 


on x86_64:
when CONFIG_PCI_HYPERV=m
and mxlx5 is builtin (=y).

ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function `mlx5_unload':
main.c:(.text+0x5d): undefined reference to `mlx5_hv_vhca_cleanup'
ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function `mlx5_cleanup_once':
main.c:(.text+0x158): undefined reference to `mlx5_hv_vhca_destroy'
ld: drivers/net/ethernet/mellanox/mlx5/core/main.o: in function `mlx5_load_one':
main.c:(.text+0x4191): undefined reference to `mlx5_hv_vhca_create'
ld: main.c:(.text+0x4772): undefined reference to `mlx5_hv_vhca_init'
ld: main.c:(.text+0x4b07): undefined reference to `mlx5_hv_vhca_cleanup'


-- 
~Randy

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox