Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net regression] "fib_rules: move common handling of newrule delrule msgs into fib_nl2rule" breaks suppress_prefixlength
From: Roopa Prabhu @ 2018-06-25 15:23 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev
In-Reply-To: <CAHmME9rdxf+577u-=qx1Ss1YAz_zOPAHa6TM6ThewtybBE_R_g@mail.gmail.com>

On Sat, Jun 23, 2018 at 8:46 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hey Roopa,
>
> On a kernel with a minimal networking config,
> CONFIG_IP_MULTIPLE_TABLES appears to be broken for certain rules after
> f9d4b0c1e9695e3de7af3768205bacc27312320c.
>
> Try, for example, running:
>
> $ ip -4 rule add table main suppress_prefixlength 0
>
> It returns with EEXIST.
>
> Perhaps the reason is that the new rule_find function does not match
> on suppress_prefixlength? However, rule_exist from before didn't do
> that either. I'll keep playing and see if I can track it down myself,
> but thought I should let you know first.

I am surprised at that also. I cannot find prior rule_exist looking at
suppress_prefixlength.
I will dig deeper also today. But your patch LGTM with a small change
I commented on it.

>
> A relevant .config can be found at https://א.cc/iq5HoUY0
>

thanks.

^ permalink raw reply

* [PATCH net] netfilter: nf_log: don't hold nf_log_mutex during user access
From: Jann Horn @ 2018-06-25 15:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, netfilter-devel, coreteam, jannh
  Cc: netdev, linux-kernel, security

The old code would indefinitely block other users of nf_log_mutex if
a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
region. Fix it by moving proc_dostring() out of the locked region.

This is a followup to commit 266d07cb1c9a ("netfilter: nf_log: fix
sleeping function called from invalid context"), which changed this code
from using rcu_read_lock() to taking nf_log_mutex.

Fixes: 266d07cb1c9a ("netfilter: nf_log: fix sleeping function calle[...]")
Signed-off-by: Jann Horn <jannh@google.com>
---
 net/netfilter/nf_log.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 426457047578..95b92954b896 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -442,14 +442,17 @@ static int nf_log_proc_dostring(struct ctl_table *table, int write,
 		rcu_assign_pointer(net->nf.nf_loggers[tindex], logger);
 		mutex_unlock(&nf_log_mutex);
 	} else {
+		struct ctl_table tmp = *table;
+
+		tmp.data = buf;
 		mutex_lock(&nf_log_mutex);
 		logger = nft_log_dereference(net->nf.nf_loggers[tindex]);
 		if (!logger)
-			table->data = "NONE";
+			strlcpy(buf, "NONE", sizeof(buf));
 		else
-			table->data = logger->name;
-		r = proc_dostring(table, write, buffer, lenp, ppos);
+			strlcpy(buf, logger->name, sizeof(buf));
 		mutex_unlock(&nf_log_mutex);
+		r = proc_dostring(&tmp, write, buffer, lenp, ppos);
 	}
 
 	return r;
-- 
2.18.0.rc2.346.g013aa6912e-goog

^ permalink raw reply related

* Re: Suspend of SDIO function devices
From: Ulf Hansson @ 2018-06-25 15:00 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Chris Ball, linux-mmc@vger.kernel.org,
	libertas-dev@lists.infradead.org, linux-wireless,
	netdev@vger.kernel.org
In-Reply-To: <b8a41e3a-7100-ae17-1275-d1b40a026201@zonque.org>

On 24 June 2018 at 22:46, Daniel Mack <daniel@zonque.org> wrote:
> Hi,
>
> I'm currently looking into the suspend callbacks of drivers of hardware that
> use an SDIO interface, specifically the libertas_sdio driver:
>
>    drivers/net/wireless/marvell/libertas/if_sdio.c

Great news, I am happy to help!

>
> The comments in if_sdio_suspend() suggest that by returning -ENOSYS due to
> runtime-dependant circumstances, the MMC core will remove the card entirely
> at suspend time. I then searched for the bits that do that and failed, until
> I came across this old commit, which first appeared in 3.16:
>
>    573185cc7e6 mmc: core: Invoke sdio func driver's PM callbacks from the
> sdio bus

Oh, so it's been broken for quite some time. :-(

My bad!

>
> Before that commit, the mmc core did in fact invoke the card's .suspend()
> callback manually and if it returned a non-zero result, it would remove the
> card. Now that the generic pm functions are in place, this does no longer
> happen because the host and its clients are independent entities.
> Consequently, systems fail to suspend when the libertas_sdio module is
> loaded.
>
> The pm notifier code in drivers/mmc/core/core.c does still handle cases
> where no pm functions are provided at all (in which case it removes the
> card), but it doesn't handle -ENOSYS return values at runtime.

Correctly observed!

>
> Now I'm wondering how this is supposed to work, and which end needs fixing.
> The mmc/sdio core by restoring the old logic from before 573185cc7e6, or the
> libertas driver.

I believe the proper solution is to fix the libertas driver. At least
we don't want to go back to the previous solution of returning -ENOSYS
from SDIO drivers.

However, let's see what fits best here.

>
> The platform I'm working on does not retain power for the SDIO slaves, so a
> complete re-init is necessary after resume.

Right.

>
> Please advise, I'm happy to test approaches and send patches.

>From a top level point of view, I think this needs to be changed:

1)
In cases when the libertas sdio driver's ->suspend() callback, thinks
of returning -ENOSYS, it should instead call if_sdio_power_off().
Depending if if_sdio_power_save() has already been called, this shall
be skipped.

The important thing here is to disable the SDIO func device and to
release the SDIO irq.

2)
During resume, depending on whether the earlier ->suspend() callback
invoked if_sdio_power_off(),  libertas sdio driver's ->resume()
callback should call if_sdio_power_on().

This should re-initiate the libertas sdio device and re-program the
firmware. To complete these actions, the firmware file also needs to
be fetched, which requires file system accesses also to be resumed.

We also need to wait for the firmware programming to be completed,
hence also do a "wait_event(card->pwron_waitq, priv->fw_ready);" from
somewhere.

Kind regards
Uffe

^ permalink raw reply

* Re: [PATCH] fib_rules: match rules based on suppress_* properties too
From: Roopa Prabhu @ 2018-06-25 14:58 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev
In-Reply-To: <20180623155930.25983-1-Jason@zx2c4.com>

On Sat, Jun 23, 2018 at 8:59 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Two rules with different values of suppress_prefix or suppress_ifgroup
> are not the same. This fixes an -EEXIST when running:
>
>    $ ip -4 rule add table main suppress_prefixlength 0
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
> ---
>  net/core/fib_rules.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index 126ffc5bc630..665799311b98 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -416,6 +416,12 @@ static struct fib_rule *rule_find(struct fib_rules_ops *ops,
>                 if (rule->mark && r->mark != rule->mark)
>                         continue;
>
> +               if (r->suppress_ifgroup != rule->suppress_ifgroup)
> +                       continue;
> +
> +               if (r->suppress_prefixlen != rule->suppress_prefixlen)
> +                       continue;
> +
>                 if (rule->mark_mask && r->mark_mask != rule->mark_mask)
>                         continue;
>

Can you please change the check to compare only if the new rule has
the attributes set ?

eg:

if (rule->suppress_ifgroup != -1 && (r->suppress_ifgroup !=
rule->suppress_ifgroup))

same thing for suppress_prefixlen

^ permalink raw reply

* Re: [PATCH rdma-next 09/12] RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq
From: Jason Gunthorpe @ 2018-06-25 14:58 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Hadar Hen Zion, Matan Barak,
	Michael J Ruhl, Noa Osherovich, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180625081041.GI17747@mtr-leonro.mtl.com>

On Mon, Jun 25, 2018 at 11:10:41AM +0300, Leon Romanovsky wrote:
> On Sun, Jun 24, 2018 at 01:56:24PM -0600, Jason Gunthorpe wrote:
> > On Sun, Jun 24, 2018 at 11:23:50AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@mellanox.com>
> > >
> > > [   61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34
> > > [   61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int'
> > > [   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
> > > [   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
> > > [   61.188315] Call Trace:
> > > [   61.188661]  dump_stack+0xc7/0x13b
> > > [   61.190427]  ubsan_epilogue+0x9/0x49
> > > [   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
> > > [   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
> > > [   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
> > > [   61.213892]  ib_uverbs_write+0x77e/0xae0
> > > [   61.248018]  vfs_write+0x121/0x3b0
> > > [   61.249831]  ksys_write+0xa1/0x120
> > > [   61.254024]  do_syscall_64+0x7c/0x2a0
> > > [   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [   61.259211] RIP: 0033:0x7f54bab70e99
> > > [   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89
> > > [   61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > [   61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99
> > > [   61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
> > > [   61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002
> > > [   61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0
> > > [   61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000
> > >
> > > Cc: <stable@vger.kernel.org> # 4.7
> > > Fixes: 79b20a6c3014 ("IB/mlx5: Add receive Work Queue verbs")
> > > Cc: syzkaller <syzkaller@googlegroups.com>
> > > Reported-by: Noa Osherovich <noaos@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >  drivers/infiniband/hw/mlx5/qp.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
> > > index 6034a670859f..8e40263fd40e 100644
> > > +++ b/drivers/infiniband/hw/mlx5/qp.c
> > > @@ -5377,7 +5377,11 @@ static int set_user_rq_size(struct mlx5_ib_dev *dev,
> > >
> > >  	rwq->wqe_count = ucmd->rq_wqe_count;
> > >  	rwq->wqe_shift = ucmd->rq_wqe_shift;
> > > -	rwq->buf_size = (rwq->wqe_count << rwq->wqe_shift);
> > > +	rwq->buf_size =
> > > +		shift_overflow((size_t)rwq->wqe_count, (size_t)rwq->wqe_shift);
> >
> > The casts are redundant, the function argument is already size_t so
> > implicit promotion is guaranteed.
> 
> rwq->wqe_count and rwq->wqe_shift are declared as u32 and not as size_t.
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/hw/mlx5/mlx5_ib.h#L296

It doesn't matter, passing them to a function accepting size_t does
implicit promotion, the same as the explicit cast.

Jason

^ permalink raw reply

* [PATCH] selftests: bpf: enable NET_SCHED
From: Anders Roxell @ 2018-06-25 14:56 UTC (permalink / raw)
  To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Anders Roxell

CONFIG_NET_SCHED wasn't enabled in arm64's defconfig only for x86.
So bpf/test_tunnel.sh tests fails with:
RTNETLINK answers: Operation not supported
RTNETLINK answers: Operation not supported
We have an error talking to the kernel, -1
Enable NET_SCHED and more tests passes.

Fixes: 3bce593ac06b ("selftests: bpf: config: add config fragments")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
 tools/testing/selftests/bpf/config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 1e0c547caf3c..7a6d92562dc6 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -6,6 +6,7 @@ CONFIG_TEST_BPF=m
 CONFIG_CGROUP_BPF=y
 CONFIG_NETDEVSIM=m
 CONFIG_NET_CLS_ACT=y
+CONFIG_NET_SCHED=y
 CONFIG_NET_SCH_INGRESS=y
 CONFIG_NET_IPIP=y
 CONFIG_IPV6=y
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: kbuild test robot @ 2018-06-25 14:52 UTC (permalink / raw)
  To: Ka-Cheong Poon; +Cc: kbuild-all, netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <7f4f460079d3d78a18f7d759488048798e99c4db.1529922794.git.ka-cheong.poon@oracle.com>

Hi Ka-Cheong,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Ka-Cheong-Poon/rds-IPv6-support/20180625-190047
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   net/rds/tcp_listen.c:86:22: sparse: expression using sizeof(void)
>> net/rds/tcp_listen.c:288:33: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [usertype] sin6_port @@    got unsignedrestricted __be16 [usertype] sin6_port @@
   net/rds/tcp_listen.c:288:33:    expected restricted __be16 [usertype] sin6_port
   net/rds/tcp_listen.c:288:33:    got unsigned short [unsigned] [usertype] <noident>
>> net/rds/tcp_listen.c:295:38: sparse: incorrect type in assignment (different base types) @@    expected restricted __be32 [usertype] s_addr @@    got ricted __be32 [usertype] s_addr @@
   net/rds/tcp_listen.c:295:38:    expected restricted __be32 [usertype] s_addr
   net/rds/tcp_listen.c:295:38:    got unsigned long [unsigned] <noident>
>> net/rds/tcp_listen.c:296:31: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [usertype] sin_port @@    got unsignedrestricted __be16 [usertype] sin_port @@
   net/rds/tcp_listen.c:296:31:    expected restricted __be16 [usertype] sin_port
   net/rds/tcp_listen.c:296:31:    got unsigned short [unsigned] [usertype] <noident>

vim +288 net/rds/tcp_listen.c

   258	
   259	struct socket *rds_tcp_listen_init(struct net *net, bool isv6)
   260	{
   261		struct socket *sock = NULL;
   262		struct sockaddr_storage ss;
   263		struct sockaddr_in6 *sin6;
   264		struct sockaddr_in *sin;
   265		int addr_len;
   266		int ret;
   267	
   268		ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
   269				       IPPROTO_TCP, &sock);
   270		if (ret < 0) {
   271			rdsdebug("could not create %s listener socket: %d\n",
   272				 isv6 ? "IPv6" : "IPv4", ret);
   273			goto out;
   274		}
   275	
   276		sock->sk->sk_reuse = SK_CAN_REUSE;
   277		rds_tcp_nonagle(sock);
   278	
   279		write_lock_bh(&sock->sk->sk_callback_lock);
   280		sock->sk->sk_user_data = sock->sk->sk_data_ready;
   281		sock->sk->sk_data_ready = rds_tcp_listen_data_ready;
   282		write_unlock_bh(&sock->sk->sk_callback_lock);
   283	
   284		if (isv6) {
   285			sin6 = (struct sockaddr_in6 *)&ss;
   286			sin6->sin6_family = PF_INET6;
   287			sin6->sin6_addr = in6addr_any;
 > 288			sin6->sin6_port = (__force u16)htons(RDS_TCP_PORT);
   289			sin6->sin6_scope_id = 0;
   290			sin6->sin6_flowinfo = 0;
   291			addr_len = sizeof(*sin6);
   292		} else {
   293			sin = (struct sockaddr_in *)&ss;
   294			sin->sin_family = PF_INET;
 > 295			sin->sin_addr.s_addr = INADDR_ANY;
 > 296			sin->sin_port = (__force u16)htons(RDS_TCP_PORT);
   297			addr_len = sizeof(*sin);
   298		}
   299	
   300		ret = sock->ops->bind(sock, (struct sockaddr *)&ss, addr_len);
   301		if (ret < 0) {
   302			rdsdebug("could not bind %s listener socket: %d\n",
   303				 isv6 ? "IPv6" : "IPv4", ret);
   304			goto out;
   305		}
   306	
   307		ret = sock->ops->listen(sock, 64);
   308		if (ret < 0)
   309			goto out;
   310	
   311		return sock;
   312	out:
   313		if (sock)
   314			sock_release(sock);
   315		return NULL;
   316	}
   317	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()
From: Paul Moore @ 2018-06-25 14:49 UTC (permalink / raw)
  To: davem; +Cc: viro, Paul Moore, netdev, selinux, linux-security-module
In-Reply-To: <20180624.164837.37612664745856114.davem@davemloft.net>

On Sun, Jun 24, 2018 at 3:48 AM David Miller <davem@davemloft.net> wrote:
>
> From: Al Viro <viro@ZenIV.linux.org.uk>
> Date: Sat, 23 Jun 2018 23:21:07 +0100
>
> > BTW, I wonder if the life would be simpler with do_ipv6_setsockopt() doing
> > the copy-in and verifying ipv6_optlen(*hdr) <= newoptlen; that would've
> > simplified ipv6_renew_option{,s}() quite a bit and completely eliminated
> > ipv6_renew_options_kern()...
>
> I agree that this makes things a lot simpler.

I had looked at moving the userspace copy up, but feared it was a bit
too invasive.  It sounds like you are open to the idea so I'll code
something up.

> One thing that drives me crazy though is this inherit stuff:
>
> > +     ipv6_renew_option(newtype == IPV6_HOPOPTS ? newopt :
> > +                             opt ? opt->hopopt : NULL,
>
> Why don't we pass the type into ipv6_renew_option() and have it
> do this pointer dance instead?
>
> That's going to definitely be easier to read.

I agree, that struck me as a little odd.  I'll rework that too.  I'll
send you guys something this week to take a look at.

Thanks.

> I don't know enough about this code to give feedback about the
> option length handling wrt. copies, sorry.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-25 14:49 UTC (permalink / raw)
  To: davem, rds-devel, santosh.shilimkar, netdev; +Cc: syzkaller-bugs
In-Reply-To: <1529934085-181126-1-git-send-email-sowmini.varadhan@oracle.com>

On (06/25/18 06:41), Sowmini Varadhan wrote:
  :
> Add the changes aligned with the changes from
> commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
> netns/module teardown and rds connection/workq management") for
> rds_loop_transport

FWIW, I am optimistic that this will take care of a number
of the use-after-free panics reported by syzbot (I have not
marked the patch with the recommended syzkaller Reported-by 
tags because I was not able to reproduce each original issue, 
but inspection of the traces suggests this missing patch may 
be behind the races that cause the reports).

--Sowmini

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: free skb if nlsk pointer is NULL
From: Steffen Klassert @ 2018-06-25 14:45 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20180625120007.13345-1-fw@strlen.de>

On Mon, Jun 25, 2018 at 02:00:07PM +0200, Florian Westphal wrote:
> nlmsg_multicast() always frees the skb, so in case we cannot call
> it we must do that ourselves.
> 
> Fixes: 21ee543edc0dea ("xfrm: fix race between netns cleanup and state expire notification")
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied, thanks Florian!

^ permalink raw reply

* [PATCH net-next v2] selftests: net: Test headroom handling of ip6_gre devices
From: Petr Machata @ 2018-06-25 14:43 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah, u9012063

Commit 5691484df961 ("net: ip6_gre: Fix headroom request in
ip6erspan_tunnel_xmit()") and commit 01b8d064d58b ("net: ip6_gre:
Request headroom in __gre6_xmit()") fix problems in reserving headroom
in the packets tunneled through ip6gre/tap and ip6erspan netdevices.

These two patches included snippets that reproduced the issues. This
patch elevates the snippets to a full-fledged test case.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Petr Machata <petrm@mellanox.com>
---

Notes:
    Changes between v1 and v2:
    
    - Move tunnel construction to setup() and destruction to cleanup().

 tools/testing/selftests/net/ip6_gre_headroom.sh | 65 +++++++++++++++++++++++++
 1 file changed, 65 insertions(+)
 create mode 100755 tools/testing/selftests/net/ip6_gre_headroom.sh

diff --git a/tools/testing/selftests/net/ip6_gre_headroom.sh b/tools/testing/selftests/net/ip6_gre_headroom.sh
new file mode 100755
index 000000000000..5b41e8bb6e2d
--- /dev/null
+++ b/tools/testing/selftests/net/ip6_gre_headroom.sh
@@ -0,0 +1,65 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that enough headroom is reserved for the first packet passing through an
+# IPv6 GRE-like netdevice.
+
+setup_prepare()
+{
+	ip link add h1 type veth peer name swp1
+	ip link add h3 type veth peer name swp3
+
+	ip link set dev h1 up
+	ip address add 192.0.2.1/28 dev h1
+
+	ip link add dev vh3 type vrf table 20
+	ip link set dev h3 master vh3
+	ip link set dev vh3 up
+	ip link set dev h3 up
+
+	ip link set dev swp3 up
+	ip address add dev swp3 2001:db8:2::1/64
+	ip address add dev swp3 2001:db8:2::3/64
+
+	ip link set dev swp1 up
+	tc qdisc add dev swp1 clsact
+
+	ip link add name er6 type ip6erspan \
+	   local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
+	ip link set dev er6 up
+
+	ip link add name gt6 type ip6gretap \
+	   local 2001:db8:2::3 remote 2001:db8:2::4
+	ip link set dev gt6 up
+
+	sleep 1
+}
+
+cleanup()
+{
+	ip link del dev gt6
+	ip link del dev er6
+	ip link del dev swp1
+	ip link del dev swp3
+	ip link del dev vh3
+}
+
+test_headroom()
+{
+	local type=$1; shift
+	local tundev=$1; shift
+
+	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
+		action mirred egress mirror dev $tundev
+	ping -I h1 192.0.2.2 -c 1 -w 2 &> /dev/null
+	tc filter del dev swp1 ingress pref 1000
+
+	# If it doesn't panic, it passes.
+	printf "TEST: %-60s  [PASS]\n" "$type headroom"
+}
+
+trap cleanup EXIT
+
+setup_prepare
+test_headroom ip6gretap gt6
+test_headroom ip6erspan er6
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH ipsec-next] xfrm: policy: remove pcpu policy cache
From: Steffen Klassert @ 2018-06-25 14:42 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20180625115753.13161-1-fw@strlen.de>

On Mon, Jun 25, 2018 at 01:57:53PM +0200, Florian Westphal wrote:
> Kristian Evensen says:
>   In a project I am involved in, we are running ipsec (Strongswan) on
>   different mt7621-based routers. Each router is configured as an
>   initiator and has around ~30 tunnels to different responders (running
>   on misc. devices). Before the flow cache was removed (kernel 4.9), we
>   got a combined throughput of around 70Mbit/s for all tunnels on one
>   router. However, we recently switched to kernel 4.14 (4.14.48), and
>   the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
>   drop of around 20%. Reverting the flow cache removal restores, as
>   expected, performance levels to that of kernel 4.9.
> 
> When pcpu xdst exists, it has to be validated first before it can be
> used.
> 
> A negative hit thus increases cost vs. no-cache.
> 
> As number of tunnels increases, hit rate decreases so this pcpu caching
> isn't a viable strategy.
> 
> Furthermore, the xdst cache also needs to run with BH off, so when
> removing this the bh disable/enable pairs can be removed too.
> 
> Kristian tested a 4.14.y backport of this change and reported
> increased performance:
> 
>   In our tests, the throughput reduction has been reduced from around -20%
>   to -5%. We also see that the overall throughput is independent of the
>   number of tunnels, while before the throughput was reduced as the number
>   of tunnels increased.
> 
> Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Can you please rebase this to ipsec-next current?

It does not apply cleanly after the merge of the
xfrm interface patches.

Thanks!

^ permalink raw reply

* [PATCH 4/4] net: lan78xx: Use s/w csum check on VLANs without tag stripping
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

Observations of VLANs dropping packets due to invalid
checksums when not offloading VLAN tag receive.
With VLAN tag stripping enabled no issue is observed.

Drop back to s/w checksums if VLAN offload is disabled.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index f72a8f5..6f2ea84 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3052,8 +3052,13 @@ static void lan78xx_rx_csum_offload(struct lan78xx_net *dev,
 				    struct sk_buff *skb,
 				    u32 rx_cmd_a, u32 rx_cmd_b)
 {
+	/* HW Checksum offload appears to be flawed if used when not stripping
+	 * VLAN headers. Drop back to S/W checksums under these conditions.
+	 */
 	if (!(dev->net->features & NETIF_F_RXCSUM) ||
-	    unlikely(rx_cmd_a & RX_CMD_A_ICSM_)) {
+	    unlikely(rx_cmd_a & RX_CMD_A_ICSM_) ||
+	    ((rx_cmd_a & RX_CMD_A_FVTG_) &&
+	     !(dev->net->features & NETIF_F_HW_VLAN_CTAG_RX))) {
 		skb->ip_summed = CHECKSUM_NONE;
 	} else {
 		skb->csum = ntohs((u16)(rx_cmd_b >> RX_CMD_B_CSUM_SHIFT_));
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH RFC v2 ipsec-next 0/3] Virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-25 14:34 UTC (permalink / raw)
  To: netdev, David Miller
  Cc: Eyal Birger, Antony Antony, Benedict Wong, Lorenzo Colitti,
	Shannon Nelson
In-Reply-To: <20180612075610.2000-1-steffen.klassert@secunet.com>

On Tue, Jun 12, 2018 at 09:56:07AM +0200, Steffen Klassert wrote:
> This patchset introduces new virtual xfrm interfaces.
> The design of virtual xfrm interfaces interfaces was
> discussed at the Linux IPsec workshop 2018. This patchset
> implements these interfaces as the IPsec userspace and
> kernel developers agreed. The purpose of these interfaces
> is to overcome the design limitations that the existing
> VTI devices have.
> 
> The main limitations that we see with the current VTI are the
> following:
> 
> - VTI interfaces are L3 tunnels with configurable endpoints.
>   For xfrm, the tunnel endpoint are already determined by the SA.
>   So the VTI tunnel endpoints must be either the same as on the
>   SA or wildcards. In case VTI tunnel endpoints are same as on
>   the SA, we get a one to one correlation between the SA and
>   the tunnel. So each SA needs its own tunnel interface.
> 
>   On the other hand, we can have only one VTI tunnel with
>   wildcard src/dst tunnel endpoints in the system because the
>   lookup is based on the tunnel endpoints. The existing tunnel
>   lookup won't work with multiple tunnels with wildcard
>   tunnel endpoints. Some usecases require more than on
>   VTI tunnel of this type, for example if somebody has multiple
>   namespaces and every namespace requires such a VTI.
> 
> - VTI needs separate interfaces for IPv4 and IPv6 tunnels.
>   So when routing to a VTI, we have to know to which address
>   family this traffic class is going to be encapsulated.
>   This is a lmitation because it makes routing more complex
>   and it is not always possible to know what happens behind the
>   VTI, e.g. when the VTI is move to some namespace.
> 
> - VTI works just with tunnel mode SAs. We need generic interfaces
>   that ensures transfomation, regardless of the xfrm mode and
>   the encapsulated address family.
> 
> - VTI is configured with a combination GRE keys and xfrm marks.
>   With this we have to deal with some extra cases in the generic
>   tunnel lookup because the GRE keys on the VTI are actually
>   not GRE keys, the GRE keys were just reused for something else.
>   All extensions to the VTI interfaces would require to add
>   even more complexity to the generic tunnel lookup.
> 
> To overcome this, we started with the following design goal:
> 
> - It should be possible to tunnel IPv4 and IPv6 through the same
>   interface.
> 
> - No limitation on xfrm mode (tunnel, transport and beet).
> 
> - Should be a generic virtual interface that ensures IPsec
>   transformation, no need to know what happens behind the
>   interface.
> 
> - Interfaces should be configured with a new key that must match a
>   new policy/SA lookup key.
> 
> - The lookup logic should stay in the xfrm codebase, no need to
>   change or extend generic routing and tunnel lookups.
> 
> - Should be possible to use IPsec hardware offloads of the underlying
>   interface.
> 
> Changes from v1:
> 
> - Document the limitations of VTI interfaces and the design of
>   the new xfrm interfaces more explicit in the commit messages.
> 
> - No code changes.

I have not got any further comments, so applied to ipsec-next.

^ permalink raw reply

* Re: [PATCH RFC ipsec-next] xfrm: Extend the output_mark to support input direction and masking.
From: Steffen Klassert @ 2018-06-25 14:31 UTC (permalink / raw)
  To: netdev; +Cc: Tobias Brunner, Eyal Birger, Lorenzo Colitti
In-Reply-To: <20180615065514.bmy6tamr4fqivpyp@gauss3.secunet.de>

On Fri, Jun 15, 2018 at 08:55:14AM +0200, Steffen Klassert wrote:
> We already support setting an output mark at the xfrm_state,
> unfortunately this does not support the input direction and
> masking the marks that will be applied to the skb. This change
> adds support applying a masked value in both directions.
> 
> The existing XFRMA_OUTPUT_MARK number is reused for this purpose
> and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.
> 
> An additional XFRMA_SET_MARK_MASK attribute is added for setting the
> mask. If the attribute mask not provided, it is set to 0xffffffff,
> keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.
> 
> Co-developed-by: Tobias Brunner <tobias@strongswan.org>
> Co-developed-by: Eyal Birger <eyal.birger@gmail.com>
> Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> Signed-off-by: Tobias Brunner <tobias@strongswan.org>
> Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

This is now applied to ipsec-next.

^ permalink raw reply

* [bpf-next PATCH 2/2] samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrs
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
	Alexei Starovoitov
In-Reply-To: <152993682254.8835.8864318933370018087.stgit@firesoul>

XDP_TX requires also changing the MAC-addrs, else some hardware
may drop the TX packet before reaching the wire.  This was
observed with driver mlx5.

If xdp_rxq_info select --action XDP_TX the swapmac functionality
is activated.  It is also possible to manually enable via cmdline
option --swapmac.  This is practical if wanting to measure the
overhead of writing/updating payload for other action types.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 samples/bpf/xdp_rxq_info_kern.c |   26 +++++++++++++++++++++++++-
 samples/bpf/xdp_rxq_info_user.c |   11 +++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/xdp_rxq_info_kern.c b/samples/bpf/xdp_rxq_info_kern.c
index 61af6210df2f..222a83eed1cb 100644
--- a/samples/bpf/xdp_rxq_info_kern.c
+++ b/samples/bpf/xdp_rxq_info_kern.c
@@ -21,6 +21,7 @@ struct config {
 enum cfg_options_flags {
 	NO_TOUCH = 0x0U,
 	READ_MEM = 0x1U,
+	SWAP_MAC = 0x2U,
 };
 struct bpf_map_def SEC("maps") config_map = {
 	.type		= BPF_MAP_TYPE_ARRAY,
@@ -52,6 +53,23 @@ struct bpf_map_def SEC("maps") rx_queue_index_map = {
 	.max_entries	= MAX_RXQs + 1,
 };
 
+static __always_inline
+void swap_src_dst_mac(void *data)
+{
+	unsigned short *p = data;
+	unsigned short dst[3];
+
+	dst[0] = p[0];
+	dst[1] = p[1];
+	dst[2] = p[2];
+	p[0] = p[3];
+	p[1] = p[4];
+	p[2] = p[5];
+	p[3] = dst[0];
+	p[4] = dst[1];
+	p[5] = dst[2];
+}
+
 SEC("xdp_prog0")
 int  xdp_prognum0(struct xdp_md *ctx)
 {
@@ -98,7 +116,7 @@ int  xdp_prognum0(struct xdp_md *ctx)
 		rxq_rec->issue++;
 
 	/* Default: Don't touch packet data, only count packets */
-	if (unlikely(config->options & READ_MEM)) {
+	if (unlikely(config->options & (READ_MEM|SWAP_MAC))) {
 		struct ethhdr *eth = data;
 
 		if (eth + 1 > data_end)
@@ -107,6 +125,12 @@ int  xdp_prognum0(struct xdp_md *ctx)
 		/* Avoid compiler removing this: Drop non 802.3 Ethertypes */
 		if (ntohs(eth->h_proto) < ETH_P_802_3_MIN)
 			return XDP_ABORTED;
+
+		/* XDP_TX requires changing MAC-addrs, else HW may drop.
+		 * Can also be enabled with --swapmac (for test purposes)
+		 */
+		if (unlikely(config->options & SWAP_MAC))
+			swap_src_dst_mac(data);
 	}
 
 	return config->action;
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index 435485d4f49e..248a7eab9531 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -51,6 +51,7 @@ static const struct option long_options[] = {
 	{"no-separators", no_argument,		NULL, 'z' },
 	{"action",	required_argument,	NULL, 'a' },
 	{"readmem", 	no_argument,		NULL, 'r' },
+	{"swapmac", 	no_argument,		NULL, 'm' },
 	{0, 0, NULL,  0 }
 };
 
@@ -72,6 +73,7 @@ struct config {
 enum cfg_options_flags {
 	NO_TOUCH = 0x0U,
 	READ_MEM = 0x1U,
+	SWAP_MAC = 0x2U,
 };
 #define XDP_ACTION_MAX (XDP_TX + 1)
 #define XDP_ACTION_MAX_STRLEN 11
@@ -119,6 +121,8 @@ static char* options2str(enum cfg_options_flags flag)
 {
 	if (flag == NO_TOUCH)
 		return "no_touch";
+	if (flag & SWAP_MAC)
+		return "swapmac";
 	if (flag & READ_MEM)
 		return "read";
 	fprintf(stderr, "ERR: Unknown config option flags");
@@ -517,6 +521,9 @@ int main(int argc, char **argv)
 		case 'r':
 			cfg_options |= READ_MEM;
 			break;
+		case 'm':
+			cfg_options |= SWAP_MAC;
+			break;
 		case 'h':
 		error:
 		default:
@@ -543,6 +550,10 @@ int main(int argc, char **argv)
 		}
 	}
 	cfg.action = action;
+
+	/* XDP_TX requires changing MAC-addrs, else HW may drop */
+	if (action == XDP_TX)
+		cfg_options |= SWAP_MAC;
 	cfg.options = cfg_options;
 
 	/* Trick to pretty printf with thousands separators use %' */

^ permalink raw reply related

* [bpf-next PATCH 1/2] samples/bpf: extend xdp_rxq_info to read packet payload
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
	Alexei Starovoitov
In-Reply-To: <152993682254.8835.8864318933370018087.stgit@firesoul>

There is a cost associated with reading the packet data payload
that this test ignored.  Add option --read to allow enabling
reading part of the payload.

This sample/tool helps us analyse an issue observed with a NIC
mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4.

With no_touch of data:

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      0       14,465,157  0
XDP-RX CPU      1       14,464,728  0
XDP-RX CPU      2       14,465,283  0
XDP-RX CPU      3       14,465,282  0
XDP-RX CPU      4       14,464,159  0
XDP-RX CPU      5       14,465,379  0
XDP-RX CPU      total   86,789,992

When not touching data, we observe that the CPUs have idle cycles.
When reading data the CPUs are 100% busy in softirq.

With reading data:

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      0       9,620,639   0
XDP-RX CPU      1       9,489,843   0
XDP-RX CPU      2       9,407,854   0
XDP-RX CPU      3       9,422,289   0
XDP-RX CPU      4       9,321,959   0
XDP-RX CPU      5       9,395,242   0
XDP-RX CPU      total   56,657,828

The effect seen above is a result of cache-misses occuring when
more RXQs are being used.  Based on perf-event observations, our
conclusion is that the CPUs DDIO (Direct Data I/O) choose to
deliver packet into main memory, instead of L3-cache.  We also
found, that this can be mitigated by either using less RXQs or by
reducing NICs the RX-ring size.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 samples/bpf/xdp_rxq_info_kern.c |   19 +++++++++++++++++++
 samples/bpf/xdp_rxq_info_user.c |   34 ++++++++++++++++++++++++++++------
 2 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/samples/bpf/xdp_rxq_info_kern.c b/samples/bpf/xdp_rxq_info_kern.c
index 3fd209291653..61af6210df2f 100644
--- a/samples/bpf/xdp_rxq_info_kern.c
+++ b/samples/bpf/xdp_rxq_info_kern.c
@@ -4,6 +4,8 @@
  *  Example howto extract XDP RX-queue info
  */
 #include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/in.h>
 #include "bpf_helpers.h"
 
 /* Config setup from with userspace
@@ -14,6 +16,11 @@
 struct config {
 	__u32 action;
 	int ifindex;
+	__u32 options;
+};
+enum cfg_options_flags {
+	NO_TOUCH = 0x0U,
+	READ_MEM = 0x1U,
 };
 struct bpf_map_def SEC("maps") config_map = {
 	.type		= BPF_MAP_TYPE_ARRAY,
@@ -90,6 +97,18 @@ int  xdp_prognum0(struct xdp_md *ctx)
 	if (key == MAX_RXQs)
 		rxq_rec->issue++;
 
+	/* Default: Don't touch packet data, only count packets */
+	if (unlikely(config->options & READ_MEM)) {
+		struct ethhdr *eth = data;
+
+		if (eth + 1 > data_end)
+			return XDP_ABORTED;
+
+		/* Avoid compiler removing this: Drop non 802.3 Ethertypes */
+		if (ntohs(eth->h_proto) < ETH_P_802_3_MIN)
+			return XDP_ABORTED;
+	}
+
 	return config->action;
 }
 
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index e4e9ba52bff0..435485d4f49e 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -50,6 +50,7 @@ static const struct option long_options[] = {
 	{"sec",		required_argument,	NULL, 's' },
 	{"no-separators", no_argument,		NULL, 'z' },
 	{"action",	required_argument,	NULL, 'a' },
+	{"readmem", 	no_argument,		NULL, 'r' },
 	{0, 0, NULL,  0 }
 };
 
@@ -66,6 +67,11 @@ static void int_exit(int sig)
 struct config {
 	__u32 action;
 	int ifindex;
+	__u32 options;
+};
+enum cfg_options_flags {
+	NO_TOUCH = 0x0U,
+	READ_MEM = 0x1U,
 };
 #define XDP_ACTION_MAX (XDP_TX + 1)
 #define XDP_ACTION_MAX_STRLEN 11
@@ -109,6 +115,16 @@ static void list_xdp_actions(void)
 	printf("\n");
 }
 
+static char* options2str(enum cfg_options_flags flag)
+{
+	if (flag == NO_TOUCH)
+		return "no_touch";
+	if (flag & READ_MEM)
+		return "read";
+	fprintf(stderr, "ERR: Unknown config option flags");
+	exit(EXIT_FAIL);
+}
+
 static void usage(char *argv[])
 {
 	int i;
@@ -305,7 +321,7 @@ static __u64 calc_errs_pps(struct datarec *r,
 
 static void stats_print(struct stats_record *stats_rec,
 			struct stats_record *stats_prev,
-			int action)
+			int action, __u32 cfg_opt)
 {
 	unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
 	unsigned int nr_cpus = bpf_num_possible_cpus();
@@ -316,8 +332,8 @@ static void stats_print(struct stats_record *stats_rec,
 	int i;
 
 	/* Header */
-	printf("\nRunning XDP on dev:%s (ifindex:%d) action:%s\n",
-	       ifname, ifindex, action2str(action));
+	printf("\nRunning XDP on dev:%s (ifindex:%d) action:%s options:%s\n",
+	       ifname, ifindex, action2str(action), options2str(cfg_opt));
 
 	/* stats_global_map */
 	{
@@ -399,7 +415,7 @@ static inline void swap(struct stats_record **a, struct stats_record **b)
 	*b = tmp;
 }
 
-static void stats_poll(int interval, int action)
+static void stats_poll(int interval, int action, __u32 cfg_opt)
 {
 	struct stats_record *record, *prev;
 
@@ -410,7 +426,7 @@ static void stats_poll(int interval, int action)
 	while (1) {
 		swap(&prev, &record);
 		stats_collect(record);
-		stats_print(record, prev, action);
+		stats_print(record, prev, action, cfg_opt);
 		sleep(interval);
 	}
 
@@ -421,6 +437,7 @@ static void stats_poll(int interval, int action)
 
 int main(int argc, char **argv)
 {
+	__u32 cfg_options= NO_TOUCH ; /* Default: Don't touch packet memory */
 	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
@@ -435,6 +452,7 @@ int main(int argc, char **argv)
 	int interval = 2;
 	__u32 key = 0;
 
+
 	char action_str_buf[XDP_ACTION_MAX_STRLEN + 1 /* for \0 */] = { 0 };
 	int action = XDP_PASS; /* Default action */
 	char *action_str = NULL;
@@ -496,6 +514,9 @@ int main(int argc, char **argv)
 			action_str = (char *)&action_str_buf;
 			strncpy(action_str, optarg, XDP_ACTION_MAX_STRLEN);
 			break;
+		case 'r':
+			cfg_options |= READ_MEM;
+			break;
 		case 'h':
 		error:
 		default:
@@ -522,6 +543,7 @@ int main(int argc, char **argv)
 		}
 	}
 	cfg.action = action;
+	cfg.options = cfg_options;
 
 	/* Trick to pretty printf with thousands separators use %' */
 	if (use_separators)
@@ -542,6 +564,6 @@ int main(int argc, char **argv)
 		return EXIT_FAIL_XDP;
 	}
 
-	stats_poll(interval, action);
+	stats_poll(interval, action, cfg_options);
 	return EXIT_OK;
 }

^ permalink raw reply related

* [bpf-next PATCH 0/2] xdp/bpf: extend XDP samples/bpf xdp_rxq_info
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
	Alexei Starovoitov

While writing an article about XDP, the samples/bpf xdp_rxq_info
program were extended to cover some more use-cases.

---

Jesper Dangaard Brouer (2):
      samples/bpf: extend xdp_rxq_info to read packet payload
      samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrs


 samples/bpf/xdp_rxq_info_kern.c |   43 +++++++++++++++++++++++++++++++++++++
 samples/bpf/xdp_rxq_info_user.c |   45 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 82 insertions(+), 6 deletions(-)

^ permalink raw reply

* [PATCH 2/4] net: lan78xx: Add support for VLAN filtering.
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

HW_VLAN_CTAG_FILTER was partially implemented, but not advertised
to Linux.

Complete the implementation of this.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 2f793d4..afe7fa3 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2363,7 +2363,7 @@ static int lan78xx_set_features(struct net_device *netdev,
 		pdata->rfe_ctl &= ~(RFE_CTL_ICMP_COE_ | RFE_CTL_IGMP_COE_);
 	}
 
-	if (features & NETIF_F_HW_VLAN_CTAG_RX)
+	if (features & NETIF_F_HW_VLAN_CTAG_FILTER)
 		pdata->rfe_ctl |= RFE_CTL_VLAN_FILTER_;
 	else
 		pdata->rfe_ctl &= ~RFE_CTL_VLAN_FILTER_;
@@ -2976,6 +2976,9 @@ static int lan78xx_bind(struct lan78xx_net *dev, struct usb_interface *intf)
 	if (DEFAULT_TSO_CSUM_ENABLE)
 		dev->net->features |= NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_SG;
 
+	if (DEFAULT_VLAN_FILTER_ENABLE)
+		dev->net->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+
 	dev->net->hw_features = dev->net->features;
 
 	ret = lan78xx_setup_irq_domain(dev);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-25 13:41 UTC (permalink / raw)
  To: netdev, sowmini.varadhan
  Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar

The RDS core module creates rds_connections based on callbacks
from rds_loop_transport when sending/receiving packets to local
addresses.

These connections will need to be cleaned up when they are
created from a netns that is not init_net, and that netns is deleted.

Add the changes aligned with the changes from
commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management") for
rds_loop_transport

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/rds/connection.c |   11 +++++++++-
 net/rds/loop.c       |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/rds/loop.h       |    2 +
 3 files changed, 68 insertions(+), 1 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index abef75d..cfb0595 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -659,11 +659,19 @@ static void rds_conn_info(struct socket *sock, unsigned int len,
 
 int rds_conn_init(void)
 {
+	int ret;
+
+	ret = rds_loop_net_init(); /* register pernet callback */
+	if (ret)
+		return ret;
+
 	rds_conn_slab = kmem_cache_create("rds_connection",
 					  sizeof(struct rds_connection),
 					  0, 0, NULL);
-	if (!rds_conn_slab)
+	if (!rds_conn_slab) {
+		rds_loop_net_exit();
 		return -ENOMEM;
+	}
 
 	rds_info_register_func(RDS_INFO_CONNECTIONS, rds_conn_info);
 	rds_info_register_func(RDS_INFO_SEND_MESSAGES,
@@ -676,6 +684,7 @@ int rds_conn_init(void)
 
 void rds_conn_exit(void)
 {
+	rds_loop_net_exit(); /* unregister pernet callback */
 	rds_loop_exit();
 
 	WARN_ON(!hlist_empty(rds_conn_hash));
diff --git a/net/rds/loop.c b/net/rds/loop.c
index dac6218..feea1f9 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -33,6 +33,8 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/in.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
 
 #include "rds_single_path.h"
 #include "rds.h"
@@ -40,6 +42,17 @@
 
 static DEFINE_SPINLOCK(loop_conns_lock);
 static LIST_HEAD(loop_conns);
+static atomic_t rds_loop_unloading = ATOMIC_INIT(0);
+
+static void rds_loop_set_unloading(void)
+{
+	atomic_set(&rds_loop_unloading, 1);
+}
+
+static bool rds_loop_is_unloading(struct rds_connection *conn)
+{
+	return atomic_read(&rds_loop_unloading) != 0;
+}
 
 /*
  * This 'loopback' transport is a special case for flows that originate
@@ -165,6 +178,8 @@ void rds_loop_exit(void)
 	struct rds_loop_connection *lc, *_lc;
 	LIST_HEAD(tmp_list);
 
+	rds_loop_set_unloading();
+	synchronize_rcu();
 	/* avoid calling conn_destroy with irqs off */
 	spin_lock_irq(&loop_conns_lock);
 	list_splice(&loop_conns, &tmp_list);
@@ -177,6 +192,46 @@ void rds_loop_exit(void)
 	}
 }
 
+static void rds_loop_kill_conns(struct net *net)
+{
+	struct rds_loop_connection *lc, *_lc;
+	LIST_HEAD(tmp_list);
+
+	spin_lock_irq(&loop_conns_lock);
+	list_for_each_entry_safe(lc, _lc, &loop_conns, loop_node)  {
+		struct net *c_net = read_pnet(&lc->conn->c_net);
+
+		if (net != c_net)
+			continue;
+		list_move_tail(&lc->loop_node, &tmp_list);
+	}
+	spin_unlock_irq(&loop_conns_lock);
+
+	list_for_each_entry_safe(lc, _lc, &tmp_list, loop_node) {
+		WARN_ON(lc->conn->c_passive);
+		rds_conn_destroy(lc->conn);
+	}
+}
+
+static void __net_exit rds_loop_exit_net(struct net *net)
+{
+	rds_loop_kill_conns(net);
+}
+
+static struct pernet_operations rds_loop_net_ops = {
+	.exit = rds_loop_exit_net,
+};
+
+int rds_loop_net_init(void)
+{
+	return register_pernet_device(&rds_loop_net_ops);
+}
+
+void rds_loop_net_exit(void)
+{
+	unregister_pernet_device(&rds_loop_net_ops);
+}
+
 /*
  * This is missing .xmit_* because loop doesn't go through generic
  * rds_send_xmit() and doesn't call rds_recv_incoming().  .listen_stop and
@@ -194,4 +249,5 @@ struct rds_transport rds_loop_transport = {
 	.inc_free		= rds_loop_inc_free,
 	.t_name			= "loopback",
 	.t_type			= RDS_TRANS_LOOP,
+	.t_unloading		= rds_loop_is_unloading,
 };
diff --git a/net/rds/loop.h b/net/rds/loop.h
index 469fa4b..bbc8cdd 100644
--- a/net/rds/loop.h
+++ b/net/rds/loop.h
@@ -5,6 +5,8 @@
 /* loop.c */
 extern struct rds_transport rds_loop_transport;
 
+int rds_loop_net_init(void);
+void rds_loop_net_exit(void);
 void rds_loop_exit(void);
 
 #endif
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 1/7] l2tp: remove pppol2tp_session_close()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

l2tp_core.c verifies that ->session_close() is defined before calling
it. There's no need for a stub.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_ppp.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 55188382845c..eea5d7844473 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -424,12 +424,6 @@ static void pppol2tp_put_sk(struct rcu_head *head)
 	sock_put(ps->__sk);
 }
 
-/* Called by l2tp_core when a session socket is being closed.
- */
-static void pppol2tp_session_close(struct l2tp_session *session)
-{
-}
-
 /* Really kill the session socket. (Called from sock_put() if
  * refcnt == 0.)
  */
@@ -573,7 +567,6 @@ static void pppol2tp_session_init(struct l2tp_session *session)
 	struct dst_entry *dst;
 
 	session->recv_skb = pppol2tp_recv;
-	session->session_close = pppol2tp_session_close;
 #if IS_ENABLED(CONFIG_L2TP_DEBUGFS)
 	session->show = pppol2tp_show;
 #endif
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 7/7] l2tp: make l2tp_xmit_core() return void
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

It always returns 0, and nobody reads the return value anyway.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 88c3001531b4..1ea285bad84b 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1007,8 +1007,8 @@ static int l2tp_build_l2tpv3_header(struct l2tp_session *session, void *buf)
 	return bufp - optr;
 }
 
-static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
-			  struct flowi *fl, size_t data_len)
+static void l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
+			   struct flowi *fl, size_t data_len)
 {
 	struct l2tp_tunnel *tunnel = session->tunnel;
 	unsigned int len = skb->len;
@@ -1050,8 +1050,6 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
 		atomic_long_inc(&tunnel->stats.tx_errors);
 		atomic_long_inc(&session->stats.tx_errors);
 	}
-
-	return 0;
 }
 
 /* If caller requires the skb to have a ppp header, the header must be
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 6/7] l2tp: avoid duplicate l2tp_pernet() calls
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

Replace 'l2tp_pernet(tunnel->l2tp_net)' with 'pn', which has been set
on the preceding line.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 96e31f2ae7cd..88c3001531b4 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -322,8 +322,7 @@ int l2tp_session_register(struct l2tp_session *session,
 
 	if (tunnel->version == L2TP_HDR_VER_3) {
 		pn = l2tp_pernet(tunnel->l2tp_net);
-		g_head = l2tp_session_id_hash_2(l2tp_pernet(tunnel->l2tp_net),
-						session->session_id);
+		g_head = l2tp_session_id_hash_2(pn, session->session_id);
 
 		spin_lock_bh(&pn->l2tp_session_hlist_lock);
 
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 5/7] l2tp: don't export l2tp_tunnel_closeall()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

This function is only used in l2tp_core.c.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 3 +--
 net/l2tp/l2tp_core.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 3adef4c35a3a..96e31f2ae7cd 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1192,7 +1192,7 @@ static void l2tp_tunnel_destruct(struct sock *sk)
 
 /* When the tunnel is closed, all the attached sessions need to go too.
  */
-void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
+static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 {
 	int hash;
 	struct hlist_node *walk;
@@ -1241,7 +1241,6 @@ void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 	}
 	write_unlock_bh(&tunnel->hlist_lock);
 }
-EXPORT_SYMBOL_GPL(l2tp_tunnel_closeall);
 
 /* Tunnel socket destroy hook for UDP encapsulation */
 static void l2tp_udp_encap_destroy(struct sock *sk)
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 0a6e582f84d3..a5c09d3a5698 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -219,7 +219,6 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id,
 int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 			 struct l2tp_tunnel_cfg *cfg);
 
-void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel);
 void l2tp_tunnel_delete(struct l2tp_tunnel *tunnel);
 struct l2tp_session *l2tp_session_create(int priv_size,
 					 struct l2tp_tunnel *tunnel,
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 3/7] l2tp: remove l2tp_tunnel_priv()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

This function, and the associated .priv field, are unused.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index b21c20a4e08f..15e1171ecf7b 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -187,8 +187,6 @@ struct l2tp_tunnel {
 						 * was created by userspace */
 
 	struct work_struct	del_work;
-
-	uint8_t			priv[0];	/* private data */
 };
 
 struct l2tp_nl_cmd_ops {
@@ -198,11 +196,6 @@ struct l2tp_nl_cmd_ops {
 	int (*session_delete)(struct l2tp_session *session);
 };
 
-static inline void *l2tp_tunnel_priv(struct l2tp_tunnel *tunnel)
-{
-	return &tunnel->priv[0];
-}
-
 static inline void *l2tp_session_priv(struct l2tp_session *session)
 {
 	return &session->priv[0];
-- 
2.18.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox