Netdev List
 help / color / mirror / Atom feed
* Re: [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage
From: Chuck Lever @ 2018-06-15 14:23 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Sudeep Holla, Trond Myklebust, Linux NFS Mailing List,
	Anna Schumaker, Bruce Fields, Jeff Layton, David S. Miller,
	netdev, open list, linux-samsung-soc@vger.kernel.org
In-Reply-To: <CAJKOXPfX0CytKcYDaDAYYuCQjk-mcGjFRHfZco-wPQsc4G1agA@mail.gmail.com>



> On Jun 15, 2018, at 10:07 AM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> 
> On Fri, Jun 15, 2018 at 2:53 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> Hi,
>> 
>> On Thu, Jun 7, 2018 at 12:19 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>>> Hi,
>>> 
>>> When booting my boards under recent linux-next, I see failures of systemd:
>>> 
>>> [FAILED] Failed to start Flush Journal to Persistent Storage.
>>> See 'systemctl status systemd-journal-flush.service' for details.
>>>         Starting Create Volatile Files and Directories...
>>> [**    ] A start job is running for Create V… [  223.209289] nfs:
>>> server 192.168.1.10 not responding, still trying
>>> [  223.209377] nfs: server 192.168.1.10 not responding, still trying
>>> 
>>> Effectively the boards fails to boot. Example is here:
>>> https://krzk.eu/#/builders/1/builds/2157
>>> 
>> 
>> I too encountered the same issue.
>> 
>>> This was bisected to:
>>> commit 37ac86c3a76c113619b7d9afe0251bbfc04cb80a
>>> Author: Chuck Lever <chuck.lever@oracle.com>
>>> Date:   Fri May 4 15:34:53 2018 -0400
>>> 
>>>    SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock
>>> 
>>>    alloc_slot is a transport-specific op, but initializing an rpc_rqst
>>>    is common to all transports. In addition, the only part of initial-
>>>    izing an rpc_rqst that needs serialization is getting a fresh XID.
>>> 
>>>    Move rpc_rqst initialization to common code in preparation for
>>>    adding a transport-specific alloc_slot to xprtrdma.
>>> 
>>>    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> 
>> 
>> Unfortunately, spent time to bisect independently without seeing this
>> report and got the same culprit.
>> 
>>> 
>>> Bisect log attached. Full configuration:
>>> 1. exynos_defconfig
>>> 2. ARMv7, octa-core, Exynos5422 and Exynos4412 (Odroid XU3, U3 and others)
>>> 3. NFSv4 client (from Raspberry Pi)
>>> 
>> 
>> Yes the issue is seen only with NFSv4 client and with latest systemd I think.
>> My Ubuntu 16.04(32bit FS) is  boots fine while 18.04 has the above issue.
>> Passing nfsv3 in kernel command line makes it work again.
> 
> Thanks for reply!
> 
> I test it on systemd versions 236 and 238... and it fails on both.
> However one board passes always - it is Odroid HC1 with same core
> configuration as described before. Probably there is some different SW
> package on it.
> 
>>> Let me know if you need any more information.
>>> 
>> 
>> Also I was observing this issue with Linus master branch from
>> the time the above patch was merged until today. The issue
>> is no longer seen since this morning however I just enabled lockdep
>> and got these messages.
> 
> All recent linux-next fail. Today's Linus' tree (4c5e8fc62d6a ("Merge
> tag 'linux-kselftest-4.18-rc1-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"))
> managed to get up on one board but stuck on different board with the
> same issue.
> 
> I am quite surprised that there is no response from the author of the
> commit and this was just moved from next (while failing) to Linus'
> tree... bringing the issue to mainline now.

Sorry. This morning is the first time I've seen this report, which was
not To: or Cc'd to me.

Since I don't have access to this kind of hardware, I will have to ask
for your help to perform basic troubleshooting.

Can we start by capturing the network traffic that occurs while you
reproduce the problem? Use tshark or tcpdump on your NFS server, filter
on the IP of the client, and send me (or the list) the raw pcap file.


--
Chuck Lever

^ permalink raw reply

* Re: [PATCH net 1/4] ipv6: Only emit append events for appended routes
From: David Ahern @ 2018-06-15 14:21 UTC (permalink / raw)
  To: Ido Schimmel, netdev; +Cc: davem, jiri, petrm, mlxsw
In-Reply-To: <20180615132338.14241-2-idosch@mellanox.com>

On 6/15/18 7:23 AM, Ido Schimmel wrote:
> Current code will emit an append event in the FIB notification chain for
> any route added with NLM_F_APPEND set, even if the route was not
> appended to any existing route.
> 
> This is inconsistent with IPv4 where such an event is only emitted when
> the new route is appended after an existing one.
> 
> Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
> handle these events.
> 
> Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  net/ipv6/ip6_fib.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage
From: Krzysztof Kozlowski @ 2018-06-15 14:10 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Trond Myklebust, linux-nfs, Anna Schumaker, J. Bruce Fields,
	Jeff Layton, David S. Miller, netdev, open list,
	linux-samsung-soc@vger.kernel.org
In-Reply-To: <CAJKOXPfX0CytKcYDaDAYYuCQjk-mcGjFRHfZco-wPQsc4G1agA@mail.gmail.com>

On Fri, Jun 15, 2018 at 4:07 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>
> I test it on systemd versions 236 and 238... and it fails on both.
> However one board passes always - it is Odroid HC1 with same core
> configuration as described before. Probably there is some different SW
> package on it.
>
>>> Let me know if you need any more information.
>>>
>>
>> Also I was observing this issue with Linus master branch from
>> the time the above patch was merged until today. The issue
>> is no longer seen since this morning however I just enabled lockdep
>> and got these messages.

About this lockdep error - I see it for very long time and it appears
always (see end of serial log here:
https://krzk.eu/#/builders/21/builds/707/steps/10/logs/serial0). It
started much before this issue came so I am not sure whether it is
related.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage
From: Krzysztof Kozlowski @ 2018-06-15 14:07 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Trond Myklebust, linux-nfs, Anna Schumaker, J. Bruce Fields,
	Jeff Layton, David S. Miller, netdev, open list,
	linux-samsung-soc@vger.kernel.org
In-Reply-To: <CAPKp9uZ79ybkv7dMpcy2GQ9L+rWk+vdExQ48diWWQPOUdDQtSQ@mail.gmail.com>

On Fri, Jun 15, 2018 at 2:53 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
> Hi,
>
> On Thu, Jun 7, 2018 at 12:19 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>> Hi,
>>
>> When booting my boards under recent linux-next, I see failures of systemd:
>>
>> [FAILED] Failed to start Flush Journal to Persistent Storage.
>> See 'systemctl status systemd-journal-flush.service' for details.
>>          Starting Create Volatile Files and Directories...
>> [**    ] A start job is running for Create V… [  223.209289] nfs:
>> server 192.168.1.10 not responding, still trying
>> [  223.209377] nfs: server 192.168.1.10 not responding, still trying
>>
>> Effectively the boards fails to boot. Example is here:
>> https://krzk.eu/#/builders/1/builds/2157
>>
>
> I too encountered the same issue.
>
>> This was bisected to:
>> commit 37ac86c3a76c113619b7d9afe0251bbfc04cb80a
>> Author: Chuck Lever <chuck.lever@oracle.com>
>> Date:   Fri May 4 15:34:53 2018 -0400
>>
>>     SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock
>>
>>     alloc_slot is a transport-specific op, but initializing an rpc_rqst
>>     is common to all transports. In addition, the only part of initial-
>>     izing an rpc_rqst that needs serialization is getting a fresh XID.
>>
>>     Move rpc_rqst initialization to common code in preparation for
>>     adding a transport-specific alloc_slot to xprtrdma.
>>
>>     Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>     Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>
>
> Unfortunately, spent time to bisect independently without seeing this
> report and got the same culprit.
>
>>
>> Bisect log attached. Full configuration:
>> 1. exynos_defconfig
>> 2. ARMv7, octa-core, Exynos5422 and Exynos4412 (Odroid XU3, U3 and others)
>> 3. NFSv4 client (from Raspberry Pi)
>>
>
> Yes the issue is seen only with NFSv4 client and with latest systemd I think.
> My Ubuntu 16.04(32bit FS) is  boots fine while 18.04 has the above issue.
> Passing nfsv3 in kernel command line makes it work again.

Thanks for reply!

I test it on systemd versions 236 and 238... and it fails on both.
However one board passes always - it is Odroid HC1 with same core
configuration as described before. Probably there is some different SW
package on it.

>> Let me know if you need any more information.
>>
>
> Also I was observing this issue with Linus master branch from
> the time the above patch was merged until today. The issue
> is no longer seen since this morning however I just enabled lockdep
> and got these messages.

All recent linux-next fail. Today's Linus' tree (4c5e8fc62d6a ("Merge
tag 'linux-kselftest-4.18-rc1-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"))
managed to get up on one board but stuck on different board with the
same issue.

I am quite surprised that there is no response from the author of the
commit and this was just moved from next (while failing) to Linus'
tree... bringing the issue to mainline now.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH] bpf: attach type BPF_LIRC_MODE2 should not depend on CONFIG_CGROUP_BPF
From: Daniel Borkmann @ 2018-06-15 14:03 UTC (permalink / raw)
  To: Sean Young
  Cc: Y Song, Matthias Reichl, linux-media, LKML, Alexei Starovoitov,
	Mauro Carvalho Chehab, netdev, Devin Heitmueller, Quentin Monnet
In-Reply-To: <20180614184207.khwcmwmj4duous4c@gofer.mess.org>

Hi Sean,

On 06/14/2018 08:42 PM, Sean Young wrote:
> If the kernel is compiled with CONFIG_CGROUP_BPF not enabled, it is not
> possible to attach, detach or query IR BPF programs to /dev/lircN devices,
> making them impossible to use. For embedded devices, it should be possible
> to use IR decoding without cgroups or CONFIG_CGROUP_BPF enabled.
> 
> This change requires some refactoring, since bpf_prog_{attach,detach,query}
> functions are now always compiled, but their code paths for cgroups need
> moving out. Rather than a #ifdef CONFIG_CGROUP_BPF in kernel/bpf/syscall.c,
> moving them to kernel/bpf/cgroup.c does not require #ifdefs since that file
> is already conditionally compiled.
> 
> Signed-off-by: Sean Young <sean@mess.org>

Just two minor edits below from my side:

>  include/linux/bpf-cgroup.h |  31 +++++++++++
>  kernel/bpf/cgroup.c        | 110 +++++++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c       | 105 ++---------------------------------
>  3 files changed, 145 insertions(+), 101 deletions(-)
> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 975fb4cf1bb7..ee67cd35f426 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -188,12 +188,43 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
>  									      \
>  	__ret;								      \
>  })
> +int sockmap_get_from_fd(const union bpf_attr *attr, int type, bool attach);
> +int cgroup_bpf_prog_attach(const union bpf_attr *attr,
> +			   enum bpf_prog_type ptype);
> +int cgroup_bpf_prog_detach(const union bpf_attr *attr,
> +			   enum bpf_prog_type ptype);
> +int cgroup_bpf_prog_query(const union bpf_attr *attr,
> +			  union bpf_attr __user *uattr);
>  #else
>  
>  struct cgroup_bpf {};
>  static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
>  static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
>  
> +static inline int sockmap_get_from_fd(const union bpf_attr *attr,
> +				      int type, bool attach)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int cgroup_bpf_prog_attach(const union bpf_attr *attr,
> +					 enum bpf_prog_type ptype)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int cgroup_bpf_prog_detach(const union bpf_attr *attr,
> +					 enum bpf_prog_type ptype)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
> +					union bpf_attr __user *uattr)
> +{
> +	return -EINVAL;
> +}
> +
>  #define cgroup_bpf_enabled (0)
>  #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
>  #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index f7c00bd6f8e4..d6e18f9dc0c4 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -428,6 +428,116 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  	return ret;
>  }
>  
> +int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
> +				      enum bpf_attach_type attach_type)
> +{
> +	switch (prog->type) {
> +	case BPF_PROG_TYPE_CGROUP_SOCK:
> +	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
> +		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
> +	default:
> +		return 0;
> +	}
> +}

This one is rather BPF core, cgroup progs only happen to use it. Could
you either move it as static inline into include/linux/bpf.h or leave it
in kernel/bpf/syscall.c? In any case the #ifdef CONFIG_CGROUP_BPF wrapper
could probably be dropped as well from it.

> +int sockmap_get_from_fd(const union bpf_attr *attr, int type, bool attach)
> +{
> +	struct bpf_prog *prog = NULL;
> +	int ufd = attr->target_fd;
> +	struct bpf_map *map;
> +	struct fd f;
> +	int err;
> +
> +	f = fdget(ufd);
> +	map = __bpf_map_get(f);
> +	if (IS_ERR(map))
> +		return PTR_ERR(map);
> +
> +	if (attach) {
> +		prog = bpf_prog_get_type(attr->attach_bpf_fd, type);
> +		if (IS_ERR(prog)) {
> +			fdput(f);
> +			return PTR_ERR(prog);
> +		}
> +	}
> +
> +	err = sock_map_prog(map, prog, attr->attach_type);
> +	if (err) {
> +		fdput(f);
> +		if (prog)
> +			bpf_prog_put(prog);
> +		return err;
> +	}
> +
> +	fdput(f);
> +	return 0;
> +}

And this one should rather end up in kernel/bpf/sockmap.c to have it with
the rest of sockmap code. Moving into cgroup.c would rather be a in the
wrong place given what the code does.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: David Woodhouse @ 2018-06-15 13:44 UTC (permalink / raw)
  To: Eric Dumazet, Elena Reshetova, netdev
  Cc: Krzysztof Mazur, Kevin Darbyshire-Bryant, 3chas3, Mathias Kresin
In-Reply-To: <3cbab0cc-230d-6980-6748-67f777c8ad22@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3273 bytes --]



On Fri, 2018-06-15 at 06:39 -0700, Eric Dumazet wrote:
> 
> On 06/15/2018 06:27 AM, Eric Dumazet wrote:
> > 
> > 
> > 
> > On 06/15/2018 05:29 AM, David Woodhouse wrote:
> > > 
> > > On Fri, 2017-06-30 at 13:08 +0300, Elena Reshetova wrote:
> > > > 
> > > > diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
> > > > index c1da539..4d97a89 100644
> > > > --- a/include/linux/atmdev.h
> > > > +++ b/include/linux/atmdev.h
> > > > @@ -254,7 +254,7 @@ static inline void atm_return(struct atm_vcc *vcc,int truesize)
> > > >  
> > > >  static inline int atm_may_send(struct atm_vcc *vcc,unsigned int size)
> > > >  {
> > > > -       return (size + atomic_read(&sk_atm(vcc)->sk_wmem_alloc)) <
> > > > +       return (size + refcount_read(&sk_atm(vcc)->sk_wmem_alloc)) <
> > > >                sk_atm(vcc)->sk_sndbuf;
> > > >  }
> > > 
> > > Hm, this (commit 14afee4b6092fd) may have broken PPPoATM. I did spend a
> > > while staring hard at my own commit 9d02daf754238 which introduced
> > > pppoatm_may_send(), but it's actually atm_may_send() which is never
> > > allowing packets to be pushed.
> > > 
> > > Debugging (which is ongoing) has so far shown that we are accounting
> > > for a packet in pppoatm_send() which has skb->truesize==0x1c0, and
> > > later end up in pppoatm_pop()→atm_raw_pop() with skb->truesize==0x2c0.
> > > 
> > > This was always harmless before, but now it's a refcount_t it appears
> > > to underflow and go into its "screw you" mode and never let any more
> > > packets get sent.
> > > 
> > > I'm staring hard at the special case in pskb_expand_head() to *not*
> > > change skb->truesize under certain circumstances, and wondering if that
> > > (is, should be) covering the case of ATM skbs:
> > > 
> > >         /* It is not generally safe to change skb->truesize.
> > >          * For the moment, we really care of rx path, or
> > >          * when skb is orphaned (not attached to a socket).
> > >          */
> > >         if (!skb->sk || skb->destructor == sock_edemux)
> > >                 skb->truesize += size - osize;
> > > 
> > > Failing that, maybe we should copy the accounted value of skb->truesize 
> > > into the struct skb_atm_data in skb->cb at the time we add it to
> > > sk_wmem_alloc — and then subtract *that* value from sk_wmem_alloc in
> > > atm_raw_pop() instead of the then current value of skb->truesize.
> > > 
> > > Suggestions?
> > > 
> > Maybe ATM should make sure skb->sk is set ?

Yeah... I don't think we want sock_wfree() as a destructor, unless we
also fix up atm_pop_raw() not to do the same refcount_sub() and cope
with some other details, but it could probably be workable with
sufficient caffeine.


> > something like the following :
> > 
> Or simply use a new field in ATM_SKB(skb) to remember a stable
> truesize used in both sides (add/sub)

Right, that was my second suggestion ("copy the accounted value...").

It's a bit of a hack, and I think that actually *using* sock_wfree()
instead of what's currently in atm_pop_raw() would be the better
solution. Does anyone remember why we didn't do that in the first
place?


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply

* [PATCH net 2/2] l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
From: Guillaume Nault @ 2018-06-15 13:39 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529065935.git.g.nault@alphalink.fr>

pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
'session' may be an Ethernet pseudo-wire.

However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
assumes l2tp_session_priv() points to a pppol2tp_session structure. For
an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
structure instead, making pppol2tp_session_ioctl() access invalid
memory.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_ppp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index f429fed06a1e..55188382845c 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1201,7 +1201,7 @@ static int pppol2tp_tunnel_ioctl(struct l2tp_tunnel *tunnel,
 				l2tp_session_get(sock_net(sk), tunnel,
 						 stats.session_id);
 
-			if (session) {
+			if (session && session->pwtype == L2TP_PWTYPE_PPP) {
 				err = pppol2tp_session_ioctl(session, cmd,
 							     arg);
 				l2tp_session_dec_refcount(session);
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 1/2] l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
From: Guillaume Nault @ 2018-06-15 13:39 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529065935.git.g.nault@alphalink.fr>

The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
found there. However, l2tp_netlink accepts creating Ethernet sessions
regardless of the underlying tunnel version.

This confuses pppol2tp_seq_session_show(), which expects that
l2tp_session_priv() returns a pppol2tp_session structure. When the
session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
instead. This leads to invalid memory access when
pppol2tp_session_get_sock() later tries to dereference ps->sk.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_netlink.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 6616c9fd292f..5b9900889e31 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -553,6 +553,12 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, struct genl_info *inf
 		goto out_tunnel;
 	}
 
+	/* L2TPv2 only accepts PPP pseudo-wires */
+	if (tunnel->version == 2 && cfg.pw_type != L2TP_PWTYPE_PPP) {
+		ret = -EPROTONOSUPPORT;
+		goto out_tunnel;
+	}
+
 	if (tunnel->version > 2) {
 		if (info->attrs[L2TP_ATTR_DATA_SEQ])
 			cfg.data_seq = nla_get_u8(info->attrs[L2TP_ATTR_DATA_SEQ]);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: Eric Dumazet @ 2018-06-15 13:39 UTC (permalink / raw)
  To: Eric Dumazet, David Woodhouse, Elena Reshetova, netdev
  Cc: Krzysztof Mazur, Kevin Darbyshire-Bryant, chas, Mathias Kresin
In-Reply-To: <9805cd6e-1c21-f7b4-cc0e-d4a343b8e3d8@gmail.com>



On 06/15/2018 06:27 AM, Eric Dumazet wrote:
> 
> 
> On 06/15/2018 05:29 AM, David Woodhouse wrote:
>> On Fri, 2017-06-30 at 13:08 +0300, Elena Reshetova wrote:
>>> diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
>>> index c1da539..4d97a89 100644
>>> --- a/include/linux/atmdev.h
>>> +++ b/include/linux/atmdev.h
>>> @@ -254,7 +254,7 @@ static inline void atm_return(struct atm_vcc *vcc,int truesize)
>>>  
>>>  static inline int atm_may_send(struct atm_vcc *vcc,unsigned int size)
>>>  {
>>> -       return (size + atomic_read(&sk_atm(vcc)->sk_wmem_alloc)) <
>>> +       return (size + refcount_read(&sk_atm(vcc)->sk_wmem_alloc)) <
>>>                sk_atm(vcc)->sk_sndbuf;
>>>  }
>>
>>
>> Hm, this (commit 14afee4b6092fd) may have broken PPPoATM. I did spend a
>> while staring hard at my own commit 9d02daf754238 which introduced
>> pppoatm_may_send(), but it's actually atm_may_send() which is never
>> allowing packets to be pushed.
>>
>> Debugging (which is ongoing) has so far shown that we are accounting
>> for a packet in pppoatm_send() which has skb->truesize==0x1c0, and
>> later end up in pppoatm_pop()→atm_raw_pop() with skb->truesize==0x2c0.
>>
>> This was always harmless before, but now it's a refcount_t it appears
>> to underflow and go into its "screw you" mode and never let any more
>> packets get sent.
>>
>> I'm staring hard at the special case in pskb_expand_head() to *not*
>> change skb->truesize under certain circumstances, and wondering if that
>> (is, should be) covering the case of ATM skbs:
>>
>>         /* It is not generally safe to change skb->truesize.
>>          * For the moment, we really care of rx path, or
>>          * when skb is orphaned (not attached to a socket).
>>          */
>>         if (!skb->sk || skb->destructor == sock_edemux)
>>                 skb->truesize += size - osize;
>>
>> Failing that, maybe we should copy the accounted value of skb->truesize 
>> into the struct skb_atm_data in skb->cb at the time we add it to
>> sk_wmem_alloc — and then subtract *that* value from sk_wmem_alloc in
>> atm_raw_pop() instead of the then current value of skb->truesize.
>>
>> Suggestions?
>>
> 
> Maybe ATM should make sure skb->sk is set ?
> 
> something like the following :
> 

Or simply use a new field in ATM_SKB(skb) to remember a stable truesize used in both sides (add/sub)

^ permalink raw reply

* [PATCH net 0/2] l2tp: l2tp_ppp must ignore non-PPP sessions
From: Guillaume Nault @ 2018-06-15 13:39 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman

The original L2TP code was written for version 2 of the protocol, which
could only carry PPP sessions. Then L2TPv3 generalised the protocol so that
it could transport different kinds of pseudo-wires. But parts of the
l2tp_ppp module still break in presence of non-PPP sessions.

Assuming L2TPv2 tunnels can only transport PPP sessions is right, but
l2tp_netlink failed to ensure that (fixed in patch 1).
When retrieving a session from an arbitrary tunnel, l2tp_ppp needs to
filter out non-PPP sessions (last occurrence fixed in patch 2).


Guillaume Nault (2):
  l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
  l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()

 net/l2tp/l2tp_netlink.c | 6 ++++++
 net/l2tp/l2tp_ppp.c     | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.17.1

^ permalink raw reply

* Re: [PATCH] net_sched: blackhole: tell upper qdisc about dropped packets
From: Konstantin Khlebnikov @ 2018-06-15 13:21 UTC (permalink / raw)
  To: Eric Dumazet, netdev, David S. Miller
  Cc: Cong Wang, Jiri Pirko, Jamal Hadi Salim
In-Reply-To: <e654c245-7745-f49a-c995-7071e0e57153@gmail.com>

On 15.06.2018 16:13, Eric Dumazet wrote:
> 
> 
> On 06/15/2018 03:27 AM, Konstantin Khlebnikov wrote:
>> When blackhole is used on top of classful qdisc like hfsc it breaks
>> qlen and backlog counters because packets are disappear without notice.
>>
>> In HFSC non-zero qlen while all classes are inactive triggers warning:
>> WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
>> and schedules watchdog work endlessly.
>>
>> This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
>> this flag tells upper layer: this packet is gone and isn't queued.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---
>>   net/sched/sch_blackhole.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c
>> index c98a61e980ba..9c4c2bb547d7 100644
>> --- a/net/sched/sch_blackhole.c
>> +++ b/net/sched/sch_blackhole.c
>> @@ -21,7 +21,7 @@ static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>>   			     struct sk_buff **to_free)
>>   {
>>   	qdisc_drop(skb, sch, to_free);
>> -	return NET_XMIT_SUCCESS;
>> +	return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> 
> Why do not we use instead :
> 
> 	return qdisc_drop(skb, sch, to_free);
> 
> Although noop_enqueue() seems to use :
> 
> 	return NET_XMIT_CN;
> 
> Oh well.
> 
> 

I suppose "blackhole" should work like "successful" xmit, but counted as drop.

^ permalink raw reply

* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: Eric Dumazet @ 2018-06-15 13:27 UTC (permalink / raw)
  To: David Woodhouse, Elena Reshetova, netdev
  Cc: Krzysztof Mazur, Kevin Darbyshire-Bryant, chas, Mathias Kresin
In-Reply-To: <1529065762.27158.40.camel@infradead.org>



On 06/15/2018 05:29 AM, David Woodhouse wrote:
> On Fri, 2017-06-30 at 13:08 +0300, Elena Reshetova wrote:
>> diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
>> index c1da539..4d97a89 100644
>> --- a/include/linux/atmdev.h
>> +++ b/include/linux/atmdev.h
>> @@ -254,7 +254,7 @@ static inline void atm_return(struct atm_vcc *vcc,int truesize)
>>  
>>  static inline int atm_may_send(struct atm_vcc *vcc,unsigned int size)
>>  {
>> -       return (size + atomic_read(&sk_atm(vcc)->sk_wmem_alloc)) <
>> +       return (size + refcount_read(&sk_atm(vcc)->sk_wmem_alloc)) <
>>                sk_atm(vcc)->sk_sndbuf;
>>  }
> 
> 
> Hm, this (commit 14afee4b6092fd) may have broken PPPoATM. I did spend a
> while staring hard at my own commit 9d02daf754238 which introduced
> pppoatm_may_send(), but it's actually atm_may_send() which is never
> allowing packets to be pushed.
> 
> Debugging (which is ongoing) has so far shown that we are accounting
> for a packet in pppoatm_send() which has skb->truesize==0x1c0, and
> later end up in pppoatm_pop()→atm_raw_pop() with skb->truesize==0x2c0.
> 
> This was always harmless before, but now it's a refcount_t it appears
> to underflow and go into its "screw you" mode and never let any more
> packets get sent.
> 
> I'm staring hard at the special case in pskb_expand_head() to *not*
> change skb->truesize under certain circumstances, and wondering if that
> (is, should be) covering the case of ATM skbs:
> 
>         /* It is not generally safe to change skb->truesize.
>          * For the moment, we really care of rx path, or
>          * when skb is orphaned (not attached to a socket).
>          */
>         if (!skb->sk || skb->destructor == sock_edemux)
>                 skb->truesize += size - osize;
> 
> Failing that, maybe we should copy the accounted value of skb->truesize 
> into the struct skb_atm_data in skb->cb at the time we add it to
> sk_wmem_alloc — and then subtract *that* value from sk_wmem_alloc in
> atm_raw_pop() instead of the then current value of skb->truesize.
> 
> Suggestions?
> 

Maybe ATM should make sure skb->sk is set ?

something like the following :

diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index 21d9d341a6199255a017437954e4b688f1ba5bfd..4863d02939a26ce0a5816da86779336255d550bd 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -350,7 +350,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
                return 1;
        }
 
-       refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
+       skb_set_owner_w(skb, &sk_atm(ATM_SKB(skb)->vcc));
        ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
        pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
                 skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);

Or something more sophisticated, but this might need more thinking

diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index 21d9d341a6199255a017437954e4b688f1ba5bfd..80902a0b56a1d15d2851a07f067490d99136d214 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -350,7 +350,10 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
                return 1;
        }
 
-       refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
+       if (!skb->sk)
+               skb_set_owner_w(skb, &sk_atm(ATM_SKB(skb)->vcc));
+-      else
+               refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
        ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
        pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
                 skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);

^ permalink raw reply related

* [PATCH net 4/4] mlxsw: spectrum_switchdev: Fix port_vlan refcounting
From: Ido Schimmel @ 2018-06-15 13:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180615132338.14241-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

Switchdev notifications for addition of SWITCHDEV_OBJ_ID_PORT_VLAN are
distributed not only on clean addition, but also when flags on an
existing VLAN are changed. mlxsw_sp_bridge_port_vlan_add() calls
mlxsw_sp_port_vlan_get() to get at the port_vlan in question, which
implicitly references the object. This then leads to discrepancies in
reference counting when the VLAN is removed. spectrum.c warns about the
problem when the module is removed:

[13578.493090] WARNING: CPU: 0 PID: 2454 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2973 mlxsw_sp_port_remove+0xfd/0x110 [mlxsw_spectrum]
[...]
[13578.627106] Call Trace:
[13578.629617]  mlxsw_sp_fini+0x2a/0xe0 [mlxsw_spectrum]
[13578.634748]  mlxsw_core_bus_device_unregister+0x3e/0x130 [mlxsw_core]
[13578.641290]  mlxsw_pci_remove+0x13/0x40 [mlxsw_pci]
[13578.646238]  pci_device_remove+0x31/0xb0
[13578.650244]  device_release_driver_internal+0x14f/0x220
[13578.655562]  driver_detach+0x32/0x70
[13578.659183]  bus_remove_driver+0x47/0xa0
[13578.663134]  pci_unregister_driver+0x1e/0x80
[13578.667486]  mlxsw_sp_module_exit+0xc/0x3fa [mlxsw_spectrum]
[13578.673207]  __x64_sys_delete_module+0x13b/0x1e0
[13578.677888]  ? exit_to_usermode_loop+0x78/0x80
[13578.682374]  do_syscall_64+0x39/0xe0
[13578.685976]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by putting the port_vlan when mlxsw_sp_port_vlan_bridge_join()
determines it's a flag-only change.

Fixes: b3529af6bb0d ("spectrum: Reference count VLAN entries")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index e97652c40d13..eea5666a86b2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1018,8 +1018,10 @@ mlxsw_sp_port_vlan_bridge_join(struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan,
 	int err;
 
 	/* No need to continue if only VLAN flags were changed */
-	if (mlxsw_sp_port_vlan->bridge_port)
+	if (mlxsw_sp_port_vlan->bridge_port) {
+		mlxsw_sp_port_vlan_put(mlxsw_sp_port_vlan);
 		return 0;
+	}
 
 	err = mlxsw_sp_port_vlan_fid_join(mlxsw_sp_port_vlan, bridge_port);
 	if (err)
-- 
2.14.4

^ permalink raw reply related

* [PATCH net 3/4] mlxsw: spectrum_router: Align with new route replace logic
From: Ido Schimmel @ 2018-06-15 13:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180615132338.14241-1-idosch@mellanox.com>

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route replace logic so that the
first matching route (i.e., same metric) is replaced.

Have mlxsw replace the first matching route as well.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c   | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index c8956ab224ea..6aaaf3d9ba31 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -4756,12 +4756,6 @@ static void mlxsw_sp_rt6_destroy(struct mlxsw_sp_rt6 *mlxsw_sp_rt6)
 	kfree(mlxsw_sp_rt6);
 }
 
-static bool mlxsw_sp_fib6_rt_can_mp(const struct fib6_info *rt)
-{
-	/* RTF_CACHE routes are ignored */
-	return (rt->fib6_flags & (RTF_GATEWAY | RTF_ADDRCONF)) == RTF_GATEWAY;
-}
-
 static struct fib6_info *
 mlxsw_sp_fib6_entry_rt(const struct mlxsw_sp_fib6_entry *fib6_entry)
 {
@@ -5169,7 +5163,7 @@ static struct mlxsw_sp_fib6_entry *
 mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
 			      const struct fib6_info *nrt, bool replace)
 {
-	struct mlxsw_sp_fib6_entry *fib6_entry, *fallback = NULL;
+	struct mlxsw_sp_fib6_entry *fib6_entry;
 
 	list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
 		struct fib6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
@@ -5178,18 +5172,13 @@ mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
 			continue;
 		if (rt->fib6_table->tb6_id != nrt->fib6_table->tb6_id)
 			break;
-		if (replace && rt->fib6_metric == nrt->fib6_metric) {
-			if (mlxsw_sp_fib6_rt_can_mp(rt) ==
-			    mlxsw_sp_fib6_rt_can_mp(nrt))
-				return fib6_entry;
-			if (mlxsw_sp_fib6_rt_can_mp(nrt))
-				fallback = fallback ?: fib6_entry;
-		}
+		if (replace && rt->fib6_metric == nrt->fib6_metric)
+			return fib6_entry;
 		if (rt->fib6_metric > nrt->fib6_metric)
-			return fallback ?: fib6_entry;
+			return fib6_entry;
 	}
 
-	return fallback;
+	return NULL;
 }
 
 static int
-- 
2.14.4

^ permalink raw reply related

* [PATCH net 2/4] mlxsw: spectrum_router: Allow appending to dev-only routes
From: Ido Schimmel @ 2018-06-15 13:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180615132338.14241-1-idosch@mellanox.com>

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route append logic so that
dev-only routes can be appended and not only gatewayed routes.

Align mlxsw with the new behaviour.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 27 +++++++++++++++-------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 77b2adb29341..c8956ab224ea 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -4771,11 +4771,11 @@ mlxsw_sp_fib6_entry_rt(const struct mlxsw_sp_fib6_entry *fib6_entry)
 
 static struct mlxsw_sp_fib6_entry *
 mlxsw_sp_fib6_node_mp_entry_find(const struct mlxsw_sp_fib_node *fib_node,
-				 const struct fib6_info *nrt, bool replace)
+				 const struct fib6_info *nrt, bool append)
 {
 	struct mlxsw_sp_fib6_entry *fib6_entry;
 
-	if (!mlxsw_sp_fib6_rt_can_mp(nrt) || replace)
+	if (!append)
 		return NULL;
 
 	list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
@@ -4790,8 +4790,7 @@ mlxsw_sp_fib6_node_mp_entry_find(const struct mlxsw_sp_fib_node *fib_node,
 			break;
 		if (rt->fib6_metric < nrt->fib6_metric)
 			continue;
-		if (rt->fib6_metric == nrt->fib6_metric &&
-		    mlxsw_sp_fib6_rt_can_mp(rt))
+		if (rt->fib6_metric == nrt->fib6_metric)
 			return fib6_entry;
 		if (rt->fib6_metric > nrt->fib6_metric)
 			break;
@@ -5316,7 +5315,8 @@ static void mlxsw_sp_fib6_entry_replace(struct mlxsw_sp *mlxsw_sp,
 }
 
 static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
-				    struct fib6_info *rt, bool replace)
+				    struct fib6_info *rt, bool replace,
+				    bool append)
 {
 	struct mlxsw_sp_fib6_entry *fib6_entry;
 	struct mlxsw_sp_fib_node *fib_node;
@@ -5342,7 +5342,7 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
 	/* Before creating a new entry, try to append route to an existing
 	 * multipath entry.
 	 */
-	fib6_entry = mlxsw_sp_fib6_node_mp_entry_find(fib_node, rt, replace);
+	fib6_entry = mlxsw_sp_fib6_node_mp_entry_find(fib_node, rt, append);
 	if (fib6_entry) {
 		err = mlxsw_sp_fib6_entry_nexthop_add(mlxsw_sp, fib6_entry, rt);
 		if (err)
@@ -5350,6 +5350,14 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
 		return 0;
 	}
 
+	/* We received an append event, yet did not find any route to
+	 * append to.
+	 */
+	if (WARN_ON(append)) {
+		err = -EINVAL;
+		goto err_fib6_entry_append;
+	}
+
 	fib6_entry = mlxsw_sp_fib6_entry_create(mlxsw_sp, fib_node, rt);
 	if (IS_ERR(fib6_entry)) {
 		err = PTR_ERR(fib6_entry);
@@ -5367,6 +5375,7 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
 err_fib6_node_entry_link:
 	mlxsw_sp_fib6_entry_destroy(mlxsw_sp, fib6_entry);
 err_fib6_entry_create:
+err_fib6_entry_append:
 err_fib6_entry_nexthop_add:
 	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
 	return err;
@@ -5717,7 +5726,7 @@ static void mlxsw_sp_router_fib6_event_work(struct work_struct *work)
 	struct mlxsw_sp_fib_event_work *fib_work =
 		container_of(work, struct mlxsw_sp_fib_event_work, work);
 	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
-	bool replace;
+	bool replace, append;
 	int err;
 
 	rtnl_lock();
@@ -5728,8 +5737,10 @@ static void mlxsw_sp_router_fib6_event_work(struct work_struct *work)
 	case FIB_EVENT_ENTRY_APPEND: /* fall through */
 	case FIB_EVENT_ENTRY_ADD:
 		replace = fib_work->event == FIB_EVENT_ENTRY_REPLACE;
+		append = fib_work->event == FIB_EVENT_ENTRY_APPEND;
 		err = mlxsw_sp_router_fib6_add(mlxsw_sp,
-					       fib_work->fen6_info.rt, replace);
+					       fib_work->fen6_info.rt, replace,
+					       append);
 		if (err)
 			mlxsw_sp_router_fib_abort(mlxsw_sp);
 		mlxsw_sp_rt6_release(fib_work->fen6_info.rt);
-- 
2.14.4

^ permalink raw reply related

* [PATCH net 1/4] ipv6: Only emit append events for appended routes
From: Ido Schimmel @ 2018-06-15 13:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180615132338.14241-1-idosch@mellanox.com>

Current code will emit an append event in the FIB notification chain for
any route added with NLM_F_APPEND set, even if the route was not
appended to any existing route.

This is inconsistent with IPv4 where such an event is only emitted when
the new route is appended after an existing one.

Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
handle these events.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/ipv6/ip6_fib.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 7aa4c41a3bd9..39d1d487eca2 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -934,6 +934,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
 {
 	struct fib6_info *leaf = rcu_dereference_protected(fn->leaf,
 				    lockdep_is_held(&rt->fib6_table->tb6_lock));
+	enum fib_event_type event = FIB_EVENT_ENTRY_ADD;
 	struct fib6_info *iter = NULL, *match = NULL;
 	struct fib6_info __rcu **ins;
 	int replace = (info->nlh &&
@@ -1013,6 +1014,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
 				       "Can not append to a REJECT route");
 			return -EINVAL;
 		}
+		event = FIB_EVENT_ENTRY_APPEND;
 		rt->fib6_nsiblings = match->fib6_nsiblings;
 		list_add_tail(&rt->fib6_siblings, &match->fib6_siblings);
 		match->fib6_nsiblings++;
@@ -1034,15 +1036,12 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
 	 *	insert node
 	 */
 	if (!replace) {
-		enum fib_event_type event;
-
 		if (!add)
 			pr_warn("NLM_F_CREATE should be set when creating new route\n");
 
 add:
 		nlflags |= NLM_F_CREATE;
 
-		event = append ? FIB_EVENT_ENTRY_APPEND : FIB_EVENT_ENTRY_ADD;
 		err = call_fib6_entry_notifiers(info->nl_net, event, rt,
 						extack);
 		if (err)
-- 
2.14.4

^ permalink raw reply related

* [PATCH net 0/4] mlxsw: IPv6 and reference counting fixes
From: Ido Schimmel @ 2018-06-15 13:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, dsahern, mlxsw, Ido Schimmel

Hi,

The first three patches fix a mismatch between the new IPv6 behavior
introduced in commit f34436a43092 ("net/ipv6: Simplify route replace and
appending into multipath route") and mlxsw. The patches allow the driver
to support multipathing in IPv6 overlays with GRE tunnel devices. A
selftest will be submitted when net-next opens.

The last patch fixes a reference count problem of the port_vlan struct.
I plan to simplify the code in net-next, so that reference counting is
not necessary anymore.

Ido Schimmel (3):
  ipv6: Only emit append events for appended routes
  mlxsw: spectrum_router: Allow appending to dev-only routes
  mlxsw: spectrum_router: Align with new route replace logic

Petr Machata (1):
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting

 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 48 +++++++++++-----------
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |  4 +-
 net/ipv6/ip6_fib.c                                 |  5 +--
 3 files changed, 29 insertions(+), 28 deletions(-)

-- 
2.14.4

^ permalink raw reply

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Daniel Borkmann @ 2018-06-15 13:22 UTC (permalink / raw)
  To: Steffen Klassert, David Miller; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <20180615061740.eqgr2iv722oydheb@gauss3.secunet.de>

Hi Steffen,

On 06/15/2018 08:17 AM, Steffen Klassert wrote:
> On Thu, Jun 14, 2018 at 10:18:31AM -0700, David Miller wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Thu, 14 Jun 2018 16:19:34 +0200
>>
>>> This patchset proposes a new fast forwarding path infrastructure
>>> that combines the GRO/GSO and the flowtable infrastructures. The
>>> idea is to add a hook at the GRO layer that is invoked before the
>>> standard GRO protocol offloads. This allows us to build custom
>>> packet chains that we can quickly pass in one go to the neighbour
>>> layer to define fast forwarding path for flows.
>>
>> We have full, complete, customizability of the packet path via XDP
>> and eBPF.
>>
>> XDP and eBPF supports everything necessary to accomplish that,
>> there are implementations of forwarding implementations in
>> the tree and elsewhere.
>>
>> And most importantly, XDP and eBPF are optimized in drivers and
>> offloaded to hardware.
>>
>> There really is no need for something like what you are proposing.
> 
> I started with this last year because I wanted to improve
> the IPsec (and UDP) forwarding path. Batching packets
> at layer2  and send them directly to the output path
> seemed to be a good method to improve this.
> 
> In particular, we need to do only one IPsec lookup
> for the whole packet chain. So it relaxes the pain
> from reomoving the IPsec flowcache a bit. It can be
> only a first step, but we need some improvements here
> as people start to complain about that.

But did you also experiment with XDP on this? Would be curious about
the numbers. You'd get implicit batching for the forwarding via devmap
as well if you're required to flush it out via different device with
XDP_REDIRECT; otherwise XDP_TX of course. Given we have recently
integrated helpers for XDP to do a FIB and neighbor lookup from the
kernel tables, where it's thus shared and integrated with the rest of
the stack and tooling, it would be awesome to get to the same point
with xfrm as well. Eyal recently did a start on that for xfrm for tc
progs; would be nice to have integration on XDP as well, potentially
it might also result in a bigger plus on the forwarding numbers.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH] net_sched: blackhole: tell upper qdisc about dropped packets
From: Eric Dumazet @ 2018-06-15 13:13 UTC (permalink / raw)
  To: Konstantin Khlebnikov, netdev, David S. Miller
  Cc: Cong Wang, Jiri Pirko, Jamal Hadi Salim
In-Reply-To: <152905845136.88583.6197221349040585517.stgit@buzz>



On 06/15/2018 03:27 AM, Konstantin Khlebnikov wrote:
> When blackhole is used on top of classful qdisc like hfsc it breaks
> qlen and backlog counters because packets are disappear without notice.
> 
> In HFSC non-zero qlen while all classes are inactive triggers warning:
> WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
> and schedules watchdog work endlessly.
> 
> This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
> this flag tells upper layer: this packet is gone and isn't queued.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  net/sched/sch_blackhole.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c
> index c98a61e980ba..9c4c2bb547d7 100644
> --- a/net/sched/sch_blackhole.c
> +++ b/net/sched/sch_blackhole.c
> @@ -21,7 +21,7 @@ static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  			     struct sk_buff **to_free)
>  {
>  	qdisc_drop(skb, sch, to_free);
> -	return NET_XMIT_SUCCESS;
> +	return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;

Why do not we use instead :

	return qdisc_drop(skb, sch, to_free);

Although noop_enqueue() seems to use :

	return NET_XMIT_CN;

Oh well.

^ permalink raw reply

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Eric Dumazet @ 2018-06-15 13:01 UTC (permalink / raw)
  To: Steffen Klassert, Eric Dumazet; +Cc: Pablo Neira Ayuso, netfilter-devel, netdev
In-Reply-To: <20180615060307.fwwc2vnb5rpc464p@gauss3.secunet.de>



On 06/14/2018 11:03 PM, Steffen Klassert wrote:
> On Thu, Jun 14, 2018 at 08:57:20AM -0700, Eric Dumazet wrote:
>>

>> Saving cpu cycles on moderate load is not okay if added complexity
>> slows down the DDOS (or stress) by 10 % :/
> 
> Why 10%?
> 

GRO adds a ~6 % cost on UDP receive path at this moment, depending on the state
of GRO engine (number of packets in the napi->gro_list)

Adding yet another conditions and icache pressure might raise the cost to 10%,
but we do not know because the numbers presented in this RFC do not include that.

(Early demux is also adding extra costs for UDP on 'non connected sockets' BTW)

Most linux hosts are not routers, but end hosts, lets not forget this...

^ permalink raw reply

* Re: [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage
From: Sudeep Holla @ 2018-06-15 12:53 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Trond Myklebust, linux-nfs
  Cc: Anna Schumaker, J. Bruce Fields, Jeff Layton, David S. Miller,
	netdev, open list, linux-samsung-soc@vger.kernel.org,
	Sudeep Holla
In-Reply-To: <CAJKOXPd2rntNOpU1quR9Zm_J22+=pEaj4ZTC_tdZ0zcRYUciFg@mail.gmail.com>

Hi,

On Thu, Jun 7, 2018 at 12:19 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> Hi,
>
> When booting my boards under recent linux-next, I see failures of systemd:
>
> [FAILED] Failed to start Flush Journal to Persistent Storage.
> See 'systemctl status systemd-journal-flush.service' for details.
>          Starting Create Volatile Files and Directories...
> [**    ] A start job is running for Create V… [  223.209289] nfs:
> server 192.168.1.10 not responding, still trying
> [  223.209377] nfs: server 192.168.1.10 not responding, still trying
>
> Effectively the boards fails to boot. Example is here:
> https://krzk.eu/#/builders/1/builds/2157
>

I too encountered the same issue.

> This was bisected to:
> commit 37ac86c3a76c113619b7d9afe0251bbfc04cb80a
> Author: Chuck Lever <chuck.lever@oracle.com>
> Date:   Fri May 4 15:34:53 2018 -0400
>
>     SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock
>
>     alloc_slot is a transport-specific op, but initializing an rpc_rqst
>     is common to all transports. In addition, the only part of initial-
>     izing an rpc_rqst that needs serialization is getting a fresh XID.
>
>     Move rpc_rqst initialization to common code in preparation for
>     adding a transport-specific alloc_slot to xprtrdma.
>
>     Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>     Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>

Unfortunately, spent time to bisect independently without seeing this
report and got the same culprit.

>
> Bisect log attached. Full configuration:
> 1. exynos_defconfig
> 2. ARMv7, octa-core, Exynos5422 and Exynos4412 (Odroid XU3, U3 and others)
> 3. NFSv4 client (from Raspberry Pi)
>

Yes the issue is seen only with NFSv4 client and with latest systemd I think.
My Ubuntu 16.04(32bit FS) is  boots fine while 18.04 has the above issue.
Passing nfsv3 in kernel command line makes it work again.

> Let me know if you need any more information.
>

Also I was observing this issue with Linus master branch from
the time the above patch was merged until today. The issue
is no longer seen since this morning however I just enabled lockdep
and got these messages.

---->8

DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
WARNING: CPU: 2 PID: 74 at kernel/locking/rwsem.c:217
up_read_non_owner+0x78/0x90
Modules linked in:
CPU: 2 PID: 74 Comm: kworker/2:1 Not tainted 4.17.0-10597-g4c5e8fc62d6a #40
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno
Development Platform, BIOS EDK II Jun 14 2018
Workqueue: nfsiod rpc_async_release
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : up_read_non_owner+0x78/0x90
lr : up_read_non_owner+0x78/0x90
Call trace:
 up_read_non_owner+0x78/0x90
 nfs_async_unlink_release+0x2c/0x88
 rpc_free_task+0x28/0x48
 rpc_async_release+0x10/0x18
 process_one_work+0x29c/0x7a0
 worker_thread+0x40/0x458
 kthread+0x12c/0x130
 ret_from_fork+0x10/0x18
irq event stamp: 64737
hardirqs last  enabled at (64737): _raw_spin_unlock_irq+0x2c/0x60
hardirqs last disabled at (64736): _raw_spin_lock_irq+0x1c/0x60
softirqs last  enabled at (64728): srcu_invoke_callbacks+0xec/0x1a8
softirqs last disabled at (64724): srcu_invoke_callbacks+0xec/0x1a8


DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
WARNING: CPU: 2 PID: 1517 at kernel/locking/rwsem.c:217
up_read_non_owner+0x78/0x90
Modules linked in:
CPU: 2 PID: 1517 Comm: kworker/2:2 Not tainted
4.17.0-10597-g4c5e8fc62d6a-dirty #42
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno
Development Platform, BIOS EDK II Jun 14 2018
Workqueue: nfsiod rpc_async_release
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : up_read_non_owner+0x78/0x90
lr : up_read_non_owner+0x78/0x90
Call trace:
 up_read_non_owner+0x78/0x90
 nfs_async_unlink_release+0x2c/0x88
 rpc_free_task+0x28/0x48
 rpc_async_release+0x10/0x18
 process_one_work+0x258/0x400
 worker_thread+0x40/0x448
 kthread+0x11c/0x120
 ret_from_fork+0x10/0x18
irq event stamp: 69059
hardirqs last  enabled at (69059): _raw_spin_unlock_irq+0x2c/0x60
hardirqs last disabled at (69058): _raw_spin_lock_irq+0x1c/0x60
softirqs last  enabled at (68194): css_release_work_fn+0x188/0x1c0
softirqs last disabled at (68192): css_release_work_fn+0x170/0x1c0

Regards,
Sudeep

^ permalink raw reply

* Re: [PATCH bpf-net] selftests/bpf: delete xfrm tunnel when test exits.
From: Daniel Borkmann @ 2018-06-15 12:52 UTC (permalink / raw)
  To: William Tu, Eyal Birger; +Cc: Linux Kernel Network Developers, Anders Roxell
In-Reply-To: <CALDO+SaBVSXg8rpatDvan57noBsZgRoRy8wY6EHNqdvpKuxQHA@mail.gmail.com>

On 06/15/2018 02:43 PM, William Tu wrote:
> On Thu, Jun 14, 2018 at 10:24 PM, Eyal Birger <eyal.birger@gmail.com> wrote:
>>
>>
>>> On 14 Jun 2018, at 15:01, William Tu <u9012063@gmail.com> wrote:
>>>
>>> Make the printting of bpf xfrm tunnel better and
>>> cleanup xfrm state and policy when xfrm test finishes.
>>
>> Yeah the ‘tee’ was useful when developing the test - I could see what’s going on :)
>>
>> Now that it’s in ‘selftests’ it’s definitely better without it.
>>
>> Thanks for the cleanup!
>> Eyal.
> 
> Hi Eyal,
> Thanks for double check!
> 
> Hi Daniel and Martin,
> Sorry for the confusing "bpf-net". It should be "net"

Yep, took it into bpf, PR to David will come later today.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf-net] selftests/bpf: delete xfrm tunnel when test exits.
From: William Tu @ 2018-06-15 12:43 UTC (permalink / raw)
  To: Eyal Birger; +Cc: Linux Kernel Network Developers, Anders Roxell
In-Reply-To: <F41E9F5B-D170-4BAE-B797-D915F6C24227@gmail.com>

On Thu, Jun 14, 2018 at 10:24 PM, Eyal Birger <eyal.birger@gmail.com> wrote:
>
>
>> On 14 Jun 2018, at 15:01, William Tu <u9012063@gmail.com> wrote:
>>
>> Make the printting of bpf xfrm tunnel better and
>> cleanup xfrm state and policy when xfrm test finishes.
>
> Yeah the ‘tee’ was useful when developing the test - I could see what’s going on :)
>
> Now that it’s in ‘selftests’ it’s definitely better without it.
>
> Thanks for the cleanup!
> Eyal.

Hi Eyal,
Thanks for double check!

Hi Daniel and Martin,
Sorry for the confusing "bpf-net". It should be "net"

Thanks
William

^ permalink raw reply

* pull-request: mac80211 2018-06-15
From: Johannes Berg @ 2018-06-15 12:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-wireless

Hi Dave,

We have just a handful of pretty simple fixes for this round
of updates for the current cycle.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit 7f6afc338405628ae32826ce82827c3f19bf3d15:

  Merge branch 'l2tp-fixes' (2018-06-14 17:10:19 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-davem-2018-06-15

for you to fetch changes up to bf2b61a6838f19cbc33f6732715012c483fa3795:

  cfg80211: fix rcu in cfg80211_unregister_wdev (2018-06-15 13:05:14 +0200)

----------------------------------------------------------------
A handful of fixes:
 * missing RCU grace period enforcement led to drivers freeing
   data structures before; fix from Dedy Lansky.
 * hwsim module init error paths were messed up; fixed it myself
   after a report from Colin King (who had sent a partial patch)
 * kernel-doc tag errors; fix from Luca Coelho
 * initialize the on-stack sinfo data structure when getting
   station information; fix from Sven Eckelmann
 * TXQ state dumping is now done from init, and when TXQs aren't
   initialized yet at that point, bad things happen, move the
   initialization; fix from Toke Høiland-Jørgensen.

----------------------------------------------------------------
Dedy Lansky (1):
      cfg80211: fix rcu in cfg80211_unregister_wdev

Johannes Berg (1):
      mac80211_hwsim: fix module init error paths

Luca Coelho (1):
      nl80211: fix some kernel doc tag mistakes

Sven Eckelmann (1):
      cfg80211: initialize sinfo in cfg80211_get_station

Toke Høiland-Jørgensen (1):
      mac80211: Move up init of TXQs

 drivers/net/wireless/mac80211_hwsim.c | 11 +++++++++--
 include/uapi/linux/nl80211.h          | 28 ++++++++++++++--------------
 net/mac80211/main.c                   | 12 ++++++------
 net/wireless/core.c                   |  1 +
 net/wireless/util.c                   |  2 ++
 5 files changed, 32 insertions(+), 22 deletions(-)

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-15 12:31 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization
In-Reply-To: <20180615113242.799974f6.cohuck@redhat.com>

On Fri, Jun 15, 2018 at 11:32:42AM +0200, Cornelia Huck wrote:
> On Fri, 15 Jun 2018 05:34:24 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Jun 14, 2018 at 12:02:31PM +0200, Cornelia Huck wrote:
> 
> > > > > I am not all that familiar with how Qemu manages network devices. If we can
> > > > > do all the
> > > > > required management of the primary/standby devices within Qemu, that is
> > > > > definitely a better
> > > > > approach without upper layer involvement.    
> > > > 
> > > > Right. I would imagine in the extreme case the upper layer doesn't
> > > > have to be involved at all if QEMU manages all hot plug/unplug logic.
> > > > The management tool can supply passthrough device and virtio with the
> > > > same group UUID, QEMU auto-manages the presence of the primary, and
> > > > hot plug the device as needed before or after the migration.  
> > > 
> > > I do not really see how you can manage that kind of stuff in QEMU only.  
> > 
> > So right now failover is limited to pci passthrough devices only.
> > The idea is to realize the vfio device but not expose it
> > to guest. Have a separate command to expose it to guest.
> > Hotunplug would also hide it from guest but not unrealize it.
> 
> So, this would not be real hot(un)plug, but 'hide it from the guest',
> right? The concept of "we have it realized in QEMU, but the guest can't
> discover and use it" should be translatable to non-pci as well (at
> least for ccw).
> 
> > 
> > This will help ensure that e.g. on migration failure we can
> > re-expose the device without risk of running out of resources.
> 
> Makes sense.
> 
> Should that 'hidden' state be visible/settable from outside as well
> (e.g. via a property)? I guess yes, so that management software has a
> chance to see whether a device is visible.

Might be handy for debug, but note that since QEMU manages this
state it's transient: can change at any time, so it's kind
of hard for management to rely on it.

> Settable may be useful if we
> find another use case for hiding realized devices.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox