Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC 4/9] veth: Use NAPI for XDP
From: Jesper Dangaard Brouer @ 2018-05-01  7:50 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: brouer, netdev, Toshiaki Makita
In-Reply-To: <20180424143923.26519-5-toshiaki.makita1@gmail.com>

On Tue, 24 Apr 2018 23:39:18 +0900
Toshiaki Makita <toshiaki.makita1@gmail.com> wrote:

> +static int veth_xdp_enqueue(struct veth_priv *priv, void *ptr)
> +{
> +	if (unlikely(ptr_ring_produce(&priv->xdp_ring, ptr)))
> +		return -ENOSPC;
> +
> +	return 0;
> +}

Here we have a lock per (enqueued) packet.  I'm working on changing the
ndo_xdp_xmit API to allow bulking.  And the tun driver have exact same
issue/need.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: IP_PKTINFO broken by acf568ee859f0 (xfrm: Reinject transport-mode packets through tasklet)
From: Steffen Klassert @ 2018-05-01  7:51 UTC (permalink / raw)
  To: Maxime Bizon; +Cc: Herbert Xu, netdev
In-Reply-To: <1524863813.15003.174.camel@sakura.staff.proxad.net>

On Fri, Apr 27, 2018 at 11:16:53PM +0200, Maxime Bizon wrote:
> 
> Hello Herbert,
> 
> That patch just went into stable 4.14 and is causing a regression on my
> setup.
> 
> Basically, IP_PKTINFO does not work anymore on transport-mode packets,
> because skb->cb is now used to store the finish callback.
> 
> Was that expected or is it an unforeseen side effect ?

This should be fixed by:

commit 9a3fb9fb84cc ("xfrm: Fix transport mode skb control buffer usage.")

^ permalink raw reply

* Re: [PATCH] change the comment of vti6_ioctl
From: Steffen Klassert @ 2018-05-01  8:00 UTC (permalink / raw)
  To: David Miller; +Cc: sunlw.fnst, netdev, herbert
In-Reply-To: <20180430.115731.1213104451620131850.davem@davemloft.net>

On Mon, Apr 30, 2018 at 11:57:31AM -0400, David Miller wrote:
> From: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
> Date: Sun, 29 Apr 2018 15:05:52 +0800
> 
> > The comment of vti6_ioctl() is wrong. which use vti6_tnl_ioctl
> > instead of vti6_ioctl.
> > 
> > Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
> 
> Please CC: the IPSEC maintainers on future patch submissions to IPSEC
> files, as per the top level MAINTAINERS file.
> 
> Steffen, please queue this up, thank you.

Now applied to ipsec-next, thanks everyone!

^ permalink raw reply

* Re: [PATCH RFC 4/9] veth: Use NAPI for XDP
From: Toshiaki Makita @ 2018-05-01  8:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Toshiaki Makita, netdev
In-Reply-To: <20180501095008.207c354f@brouer.com>

On 2018/05/01 16:50, Jesper Dangaard Brouer wrote:
> On Tue, 24 Apr 2018 23:39:18 +0900
> Toshiaki Makita <toshiaki.makita1@gmail.com> wrote:
> 
>> +static int veth_xdp_enqueue(struct veth_priv *priv, void *ptr)
>> +{
>> +	if (unlikely(ptr_ring_produce(&priv->xdp_ring, ptr)))
>> +		return -ENOSPC;
>> +
>> +	return 0;
>> +}
> 
> Here we have a lock per (enqueued) packet.  I'm working on changing the
> ndo_xdp_xmit API to allow bulking.  And the tun driver have exact same
> issue/need.

Probably I should have noted in commitlog, but this per-packet lock is
removed in patch 9.
I'm curious about if any change is needed by your new API.

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [PATCH RFC 6/9] veth: Add ndo_xdp_xmit
From: Jesper Dangaard Brouer @ 2018-05-01  8:14 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toshiaki Makita, netdev, Tariq Toukan, brouer, Daniel Borkmann,
	Alexei Starovoitov, Eran Ben Elisha
In-Reply-To: <6ed7c0c5-d6cb-7870-38da-4bb49707a63c@lab.ntt.co.jp>

On Tue, 1 May 2018 10:02:01 +0900
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:

> On 2018/05/01 2:27, Jesper Dangaard Brouer wrote:
> > On Thu, 26 Apr 2018 19:52:40 +0900
> > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> >   
> >> On 2018/04/26 5:24, Jesper Dangaard Brouer wrote:  
> >>> On Tue, 24 Apr 2018 23:39:20 +0900
> >>> Toshiaki Makita <toshiaki.makita1@gmail.com> wrote:
> >>>     
> >>>> +static int veth_xdp_xmit(struct net_device *dev, struct xdp_frame *frame)
> >>>> +{
> >>>> +	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
> >>>> +	int headroom = frame->data - (void *)frame;
> >>>> +	struct net_device *rcv;
> >>>> +	int err = 0;
> >>>> +
> >>>> +	rcv = rcu_dereference(priv->peer);
> >>>> +	if (unlikely(!rcv))
> >>>> +		return -ENXIO;
> >>>> +
> >>>> +	rcv_priv = netdev_priv(rcv);
> >>>> +	/* xdp_ring is initialized on receive side? */
> >>>> +	if (rcu_access_pointer(rcv_priv->xdp_prog)) {
> >>>> +		err = xdp_ok_fwd_dev(rcv, frame->len);
> >>>> +		if (unlikely(err))
> >>>> +			return err;
> >>>> +
> >>>> +		err = veth_xdp_enqueue(rcv_priv, veth_xdp_to_ptr(frame));
> >>>> +	} else {
> >>>> +		struct sk_buff *skb;
> >>>> +
> >>>> +		skb = veth_build_skb(frame, headroom, frame->len, 0);
> >>>> +		if (unlikely(!skb))
> >>>> +			return -ENOMEM;
> >>>> +
> >>>> +		/* Get page ref in case skb is dropped in netif_rx.
> >>>> +		 * The caller is responsible for freeing the page on error.
> >>>> +		 */
> >>>> +		get_page(virt_to_page(frame->data));    
> >>>
> >>> I'm not sure you can make this assumption, that xdp_frames coming from
> >>> another device driver uses a refcnt based memory model. But maybe I'm
> >>> confused, as this looks like an SKB receive path, but in the
> >>> ndo_xdp_xmit().    
> >>
> >> I find this path similar to cpumap, which creates skb from redirected
> >> xdp frame. Once it is converted to skb, skb head is freed by
> >> page_frag_free, so anyway I needed to get the refcount here regardless
> >> of memory model.  
> > 
> > Yes I know, I wrote cpumap ;-)
> > 
> > First of all, I don't want to see such xdp_frame to SKB conversion code
> > in every driver.  Because that increase the chances of errors.  And
> > when looking at the details, then it seems that you have made the
> > mistake of making it possible to leak xdp_frame info to the SKB (which
> > cpumap takes into account).  
> 
> Do you mean leaving xdp_frame in skb->head is leaking something? how?

Like commit 97e19cce05e5 ("bpf: reserve xdp_frame size in xdp headroom")
and commit 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data
on page reuse").

But this time, the concern is a bpf_prog attached at TC/bpf_cls level, that can 
that can adjust head via bpf_skb_change_head (for XDP it is
bpf_xdp_adjust_head) into the area used by xdp_frame.  As described in
https://git.kernel.org/davem/net-next/c/6dfb970d3dbd in is not super
critical at the moment, as this _currently_ runs as CAP_SYS_ADMIN, but
we would like to move towards CAP_NET_ADMIN.

> > Second, I think the refcnt scheme here is wrong. The xdp_frame should
> > be "owned" by XDP and have the proper refcnt to deliver it directly to
> > the network stack.
> > 
> > Third, if we choose that we want a fallback, in-case XDP is not enabled
> > on egress dev (but it have an ndo_xdp_xmit), then it should be placed
> > in the generic/core code.  E.g. __bpf_tx_xdp_map() could look at the
> > return code from dev->netdev_ops->ndo_xdp() and create an SKB.  (Hint,
> > this would make it easy to implement TX bulking towards the dev).  
> 
> Right, this is a much cleaner way.
>
> Although I feel like we should add this fallback for veth because it
> requires something which is different from other drivers (enabling XDP
> on the peer device of the egress device), 

(This is why I Cc'ed Tariq...)

This is actually a general problem with the xdp "xmit" side (and not
specific to veth driver). The problem exists for other drivers as well.

The problem is that a driver can implement ndo_xdp_xmit(), but the
driver might not have allocated the necessary XDP TX-queue resources
yet (or it might not be possible due to system resource limits).

The current "hack" is to load an XDP prog on the egress device, and
then assume that this cause the driver to also allocate the XDP
ndo_xdo_xmit side HW resources.  This is IMHO a wrong assumption.

We need a more generic way to test if a net_device is "ready/enabled"
for handling xdp_frames via ndo_xdp_xmit.  And Tariq had some ideas on
how to implement this...

My opinion is that it is a waste of (HW/mem) resources to always
allocate resources for ndo_xdp_xmit when loading an XDP program.
Because what if my use-cases are XDP_DROP DDoS filter, or CPUMAP
redirect load-balancer, then I don't want/need ndo_xdp_xmit.  E.g.
today using ixgbe on machines with more than 96 CPUs, will fail due to
limited TX queue resources. Thus, blocking the mentioned use-cases.

> I'll drop the part for now. It should not be resolved in the driver
> code.

Thank you.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net-next 0/2] mlxsw: Reject unsupported FIB configurations
From: Ido Schimmel @ 2018-05-01  8:16 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, dsahern, mlxsw, Ido Schimmel

Recently it became possible for listeners of the FIB notification chain
to veto operations such as addition of routes and rules.

Adjust the mlxsw driver to take advantage of it and return an error for
unsupported FIB rules and for routes configured after the abort
mechanism was triggered (due to exceeded resources for example).

Ido Schimmel (2):
  mlxsw: spectrum_router: Return an error for non-default FIB rules
  mlxsw: spectrum_router: Return an error for routes added after abort

 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [PATCH net-next 1/2] mlxsw: spectrum_router: Return an error for non-default FIB rules
From: Ido Schimmel @ 2018-05-01  8:16 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180501081639.29162-1-idosch@mellanox.com>

Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
fib_nl_newrule") it is possible to forbid the installation of
unsupported FIB rules.

Have mlxsw return an error for non-default FIB rules in addition to the
existing extack message.

Example:
# ip rule add from 198.51.100.1 table 10
Error: mlxsw_spectrum: FIB rules not supported.

Note that offload is only aborted when non-default FIB rules are already
installed and merely replayed during module initialization.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 8e4edb634b11..baea97560029 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5899,7 +5899,7 @@ static int mlxsw_sp_router_fib_rule_event(unsigned long event,
 	}
 
 	if (err < 0)
-		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported. Aborting offload");
+		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported");
 
 	return err;
 }
@@ -5926,8 +5926,8 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 	case FIB_EVENT_RULE_DEL:
 		err = mlxsw_sp_router_fib_rule_event(event, info,
 						     router->mlxsw_sp);
-		if (!err)
-			return NOTIFY_DONE;
+		if (!err || info->extack)
+			return notifier_from_errno(err);
 	}
 
 	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 2/2] mlxsw: spectrum_router: Return an error for routes added after abort
From: Ido Schimmel @ 2018-05-01  8:16 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180501081639.29162-1-idosch@mellanox.com>

We currently do not perform accounting in the driver and thus can't
reject routes before resources are exceeded.

However, in order to make users aware of the fact that routes are no
longer offloaded we can return an error for routes configured after the
abort mechanism was triggered.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index baea97560029..c9fce669cee4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5928,6 +5928,13 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 						     router->mlxsw_sp);
 		if (!err || info->extack)
 			return notifier_from_errno(err);
+		break;
+	case FIB_EVENT_ENTRY_ADD:
+		if (router->aborted) {
+			NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
+			return notifier_from_errno(-EINVAL);
+		}
+		break;
 	}
 
 	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH RFC 4/9] veth: Use NAPI for XDP
From: Jesper Dangaard Brouer @ 2018-05-01  8:43 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Toshiaki Makita, netdev, brouer
In-Reply-To: <79ff8b87-2ee2-ce60-36f3-6f3f79cb3272@lab.ntt.co.jp>

On Tue, 1 May 2018 17:02:34 +0900
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:

> On 2018/05/01 16:50, Jesper Dangaard Brouer wrote:
> > On Tue, 24 Apr 2018 23:39:18 +0900
> > Toshiaki Makita <toshiaki.makita1@gmail.com> wrote:
> >   
> >> +static int veth_xdp_enqueue(struct veth_priv *priv, void *ptr)
> >> +{
> >> +	if (unlikely(ptr_ring_produce(&priv->xdp_ring, ptr)))
> >> +		return -ENOSPC;
> >> +
> >> +	return 0;
> >> +}  
> > 
> > Here we have a lock per (enqueued) packet.  I'm working on changing the
> > ndo_xdp_xmit API to allow bulking.  And the tun driver have exact same
> > issue/need.  
> 
> Probably I should have noted in commitlog, but this per-packet lock is
> removed in patch 9.
> I'm curious about if any change is needed by your new API.

Again, I'm just moving this into the generic code, to avoid having to
implement this for every driver.

Plus, with CONFIG_RETPOLINE we have the advantage of only calling
ndo_xdp_xmit once (indirect function pointer call).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: KMSAN: uninit-value in _decode_session6
From: syzbot @ 2018-05-01  9:12 UTC (permalink / raw)
  To: davem, herbert, kuznet, linux-kernel, netdev, steffen.klassert,
	syzkaller-bugs, yoshfuji
In-Reply-To: <000000000000311cdd0569510cc7@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    d2d741e5d189 kmsan: add initialization for shmem pages
git tree:       https://github.com/google/kmsan.git/master
console output: https://syzkaller.appspot.com/x/log.txt?id=6550343064223744
kernel config:   
https://syzkaller.appspot.com/x/.config?id=328654897048964367
dashboard link: https://syzkaller.appspot.com/bug?extid=2974b85346f85b586f4d
compiler:       clang version 7.0.0 (trunk 329391)
syzkaller  
repro:https://syzkaller.appspot.com/x/repro.syz?id=5023772637659136
C reproducer:   https://syzkaller.appspot.com/x/repro.c?id=5102535626981376

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2974b85346f85b586f4d@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
==================================================================
BUG: KMSAN: uninit-value in _decode_session6+0x6d2/0x16e0  
net/ipv6/xfrm6_policy.c:151
CPU: 0 PID: 4529 Comm: syz-executor165 Not tainted 4.16.0+ #87
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
  _decode_session6+0x6d2/0x16e0 net/ipv6/xfrm6_policy.c:151
  __xfrm_decode_session+0x151/0x200 net/xfrm/xfrm_policy.c:2368
  xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
  icmpv6_route_lookup net/ipv6/icmp.c:372 [inline]
  icmp6_send+0x2bf7/0x3730 net/ipv6/icmp.c:551
  icmpv6_send+0xe0/0x110 net/ipv6/ip6_icmp.c:43
  ip6_link_failure+0x8f/0x580 net/ipv6/route.c:2034
  dst_link_failure include/net/dst.h:426 [inline]
  ip6_tnl_xmit+0x1423/0x3af0 net/ipv6/ip6_tunnel.c:1215
  ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1367 [inline]
  ip6_tnl_start_xmit+0x1cc0/0x1ef0 net/ipv6/ip6_tunnel.c:1390
  __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
  netdev_start_xmit include/linux/netdevice.h:4075 [inline]
  xmit_one net/core/dev.c:3026 [inline]
  dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
  __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
  neigh_direct_output+0x42/0x50 net/core/neighbour.c:1390
  neigh_output include/net/neighbour.h:482 [inline]
  ip6_finish_output2+0x1d01/0x2130 net/ipv6/ip6_output.c:120
  ip6_finish_output+0xae9/0xba0 net/ipv6/ip6_output.c:154
  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
  ip6_output+0x597/0x6c0 net/ipv6/ip6_output.c:171
  dst_output include/net/dst.h:443 [inline]
  ip6_local_out+0x15e/0x1d0 net/ipv6/output_core.c:176
  ip6_send_skb net/ipv6/ip6_output.c:1682 [inline]
  ip6_push_pending_frames+0x218/0x4d0 net/ipv6/ip6_output.c:1702
  rawv6_push_pending_frames net/ipv6/raw.c:616 [inline]
  rawv6_sendmsg+0x4235/0x4fb0 net/ipv6/raw.c:935
  inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  sock_write_iter+0x3b9/0x470 net/socket.c:909
  call_write_iter include/linux/fs.h:1782 [inline]
  new_sync_write fs/read_write.c:469 [inline]
  __vfs_write+0x7fb/0x9f0 fs/read_write.c:482
  vfs_write+0x463/0x8d0 fs/read_write.c:544
  SYSC_write+0x172/0x360 fs/read_write.c:589
  SyS_write+0x55/0x80 fs/read_write.c:581
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x4418b9
RSP: 002b:00007ffece331e68 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004418b9
RDX: 000000000000036b RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004025b0
R13: 0000000000402640 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
  slab_post_alloc_hook mm/slab.h:445 [inline]
  slab_alloc_node mm/slub.c:2737 [inline]
  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  pskb_expand_head+0x21d/0x1a70 net/core/skbuff.c:1458
  __pskb_pull_tail+0x1d7/0x2300 net/core/skbuff.c:1878
  pskb_may_pull include/linux/skbuff.h:2112 [inline]
  ip6_tnl_parse_tlv_enc_lim+0x7f5/0xa90 net/ipv6/ip6_tunnel.c:411
  ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline]
  ip6_tnl_start_xmit+0x911/0x1ef0 net/ipv6/ip6_tunnel.c:1390
  __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
  netdev_start_xmit include/linux/netdevice.h:4075 [inline]
  xmit_one net/core/dev.c:3026 [inline]
  dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
  __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
  neigh_direct_output+0x42/0x50 net/core/neighbour.c:1390
  neigh_output include/net/neighbour.h:482 [inline]
  ip6_finish_output2+0x1d01/0x2130 net/ipv6/ip6_output.c:120
  ip6_finish_output+0xae9/0xba0 net/ipv6/ip6_output.c:154
  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
  ip6_output+0x597/0x6c0 net/ipv6/ip6_output.c:171
  dst_output include/net/dst.h:443 [inline]
  ip6_local_out+0x15e/0x1d0 net/ipv6/output_core.c:176
  ip6_send_skb net/ipv6/ip6_output.c:1682 [inline]
  ip6_push_pending_frames+0x218/0x4d0 net/ipv6/ip6_output.c:1702
  rawv6_push_pending_frames net/ipv6/raw.c:616 [inline]
  rawv6_sendmsg+0x4235/0x4fb0 net/ipv6/raw.c:935
  inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  sock_write_iter+0x3b9/0x470 net/socket.c:909
  call_write_iter include/linux/fs.h:1782 [inline]
  new_sync_write fs/read_write.c:469 [inline]
  __vfs_write+0x7fb/0x9f0 fs/read_write.c:482
  vfs_write+0x463/0x8d0 fs/read_write.c:544
  SYSC_write+0x172/0x360 fs/read_write.c:589
  SyS_write+0x55/0x80 fs/read_write.c:581
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
==================================================================

^ permalink raw reply

* [PATCH] wireless: ipw2200:  fix spelling mistake: "functionalitis" -> "functionalities"
From: Colin King @ 2018-05-01  9:36 UTC (permalink / raw)
  To: Stanislav Yakovlev, Kalle Valo, David S . Miller, linux-wireless,
	netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in module parameter description text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/intel/ipw2x00/ipw2200.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.c b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
index 87a5e414c2f7..ba3fb1d2ddb4 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
@@ -12012,7 +12012,7 @@ MODULE_PARM_DESC(rtap_iface, "create the rtap interface (1 - create, default 0)"
 
 #ifdef CONFIG_IPW2200_QOS
 module_param(qos_enable, int, 0444);
-MODULE_PARM_DESC(qos_enable, "enable all QoS functionalitis");
+MODULE_PARM_DESC(qos_enable, "enable all QoS functionalities");
 
 module_param(qos_burst_enable, int, 0444);
 MODULE_PARM_DESC(qos_burst_enable, "enable QoS burst mode");
-- 
2.17.0

^ permalink raw reply related

* Re: [Xen-devel] [PATCH 4/6] xen-netfront: add range check for Tx response id
From: Wei Liu @ 2018-05-01 10:05 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Juergen Gross, open list:NETWORKING DRIVERS, stable,
	open list, Boris Ostrovsky, Wei Liu
In-Reply-To: <960c6d6300fd3450ae9fb1de1c412bef7dbae992.1525122026.git-series.marmarek@invisiblethingslab.com>

On Mon, Apr 30, 2018 at 11:01:48PM +0200, Marek Marczykowski-Górecki wrote:
> Tx response ID is fetched from shared page, so make sure it is sane
> before using it as an array index.
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
>  drivers/net/xen-netfront.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 934b8a4..55c9b25 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -394,6 +394,7 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
>  				continue;
>  
>  			id  = txrsp.id;
> +			BUG_ON(id >= NET_TX_RING_SIZE);

It is better to use ARRAY_SIZE here.

Wei.

^ permalink raw reply

* Re: [PATCH net-next] udp: Complement partial checksum for GSO packet
From: Willem de Bruijn @ 2018-05-01 10:06 UTC (permalink / raw)
  To: Sean Tranchetti
  Cc: Willem de Bruijn, David Miller, Network Development,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <1525132862-7672-1-git-send-email-stranche@codeaurora.org>

On Tue, May 1, 2018 at 2:01 AM, Sean Tranchetti <stranche@codeaurora.org> wrote:
> Using the udp_v4_check() function to calculate the pseudo header
> for the newly segmented UDP packets results in assigning the complement
> of the value to the UDP header checksum field.
>
> Always undo the complement the partial checksum value in order to
> match the case where GSO is not used on the UDP transmit path.
>
> Fixes: ee80d1ebe5ba ("udp: add udp gso")
> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
>  net/ipv4/udp_offload.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index f78fb36..0062570 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -223,6 +223,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>                         csum_replace2(&uh->check, htons(mss),
>                                       htons(seg->len - hdrlen - sizeof(*uh)));
>
> +               uh->check = ~uh->check;
>                 seg->destructor = sock_wfree;
>                 seg->sk = sk;
>                 sum_truesize += seg->truesize;

Thanks, this looks plausible. I may have been mistaken by evaluation
showing packets arriving at the peer with the correct checksum before.

On retesting, packets arrive with correct checksum in both cases, and
even when setting purposely bad uh->check values.

Perhaps the two platforms I tested on (bnx2x, mlx4) ignore the pseudo
header and recompute from scratch if CHECKSUM_PARTIAL is set for
udp. Still having a look..

^ permalink raw reply

* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-05-01 11:36 UTC (permalink / raw)
  To: Cong Wang, Dave Taht; +Cc: Linux Kernel Network Developers, Cake List
In-Reply-To: <CAM_iQpXTYJ7nkFW=H_ZudUksP1UHT=cyLPv6M-XjekfeuzYbXg@mail.gmail.com>

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Mon, Apr 30, 2018 at 2:27 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> On Mon, Apr 30, 2018 at 2:21 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Sun, Apr 29, 2018 at 2:34 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>>> sch_cake targets the home router use case and is intended to squeeze the
>>>> most bandwidth and latency out of even the slowest ISP links and routers,
>>>> while presenting an API simple enough that even an ISP can configure it.
>>>>
>>>> Example of use on a cable ISP uplink:
>>>>
>>>> tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter
>>>>
>>>> To shape a cable download link (ifb and tc-mirred setup elided)
>>>>
>>>> tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash
>>>>
>>>> CAKE is filled with:
>>>>
>>>> * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
>>>>   derived Flow Queuing system, which autoconfigures based on the bandwidth.
>>>> * A novel "triple-isolate" mode (the default) which balances per-host
>>>>   and per-flow FQ even through NAT.
>>>> * An deficit based shaper, that can also be used in an unlimited mode.
>>>> * 8 way set associative hashing to reduce flow collisions to a minimum.
>>>> * A reasonable interpretation of various diffserv latency/loss tradeoffs.
>>>> * Support for zeroing diffserv markings for entering and exiting traffic.
>>>> * Support for interacting well with Docsis 3.0 shaper framing.
>>>> * Extensive support for DSL framing types.
>>>> * Support for ack filtering.
>>>
>>> Why this TCP ACK filtering has to be built into CAKE qdisc rather than
>>> an independent TC filter? Why other qdisc's can't use it?
>>
>> I actually have a tc - bpf based ack filter, during the development of
>> cake's ack-thinner, that I should submit one of these days. It
>> proved to be of limited use.
>
> Yeah.
>
>>
>> Probably the biggest mistake we made is by calling this cake feature a
>> filter. It isn't.
>
>
> It inspects the payload of each packet and drops packets, therefore it
> is a filter by definition, no matter how you name it.

Well, sure, but the distinguishing feature is that it is a *stateful*
filter, whose reaction depends on the current state of the queue.

>> Maybe we should have called it a "thinner" or something like that? In
>> order to properly "thin" or "reduce" an ack stream
>> you have to have a queue to look at and some related state. TC filters
>> do not operate on queues, qdiscs do. Thus the "ack-filter" here is
>> deeply embedded into cake's flow isolation and queue structures.
>
> TC filters are installed on qdiscs and in the beginning qdiscs were
> queues,for example, pfifo. We already have flow-based filters too
> (cls_flower),so we can make them work together, although probably it
> is not straight forward.

Yeah, but filters don't have any visibility into the queue. You could
conceivably amend the TC filter API to allow this, but it would be quite
intrusive, especially for complicated qdiscs.

I'm not opposed to seeing an API change and generalising the CAKE ACK
filter based on it. But that is certainly out of scope for this
submission...

-Toke

^ permalink raw reply

* RE: [PATCH] tipc: fix a potential missing-check bug
From: Jon Maloy @ 2018-05-01 12:01 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Kangjie Lu, Ying Xue, David S. Miller,
	open list:TIPC NETWORK LAYER, open list:TIPC NETWORK LAYER,
	open list
In-Reply-To: <1525148761-11091-1-git-send-email-wang6495@umn.edu>



> -----Original Message-----
> From: Wenwen Wang [mailto:wang6495@umn.edu]
> Sent: Tuesday, May 01, 2018 00:26
> To: Wenwen Wang <wang6495@umn.edu>
> Cc: Kangjie Lu <kjlu@umn.edu>; Jon Maloy <jon.maloy@ericsson.com>; Ying
> Xue <ying.xue@windriver.com>; David S. Miller <davem@davemloft.net>;
> open list:TIPC NETWORK LAYER <netdev@vger.kernel.org>; open list:TIPC
> NETWORK LAYER <tipc-discussion@lists.sourceforge.net>; open list <linux-
> kernel@vger.kernel.org>
> Subject: [PATCH] tipc: fix a potential missing-check bug
> 
> In tipc_link_xmit(), the member field "len" of l->backlog[imp] must be less
> than the member field "limit" of l->backlog[imp] when imp is equal to
> TIPC_SYSTEM_IMPORTANCE. Otherwise, an error code, i.e., -ENOBUFS, is
> returned. This is enforced by the security check. However, at the end of
> tipc_link_xmit(), the length of "list" is added to l->backlog[imp].len without
> any further check. This can potentially cause unexpected values for
> l->backlog[imp].len. If imp is equal to TIPC_SYSTEM_IMPORTANCE and the
> original value of l->backlog[imp].len is less than l->backlog[imp].limit, after
> this addition, l->backlog[imp] could be larger than
> l->backlog[imp].limit. 

It can, but only once. That is the intention with allowing oversubscription. This is expected and permitted.
At next sending attempt, if the send queue has not been reduced in the meantime, the link will be reset, as intended.

> That means the security check can potentially be
> bypassed,  especially when an adversary can control the length of "list".

The length of 'list' is entirely controlled by TIPC itself, either by the socket layer (where length  always is 1 for this type of messages) or
 name_dist, In the latter case the length is also 1, except at first link setup, when there guaranteed is no congestion anyway.

I appreciate your interest, but this patch is not needed.

BR
///jon

> 
> This patch performs such a check after the modification to
> l->backlog[imp].len (if imp is TIPC_SYSTEM_IMPORTANCE) to avoid such
> security issues. An error code will be returned if an unexpected value of
> l->backlog[imp].len is generated.
> 
> Signed-off-by: Wenwen Wang <wang6495@umn.edu>
> ---
>  net/tipc/link.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 695acb7..62972fa 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -948,6 +948,11 @@ int tipc_link_xmit(struct tipc_link *l, struct
> sk_buff_head *list,
>  			continue;
>  		}
>  		l->backlog[imp].len += skb_queue_len(list);
> +		if (imp == TIPC_SYSTEM_IMPORTANCE &&
> +		    l->backlog[imp].len >= l->backlog[imp].limit) {
> +			pr_warn("%s<%s>, link overflow", link_rst_msg, l-
> >name);
> +			return -ENOBUFS;
> +		}
>  		skb_queue_splice_tail_init(list, backlogq);
>  	}
>  	l->snd_nxt = seqno;
> --
> 2.7.4

^ permalink raw reply

* [PATCH iproute2 2/2] arpd: remove pthread dependency
From: Baruch Siach @ 2018-05-01 12:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach
In-Reply-To: <1918c6139ac3ee9affe8e62817f30b110ab235c7.1525178588.git.baruch@tkos.co.il>

Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d7
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.

This change allows arpd build with toolchains that do not provide
threads support.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 misc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/Makefile b/misc/Makefile
index 34ef6b21b4ed..b2dd6b26e2dc 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -25,7 +25,7 @@ rtacct: rtacct.c
 	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o rtacct rtacct.c $(LDLIBS) -lm
 
 arpd: arpd.c
-	$(QUIET_CC)$(CC) $(CFLAGS) -I$(DBM_INCLUDE) $(LDFLAGS) -o arpd arpd.c $(LDLIBS) -ldb -lpthread
+	$(QUIET_CC)$(CC) $(CFLAGS) -I$(DBM_INCLUDE) $(LDFLAGS) -o arpd arpd.c $(LDLIBS) -ldb
 
 ssfilter.c: ssfilter.y
 	$(QUIET_YACC)bison ssfilter.y -o ssfilter.c
-- 
2.17.0

^ permalink raw reply related

* [PATCH iproute2 1/2] README: update libdb build dependency information
From: Baruch Siach @ 2018-05-01 12:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach

Debian does not distribute libdb4.x-dev for quite some time now. Current
stable carries libdb5.3-dev. Update the wording accordingly.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 README | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README b/README
index f66fd5faf4cf..bc82187cf018 100644
--- a/README
+++ b/README
@@ -16,8 +16,8 @@ How to compile this.
 --------------------
 1. libdbm
 
-arpd needs to have the db4 development libraries. For Debian
-users this is the package with a name like libdb4.x-dev.
+arpd needs to have the berkeleydb development libraries. For Debian
+users this is the package with a name like libdbX.X-dev.
 DBM_INCLUDE points to the directory with db_185.h which
 is the include file used by arpd to get to the old format Berkeley
 database routines.  Often this is in the db-devel package.
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 1/2] dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
From: Rob Herring @ 2018-05-01 12:55 UTC (permalink / raw)
  To: Yixun Lan
  Cc: David S. Miller, netdev, Kevin Hilman, Carlo Caione,
	Jerome Brunet, Martin Blumenstingl, linux-amlogic,
	linux-arm-kernel, linux-kernel, devicetree
In-Reply-To: <20180428102111.18384-2-yixun.lan@amlogic.com>

On Sat, Apr 28, 2018 at 10:21:10AM +0000, Yixun Lan wrote:
> We need to introduce a new compatible name for the Meson-AXG SoC
> in order to support the RMII 100M ethernet PHY, since the PRG_ETH0
> register of the dwmac glue layer is changed from previous old SoC.
> 
> Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
> ---
>  Documentation/devicetree/bindings/net/meson-dwmac.txt | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* [PATCH iproute2-master] iproute: Parse last nexthop in a multipath route
From: Ido Schimmel @ 2018-05-01 13:16 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern, mlxsw, Ido Schimmel

Continue parsing a multipath payload as long as another nexthop can fit
in the payload.

# ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1

Before:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1

After:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1
        nexthop dev dummy1 weight 1

Fixes: f48e14880a0e ("iproute: refactor multipath print")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 ip/iproute.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 44351bc51b4b..56dd9f25e38e 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -650,7 +650,7 @@ static void print_rta_multipath(FILE *fp, const struct rtmsg *r,
 	int len = RTA_PAYLOAD(rta);
 	int first = 1;
 
-	while (len > sizeof(*nh)) {
+	while (len >= sizeof(*nh)) {
 		struct rtattr *tb[RTA_MAX + 1];
 
 		if (nh->rtnh_len > len)
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH v2 2/2] net: stmmac: dwmac-meson: extend phy mode setting
From: Martin Blumenstingl @ 2018-05-01 13:20 UTC (permalink / raw)
  To: Yixun Lan
  Cc: David S. Miller, netdev, Kevin Hilman, Carlo Caione, Rob Herring,
	Jerome Brunet, linux-amlogic, linux-arm-kernel, linux-kernel
In-Reply-To: <20180428102111.18384-3-yixun.lan@amlogic.com>

Hello Yixun,

On Sat, Apr 28, 2018 at 12:21 PM, Yixun Lan <yixun.lan@amlogic.com> wrote:
>   In the Meson-AXG SoC, the phy mode setting of PRG_ETH0 in the glue layer
> is extended from bit[0] to bit[2:0].
>   There is no problem if we configure it to the RGMII 1000M PHY mode,
> since the register setting is coincidentally compatible with previous one,
> but for the RMII 100M PHY mode, the configuration need to be changed to
> value - b100.
>   This patch was verified with a RTL8201F 100M ethernet PHY.
>
> Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
> ---
>  .../ethernet/stmicro/stmmac/dwmac-meson8b.c   | 120 +++++++++++++++---
>  1 file changed, 104 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> index 7cb794094a70..4ff231df7322 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> @@ -18,6 +18,7 @@
>  #include <linux/io.h>
>  #include <linux/ioport.h>
>  #include <linux/module.h>
> +#include <linux/of_device.h>
>  #include <linux/of_net.h>
>  #include <linux/mfd/syscon.h>
>  #include <linux/platform_device.h>
> @@ -29,6 +30,10 @@
>
>  #define PRG_ETH0_RGMII_MODE            BIT(0)
>
> +#define PRG_ETH0_EXT_PHY_MODE_MASK     GENMASK(2, 0)
> +#define PRG_ETH0_EXT_RGMII_MODE                1
> +#define PRG_ETH0_EXT_RMII_MODE         4
> +
>  /* mux to choose between fclk_div2 (bit unset) and mpll2 (bit set) */
>  #define PRG_ETH0_CLK_M250_SEL_SHIFT    4
>  #define PRG_ETH0_CLK_M250_SEL_MASK     GENMASK(4, 4)
> @@ -47,12 +52,20 @@
>
>  #define MUX_CLK_NUM_PARENTS            2
>
> +struct meson8b_dwmac;
> +
> +struct meson8b_dwmac_data {
> +       int (*set_phy_mode)(struct meson8b_dwmac *dwmac);
> +};
> +
>  struct meson8b_dwmac {
> -       struct device           *dev;
> -       void __iomem            *regs;
> -       phy_interface_t         phy_mode;
> -       struct clk              *rgmii_tx_clk;
> -       u32                     tx_delay_ns;
> +       struct device                   *dev;
> +       void __iomem                    *regs;
> +
> +       const struct meson8b_dwmac_data *data;
> +       phy_interface_t                 phy_mode;
> +       struct clk                      *rgmii_tx_clk;
> +       u32                             tx_delay_ns;
>  };
>
>  struct meson8b_dwmac_clk_configs {
> @@ -171,6 +184,59 @@ static int meson8b_init_rgmii_tx_clk(struct meson8b_dwmac *dwmac)
>         return 0;
>  }
>
> +static int meson8b_set_phy_mode(struct meson8b_dwmac *dwmac)
> +{
> +       switch (dwmac->phy_mode) {
> +       case PHY_INTERFACE_MODE_RGMII:
> +       case PHY_INTERFACE_MODE_RGMII_RXID:
> +       case PHY_INTERFACE_MODE_RGMII_ID:
> +       case PHY_INTERFACE_MODE_RGMII_TXID:
> +               /* enable RGMII mode */
> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_RGMII_MODE,
> +                                       PRG_ETH0_RGMII_MODE);
> +               break;
> +       case PHY_INTERFACE_MODE_RMII:
> +               /* disable RGMII mode -> enables RMII mode */
> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_RGMII_MODE, 0);
> +               break;
> +       default:
> +               dev_err(dwmac->dev, "fail to set phy-mode %s\n",
> +                       phy_modes(dwmac->phy_mode));
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
> +static int meson_axg_set_phy_mode(struct meson8b_dwmac *dwmac)
> +{
> +       switch (dwmac->phy_mode) {
> +       case PHY_INTERFACE_MODE_RGMII:
> +       case PHY_INTERFACE_MODE_RGMII_RXID:
> +       case PHY_INTERFACE_MODE_RGMII_ID:
> +       case PHY_INTERFACE_MODE_RGMII_TXID:
> +               /* enable RGMII mode */
> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_EXT_PHY_MODE_MASK,
> +                                       PRG_ETH0_EXT_RGMII_MODE);
> +               break;
> +       case PHY_INTERFACE_MODE_RMII:
> +               /* disable RGMII mode -> enables RMII mode */
if you have to re-send it for whatever reason:
maybe you could remove the comments from meson_axg_set_phy_mode. the
"older" register layout requires un-setting RGMII mode to enable RMII
mode. however, for AXG there seem to be two dedicated values (1 and 4)
for each mode

apart from that:
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>

> +               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
> +                                       PRG_ETH0_EXT_PHY_MODE_MASK,
> +                                       PRG_ETH0_EXT_RMII_MODE);
> +               break;
> +       default:
> +               dev_err(dwmac->dev, "fail to set phy-mode %s\n",
> +                       phy_modes(dwmac->phy_mode));
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
>  static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
>  {
>         int ret;
> @@ -188,10 +254,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
>
>         case PHY_INTERFACE_MODE_RGMII_ID:
>         case PHY_INTERFACE_MODE_RGMII_TXID:
> -               /* enable RGMII mode */
> -               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
> -                                       PRG_ETH0_RGMII_MODE);
> -
>                 /* only relevant for RMII mode -> disable in RGMII mode */
>                 meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
>                                         PRG_ETH0_INVERTED_RMII_CLK, 0);
> @@ -224,10 +286,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
>                 break;
>
>         case PHY_INTERFACE_MODE_RMII:
> -               /* disable RGMII mode -> enables RMII mode */
> -               meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
> -                                       0);
> -
>                 /* invert internal clk_rmii_i to generate 25/2.5 tx_rx_clk */
>                 meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
>                                         PRG_ETH0_INVERTED_RMII_CLK,
> @@ -274,6 +332,11 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
>                 goto err_remove_config_dt;
>         }
>
> +       dwmac->data = (const struct meson8b_dwmac_data *)
> +               of_device_get_match_data(&pdev->dev);
> +       if (!dwmac->data)
> +               return -EINVAL;
> +
>         res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>         dwmac->regs = devm_ioremap_resource(&pdev->dev, res);
>         if (IS_ERR(dwmac->regs)) {
> @@ -298,6 +361,10 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
>         if (ret)
>                 goto err_remove_config_dt;
>
> +       ret = dwmac->data->set_phy_mode(dwmac);
> +       if (ret)
> +               goto err_remove_config_dt;
> +
>         ret = meson8b_init_prg_eth(dwmac);
>         if (ret)
>                 goto err_remove_config_dt;
> @@ -316,10 +383,31 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
>         return ret;
>  }
>
> +static const struct meson8b_dwmac_data meson8b_dwmac_data = {
> +       .set_phy_mode = meson8b_set_phy_mode,
> +};
> +
> +static const struct meson8b_dwmac_data meson_axg_dwmac_data = {
> +       .set_phy_mode = meson_axg_set_phy_mode,
> +};
> +
>  static const struct of_device_id meson8b_dwmac_match[] = {
> -       { .compatible = "amlogic,meson8b-dwmac" },
> -       { .compatible = "amlogic,meson8m2-dwmac" },
> -       { .compatible = "amlogic,meson-gxbb-dwmac" },
> +       {
> +               .compatible = "amlogic,meson8b-dwmac",
> +               .data = &meson8b_dwmac_data,
> +       },
> +       {
> +               .compatible = "amlogic,meson8m2-dwmac",
> +               .data = &meson8b_dwmac_data,
> +       },
> +       {
> +               .compatible = "amlogic,meson-gxbb-dwmac",
> +               .data = &meson8b_dwmac_data,
> +       },
> +       {
> +               .compatible = "amlogic,meson-axg-dwmac",
> +               .data = &meson_axg_dwmac_data,
> +       },
>         { }
>  };
>  MODULE_DEVICE_TABLE(of, meson8b_dwmac_match);
> --
> 2.17.0
>

^ permalink raw reply

* Re: [PATCH] ipv6: Allow non-gateway ECMP for IPv6
From: Ido Schimmel @ 2018-05-01 13:20 UTC (permalink / raw)
  To: David Ahern
  Cc: Thomas Winter, netdev, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI
In-Reply-To: <89497565-f1e6-a916-70d3-dfc7efa7a7e4@gmail.com>

On Mon, Apr 30, 2018 at 08:59:10PM -0600, David Ahern wrote:
> On 4/30/18 3:15 PM, Thomas Winter wrote:
> > It is valid to have static routes where the nexthop
> > is an interface not an address such as tunnels.
> > For IPv4 it was possible to use ECMP on these routes
> > but not for IPv6.
> > 
> > Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > ---
> >  include/net/ip6_route.h | 3 +--
> >  net/ipv6/ip6_fib.c      | 3 ---
> >  2 files changed, 1 insertion(+), 5 deletions(-)
> > 
> 
> Interesting. Existing code inserts the dev nexthop as a separate route.
> 
> Change looks good to me.
> 
> Acked-by: David Ahern <dsahern@gmail.com>

Thanks for the Cc, David. I'll need to adjust mlxsw to support this.
Specifically, mlxsw_sp_fib6_rt_can_mp().

BTW, I hit this bug while looking into this:
https://patchwork.ozlabs.org/patch/907050/

^ permalink raw reply

* Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2018-04-30
From: David Miller @ 2018-05-01 13:38 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 30 Apr 2018 10:00:50 -0700

> This series contains updates to i40e and i40evf only.
> 
> Jia-Ju Bai replaces an instance of GFP_ATOMIC to GFP_KERNEL, since
> i40evf is not in atomic context when i40evf_add_vlan() is called.
> 
> Jake cleans up function header comments to ensure that the function
> parameter comments actually match the function parameters.  Fixed a
> possible overflow error in the PTP clock code.  Fixed warnings regarding
> restricted __be32 type usage.
> 
> Mariusz fixes the reading of the LLDP configuration, which moves from
> using relative values to calculating the absolute address.
> 
> Jakub adds a check for 10G LR mode for i40e.
> 
> Paweł fixes an issue, where changing the MTU would turn on TSO, GSO and
> GRO.
> 
> Alex fixes a couple of issues with the UDP tunnel filter configuration.
> First being that the tunnels did not have mutual exclusion in place to
> prevent a race condition between a user request to add/remove a port and
> an update.  The second issue was we were deleting filters that were not
> associated with the actual filter we wanted to delete.
> 
> Harshitha ensures that the queue map sent by the VF is taken into
> account when enabling/disabling queues in the VF VSI.
> 
> The following are changes since commit 76c2a96d42ca3bdac12c463ff27fec3bb2982e3f:
>   liquidio: fix spelling mistake: "mac_tx_multi_collison" -> "mac_tx_multi_collision"
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH  2/8] dt-bindings: stm32-dwmac: add support of MPU families
From: Rob Herring @ 2018-05-01 13:58 UTC (permalink / raw)
  To: Christophe Roullier
  Cc: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro,
	devicetree, linux-arm-kernel, netdev
In-Reply-To: <1524582120-4451-3-git-send-email-christophe.roullier@st.com>

On Tue, Apr 24, 2018 at 05:01:54PM +0200, Christophe Roullier wrote:
> Add description for Ethernet MPU families fields
> 
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
>  Documentation/devicetree/bindings/net/stm32-dwmac.txt | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/stm32-dwmac.txt b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> index 489dbcb..e9d1c4a 100644
> --- a/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> @@ -6,14 +6,26 @@ Please see stmmac.txt for the other unchanged properties.
>  The device node has following properties.
>  
>  Required properties:
> -- compatible:  Should be "st,stm32-dwmac" to select glue, and
> +- compatible:  For MCU family should be "st,stm32-dwmac" to select glue, and
>  	       "snps,dwmac-3.50a" to select IP version.
> +	       For MPU family should be "st,stm32mp1-dwmac" to select
> +	       glue, and "snps,dwmac-4.20a" to select IP version.
>  - clocks: Must contain a phandle for each entry in clock-names.
>  - clock-names: Should be "stmmaceth" for the host clock.
>  	       Should be "mac-clk-tx" for the MAC TX clock.
>  	       Should be "mac-clk-rx" for the MAC RX clock.
> +	       For MPU family "ethstp" for power mode clock.
> +	       For MPU family need also "syscfg-clk" for SYSCFG clock.

These are in addition or instead of the first 3 clocks.

> +- interrupt-names: Should contain a list of interrupt names corresponding to
> +           the interrupts in the interrupts property, if available.

You need to list the names. Seems unrelated to MPU support.

>  - st,syscon : Should be phandle/offset pair. The phandle to the syscon node which
> -	      encompases the glue register, and the offset of the control register.
> +	       encompases the glue register, and the offset of the control register.
> +
> +Optional properties:
> +- clock-names:     For MPU family "mac-clk-ck" for PHY without quartz

The clock is always connected whether you use it or not, right? So it 
shouldn't be optional based on use.

> +- st,int-phyclk :  valid only where PHY do not have quartz and need to be clock
> +	           by RCC

Boolean?


> +
>  Example:
>  
>  	ethernet@40028000 {
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH  8/8] dt-bindings: stm32: add compatible for syscon
From: Rob Herring @ 2018-05-01 14:01 UTC (permalink / raw)
  To: Christophe Roullier
  Cc: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro,
	devicetree, linux-arm-kernel, netdev
In-Reply-To: <1524582120-4451-9-git-send-email-christophe.roullier@st.com>

On Tue, Apr 24, 2018 at 05:02:00PM +0200, Christophe Roullier wrote:
> This patch describes syscon DT bindings.
> 
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
>  Documentation/devicetree/bindings/arm/stm32.txt | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/arm/stm32.txt b/Documentation/devicetree/bindings/arm/stm32.txt
> index 6808ed9..a871a78 100644
> --- a/Documentation/devicetree/bindings/arm/stm32.txt
> +++ b/Documentation/devicetree/bindings/arm/stm32.txt
> @@ -8,3 +8,10 @@ using one of the following compatible strings:
>    st,stm32f746
>    st,stm32h743
>    st,stm32mp157
> +
> +Required nodes:
> +
> +- syscon: some subnode of the STM32 SoC node must be a
> +  system controller node pointing to the control registers,
> +  with the compatible string set to one of these tuples:
> +  "st,stm32-syscfg", "syscon"

This should be a separate file.

I'd guess the syscfg registers differ from SoC to SoC, so you need more 
specific compatible strings.

Rob

^ permalink raw reply

* [PATCH V3 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: Soheil Hassas Yeganeh @ 2018-05-01 14:11 UTC (permalink / raw)
  To: davem, netdev; +Cc: ycheng, ncardwell, edumazet, willemb, Soheil Hassas Yeganeh

From: Soheil Hassas Yeganeh <soheil@google.com>

Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.

The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.

Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.

Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.

With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.

V3 change-log:
	As suggested by David Miller, added loads with barrier
	to check whether we have multiple threads calling
	recvmsg in parallel. When that happens we lock the
	socket to calculate inq.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
---
 include/linux/tcp.h      |  2 +-
 include/uapi/linux/tcp.h |  3 +++
 net/ipv4/tcp.c           | 43 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20585d5c4e1c3..807776928cb86 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -228,7 +228,7 @@ struct tcp_sock {
 		unused:2;
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		thin_lto    : 1,/* Use linear timeouts for thin streams */
-		unused1	    : 1,
+		recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
 		repair      : 1,
 		frto        : 1;/* F-RTO (RFC5682) activated in CA_Loss */
 	u8	repair_queue;
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index e9e8373b34b9d..29eb659aa77a1 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -123,6 +123,9 @@ enum {
 #define TCP_FASTOPEN_KEY	33	/* Set the key for Fast Open (cookie) */
 #define TCP_FASTOPEN_NO_COOKIE	34	/* Enable TFO without a TFO cookie */
 #define TCP_ZEROCOPY_RECEIVE	35
+#define TCP_INQ			36	/* Notify bytes available to read as a cmsg on read */
+
+#define TCP_CM_INQ		TCP_INQ
 
 struct tcp_repair_opt {
 	__u32	opt_code;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4028ddd14dd5a..ca7365db59dff 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1889,6 +1889,22 @@ static void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
 	}
 }
 
+static inline int tcp_inq_hint(struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	u32 copied_seq = READ_ONCE(tp->copied_seq);
+	u32 rcv_nxt = READ_ONCE(tp->rcv_nxt);
+	int inq;
+
+	inq = rcv_nxt - copied_seq;
+	if (unlikely(inq < 0 || copied_seq != READ_ONCE(tp->copied_seq))) {
+		lock_sock(sk);
+		inq = tp->rcv_nxt - tp->copied_seq;
+		release_sock(sk);
+	}
+	return inq;
+}
+
 /*
  *	This routine copies from a sock struct into the user buffer.
  *
@@ -1905,13 +1921,14 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	u32 peek_seq;
 	u32 *seq;
 	unsigned long used;
-	int err;
+	int err, inq;
 	int target;		/* Read at least this many bytes */
 	long timeo;
 	struct sk_buff *skb, *last;
 	u32 urg_hole = 0;
 	struct scm_timestamping tss;
 	bool has_tss = false;
+	bool has_cmsg;
 
 	if (unlikely(flags & MSG_ERRQUEUE))
 		return inet_recv_error(sk, msg, len, addr_len);
@@ -1926,6 +1943,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	if (sk->sk_state == TCP_LISTEN)
 		goto out;
 
+	has_cmsg = tp->recvmsg_inq;
 	timeo = sock_rcvtimeo(sk, nonblock);
 
 	/* Urgent data needs to be handled specially. */
@@ -2112,6 +2130,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		if (TCP_SKB_CB(skb)->has_rxtstamp) {
 			tcp_update_recv_tstamps(skb, &tss);
 			has_tss = true;
+			has_cmsg = true;
 		}
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			goto found_fin_ok;
@@ -2131,13 +2150,20 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
 
-	if (has_tss)
-		tcp_recv_timestamp(msg, sk, &tss);
-
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
 
 	release_sock(sk);
+
+	if (has_cmsg) {
+		if (has_tss)
+			tcp_recv_timestamp(msg, sk, &tss);
+		if (tp->recvmsg_inq) {
+			inq = tcp_inq_hint(sk);
+			put_cmsg(msg, SOL_TCP, TCP_CM_INQ, sizeof(inq), &inq);
+		}
+	}
+
 	return copied;
 
 out:
@@ -3006,6 +3032,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		tp->notsent_lowat = val;
 		sk->sk_write_space(sk);
 		break;
+	case TCP_INQ:
+		if (val > 1 || val < 0)
+			err = -EINVAL;
+		else
+			tp->recvmsg_inq = val;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -3431,6 +3463,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 	case TCP_NOTSENT_LOWAT:
 		val = tp->notsent_lowat;
 		break;
+	case TCP_INQ:
+		val = tp->recvmsg_inq;
+		break;
 	case TCP_SAVE_SYN:
 		val = tp->save_syn;
 		break;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox