Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH][net-next] vxlan: reduce dirty cache line in vxlan_find_mac
From: Li RongQing @ 2018-08-19  3:36 UTC (permalink / raw)
  To: netdev

vxlan_find_mac() unconditionally set f->used for every packet,
this cause a cache miss for every packet, since remote, hlist
and used of vxlan_fdb share the same cacheline.

With this change f->used is set only if not equal to jiffies
This gives up to 5% speed-up with small packets.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ababba37d735..e5d236595206 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -464,7 +464,7 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
 	struct vxlan_fdb *f;
 
 	f = __vxlan_find_mac(vxlan, mac, vni);
-	if (f)
+	if (f && f->used != jiffies)
 		f->used = jiffies;
 
 	return f;
-- 
2.16.2

^ permalink raw reply related

* Re: [RFC 0/1] Appletalk AARP probe broken by receipt of own broadcasts.
From: Craig McGeachie @ 2018-08-19  2:09 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David S. Miller, netdev, Craig McGeachie
In-Reply-To: <20180819013244.GA8950@lunn.ch>

On 19/08/18 13:32, Andrew Lunn wrote:
> On Sun, Aug 19, 2018 at 01:07:38PM +1200, Craig McGeachie wrote:
>> I'm hoping I can find someone able and willing to test this patch. That
>> requires someone still using netatalk 2.2.x with DDP, or some other DDP
>> userspace application. This feels like a longshot.
>>
>> When netatalk 2.2.x starts up with DDP and sets the Appletalk node
>> address, the kernel AARP code sends a probe packet for the address. It
>> then receives its own probe packet and interprets that as some other
>> node also trying to claim the address. It increments the address, tries
>> again, and fails again ad nausium. Eventually the kernel module gives up
>> and returns to netatalk which terminates with an error that it cannot
>> get a node address.
> 
> Hi Craig
> 
> What Ethernet device are you seeing this problem with?
> 
> I'm not sure an Ethernet device should receive its own broadcasts.
> This might be a driver bug, not an AARP bug.
> 
>       Andrew
> 

I run inside Virtualbox with the Realtek PCIe GBE Family Controller.

Assuming I'm reading /sys/class/net/enp0s3/driver correctly, it's using 
the e1000 driver.

However, it might not be the ethernet driver's fault. I've been a bit 
loose with terminology. Appletalk AARP probe packets aren't ethernet 
broadcasts as such; they're multicast packets, via the psnap driver, to 
hardware address 09:00:07:ff:ff:ff.

Cheers,
Craig.

^ permalink raw reply

* Re: [RFC 0/1] Appletalk AARP probe broken by receipt of own broadcasts.
From: Andrew Lunn @ 2018-08-19  1:32 UTC (permalink / raw)
  To: Craig McGeachie; +Cc: David S. Miller, netdev, Craig McGeachie
In-Reply-To: <20180819010739.26975-1-slapdau@gmail.com>

On Sun, Aug 19, 2018 at 01:07:38PM +1200, Craig McGeachie wrote:
> I'm hoping I can find someone able and willing to test this patch. That
> requires someone still using netatalk 2.2.x with DDP, or some other DDP
> userspace application. This feels like a longshot.
> 
> When netatalk 2.2.x starts up with DDP and sets the Appletalk node
> address, the kernel AARP code sends a probe packet for the address. It
> then receives its own probe packet and interprets that as some other
> node also trying to claim the address. It increments the address, tries
> again, and fails again ad nausium. Eventually the kernel module gives up
> and returns to netatalk which terminates with an error that it cannot
> get a node address.

Hi Craig

What Ethernet device are you seeing this problem with?

I'm not sure an Ethernet device should receive its own broadcasts.
This might be a driver bug, not an AARP bug.

     Andrew

^ permalink raw reply

* [RFC 1/1] appletalk: ignore aarp probe broadcasts that loopback.
From: Craig McGeachie @ 2018-08-19  1:07 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Craig McGeachie, Craig McGeachie
In-Reply-To: <20180819010739.26975-1-slapdau@gmail.com>

AARP probe packets are broadcast to dynamically discover if an Appletalk
node address is in use. The definition of AARP requires interpreting the
receipt of probe requests for the address being tested as the address being
in use, and then trying the next address. However, the node receives its own
Ethertalk broadcast, and incorrectly interprets that as another node trying
to claim the address.

Signed-off-by: Craig McGeachie <slapdau@gmail.com>
---
 net/appletalk/aarp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/appletalk/aarp.c b/net/appletalk/aarp.c
index 49a16cee..f966cc01 100644
--- a/net/appletalk/aarp.c
+++ b/net/appletalk/aarp.c
@@ -757,6 +757,10 @@ static int aarp_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (!ifa)
 		goto out1;

+	/* Ignore packets from myself. */
+	if (ether_addr_equal(ea->hw_src, dev->dev_addr))
+		goto out1;
+
 	if (ifa->status & ATIF_PROBE &&
 	    ifa->address.s_node == ea->pa_dst_node &&
 	    ifa->address.s_net == ea->pa_dst_net) {
-- 
2.17.1

^ permalink raw reply related

* [RFC 0/1] Appletalk AARP probe broken by receipt of own broadcasts.
From: Craig McGeachie @ 2018-08-19  1:07 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Craig McGeachie, Craig McGeachie

I'm hoping I can find someone able and willing to test this patch. That
requires someone still using netatalk 2.2.x with DDP, or some other DDP
userspace application. This feels like a longshot.

When netatalk 2.2.x starts up with DDP and sets the Appletalk node
address, the kernel AARP code sends a probe packet for the address. It
then receives its own probe packet and interprets that as some other
node also trying to claim the address. It increments the address, tries
again, and fails again ad nausium. Eventually the kernel module gives up
and returns to netatalk which terminates with an error that it cannot
get a node address.

Well, most of the time. There seems to be some sort of race condition
where occasionally a self collision won't happen. Restart netatalk
enough times and it will probably work.

The device Ethernet MAC address is copied into the AARP packet, so the
fix is to disregard all received packets that have a sender address that
matches the device hardware address. This is more than just probe
packets, but there is no legitimate situation where an Appletalk node
sends AARP packets to itself.

Craig McGeachie (1):
  appletalk: ignore aarp probe broadcasts that loopback.

 net/appletalk/aarp.c | 4 ++++
 1 file changed, 4 insertions(+)

-- 
2.17.1

^ permalink raw reply

* financial support
From: Xiang, Feifei @ 2018-08-19  0:00 UTC (permalink / raw)




-- 
donation of $3 to you, contact hams.fly@hotmail.com to verefly

^ permalink raw reply

* Re: [PATCH ethtool] ethtool: document WoL filters option also in help message
From: John W. Linville @ 2018-08-18 21:35 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Michal Kubecek, netdev
In-Reply-To: <430765a1-adde-a5bc-5c96-213e85d5c093@gmail.com>

On Fri, Aug 17, 2018 at 10:51:31AM -0700, Florian Fainelli wrote:
> On 08/17/2018 06:21 AM, Michal Kubecek wrote:
> > Commit eff0bb337223 ("ethtool: Add support for WAKE_FILTER (WoL using
> > filters)") added option "f" for wake on lan and documented it in man page
> > but not in the output of "ethtool --help".
> > 
> > Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> 
> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Thanks Michal!

Yes, good eye!

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH v2] net: phy: add tracepoints
From: David Miller @ 2018-08-18 21:01 UTC (permalink / raw)
  To: tobias; +Cc: netdev, andrew, f.fainelli, rostedt, mingo, linux-kernel
In-Reply-To: <20180816152452.7834-1-tobias@waldekranz.com>

From: Tobias Waldekranz <tobias@waldekranz.com>
Date: Thu, 16 Aug 2018 17:24:48 +0200

> Two tracepoints for now:
> 
>    * `phy_interrupt` Pretty self-explanatory.
> 
>    * `phy_state_change` Whenever the PHY's state machine is run, trace
>      the old and the new state.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> v1 -> v2:
>    * Actually include the entire patch, based on current net-next.
>    * Pretty print the PHY's state according to Steven's suggestion.

Please resubmit this when the net-next tree opens back up.

Thank you.

^ permalink raw reply

* KASAN: use-after-free Read in tipc_group_fill_sock_diag
From: syzbot @ 2018-08-19  0:10 UTC (permalink / raw)
  To: davem, jon.maloy, linux-kernel, netdev, syzkaller-bugs,
	tipc-discussion, ying.xue

Hello,

syzbot found the following crash on:

HEAD commit:    a18d783fedfe Merge tag 'driver-core-4.19-rc1' of git://git..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14bcc936400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7eeabb17332898c0
dashboard link: https://syzkaller.appspot.com/bug?extid=b9c8f3ab2994b7cd1625
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1374ec4a400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
==================================================================
BUG: KASAN: use-after-free in tipc_group_fill_sock_diag+0x7b9/0x84b  
net/tipc/group.c:921
Read of size 4 at addr ffff8801ceb5565c by task syz-executor0/4838

CPU: 1 PID: 4838 Comm: syz-executor0 Not tainted 4.18.0+ #196
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
  tipc_group_fill_sock_diag+0x7b9/0x84b net/tipc/group.c:921
  tipc_sk_fill_sock_diag+0x9f8/0xdb0 net/tipc/socket.c:3322
  __tipc_add_sock_diag+0x22f/0x360 net/tipc/diag.c:62
  tipc_nl_sk_walk+0x68d/0xd30 net/tipc/socket.c:3249
  tipc_diag_dump+0x24/0x30 net/tipc/diag.c:73
  netlink_dump+0x519/0xd50 net/netlink/af_netlink.c:2233
  __netlink_dump_start+0x4f1/0x6f0 net/netlink/af_netlink.c:2329
  netlink_dump_start include/linux/netlink.h:213 [inline]
  tipc_sock_diag_handler_dump+0x234/0x340 net/tipc/diag.c:89
  __sock_diag_cmd net/core/sock_diag.c:232 [inline]
  sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
  netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
  sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
  netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
  netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
  netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:631
  ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
  __sys_sendmsg+0x11d/0x290 net/socket.c:2152
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg net/socket.c:2159 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457089
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3a5d506c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f3a5d5076d4 RCX: 0000000000457089
RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000006
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d4088 R14: 00000000004c8ab0 R15: 0000000000000000

Allocated by task 4838:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
  kmalloc include/linux/slab.h:513 [inline]
  kzalloc include/linux/slab.h:707 [inline]
  tipc_group_create+0x155/0xa70 net/tipc/group.c:171
  tipc_sk_join net/tipc/socket.c:2766 [inline]
  tipc_setsockopt+0x2d1/0xd70 net/tipc/socket.c:2881
  __sys_setsockopt+0x1c5/0x3b0 net/socket.c:1900
  __do_sys_setsockopt net/socket.c:1911 [inline]
  __se_sys_setsockopt net/socket.c:1908 [inline]
  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1908
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4837:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x260 mm/slab.c:3813
  tipc_group_delete+0x2e5/0x3f0 net/tipc/group.c:227
  tipc_sk_leave+0x113/0x220 net/tipc/socket.c:2800
  tipc_release+0x14e/0x12b0 net/tipc/socket.c:574
  __sock_release+0xd7/0x250 net/socket.c:579
  sock_close+0x19/0x20 net/socket.c:1139
  __fput+0x39b/0x860 fs/file_table.c:251
  ____fput+0x15/0x20 fs/file_table.c:282
  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:193 [inline]
  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ceb55600
  which belongs to the cache kmalloc-192 of size 192
The buggy address is located 92 bytes inside of
  192-byte region [ffff8801ceb55600, ffff8801ceb556c0)
The buggy address belongs to the page:
page:ffffea00073ad540 count:1 mapcount:0 mapping:ffff8801dac00040 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea00073a5088 ffffea00073ae5c8 ffff8801dac00040
raw: 0000000000000000 ffff8801ceb55000 0000000100000010 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ceb55500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801ceb55580: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801ceb55600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                     ^
  ffff8801ceb55680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  ffff8801ceb55700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH]ipv6: multicast: In mld_send_cr function moving read lock to second for loop
From: David Miller @ 2018-08-18 20:58 UTC (permalink / raw)
  To: guru2018; +Cc: netdev, kuznet, yoshfuji
In-Reply-To: <CAHSpA5-ivVaQnQU7_ikt=BV-EAwFJFaA6ErpQ1Jn1HmPmXZapg@mail.gmail.com>

From: Guruswamy Basavaiah <guru2018@gmail.com>
Date: Fri, 17 Aug 2018 18:01:41 +0530

> @@ -1860,7 +1860,6 @@ static void mld_send_cr(struct inet6_dev *idev)
>      struct sk_buff *skb = NULL;
>      int type, dtype;
> 
> -    read_lock_bh(&idev->lock);
>      spin_lock(&idev->mc_lock);
> 
>      /* deleted MCA's */

This will lead to deadlocks, idev->mc_lock must be taken with _bh().

I have zero confidence in this change, did you do any stress testing
with lockdep enabled?  It would have caught this quickly.

^ permalink raw reply

* Re: how to (cross)connect two (physical) eth ports for ping test?
From: Willy Tarreau @ 2018-08-18 20:45 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Robert P. J. Day, Linux kernel netdev mailing list
In-Reply-To: <20180818191025.GA11187@lunn.ch>

On Sat, Aug 18, 2018 at 09:10:25PM +0200, Andrew Lunn wrote:
> On Sat, Aug 18, 2018 at 01:39:50PM -0400, Robert P. J. Day wrote:
> > 
> >   (i'm sure this has been explained many times before, so a link
> > covering this will almost certainly do just fine.)
> > 
> >   i want to loop one physical ethernet port into another, and just
> > ping the daylights from one to the other for stress testing. my fedora
> > laptop doesn't actually have two unused ethernet ports, so i just want
> > to emulate this by slapping a couple startech USB/net adapters into
> > two empty USB ports, setting this up, then doing it all over again
> > monday morning on the actual target system, which does have multiple
> > ethernet ports.
> > 
> >   so if someone can point me to the recipe, that would be great and
> > you can stop reading.
> > 
> >   as far as my tentative solution goes, i assume i need to put at
> > least one of the physical ports in a network namespace via "ip netns",
> > then ping from the netns to the root namespace. or, going one step
> > further, perhaps putting both interfaces into two new namespaces, and
> > setting up forwarding.
> 
> Namespaces is a good solution. Something like this should work:
> 
> ip netns add namespace1
> ip netns add namespace2
> 
> ip link set eth1 netns namespace1
> ip link set eth2 netns namespace2
> 
> ip netns exec namespace1 \
>         ip addr add 10.42.42.42/24 dev eth1
> 
> ip netns exec namespace1 \
>         ip link set eth1 up
> 
> ip netns exec namespace2 \
>         ip addr add 10.42.42.24/24 dev eth2
> 
> ip netns exec namespace2 \
>         ip link set eth2 up
> 
> ip netns exec namespace1 \
>         ping 10.42.42.24
> 
> You might also want to consider iperf3 for stress testing, depending
> on the sort of stress you need.

FWIW I have a setup somewhere involving ip rule + ip route which achieves
the same without involving namespaces. It's a bit hackish but sometimes
convenient. I can dig if someone is interested.

Regards,
Willy

^ permalink raw reply

* Re: [bpf-next RFC 2/3] flow_dissector: implements eBPF parser
From: Willem de Bruijn @ 2018-08-18 19:49 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Petar Penkov, Network Development, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman, Petar Penkov,
	Willem de Bruijn
In-Reply-To: <CALx6S34gBQbpN1rjsC7jWYFMpzW-T8EUnX-82um7fDiFHLyysQ@mail.gmail.com>

On Sat, Aug 18, 2018 at 11:56 AM Tom Herbert <tom@herbertland.com> wrote:
>
> On Thu, Aug 16, 2018 at 9:44 AM, Petar Penkov <peterpenkov96@gmail.com> wrote:
> > From: Petar Penkov <ppenkov@google.com>
> >
> > This eBPF program extracts basic/control/ip address/ports keys from
> > incoming packets. It supports recursive parsing for IP
> > encapsulation, MPLS, GUE, and VLAN, along with IPv4/IPv6 and extension
> > headers. This program is meant to show how flow dissection and key
> > extraction can be done in eBPF.
> >
> > It is initially meant to be used for demonstration rather than as a
> > complete replacement of the existing flow dissector.
> >
> > This includes parsing of GUE and MPLS payload, which cannot be done
> > in production in general, as GUE tunnels and MPLS payloads cannot
> > unambiguously be detected in general.
> >
> > In closed environments, however, it can be enabled. Another example
> > where the programmability of BPF aids flow dissection.

> > +static __always_inline int write_ports(struct __sk_buff *skb, __u8 proto)
> > +{
> > +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> > +       struct flow_dissector_key_ports ports;
> > +
> > +       /* The supported protocols always start with the ports */
> > +       if (bpf_skb_load_bytes(skb, cb->nhoff, &ports, sizeof(ports)))
> > +               return BPF_DROP;
> > +
> > +       if (proto == IPPROTO_UDP && ports.dst == bpf_htons(GUE_PORT)) {
> > +               /* GUE encapsulation */
> > +               cb->nhoff += sizeof(struct udphdr);
> > +               bpf_tail_call(skb, &jmp_table, GUE);
> > +               return BPF_DROP;
>
> It's a nice sentiment to support GUE, but this really isn't the right
> way to do it.

Yes, this was just for demonstration purposes. The same for
unconditionally parsing MPLS payload as IP.

Though note the point in the commit message that within a closed
network with fixed reserved GUE ports, a custom BPF program
like this could be sufficient. That's true not only for UDP tunnels.

> What would be much better is a means to generically
> support all the various UDP encapsulations like GUE, VXLAN, Geneve,
> GRE/UDP, MPLS/UDP, etc. I think there's two ways to do that:
>
> 1) A UDP socket lookup that returns an encapsulation socket containing
> a flow dissector function that can be called. This is the safest
> method because of the UDP are reserved numbers problem. I implement
> this in kernel flow dissector, not upstreamed though.

Yes, similar to udp_gro_receive. Socket lookup is not free, however,
and this is a relatively rarely used feature.

I want to move the one in udp_gro_receive behind a static key.
udp_encap_needed_key is the likely target. Then the same can
eventually be done for flow dissection inside UDP tunnels.

> 2) Create a lookup table based on destination port that returns the
> flow dissector function to call. This doesn't have the socket lookup
> so it isn't quite as robust as the socket lookup. But, at least it's a
> generic interface and programmable so it might be appropriate in the
> BPF flow dissector case.

Option 1 sounds preferable to me.

^ permalink raw reply

* Re: how to (cross)connect two (physical) eth ports for ping test?
From: Andrew Lunn @ 2018-08-18 19:10 UTC (permalink / raw)
  To: Robert P. J. Day; +Cc: Linux kernel netdev mailing list
In-Reply-To: <alpine.LFD.2.21.1808181332210.7716@localhost.localdomain>

On Sat, Aug 18, 2018 at 01:39:50PM -0400, Robert P. J. Day wrote:
> 
>   (i'm sure this has been explained many times before, so a link
> covering this will almost certainly do just fine.)
> 
>   i want to loop one physical ethernet port into another, and just
> ping the daylights from one to the other for stress testing. my fedora
> laptop doesn't actually have two unused ethernet ports, so i just want
> to emulate this by slapping a couple startech USB/net adapters into
> two empty USB ports, setting this up, then doing it all over again
> monday morning on the actual target system, which does have multiple
> ethernet ports.
> 
>   so if someone can point me to the recipe, that would be great and
> you can stop reading.
> 
>   as far as my tentative solution goes, i assume i need to put at
> least one of the physical ports in a network namespace via "ip netns",
> then ping from the netns to the root namespace. or, going one step
> further, perhaps putting both interfaces into two new namespaces, and
> setting up forwarding.

Namespaces is a good solution. Something like this should work:

ip netns add namespace1
ip netns add namespace2

ip link set eth1 netns namespace1
ip link set eth2 netns namespace2

ip netns exec namespace1 \
        ip addr add 10.42.42.42/24 dev eth1

ip netns exec namespace1 \
        ip link set eth1 up

ip netns exec namespace2 \
        ip addr add 10.42.42.24/24 dev eth2

ip netns exec namespace2 \
        ip link set eth2 up

ip netns exec namespace1 \
        ping 10.42.42.24

You might also want to consider iperf3 for stress testing, depending
on the sort of stress you need.

   Andrew

^ permalink raw reply

* how to (cross)connect two (physical) eth ports for ping test?
From: Robert P. J. Day @ 2018-08-18 17:39 UTC (permalink / raw)
  To: Linux kernel netdev mailing list

  (i'm sure this has been explained many times before, so a link
covering this will almost certainly do just fine.)

  i want to loop one physical ethernet port into another, and just
ping the daylights from one to the other for stress testing. my fedora
laptop doesn't actually have two unused ethernet ports, so i just want
to emulate this by slapping a couple startech USB/net adapters into
two empty USB ports, setting this up, then doing it all over again
monday morning on the actual target system, which does have multiple
ethernet ports.

  so if someone can point me to the recipe, that would be great and
you can stop reading.

  as far as my tentative solution goes, i assume i need to put at
least one of the physical ports in a network namespace via "ip netns",
then ping from the netns to the root namespace. or, going one step
further, perhaps putting both interfaces into two new namespaces, and
setting up forwarding.

  anyway, a recipe for this would be just ducky. thank you kindly.

rday

-- 

========================================================================
Robert P. J. Day                                 Ottawa, Ontario, CANADA
                  http://crashcourse.ca/dokuwiki

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================

^ permalink raw reply

* Re: [PATCH] ip6_vti: simplify stats handling in vti6_xmit
From: David Miller @ 2018-08-18 20:47 UTC (permalink / raw)
  To: yanhaishuang; +Cc: steffen.klassert, kuznet, netdev, linux-kernel
In-Reply-To: <1534603428-30425-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 18 Aug 2018 22:43:48 +0800

> Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit
> code path.
> 
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: nixge: Add support for 64-bit platforms
From: David Miller @ 2018-08-18 17:04 UTC (permalink / raw)
  To: mdf; +Cc: keescook, netdev, alex.williams, moritz.fischer, f.fainelli
In-Reply-To: <20180816190706.11334-1-mdf@kernel.org>

From: Moritz Fischer <mdf@kernel.org>
Date: Thu, 16 Aug 2018 12:07:06 -0700

> Add support for 64-bit platforms to driver.
> 
> The hardware only supports 32-bit register accesses
> so the accesses need to be split up into two writes
> when setting the current and tail descriptor values.
> 
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Moritz Fischer <mdf@kernel.org>

Please resubmit when the net-next tree opens back up.

Thank you.

^ permalink raw reply

* Re: pull-request: bpf 2018-08-18
From: David Miller @ 2018-08-18 17:03 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20180817232920.8608-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat, 18 Aug 2018 01:29:20 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
 ...
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH 00/15] Netfilter/IPVS fixes for net
From: David Miller @ 2018-08-18 17:01 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20180817193850.2796-1-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 17 Aug 2018 21:38:35 +0200

> The following patchset contains Netfilter/IPVS fixes for your net tree:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks.

^ permalink raw reply

* Re: [bpf-next RFC 2/3] flow_dissector: implements eBPF parser
From: Tom Herbert @ 2018-08-18 15:50 UTC (permalink / raw)
  To: Petar Penkov
  Cc: Linux Kernel Network Developers, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, Simon Horman, Petar Penkov,
	Willem de Bruijn
In-Reply-To: <20180816164423.14368-3-peterpenkov96@gmail.com>

On Thu, Aug 16, 2018 at 9:44 AM, Petar Penkov <peterpenkov96@gmail.com> wrote:
> From: Petar Penkov <ppenkov@google.com>
>
> This eBPF program extracts basic/control/ip address/ports keys from
> incoming packets. It supports recursive parsing for IP
> encapsulation, MPLS, GUE, and VLAN, along with IPv4/IPv6 and extension
> headers. This program is meant to show how flow dissection and key
> extraction can be done in eBPF.
>
> It is initially meant to be used for demonstration rather than as a
> complete replacement of the existing flow dissector.
>
> This includes parsing of GUE and MPLS payload, which cannot be done
> in production in general, as GUE tunnels and MPLS payloads cannot
> unambiguously be detected in general.
>
> In closed environments, however, it can be enabled. Another example
> where the programmability of BPF aids flow dissection.
>
> Link: http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  tools/testing/selftests/bpf/Makefile   |   2 +-
>  tools/testing/selftests/bpf/bpf_flow.c | 542 +++++++++++++++++++++++++
>  2 files changed, 543 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/bpf_flow.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index fff7fb1285fc..e65f50f9185e 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
>         test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
>         test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
>         get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
> -       test_skb_cgroup_id_kern.o
> +       test_skb_cgroup_id_kern.o bpf_flow.o
>
>  # Order correspond to 'make run_tests' order
>  TEST_PROGS := test_kmod.sh \
> diff --git a/tools/testing/selftests/bpf/bpf_flow.c b/tools/testing/selftests/bpf/bpf_flow.c
> new file mode 100644
> index 000000000000..9c11c644b713
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/bpf_flow.c
> @@ -0,0 +1,542 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <stddef.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <linux/pkt_cls.h>
> +#include <linux/bpf.h>
> +#include <linux/in.h>
> +#include <linux/if_ether.h>
> +#include <linux/icmp.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include <linux/tcp.h>
> +#include <linux/udp.h>
> +#include <linux/if_packet.h>
> +#include <sys/socket.h>
> +#include <linux/if_tunnel.h>
> +#include <linux/mpls.h>
> +#include "bpf_helpers.h"
> +#include "bpf_endian.h"
> +
> +int _version SEC("version") = 1;
> +#define PROG(F) SEC(#F) int bpf_func_##F
> +
> +/* These are the identifiers of the BPF programs that will be used in tail
> + * calls. Name is limited to 16 characters, with the terminating character and
> + * bpf_func_ above, we have only 6 to work with, anything after will be cropped.
> + */
> +enum {
> +       IP,
> +       IPV6,
> +       IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
> +       IPV6FR, /* Fragmentation IPv6 Extension Header */
> +       MPLS,
> +       VLAN,
> +       GUE,
> +};
> +
> +#define IP_MF          0x2000
> +#define IP_OFFSET      0x1FFF
> +#define IP6_MF         0x0001
> +#define IP6_OFFSET     0xFFF8
> +
> +struct vlan_hdr {
> +       __be16 h_vlan_TCI;
> +       __be16 h_vlan_encapsulated_proto;
> +};
> +
> +struct gre_hdr {
> +       __be16 flags;
> +       __be16 proto;
> +};
> +
> +#define GUE_PORT 6080
> +/* Taken from include/net/gue.h. Move that to uapi, instead? */
> +struct guehdr {
> +       union {
> +               struct {
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> +                       __u8    hlen:5,
> +                               control:1,
> +                               version:2;
> +#elif defined (__BIG_ENDIAN_BITFIELD)
> +                       __u8    version:2,
> +                               control:1,
> +                               hlen:5;
> +#else
> +#error  "Please fix <asm/byteorder.h>"
> +#endif
> +                       __u8    proto_ctype;
> +                       __be16  flags;
> +               };
> +               __be32  word;
> +       };
> +};
> +
> +enum flow_dissector_key_id {
> +       FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
> +       FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
> +       FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
> +       FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
> +       FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
> +       FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> +       FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
> +       FLOW_DISSECTOR_KEY_TIPC, /* struct flow_dissector_key_tipc */
> +       FLOW_DISSECTOR_KEY_ARP, /* struct flow_dissector_key_arp */
> +       FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
> +       FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_flow_tags */
> +       FLOW_DISSECTOR_KEY_GRE_KEYID, /* struct flow_dissector_key_keyid */
> +       FLOW_DISSECTOR_KEY_MPLS_ENTROPY, /* struct flow_dissector_key_keyid */
> +       FLOW_DISSECTOR_KEY_ENC_KEYID, /* struct flow_dissector_key_keyid */
> +       FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
> +       FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
> +       FLOW_DISSECTOR_KEY_ENC_CONTROL, /* struct flow_dissector_key_control */
> +       FLOW_DISSECTOR_KEY_ENC_PORTS, /* struct flow_dissector_key_ports */
> +       FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
> +       FLOW_DISSECTOR_KEY_TCP, /* struct flow_dissector_key_tcp */
> +       FLOW_DISSECTOR_KEY_IP, /* struct flow_dissector_key_ip */
> +       FLOW_DISSECTOR_KEY_CVLAN, /* struct flow_dissector_key_flow_vlan */
> +
> +       FLOW_DISSECTOR_KEY_MAX,
> +};
> +
> +struct flow_dissector_key_control {
> +       __u16   thoff;
> +       __u16   addr_type;
> +       __u32   flags;
> +};
> +
> +#define FLOW_DIS_IS_FRAGMENT   (1 << 0)
> +#define FLOW_DIS_FIRST_FRAG    (1 << 1)
> +#define FLOW_DIS_ENCAPSULATION (1 << 2)
> +
> +struct flow_dissector_key_basic {
> +       __be16  n_proto;
> +       __u8    ip_proto;
> +       __u8    padding;
> +};
> +
> +struct flow_dissector_key_ipv4_addrs {
> +       __be32 src;
> +       __be32 dst;
> +};
> +
> +struct flow_dissector_key_ipv6_addrs {
> +       struct in6_addr src;
> +       struct in6_addr dst;
> +};
> +
> +struct flow_dissector_key_addrs {
> +       union {
> +               struct flow_dissector_key_ipv4_addrs v4addrs;
> +               struct flow_dissector_key_ipv6_addrs v6addrs;
> +       };
> +};
> +
> +struct flow_dissector_key_ports {
> +       union {
> +               __be32 ports;
> +               struct {
> +                       __be16 src;
> +                       __be16 dst;
> +               };
> +       };
> +};
> +
> +struct bpf_map_def SEC("maps") jmp_table = {
> +       .type = BPF_MAP_TYPE_PROG_ARRAY,
> +       .key_size = sizeof(__u32),
> +       .value_size = sizeof(__u32),
> +       .max_entries = 8
> +};
> +
> +struct bpf_dissect_cb {
> +       __u16 nhoff;
> +       __u16 flags;
> +};
> +
> +/* Dispatches on ETHERTYPE */
> +static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
> +{
> +       switch (proto) {
> +       case bpf_htons(ETH_P_IP):
> +               bpf_tail_call(skb, &jmp_table, IP);
> +               break;
> +       case bpf_htons(ETH_P_IPV6):
> +               bpf_tail_call(skb, &jmp_table, IPV6);
> +               break;
> +       case bpf_htons(ETH_P_MPLS_MC):
> +       case bpf_htons(ETH_P_MPLS_UC):
> +               bpf_tail_call(skb, &jmp_table, MPLS);
> +               break;
> +       case bpf_htons(ETH_P_8021Q):
> +       case bpf_htons(ETH_P_8021AD):
> +               bpf_tail_call(skb, &jmp_table, VLAN);
> +               break;
> +       default:
> +               /* Protocol not supported */
> +               return BPF_DROP;
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +static __always_inline int write_ports(struct __sk_buff *skb, __u8 proto)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct flow_dissector_key_ports ports;
> +
> +       /* The supported protocols always start with the ports */
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &ports, sizeof(ports)))
> +               return BPF_DROP;
> +
> +       if (proto == IPPROTO_UDP && ports.dst == bpf_htons(GUE_PORT)) {
> +               /* GUE encapsulation */
> +               cb->nhoff += sizeof(struct udphdr);
> +               bpf_tail_call(skb, &jmp_table, GUE);
> +               return BPF_DROP;

It's a nice sentiment to support GUE, but this really isn't the right
way to do it. What would be much better is a means to generically
support all the various UDP encapsulations like GUE, VXLAN, Geneve,
GRE/UDP, MPLS/UDP, etc. I think there's two ways to do that:

1) A UDP socket lookup that returns an encapsulation socket containing
a flow dissector function that can be called. This is the safest
method because of the UDP are reserved numbers problem. I implement
this in kernel flow dissector, not upstreamed though.
2) Create a lookup table based on destination port that returns the
flow dissector function to call. This doesn't have the socket lookup
so it isn't quite as robust as the socket lookup. But, at least it's a
generic interface and programmable so it might be appropriate in the
BPF flow dissector case.

Tom

> +       }
> +
> +       if (bpf_flow_dissector_write_keys(skb, &ports, sizeof(ports),
> +                                         FLOW_DISSECTOR_KEY_PORTS))
> +               return BPF_DROP;
> +
> +       return BPF_OK;
> +}
> +
> +SEC("dissect")
> +int dissect(struct __sk_buff *skb)
> +{
> +       if (!skb->vlan_present)
> +               return parse_eth_proto(skb, skb->protocol);
> +       else
> +               return parse_eth_proto(skb, skb->vlan_proto);
> +}
> +
> +/* Parses on IPPROTO_* */
> +static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __u8 *data_end = (__u8 *)(long)skb->data_end;
> +       __u8 *data = (__u8 *)(long)skb->data;
> +       __u32 data_len = data_end - data;
> +       struct gre_hdr gre;
> +       struct ethhdr eth;
> +       struct tcphdr tcp;
> +
> +       switch (proto) {
> +       case IPPROTO_ICMP:
> +               if (cb->nhoff + sizeof(struct icmphdr) > data_len)
> +                       return BPF_DROP;
> +               return BPF_OK;
> +       case IPPROTO_IPIP:
> +               cb->flags |= FLOW_DIS_ENCAPSULATION;
> +               bpf_tail_call(skb, &jmp_table, IP);
> +               break;
> +       case IPPROTO_IPV6:
> +               cb->flags |= FLOW_DIS_ENCAPSULATION;
> +               bpf_tail_call(skb, &jmp_table, IPV6);
> +               break;
> +       case IPPROTO_GRE:
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &gre, sizeof(gre)))
> +                       return BPF_DROP;
> +
> +               if (bpf_htons(gre.flags & GRE_VERSION))
> +                       /* Only inspect standard GRE packets with version 0 */
> +                       return BPF_OK;
> +
> +               cb->nhoff += sizeof(gre); /* Step over GRE Flags and Protocol */
> +               if (GRE_IS_CSUM(gre.flags))
> +                       cb->nhoff += 4; /* Step over chksum and Padding */
> +               if (GRE_IS_KEY(gre.flags))
> +                       cb->nhoff += 4; /* Step over key */
> +               if (GRE_IS_SEQ(gre.flags))
> +                       cb->nhoff += 4; /* Step over sequence number */
> +
> +               cb->flags |= FLOW_DIS_ENCAPSULATION;
> +
> +               if (gre.proto == bpf_htons(ETH_P_TEB)) {
> +                       if (bpf_skb_load_bytes(skb, cb->nhoff, &eth,
> +                                              sizeof(eth)))
> +                               return BPF_DROP;
> +
> +                       cb->nhoff += sizeof(eth);
> +
> +                       return parse_eth_proto(skb, eth.h_proto);
> +               } else {
> +                       return parse_eth_proto(skb, gre.proto);
> +               }
> +
> +       case IPPROTO_TCP:
> +               if (cb->nhoff + sizeof(struct tcphdr) > data_len)
> +                       return BPF_DROP;
> +
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &tcp, sizeof(tcp)))
> +                       return BPF_DROP;
> +
> +               if (tcp.doff < 5)
> +                       return BPF_DROP;
> +
> +               if (cb->nhoff + (tcp.doff << 2) > data_len)
> +                       return BPF_DROP;
> +
> +               return write_ports(skb, proto);
> +       case IPPROTO_UDP:
> +       case IPPROTO_UDPLITE:
> +               if (cb->nhoff + sizeof(struct udphdr) > data_len)
> +                       return BPF_DROP;
> +
> +               return write_ports(skb, proto);
> +       default:
> +               return BPF_DROP;
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct flow_dissector_key_control control;
> +       struct flow_dissector_key_basic basic;
> +
> +       switch (nexthdr) {
> +       case IPPROTO_HOPOPTS:
> +       case IPPROTO_DSTOPTS:
> +               bpf_tail_call(skb, &jmp_table, IPV6OP);
> +               break;
> +       case IPPROTO_FRAGMENT:
> +               bpf_tail_call(skb, &jmp_table, IPV6FR);
> +               break;
> +       default:
> +               control.thoff = cb->nhoff;
> +               control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
> +               control.flags = cb->flags;
> +               if (bpf_flow_dissector_write_keys(skb, &control,
> +                                                 sizeof(control),
> +                                                 FLOW_DISSECTOR_KEY_CONTROL))
> +                       return BPF_DROP;
> +
> +               memset(&basic, 0, sizeof(basic));
> +               basic.n_proto = bpf_htons(ETH_P_IPV6);
> +               basic.ip_proto = nexthdr;
> +               if (bpf_flow_dissector_write_keys(skb, &basic, sizeof(basic),
> +                                             FLOW_DISSECTOR_KEY_BASIC))
> +                       return BPF_DROP;
> +
> +               return parse_ip_proto(skb, nexthdr);
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +PROG(IP)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __u8 *data_end = (__u8 *)(long)skb->data_end;
> +       struct flow_dissector_key_control control;
> +       struct flow_dissector_key_addrs addrs;
> +       struct flow_dissector_key_basic basic;
> +       __u8 *data = (__u8 *)(long)skb->data;
> +       __u32 data_len = data_end - data;
> +       bool done = false;
> +       struct iphdr iph;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &iph, sizeof(iph)))
> +               return BPF_DROP;
> +
> +       /* IP header cannot be smaller than 20 bytes */
> +       if (iph.ihl < 5)
> +               return BPF_DROP;
> +
> +       addrs.v4addrs.src = iph.saddr;
> +       addrs.v4addrs.dst = iph.daddr;
> +       if (bpf_flow_dissector_write_keys(skb, &addrs, sizeof(addrs.v4addrs),
> +                                     FLOW_DISSECTOR_KEY_IPV4_ADDRS))
> +               return BPF_DROP;
> +
> +       cb->nhoff += iph.ihl << 2;
> +       if (cb->nhoff > data_len)
> +               return BPF_DROP;
> +
> +       if (iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
> +               cb->flags |= FLOW_DIS_IS_FRAGMENT;
> +               if (iph.frag_off & bpf_htons(IP_OFFSET))
> +                       /* From second fragment on, packets do not have headers
> +                        * we can parse.
> +                        */
> +                       done = true;
> +               else
> +                       cb->flags |= FLOW_DIS_FIRST_FRAG;
> +       }
> +
> +
> +       control.thoff = cb->nhoff;
> +       control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
> +       control.flags = cb->flags;
> +       if (bpf_flow_dissector_write_keys(skb, &control, sizeof(control),
> +                                         FLOW_DISSECTOR_KEY_CONTROL))
> +               return BPF_DROP;
> +
> +       memset(&basic, 0, sizeof(basic));
> +       basic.n_proto = bpf_htons(ETH_P_IP);
> +       basic.ip_proto = iph.protocol;
> +       if (bpf_flow_dissector_write_keys(skb, &basic, sizeof(basic),
> +                                     FLOW_DISSECTOR_KEY_BASIC))
> +               return BPF_DROP;
> +
> +       if (done)
> +               return BPF_OK;
> +
> +       return parse_ip_proto(skb, iph.protocol);
> +}
> +
> +PROG(IPV6)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct flow_dissector_key_addrs addrs;
> +       struct ipv6hdr ip6h;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &ip6h, sizeof(ip6h)))
> +               return BPF_DROP;
> +
> +       addrs.v6addrs.src = ip6h.saddr;
> +       addrs.v6addrs.dst = ip6h.daddr;
> +       if (bpf_flow_dissector_write_keys(skb, &addrs, sizeof(addrs.v6addrs),
> +                                     FLOW_DISSECTOR_KEY_IPV6_ADDRS))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(struct ipv6hdr);
> +
> +       return parse_ipv6_proto(skb, ip6h.nexthdr);
> +}
> +
> +PROG(IPV6OP)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __u8 proto;
> +       __u8 hlen;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &proto, sizeof(proto)))
> +               return BPF_DROP;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff + sizeof(proto), &hlen,
> +                              sizeof(hlen)))
> +               return BPF_DROP;
> +       /* hlen is in 8-octects and does not include the first 8 bytes
> +        * of the header
> +        */
> +       cb->nhoff += (1 + hlen) << 3;
> +
> +       return parse_ipv6_proto(skb, proto);
> +}
> +
> +PROG(IPV6FR)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __be16 frag_off;
> +       __u8 proto;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &proto, sizeof(proto)))
> +               return BPF_DROP;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff + 2, &frag_off, sizeof(frag_off)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += 8;
> +       cb->flags |= FLOW_DIS_IS_FRAGMENT;
> +       if (!(frag_off & bpf_htons(IP6_OFFSET)))
> +               cb->flags |= FLOW_DIS_FIRST_FRAG;
> +
> +       return parse_ipv6_proto(skb, proto);
> +}
> +
> +PROG(MPLS)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct mpls_label mpls;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &mpls, sizeof(mpls)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(mpls);
> +
> +       if (mpls.entry & MPLS_LS_S_MASK) {
> +               /* This is the last MPLS header. The network layer packet always
> +                * follows the MPLS header. Peek forward and dispatch based on
> +                * that.
> +                */
> +               __u8 version;
> +
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &version,
> +                                      sizeof(version)))
> +                       return BPF_DROP;
> +
> +               /* IP version is always the first 4 bits of the header */
> +               switch (version & 0xF0) {
> +               case 4:
> +                       bpf_tail_call(skb, &jmp_table, IP);
> +                       break;
> +               case 6:
> +                       bpf_tail_call(skb, &jmp_table, IPV6);
> +                       break;
> +               default:
> +                       return BPF_DROP;
> +               }
> +       } else {
> +               bpf_tail_call(skb, &jmp_table, MPLS);
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +PROG(VLAN)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct vlan_hdr vlan;
> +       __be16 proto;
> +
> +       /* Peek back to see if single or double-tagging */
> +       if (bpf_skb_load_bytes(skb, cb->nhoff - sizeof(proto), &proto,
> +                              sizeof(proto)))
> +               return BPF_DROP;
> +
> +       /* Account for double-tagging */
> +       if (proto == bpf_htons(ETH_P_8021AD)) {
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &vlan, sizeof(vlan)))
> +                       return BPF_DROP;
> +
> +               if (vlan.h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q))
> +                       return BPF_DROP;
> +
> +               cb->nhoff += sizeof(vlan);
> +       }
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &vlan, sizeof(vlan)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(vlan);
> +       /* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/
> +       if (vlan.h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) ||
> +           vlan.h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q))
> +               return BPF_DROP;
> +
> +       return parse_eth_proto(skb, vlan.h_vlan_encapsulated_proto);
> +}
> +
> +PROG(GUE)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct guehdr gue;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &gue, sizeof(gue)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(gue);
> +       cb->nhoff += gue.hlen << 2;
> +
> +       cb->flags |= FLOW_DIS_ENCAPSULATION;
> +       return parse_ip_proto(skb, gue.proto_ctype);
> +}
> +
> +char __license[] SEC("license") = "GPL";
> --
> 2.18.0.865.gffc8e1a3cd6-goog
>

^ permalink raw reply

* Re: [PATCH] wireless: Use dma_zalloc_coherent instead of dma_alloc_coherent + memset
From: Kalle Valo @ 2018-08-18 18:31 UTC (permalink / raw)
  To: zhong jiang; +Cc: davem, linux-kernel, netdev
In-Reply-To: <87pnyfttop.fsf@kamboji.qca.qualcomm.com>

Kalle Valo <kvalo@codeaurora.org> writes:

> zhong jiang <zhongjiang@huawei.com> writes:
>
>> dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
>> We prefer to dma_zalloc_coherent instead of open-codeing.
>>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> The correct prefix is "wcn36xx: ", not "wireless:". I can fix it this
> time.
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#commit_title_is_wrong

Actually please resend this patch and CC linux-wireless so that
patchwork sees this.

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#who_to_address

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] wireless: Use dma_zalloc_coherent instead of dma_alloc_coherent + memset
From: Kalle Valo @ 2018-08-18 18:29 UTC (permalink / raw)
  To: zhong jiang; +Cc: davem, linux-kernel, netdev
In-Reply-To: <1534604707-10874-1-git-send-email-zhongjiang@huawei.com>

zhong jiang <zhongjiang@huawei.com> writes:

> dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
> We prefer to dma_zalloc_coherent instead of open-codeing.
>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)

The correct prefix is "wcn36xx: ", not "wireless:". I can fix it this
time.

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#commit_title_is_wrong

-- 
Kalle Valo

^ permalink raw reply

* [PATCH] ethernet: Use dma_zalloc_coherent to replace dma_alloc_coherent + memset
From: zhong jiang @ 2018-08-18 14:48 UTC (permalink / raw)
  To: davem; +Cc: jeffrey.t.kirsher, netdev, linux-kernel

dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
We prefer to dma_zalloc_coherent instead of open-codeing.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/ethernet/intel/ixgb/ixgb_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index 43664ad..d3e72d0 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -771,14 +771,13 @@ static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev,
 	rxdr->size = rxdr->count * sizeof(struct ixgb_rx_desc);
 	rxdr->size = ALIGN(rxdr->size, 4096);
 
-	rxdr->desc = dma_alloc_coherent(&pdev->dev, rxdr->size, &rxdr->dma,
-					GFP_KERNEL);
+	rxdr->desc = dma_zalloc_coherent(&pdev->dev, rxdr->size, &rxdr->dma,
+					 GFP_KERNEL);
 
 	if (!rxdr->desc) {
 		vfree(rxdr->buffer_info);
 		return -ENOMEM;
 	}
-	memset(rxdr->desc, 0, rxdr->size);
 
 	rxdr->next_to_clean = 0;
 	rxdr->next_to_use = 0;
-- 
1.7.12.4

^ permalink raw reply related

* WARNING: refcount bug in igmp_start_timer
From: syzbot @ 2018-08-18 16:37 UTC (permalink / raw)
  To: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    edb0a2000936 Merge tag 'arm64-fixes' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1749ebce400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4fd89f99c889a184
dashboard link: https://syzkaller.appspot.com/bug?extid=e28037ac1c96d2a86e89
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e28037ac1c96d2a86e89@syzkaller.appspotmail.com

binder: 17097:17099 Acquire 1 refcount change on invalid ref 0 ret -22
binder: 17097:17099 BC_REQUEST_DEATH_NOTIFICATION invalid ref 0
------------[ cut here ]------------
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 0 PID: 17119 at lib/refcount.c:153  
refcount_inc_checked+0x5d/0x70 lib/refcount.c:153
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 17119 Comm: syz-executor2 Not tainted 4.18.0+ #194
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
binder: 17097:17122 unknown command 1074553618
  panic+0x238/0x4e7 kernel/panic.c:184
  __warn.cold.8+0x163/0x1ba kernel/panic.c:536
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:993
RIP: 0010:refcount_inc_checked+0x5d/0x70 lib/refcount.c:153
Code: 1d 7d f5 24 05 31 ff 89 de e8 7f be 1b fe 84 db 75 df e8 a6 bd 1b fe  
48 c7 c7 00 72 3a 87 c6 05 5d f5 24 05 01 e8 43 4e e6 fd <0f> 0b eb c3 0f  
1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
binder: 17097:17122 ioctl c0306201 20000040 returned -22
RSP: 0018:ffff88019521ebd8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000200d000
RDX: 0000000000003667 RSI: ffffffff8163ac21 RDI: ffff88019521e8c8
RBP: ffff88019521ebe0 R08: ffff880199d10080 R09: 0000000000000002
R10: ffff880199d10080 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000008 R14: ffff8801d7aaa2c0 R15: dffffc0000000000
  igmp_start_timer+0xaf/0xe0 net/ipv4/igmp.c:217
  igmp_mod_timer net/ipv4/igmp.c:255 [inline]
  igmp_heard_query net/ipv4/igmp.c:1027 [inline]
  igmp_rcv+0x1920/0x3060 net/ipv4/igmp.c:1062
  ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215
  NF_HOOK include/linux/netfilter.h:287 [inline]
  ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256
  dst_input include/net/dst.h:450 [inline]
  ip_rcv_finish+0x1f9/0x300 net/ipv4/ip_input.c:415
  NF_HOOK include/linux/netfilter.h:287 [inline]
  ip_rcv+0xed/0x610 net/ipv4/ip_input.c:524
  __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4892
  __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5002
  netif_receive_skb_internal+0x12e/0x680 net/core/dev.c:5105
  netif_receive_skb+0xbf/0x420 net/core/dev.c:5178
  tun_rx_batched.isra.56+0x4ba/0x8c0 drivers/net/tun.c:1572
  tun_get_user+0x2ac9/0x42c0 drivers/net/tun.c:1982
  tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2010
  call_write_iter include/linux/fs.h:1808 [inline]
  do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
  do_iter_write+0x185/0x5f0 fs/read_write.c:959
  vfs_writev+0x1f1/0x360 fs/read_write.c:1004
  do_writev+0x11a/0x310 fs/read_write.c:1039
  __do_sys_writev fs/read_write.c:1112 [inline]
  __se_sys_writev fs/read_write.c:1109 [inline]
  __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456f41
Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b6 fb ff c3 48  
83 ec 08 e8 da 2c 00 00 48 89 04 24 b8 14 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 23 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007f1641b03ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 000000000000002a RCX: 0000000000456f41
RDX: 0000000000000001 RSI: 00007f1641b03bf0 RDI: 00000000000000f0
RBP: 0000000020000640 R08: 00000000000000f0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00000000004d6410 R14: 00000000004c9a96 R15: 0000000000000000
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* [PATCH] wireless: Use dma_zalloc_coherent instead of dma_alloc_coherent + memset
From: zhong jiang @ 2018-08-18 15:05 UTC (permalink / raw)
  To: kvalo, davem; +Cc: linux-kernel, netdev

dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
We prefer to dma_zalloc_coherent instead of open-codeing.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 06cfe8d..e66ddaa 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -174,13 +174,11 @@ static int wcn36xx_dxe_init_descs(struct device *dev, struct wcn36xx_dxe_ch *wcn
 	int i;
 
 	size = wcn_ch->desc_num * sizeof(struct wcn36xx_dxe_desc);
-	wcn_ch->cpu_addr = dma_alloc_coherent(dev, size, &wcn_ch->dma_addr,
-					      GFP_KERNEL);
+	wcn_ch->cpu_addr = dma_zalloc_coherent(dev, size, &wcn_ch->dma_addr,
+					       GFP_KERNEL);
 	if (!wcn_ch->cpu_addr)
 		return -ENOMEM;
 
-	memset(wcn_ch->cpu_addr, 0, size);
-
 	cur_dxe = (struct wcn36xx_dxe_desc *)wcn_ch->cpu_addr;
 	cur_ctl = wcn_ch->head_blk_ctl;
 
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH] ip6_vti: simplify stats handling in vti6_xmit
From: Haishuang Yan @ 2018-08-18 14:43 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Alexey Kuznetsov
  Cc: netdev, linux-kernel, Haishuang Yan

Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit
code path.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv6/ip6_vti.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index c72ae3a..65d4a80 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -503,17 +503,9 @@ static bool vti6_state_check(const struct xfrm_state *x,
 	skb->dev = skb_dst(skb)->dev;
 
 	err = dst_output(t->net, skb->sk, skb);
-	if (net_xmit_eval(err) == 0) {
-		struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
-
-		u64_stats_update_begin(&tstats->syncp);
-		tstats->tx_bytes += pkt_len;
-		tstats->tx_packets++;
-		u64_stats_update_end(&tstats->syncp);
-	} else {
-		stats->tx_errors++;
-		stats->tx_aborted_errors++;
-	}
+	if (net_xmit_eval(err) == 0)
+		err = pkt_len;
+	iptunnel_xmit_stats(dev, err);
 
 	return 0;
 tx_err_link_failure:
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox