Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH]ipv6: multicast: In mld_send_cr function moving read lock to second for loop
From: David Miller @ 2018-08-18 20:58 UTC (permalink / raw)
  To: guru2018; +Cc: netdev, kuznet, yoshfuji
In-Reply-To: <CAHSpA5-ivVaQnQU7_ikt=BV-EAwFJFaA6ErpQ1Jn1HmPmXZapg@mail.gmail.com>

From: Guruswamy Basavaiah <guru2018@gmail.com>
Date: Fri, 17 Aug 2018 18:01:41 +0530

> @@ -1860,7 +1860,6 @@ static void mld_send_cr(struct inet6_dev *idev)
>      struct sk_buff *skb = NULL;
>      int type, dtype;
> 
> -    read_lock_bh(&idev->lock);
>      spin_lock(&idev->mc_lock);
> 
>      /* deleted MCA's */

This will lead to deadlocks, idev->mc_lock must be taken with _bh().

I have zero confidence in this change, did you do any stress testing
with lockdep enabled?  It would have caught this quickly.

^ permalink raw reply

* Re: how to (cross)connect two (physical) eth ports for ping test?
From: Willy Tarreau @ 2018-08-18 20:45 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Robert P. J. Day, Linux kernel netdev mailing list
In-Reply-To: <20180818191025.GA11187@lunn.ch>

On Sat, Aug 18, 2018 at 09:10:25PM +0200, Andrew Lunn wrote:
> On Sat, Aug 18, 2018 at 01:39:50PM -0400, Robert P. J. Day wrote:
> > 
> >   (i'm sure this has been explained many times before, so a link
> > covering this will almost certainly do just fine.)
> > 
> >   i want to loop one physical ethernet port into another, and just
> > ping the daylights from one to the other for stress testing. my fedora
> > laptop doesn't actually have two unused ethernet ports, so i just want
> > to emulate this by slapping a couple startech USB/net adapters into
> > two empty USB ports, setting this up, then doing it all over again
> > monday morning on the actual target system, which does have multiple
> > ethernet ports.
> > 
> >   so if someone can point me to the recipe, that would be great and
> > you can stop reading.
> > 
> >   as far as my tentative solution goes, i assume i need to put at
> > least one of the physical ports in a network namespace via "ip netns",
> > then ping from the netns to the root namespace. or, going one step
> > further, perhaps putting both interfaces into two new namespaces, and
> > setting up forwarding.
> 
> Namespaces is a good solution. Something like this should work:
> 
> ip netns add namespace1
> ip netns add namespace2
> 
> ip link set eth1 netns namespace1
> ip link set eth2 netns namespace2
> 
> ip netns exec namespace1 \
>         ip addr add 10.42.42.42/24 dev eth1
> 
> ip netns exec namespace1 \
>         ip link set eth1 up
> 
> ip netns exec namespace2 \
>         ip addr add 10.42.42.24/24 dev eth2
> 
> ip netns exec namespace2 \
>         ip link set eth2 up
> 
> ip netns exec namespace1 \
>         ping 10.42.42.24
> 
> You might also want to consider iperf3 for stress testing, depending
> on the sort of stress you need.

FWIW I have a setup somewhere involving ip rule + ip route which achieves
the same without involving namespaces. It's a bit hackish but sometimes
convenient. I can dig if someone is interested.

Regards,
Willy

^ permalink raw reply

* Re: [bpf-next RFC 2/3] flow_dissector: implements eBPF parser
From: Willem de Bruijn @ 2018-08-18 19:49 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Petar Penkov, Network Development, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman, Petar Penkov,
	Willem de Bruijn
In-Reply-To: <CALx6S34gBQbpN1rjsC7jWYFMpzW-T8EUnX-82um7fDiFHLyysQ@mail.gmail.com>

On Sat, Aug 18, 2018 at 11:56 AM Tom Herbert <tom@herbertland.com> wrote:
>
> On Thu, Aug 16, 2018 at 9:44 AM, Petar Penkov <peterpenkov96@gmail.com> wrote:
> > From: Petar Penkov <ppenkov@google.com>
> >
> > This eBPF program extracts basic/control/ip address/ports keys from
> > incoming packets. It supports recursive parsing for IP
> > encapsulation, MPLS, GUE, and VLAN, along with IPv4/IPv6 and extension
> > headers. This program is meant to show how flow dissection and key
> > extraction can be done in eBPF.
> >
> > It is initially meant to be used for demonstration rather than as a
> > complete replacement of the existing flow dissector.
> >
> > This includes parsing of GUE and MPLS payload, which cannot be done
> > in production in general, as GUE tunnels and MPLS payloads cannot
> > unambiguously be detected in general.
> >
> > In closed environments, however, it can be enabled. Another example
> > where the programmability of BPF aids flow dissection.

> > +static __always_inline int write_ports(struct __sk_buff *skb, __u8 proto)
> > +{
> > +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> > +       struct flow_dissector_key_ports ports;
> > +
> > +       /* The supported protocols always start with the ports */
> > +       if (bpf_skb_load_bytes(skb, cb->nhoff, &ports, sizeof(ports)))
> > +               return BPF_DROP;
> > +
> > +       if (proto == IPPROTO_UDP && ports.dst == bpf_htons(GUE_PORT)) {
> > +               /* GUE encapsulation */
> > +               cb->nhoff += sizeof(struct udphdr);
> > +               bpf_tail_call(skb, &jmp_table, GUE);
> > +               return BPF_DROP;
>
> It's a nice sentiment to support GUE, but this really isn't the right
> way to do it.

Yes, this was just for demonstration purposes. The same for
unconditionally parsing MPLS payload as IP.

Though note the point in the commit message that within a closed
network with fixed reserved GUE ports, a custom BPF program
like this could be sufficient. That's true not only for UDP tunnels.

> What would be much better is a means to generically
> support all the various UDP encapsulations like GUE, VXLAN, Geneve,
> GRE/UDP, MPLS/UDP, etc. I think there's two ways to do that:
>
> 1) A UDP socket lookup that returns an encapsulation socket containing
> a flow dissector function that can be called. This is the safest
> method because of the UDP are reserved numbers problem. I implement
> this in kernel flow dissector, not upstreamed though.

Yes, similar to udp_gro_receive. Socket lookup is not free, however,
and this is a relatively rarely used feature.

I want to move the one in udp_gro_receive behind a static key.
udp_encap_needed_key is the likely target. Then the same can
eventually be done for flow dissection inside UDP tunnels.

> 2) Create a lookup table based on destination port that returns the
> flow dissector function to call. This doesn't have the socket lookup
> so it isn't quite as robust as the socket lookup. But, at least it's a
> generic interface and programmable so it might be appropriate in the
> BPF flow dissector case.

Option 1 sounds preferable to me.

^ permalink raw reply

* Re: how to (cross)connect two (physical) eth ports for ping test?
From: Andrew Lunn @ 2018-08-18 19:10 UTC (permalink / raw)
  To: Robert P. J. Day; +Cc: Linux kernel netdev mailing list
In-Reply-To: <alpine.LFD.2.21.1808181332210.7716@localhost.localdomain>

On Sat, Aug 18, 2018 at 01:39:50PM -0400, Robert P. J. Day wrote:
> 
>   (i'm sure this has been explained many times before, so a link
> covering this will almost certainly do just fine.)
> 
>   i want to loop one physical ethernet port into another, and just
> ping the daylights from one to the other for stress testing. my fedora
> laptop doesn't actually have two unused ethernet ports, so i just want
> to emulate this by slapping a couple startech USB/net adapters into
> two empty USB ports, setting this up, then doing it all over again
> monday morning on the actual target system, which does have multiple
> ethernet ports.
> 
>   so if someone can point me to the recipe, that would be great and
> you can stop reading.
> 
>   as far as my tentative solution goes, i assume i need to put at
> least one of the physical ports in a network namespace via "ip netns",
> then ping from the netns to the root namespace. or, going one step
> further, perhaps putting both interfaces into two new namespaces, and
> setting up forwarding.

Namespaces is a good solution. Something like this should work:

ip netns add namespace1
ip netns add namespace2

ip link set eth1 netns namespace1
ip link set eth2 netns namespace2

ip netns exec namespace1 \
        ip addr add 10.42.42.42/24 dev eth1

ip netns exec namespace1 \
        ip link set eth1 up

ip netns exec namespace2 \
        ip addr add 10.42.42.24/24 dev eth2

ip netns exec namespace2 \
        ip link set eth2 up

ip netns exec namespace1 \
        ping 10.42.42.24

You might also want to consider iperf3 for stress testing, depending
on the sort of stress you need.

   Andrew

^ permalink raw reply

* how to (cross)connect two (physical) eth ports for ping test?
From: Robert P. J. Day @ 2018-08-18 17:39 UTC (permalink / raw)
  To: Linux kernel netdev mailing list

  (i'm sure this has been explained many times before, so a link
covering this will almost certainly do just fine.)

  i want to loop one physical ethernet port into another, and just
ping the daylights from one to the other for stress testing. my fedora
laptop doesn't actually have two unused ethernet ports, so i just want
to emulate this by slapping a couple startech USB/net adapters into
two empty USB ports, setting this up, then doing it all over again
monday morning on the actual target system, which does have multiple
ethernet ports.

  so if someone can point me to the recipe, that would be great and
you can stop reading.

  as far as my tentative solution goes, i assume i need to put at
least one of the physical ports in a network namespace via "ip netns",
then ping from the netns to the root namespace. or, going one step
further, perhaps putting both interfaces into two new namespaces, and
setting up forwarding.

  anyway, a recipe for this would be just ducky. thank you kindly.

rday

-- 

========================================================================
Robert P. J. Day                                 Ottawa, Ontario, CANADA
                  http://crashcourse.ca/dokuwiki

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================

^ permalink raw reply

* Re: [PATCH] ip6_vti: simplify stats handling in vti6_xmit
From: David Miller @ 2018-08-18 20:47 UTC (permalink / raw)
  To: yanhaishuang; +Cc: steffen.klassert, kuznet, netdev, linux-kernel
In-Reply-To: <1534603428-30425-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 18 Aug 2018 22:43:48 +0800

> Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit
> code path.
> 
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: nixge: Add support for 64-bit platforms
From: David Miller @ 2018-08-18 17:04 UTC (permalink / raw)
  To: mdf; +Cc: keescook, netdev, alex.williams, moritz.fischer, f.fainelli
In-Reply-To: <20180816190706.11334-1-mdf@kernel.org>

From: Moritz Fischer <mdf@kernel.org>
Date: Thu, 16 Aug 2018 12:07:06 -0700

> Add support for 64-bit platforms to driver.
> 
> The hardware only supports 32-bit register accesses
> so the accesses need to be split up into two writes
> when setting the current and tail descriptor values.
> 
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Moritz Fischer <mdf@kernel.org>

Please resubmit when the net-next tree opens back up.

Thank you.

^ permalink raw reply

* Re: pull-request: bpf 2018-08-18
From: David Miller @ 2018-08-18 17:03 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20180817232920.8608-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat, 18 Aug 2018 01:29:20 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
 ...
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH 00/15] Netfilter/IPVS fixes for net
From: David Miller @ 2018-08-18 17:01 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20180817193850.2796-1-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 17 Aug 2018 21:38:35 +0200

> The following patchset contains Netfilter/IPVS fixes for your net tree:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks.

^ permalink raw reply

* Re: [bpf-next RFC 2/3] flow_dissector: implements eBPF parser
From: Tom Herbert @ 2018-08-18 15:50 UTC (permalink / raw)
  To: Petar Penkov
  Cc: Linux Kernel Network Developers, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, Simon Horman, Petar Penkov,
	Willem de Bruijn
In-Reply-To: <20180816164423.14368-3-peterpenkov96@gmail.com>

On Thu, Aug 16, 2018 at 9:44 AM, Petar Penkov <peterpenkov96@gmail.com> wrote:
> From: Petar Penkov <ppenkov@google.com>
>
> This eBPF program extracts basic/control/ip address/ports keys from
> incoming packets. It supports recursive parsing for IP
> encapsulation, MPLS, GUE, and VLAN, along with IPv4/IPv6 and extension
> headers. This program is meant to show how flow dissection and key
> extraction can be done in eBPF.
>
> It is initially meant to be used for demonstration rather than as a
> complete replacement of the existing flow dissector.
>
> This includes parsing of GUE and MPLS payload, which cannot be done
> in production in general, as GUE tunnels and MPLS payloads cannot
> unambiguously be detected in general.
>
> In closed environments, however, it can be enabled. Another example
> where the programmability of BPF aids flow dissection.
>
> Link: http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  tools/testing/selftests/bpf/Makefile   |   2 +-
>  tools/testing/selftests/bpf/bpf_flow.c | 542 +++++++++++++++++++++++++
>  2 files changed, 543 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/bpf_flow.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index fff7fb1285fc..e65f50f9185e 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
>         test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
>         test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
>         get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
> -       test_skb_cgroup_id_kern.o
> +       test_skb_cgroup_id_kern.o bpf_flow.o
>
>  # Order correspond to 'make run_tests' order
>  TEST_PROGS := test_kmod.sh \
> diff --git a/tools/testing/selftests/bpf/bpf_flow.c b/tools/testing/selftests/bpf/bpf_flow.c
> new file mode 100644
> index 000000000000..9c11c644b713
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/bpf_flow.c
> @@ -0,0 +1,542 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <stddef.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <linux/pkt_cls.h>
> +#include <linux/bpf.h>
> +#include <linux/in.h>
> +#include <linux/if_ether.h>
> +#include <linux/icmp.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include <linux/tcp.h>
> +#include <linux/udp.h>
> +#include <linux/if_packet.h>
> +#include <sys/socket.h>
> +#include <linux/if_tunnel.h>
> +#include <linux/mpls.h>
> +#include "bpf_helpers.h"
> +#include "bpf_endian.h"
> +
> +int _version SEC("version") = 1;
> +#define PROG(F) SEC(#F) int bpf_func_##F
> +
> +/* These are the identifiers of the BPF programs that will be used in tail
> + * calls. Name is limited to 16 characters, with the terminating character and
> + * bpf_func_ above, we have only 6 to work with, anything after will be cropped.
> + */
> +enum {
> +       IP,
> +       IPV6,
> +       IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
> +       IPV6FR, /* Fragmentation IPv6 Extension Header */
> +       MPLS,
> +       VLAN,
> +       GUE,
> +};
> +
> +#define IP_MF          0x2000
> +#define IP_OFFSET      0x1FFF
> +#define IP6_MF         0x0001
> +#define IP6_OFFSET     0xFFF8
> +
> +struct vlan_hdr {
> +       __be16 h_vlan_TCI;
> +       __be16 h_vlan_encapsulated_proto;
> +};
> +
> +struct gre_hdr {
> +       __be16 flags;
> +       __be16 proto;
> +};
> +
> +#define GUE_PORT 6080
> +/* Taken from include/net/gue.h. Move that to uapi, instead? */
> +struct guehdr {
> +       union {
> +               struct {
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> +                       __u8    hlen:5,
> +                               control:1,
> +                               version:2;
> +#elif defined (__BIG_ENDIAN_BITFIELD)
> +                       __u8    version:2,
> +                               control:1,
> +                               hlen:5;
> +#else
> +#error  "Please fix <asm/byteorder.h>"
> +#endif
> +                       __u8    proto_ctype;
> +                       __be16  flags;
> +               };
> +               __be32  word;
> +       };
> +};
> +
> +enum flow_dissector_key_id {
> +       FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
> +       FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
> +       FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
> +       FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
> +       FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
> +       FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> +       FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
> +       FLOW_DISSECTOR_KEY_TIPC, /* struct flow_dissector_key_tipc */
> +       FLOW_DISSECTOR_KEY_ARP, /* struct flow_dissector_key_arp */
> +       FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
> +       FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_flow_tags */
> +       FLOW_DISSECTOR_KEY_GRE_KEYID, /* struct flow_dissector_key_keyid */
> +       FLOW_DISSECTOR_KEY_MPLS_ENTROPY, /* struct flow_dissector_key_keyid */
> +       FLOW_DISSECTOR_KEY_ENC_KEYID, /* struct flow_dissector_key_keyid */
> +       FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
> +       FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
> +       FLOW_DISSECTOR_KEY_ENC_CONTROL, /* struct flow_dissector_key_control */
> +       FLOW_DISSECTOR_KEY_ENC_PORTS, /* struct flow_dissector_key_ports */
> +       FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
> +       FLOW_DISSECTOR_KEY_TCP, /* struct flow_dissector_key_tcp */
> +       FLOW_DISSECTOR_KEY_IP, /* struct flow_dissector_key_ip */
> +       FLOW_DISSECTOR_KEY_CVLAN, /* struct flow_dissector_key_flow_vlan */
> +
> +       FLOW_DISSECTOR_KEY_MAX,
> +};
> +
> +struct flow_dissector_key_control {
> +       __u16   thoff;
> +       __u16   addr_type;
> +       __u32   flags;
> +};
> +
> +#define FLOW_DIS_IS_FRAGMENT   (1 << 0)
> +#define FLOW_DIS_FIRST_FRAG    (1 << 1)
> +#define FLOW_DIS_ENCAPSULATION (1 << 2)
> +
> +struct flow_dissector_key_basic {
> +       __be16  n_proto;
> +       __u8    ip_proto;
> +       __u8    padding;
> +};
> +
> +struct flow_dissector_key_ipv4_addrs {
> +       __be32 src;
> +       __be32 dst;
> +};
> +
> +struct flow_dissector_key_ipv6_addrs {
> +       struct in6_addr src;
> +       struct in6_addr dst;
> +};
> +
> +struct flow_dissector_key_addrs {
> +       union {
> +               struct flow_dissector_key_ipv4_addrs v4addrs;
> +               struct flow_dissector_key_ipv6_addrs v6addrs;
> +       };
> +};
> +
> +struct flow_dissector_key_ports {
> +       union {
> +               __be32 ports;
> +               struct {
> +                       __be16 src;
> +                       __be16 dst;
> +               };
> +       };
> +};
> +
> +struct bpf_map_def SEC("maps") jmp_table = {
> +       .type = BPF_MAP_TYPE_PROG_ARRAY,
> +       .key_size = sizeof(__u32),
> +       .value_size = sizeof(__u32),
> +       .max_entries = 8
> +};
> +
> +struct bpf_dissect_cb {
> +       __u16 nhoff;
> +       __u16 flags;
> +};
> +
> +/* Dispatches on ETHERTYPE */
> +static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
> +{
> +       switch (proto) {
> +       case bpf_htons(ETH_P_IP):
> +               bpf_tail_call(skb, &jmp_table, IP);
> +               break;
> +       case bpf_htons(ETH_P_IPV6):
> +               bpf_tail_call(skb, &jmp_table, IPV6);
> +               break;
> +       case bpf_htons(ETH_P_MPLS_MC):
> +       case bpf_htons(ETH_P_MPLS_UC):
> +               bpf_tail_call(skb, &jmp_table, MPLS);
> +               break;
> +       case bpf_htons(ETH_P_8021Q):
> +       case bpf_htons(ETH_P_8021AD):
> +               bpf_tail_call(skb, &jmp_table, VLAN);
> +               break;
> +       default:
> +               /* Protocol not supported */
> +               return BPF_DROP;
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +static __always_inline int write_ports(struct __sk_buff *skb, __u8 proto)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct flow_dissector_key_ports ports;
> +
> +       /* The supported protocols always start with the ports */
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &ports, sizeof(ports)))
> +               return BPF_DROP;
> +
> +       if (proto == IPPROTO_UDP && ports.dst == bpf_htons(GUE_PORT)) {
> +               /* GUE encapsulation */
> +               cb->nhoff += sizeof(struct udphdr);
> +               bpf_tail_call(skb, &jmp_table, GUE);
> +               return BPF_DROP;

It's a nice sentiment to support GUE, but this really isn't the right
way to do it. What would be much better is a means to generically
support all the various UDP encapsulations like GUE, VXLAN, Geneve,
GRE/UDP, MPLS/UDP, etc. I think there's two ways to do that:

1) A UDP socket lookup that returns an encapsulation socket containing
a flow dissector function that can be called. This is the safest
method because of the UDP are reserved numbers problem. I implement
this in kernel flow dissector, not upstreamed though.
2) Create a lookup table based on destination port that returns the
flow dissector function to call. This doesn't have the socket lookup
so it isn't quite as robust as the socket lookup. But, at least it's a
generic interface and programmable so it might be appropriate in the
BPF flow dissector case.

Tom

> +       }
> +
> +       if (bpf_flow_dissector_write_keys(skb, &ports, sizeof(ports),
> +                                         FLOW_DISSECTOR_KEY_PORTS))
> +               return BPF_DROP;
> +
> +       return BPF_OK;
> +}
> +
> +SEC("dissect")
> +int dissect(struct __sk_buff *skb)
> +{
> +       if (!skb->vlan_present)
> +               return parse_eth_proto(skb, skb->protocol);
> +       else
> +               return parse_eth_proto(skb, skb->vlan_proto);
> +}
> +
> +/* Parses on IPPROTO_* */
> +static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __u8 *data_end = (__u8 *)(long)skb->data_end;
> +       __u8 *data = (__u8 *)(long)skb->data;
> +       __u32 data_len = data_end - data;
> +       struct gre_hdr gre;
> +       struct ethhdr eth;
> +       struct tcphdr tcp;
> +
> +       switch (proto) {
> +       case IPPROTO_ICMP:
> +               if (cb->nhoff + sizeof(struct icmphdr) > data_len)
> +                       return BPF_DROP;
> +               return BPF_OK;
> +       case IPPROTO_IPIP:
> +               cb->flags |= FLOW_DIS_ENCAPSULATION;
> +               bpf_tail_call(skb, &jmp_table, IP);
> +               break;
> +       case IPPROTO_IPV6:
> +               cb->flags |= FLOW_DIS_ENCAPSULATION;
> +               bpf_tail_call(skb, &jmp_table, IPV6);
> +               break;
> +       case IPPROTO_GRE:
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &gre, sizeof(gre)))
> +                       return BPF_DROP;
> +
> +               if (bpf_htons(gre.flags & GRE_VERSION))
> +                       /* Only inspect standard GRE packets with version 0 */
> +                       return BPF_OK;
> +
> +               cb->nhoff += sizeof(gre); /* Step over GRE Flags and Protocol */
> +               if (GRE_IS_CSUM(gre.flags))
> +                       cb->nhoff += 4; /* Step over chksum and Padding */
> +               if (GRE_IS_KEY(gre.flags))
> +                       cb->nhoff += 4; /* Step over key */
> +               if (GRE_IS_SEQ(gre.flags))
> +                       cb->nhoff += 4; /* Step over sequence number */
> +
> +               cb->flags |= FLOW_DIS_ENCAPSULATION;
> +
> +               if (gre.proto == bpf_htons(ETH_P_TEB)) {
> +                       if (bpf_skb_load_bytes(skb, cb->nhoff, &eth,
> +                                              sizeof(eth)))
> +                               return BPF_DROP;
> +
> +                       cb->nhoff += sizeof(eth);
> +
> +                       return parse_eth_proto(skb, eth.h_proto);
> +               } else {
> +                       return parse_eth_proto(skb, gre.proto);
> +               }
> +
> +       case IPPROTO_TCP:
> +               if (cb->nhoff + sizeof(struct tcphdr) > data_len)
> +                       return BPF_DROP;
> +
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &tcp, sizeof(tcp)))
> +                       return BPF_DROP;
> +
> +               if (tcp.doff < 5)
> +                       return BPF_DROP;
> +
> +               if (cb->nhoff + (tcp.doff << 2) > data_len)
> +                       return BPF_DROP;
> +
> +               return write_ports(skb, proto);
> +       case IPPROTO_UDP:
> +       case IPPROTO_UDPLITE:
> +               if (cb->nhoff + sizeof(struct udphdr) > data_len)
> +                       return BPF_DROP;
> +
> +               return write_ports(skb, proto);
> +       default:
> +               return BPF_DROP;
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct flow_dissector_key_control control;
> +       struct flow_dissector_key_basic basic;
> +
> +       switch (nexthdr) {
> +       case IPPROTO_HOPOPTS:
> +       case IPPROTO_DSTOPTS:
> +               bpf_tail_call(skb, &jmp_table, IPV6OP);
> +               break;
> +       case IPPROTO_FRAGMENT:
> +               bpf_tail_call(skb, &jmp_table, IPV6FR);
> +               break;
> +       default:
> +               control.thoff = cb->nhoff;
> +               control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
> +               control.flags = cb->flags;
> +               if (bpf_flow_dissector_write_keys(skb, &control,
> +                                                 sizeof(control),
> +                                                 FLOW_DISSECTOR_KEY_CONTROL))
> +                       return BPF_DROP;
> +
> +               memset(&basic, 0, sizeof(basic));
> +               basic.n_proto = bpf_htons(ETH_P_IPV6);
> +               basic.ip_proto = nexthdr;
> +               if (bpf_flow_dissector_write_keys(skb, &basic, sizeof(basic),
> +                                             FLOW_DISSECTOR_KEY_BASIC))
> +                       return BPF_DROP;
> +
> +               return parse_ip_proto(skb, nexthdr);
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +PROG(IP)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __u8 *data_end = (__u8 *)(long)skb->data_end;
> +       struct flow_dissector_key_control control;
> +       struct flow_dissector_key_addrs addrs;
> +       struct flow_dissector_key_basic basic;
> +       __u8 *data = (__u8 *)(long)skb->data;
> +       __u32 data_len = data_end - data;
> +       bool done = false;
> +       struct iphdr iph;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &iph, sizeof(iph)))
> +               return BPF_DROP;
> +
> +       /* IP header cannot be smaller than 20 bytes */
> +       if (iph.ihl < 5)
> +               return BPF_DROP;
> +
> +       addrs.v4addrs.src = iph.saddr;
> +       addrs.v4addrs.dst = iph.daddr;
> +       if (bpf_flow_dissector_write_keys(skb, &addrs, sizeof(addrs.v4addrs),
> +                                     FLOW_DISSECTOR_KEY_IPV4_ADDRS))
> +               return BPF_DROP;
> +
> +       cb->nhoff += iph.ihl << 2;
> +       if (cb->nhoff > data_len)
> +               return BPF_DROP;
> +
> +       if (iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
> +               cb->flags |= FLOW_DIS_IS_FRAGMENT;
> +               if (iph.frag_off & bpf_htons(IP_OFFSET))
> +                       /* From second fragment on, packets do not have headers
> +                        * we can parse.
> +                        */
> +                       done = true;
> +               else
> +                       cb->flags |= FLOW_DIS_FIRST_FRAG;
> +       }
> +
> +
> +       control.thoff = cb->nhoff;
> +       control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
> +       control.flags = cb->flags;
> +       if (bpf_flow_dissector_write_keys(skb, &control, sizeof(control),
> +                                         FLOW_DISSECTOR_KEY_CONTROL))
> +               return BPF_DROP;
> +
> +       memset(&basic, 0, sizeof(basic));
> +       basic.n_proto = bpf_htons(ETH_P_IP);
> +       basic.ip_proto = iph.protocol;
> +       if (bpf_flow_dissector_write_keys(skb, &basic, sizeof(basic),
> +                                     FLOW_DISSECTOR_KEY_BASIC))
> +               return BPF_DROP;
> +
> +       if (done)
> +               return BPF_OK;
> +
> +       return parse_ip_proto(skb, iph.protocol);
> +}
> +
> +PROG(IPV6)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct flow_dissector_key_addrs addrs;
> +       struct ipv6hdr ip6h;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &ip6h, sizeof(ip6h)))
> +               return BPF_DROP;
> +
> +       addrs.v6addrs.src = ip6h.saddr;
> +       addrs.v6addrs.dst = ip6h.daddr;
> +       if (bpf_flow_dissector_write_keys(skb, &addrs, sizeof(addrs.v6addrs),
> +                                     FLOW_DISSECTOR_KEY_IPV6_ADDRS))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(struct ipv6hdr);
> +
> +       return parse_ipv6_proto(skb, ip6h.nexthdr);
> +}
> +
> +PROG(IPV6OP)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __u8 proto;
> +       __u8 hlen;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &proto, sizeof(proto)))
> +               return BPF_DROP;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff + sizeof(proto), &hlen,
> +                              sizeof(hlen)))
> +               return BPF_DROP;
> +       /* hlen is in 8-octects and does not include the first 8 bytes
> +        * of the header
> +        */
> +       cb->nhoff += (1 + hlen) << 3;
> +
> +       return parse_ipv6_proto(skb, proto);
> +}
> +
> +PROG(IPV6FR)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       __be16 frag_off;
> +       __u8 proto;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &proto, sizeof(proto)))
> +               return BPF_DROP;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff + 2, &frag_off, sizeof(frag_off)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += 8;
> +       cb->flags |= FLOW_DIS_IS_FRAGMENT;
> +       if (!(frag_off & bpf_htons(IP6_OFFSET)))
> +               cb->flags |= FLOW_DIS_FIRST_FRAG;
> +
> +       return parse_ipv6_proto(skb, proto);
> +}
> +
> +PROG(MPLS)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct mpls_label mpls;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &mpls, sizeof(mpls)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(mpls);
> +
> +       if (mpls.entry & MPLS_LS_S_MASK) {
> +               /* This is the last MPLS header. The network layer packet always
> +                * follows the MPLS header. Peek forward and dispatch based on
> +                * that.
> +                */
> +               __u8 version;
> +
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &version,
> +                                      sizeof(version)))
> +                       return BPF_DROP;
> +
> +               /* IP version is always the first 4 bits of the header */
> +               switch (version & 0xF0) {
> +               case 4:
> +                       bpf_tail_call(skb, &jmp_table, IP);
> +                       break;
> +               case 6:
> +                       bpf_tail_call(skb, &jmp_table, IPV6);
> +                       break;
> +               default:
> +                       return BPF_DROP;
> +               }
> +       } else {
> +               bpf_tail_call(skb, &jmp_table, MPLS);
> +       }
> +
> +       return BPF_DROP;
> +}
> +
> +PROG(VLAN)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct vlan_hdr vlan;
> +       __be16 proto;
> +
> +       /* Peek back to see if single or double-tagging */
> +       if (bpf_skb_load_bytes(skb, cb->nhoff - sizeof(proto), &proto,
> +                              sizeof(proto)))
> +               return BPF_DROP;
> +
> +       /* Account for double-tagging */
> +       if (proto == bpf_htons(ETH_P_8021AD)) {
> +               if (bpf_skb_load_bytes(skb, cb->nhoff, &vlan, sizeof(vlan)))
> +                       return BPF_DROP;
> +
> +               if (vlan.h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q))
> +                       return BPF_DROP;
> +
> +               cb->nhoff += sizeof(vlan);
> +       }
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &vlan, sizeof(vlan)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(vlan);
> +       /* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/
> +       if (vlan.h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) ||
> +           vlan.h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q))
> +               return BPF_DROP;
> +
> +       return parse_eth_proto(skb, vlan.h_vlan_encapsulated_proto);
> +}
> +
> +PROG(GUE)(struct __sk_buff *skb)
> +{
> +       struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> +       struct guehdr gue;
> +
> +       if (bpf_skb_load_bytes(skb, cb->nhoff, &gue, sizeof(gue)))
> +               return BPF_DROP;
> +
> +       cb->nhoff += sizeof(gue);
> +       cb->nhoff += gue.hlen << 2;
> +
> +       cb->flags |= FLOW_DIS_ENCAPSULATION;
> +       return parse_ip_proto(skb, gue.proto_ctype);
> +}
> +
> +char __license[] SEC("license") = "GPL";
> --
> 2.18.0.865.gffc8e1a3cd6-goog
>

^ permalink raw reply

* Re: [PATCH] wireless: Use dma_zalloc_coherent instead of dma_alloc_coherent + memset
From: Kalle Valo @ 2018-08-18 18:31 UTC (permalink / raw)
  To: zhong jiang; +Cc: davem, linux-kernel, netdev
In-Reply-To: <87pnyfttop.fsf@kamboji.qca.qualcomm.com>

Kalle Valo <kvalo@codeaurora.org> writes:

> zhong jiang <zhongjiang@huawei.com> writes:
>
>> dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
>> We prefer to dma_zalloc_coherent instead of open-codeing.
>>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> The correct prefix is "wcn36xx: ", not "wireless:". I can fix it this
> time.
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#commit_title_is_wrong

Actually please resend this patch and CC linux-wireless so that
patchwork sees this.

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#who_to_address

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] wireless: Use dma_zalloc_coherent instead of dma_alloc_coherent + memset
From: Kalle Valo @ 2018-08-18 18:29 UTC (permalink / raw)
  To: zhong jiang; +Cc: davem, linux-kernel, netdev
In-Reply-To: <1534604707-10874-1-git-send-email-zhongjiang@huawei.com>

zhong jiang <zhongjiang@huawei.com> writes:

> dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
> We prefer to dma_zalloc_coherent instead of open-codeing.
>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)

The correct prefix is "wcn36xx: ", not "wireless:". I can fix it this
time.

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#commit_title_is_wrong

-- 
Kalle Valo

^ permalink raw reply

* [PATCH] ethernet: Use dma_zalloc_coherent to replace dma_alloc_coherent + memset
From: zhong jiang @ 2018-08-18 14:48 UTC (permalink / raw)
  To: davem; +Cc: jeffrey.t.kirsher, netdev, linux-kernel

dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
We prefer to dma_zalloc_coherent instead of open-codeing.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/ethernet/intel/ixgb/ixgb_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index 43664ad..d3e72d0 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -771,14 +771,13 @@ static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev,
 	rxdr->size = rxdr->count * sizeof(struct ixgb_rx_desc);
 	rxdr->size = ALIGN(rxdr->size, 4096);
 
-	rxdr->desc = dma_alloc_coherent(&pdev->dev, rxdr->size, &rxdr->dma,
-					GFP_KERNEL);
+	rxdr->desc = dma_zalloc_coherent(&pdev->dev, rxdr->size, &rxdr->dma,
+					 GFP_KERNEL);
 
 	if (!rxdr->desc) {
 		vfree(rxdr->buffer_info);
 		return -ENOMEM;
 	}
-	memset(rxdr->desc, 0, rxdr->size);
 
 	rxdr->next_to_clean = 0;
 	rxdr->next_to_use = 0;
-- 
1.7.12.4

^ permalink raw reply related

* WARNING: refcount bug in igmp_start_timer
From: syzbot @ 2018-08-18 16:37 UTC (permalink / raw)
  To: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    edb0a2000936 Merge tag 'arm64-fixes' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1749ebce400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4fd89f99c889a184
dashboard link: https://syzkaller.appspot.com/bug?extid=e28037ac1c96d2a86e89
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e28037ac1c96d2a86e89@syzkaller.appspotmail.com

binder: 17097:17099 Acquire 1 refcount change on invalid ref 0 ret -22
binder: 17097:17099 BC_REQUEST_DEATH_NOTIFICATION invalid ref 0
------------[ cut here ]------------
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 0 PID: 17119 at lib/refcount.c:153  
refcount_inc_checked+0x5d/0x70 lib/refcount.c:153
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 17119 Comm: syz-executor2 Not tainted 4.18.0+ #194
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
binder: 17097:17122 unknown command 1074553618
  panic+0x238/0x4e7 kernel/panic.c:184
  __warn.cold.8+0x163/0x1ba kernel/panic.c:536
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:993
RIP: 0010:refcount_inc_checked+0x5d/0x70 lib/refcount.c:153
Code: 1d 7d f5 24 05 31 ff 89 de e8 7f be 1b fe 84 db 75 df e8 a6 bd 1b fe  
48 c7 c7 00 72 3a 87 c6 05 5d f5 24 05 01 e8 43 4e e6 fd <0f> 0b eb c3 0f  
1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
binder: 17097:17122 ioctl c0306201 20000040 returned -22
RSP: 0018:ffff88019521ebd8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000200d000
RDX: 0000000000003667 RSI: ffffffff8163ac21 RDI: ffff88019521e8c8
RBP: ffff88019521ebe0 R08: ffff880199d10080 R09: 0000000000000002
R10: ffff880199d10080 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000008 R14: ffff8801d7aaa2c0 R15: dffffc0000000000
  igmp_start_timer+0xaf/0xe0 net/ipv4/igmp.c:217
  igmp_mod_timer net/ipv4/igmp.c:255 [inline]
  igmp_heard_query net/ipv4/igmp.c:1027 [inline]
  igmp_rcv+0x1920/0x3060 net/ipv4/igmp.c:1062
  ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215
  NF_HOOK include/linux/netfilter.h:287 [inline]
  ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256
  dst_input include/net/dst.h:450 [inline]
  ip_rcv_finish+0x1f9/0x300 net/ipv4/ip_input.c:415
  NF_HOOK include/linux/netfilter.h:287 [inline]
  ip_rcv+0xed/0x610 net/ipv4/ip_input.c:524
  __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4892
  __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5002
  netif_receive_skb_internal+0x12e/0x680 net/core/dev.c:5105
  netif_receive_skb+0xbf/0x420 net/core/dev.c:5178
  tun_rx_batched.isra.56+0x4ba/0x8c0 drivers/net/tun.c:1572
  tun_get_user+0x2ac9/0x42c0 drivers/net/tun.c:1982
  tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2010
  call_write_iter include/linux/fs.h:1808 [inline]
  do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
  do_iter_write+0x185/0x5f0 fs/read_write.c:959
  vfs_writev+0x1f1/0x360 fs/read_write.c:1004
  do_writev+0x11a/0x310 fs/read_write.c:1039
  __do_sys_writev fs/read_write.c:1112 [inline]
  __se_sys_writev fs/read_write.c:1109 [inline]
  __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456f41
Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b6 fb ff c3 48  
83 ec 08 e8 da 2c 00 00 48 89 04 24 b8 14 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 23 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007f1641b03ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 000000000000002a RCX: 0000000000456f41
RDX: 0000000000000001 RSI: 00007f1641b03bf0 RDI: 00000000000000f0
RBP: 0000000020000640 R08: 00000000000000f0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00000000004d6410 R14: 00000000004c9a96 R15: 0000000000000000
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* [PATCH] wireless: Use dma_zalloc_coherent instead of dma_alloc_coherent + memset
From: zhong jiang @ 2018-08-18 15:05 UTC (permalink / raw)
  To: kvalo, davem; +Cc: linux-kernel, netdev

dma_zalloc_coherent has implemented the dma_alloc_coherent() + memset (),
We prefer to dma_zalloc_coherent instead of open-codeing.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 06cfe8d..e66ddaa 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -174,13 +174,11 @@ static int wcn36xx_dxe_init_descs(struct device *dev, struct wcn36xx_dxe_ch *wcn
 	int i;
 
 	size = wcn_ch->desc_num * sizeof(struct wcn36xx_dxe_desc);
-	wcn_ch->cpu_addr = dma_alloc_coherent(dev, size, &wcn_ch->dma_addr,
-					      GFP_KERNEL);
+	wcn_ch->cpu_addr = dma_zalloc_coherent(dev, size, &wcn_ch->dma_addr,
+					       GFP_KERNEL);
 	if (!wcn_ch->cpu_addr)
 		return -ENOMEM;
 
-	memset(wcn_ch->cpu_addr, 0, size);
-
 	cur_dxe = (struct wcn36xx_dxe_desc *)wcn_ch->cpu_addr;
 	cur_ctl = wcn_ch->head_blk_ctl;
 
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH] ip6_vti: simplify stats handling in vti6_xmit
From: Haishuang Yan @ 2018-08-18 14:43 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Alexey Kuznetsov
  Cc: netdev, linux-kernel, Haishuang Yan

Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit
code path.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv6/ip6_vti.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index c72ae3a..65d4a80 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -503,17 +503,9 @@ static bool vti6_state_check(const struct xfrm_state *x,
 	skb->dev = skb_dst(skb)->dev;
 
 	err = dst_output(t->net, skb->sk, skb);
-	if (net_xmit_eval(err) == 0) {
-		struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
-
-		u64_stats_update_begin(&tstats->syncp);
-		tstats->tx_bytes += pkt_len;
-		tstats->tx_packets++;
-		u64_stats_update_end(&tstats->syncp);
-	} else {
-		stats->tx_errors++;
-		stats->tx_aborted_errors++;
-	}
+	if (net_xmit_eval(err) == 0)
+		err = pkt_len;
+	iptunnel_xmit_stats(dev, err);
 
 	return 0;
 tx_err_link_failure:
-- 
1.8.3.1

^ permalink raw reply related

* [PATCHv2 1/2] ethernet: declance:  Use NULL to compare with pointer-typed value rather than 0
From: zhong jiang @ 2018-08-18  6:32 UTC (permalink / raw)
  To: davem; +Cc: vz, slemieux.tyco, keescook, netdev, linux-kernel
In-Reply-To: <1534573949-17548-1-git-send-email-zhongjiang@huawei.com>

We should use NULL to compare with pointer-typed value rather than
0. The issue is detected with the help of Coccinelle.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/ethernet/amd/declance.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/declance.c b/drivers/net/ethernet/amd/declance.c
index 116997a..c636f02 100644
--- a/drivers/net/ethernet/amd/declance.c
+++ b/drivers/net/ethernet/amd/declance.c
@@ -606,8 +606,7 @@ static int lance_rx(struct net_device *dev)
 		} else {
 			len = (*rds_ptr(rd, mblength, lp->type) & 0xfff) - 4;
 			skb = netdev_alloc_skb(dev, len + 2);
-
-			if (skb == 0) {
+			if (!skb) {
 				dev->stats.rx_dropped++;
 				*rds_ptr(rd, mblength, lp->type) = 0;
 				*rds_ptr(rd, rmd1, lp->type) =
-- 
1.7.12.4

^ permalink raw reply related

* RE: [PATCH net-next v1] net/tls: Add support for async decryption of tls records
From: Vakul Garg @ 2018-08-18  5:55 UTC (permalink / raw)
  To: Dave Watson
  Cc: netdev@vger.kernel.org, borisp@mellanox.com, aviadye@mellanox.com,
	davem@davemloft.net
In-Reply-To: <20180817221238.b4napcwedbwup22q@davejwatson-mba.local.dhcp.thefacebook.com>



> -----Original Message-----
> From: Dave Watson <davejwatson@fb.com>
> Sent: Saturday, August 18, 2018 3:43 AM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: netdev@vger.kernel.org; borisp@mellanox.com;
> aviadye@mellanox.com; davem@davemloft.net
> Subject: Re: [PATCH net-next v1] net/tls: Add support for async decryption of
> tls records
> 
> On 08/16/18 08:49 PM, Vakul Garg wrote:
> > Changes since RFC version:
> > 	1) Improved commit message.
> > 	2) Fixed dequeued record offset handling because of which few of
> > 	   tls selftests 'recv_partial, recv_peek, recv_peek_multiple' were
> failing.
> 
> Thanks! Commit message much more clear, tests work great for me also,
> only minor comments on clarity
> 
> > -			if (tls_sw_advance_skb(sk, skb, chunk)) {
> > +			if (async) {
> > +				/* Finished with current record, pick up next
> */
> > +				ctx->recv_pkt = NULL;
> > +				__strp_unpause(&ctx->strp);
> > +				goto mark_eor_chk_ctrl;
> 
> Control flow is a little hard to follow here, maybe just pass an async flag to
> tls_sw_advance_skb?  It already does strp_unpause and recv_pkt = NULL.
> 

I improved it but in a slightly different way. Please see in v2.
As net-next is closed right now, I would send the patch to you privately &
later post it on list when David gives a green signal.
Is it ok?


> > +			} else if (tls_sw_advance_skb(sk, skb, chunk)) {
> >  				/* Return full control message to
> >  				 * userspace before trying to parse
> >  				 * another message type
> >  				 */
> > +mark_eor_chk_ctrl:
> >  				msg->msg_flags |= MSG_EOR;
> >  				if (control != TLS_RECORD_TYPE_DATA)
> >  					goto recv_end;
> > +			} else {
> > +				break;
> 
> I don't see the need for the else { break; }, isn't this already covered by
> while(len); below as before?
 
When tls_sw_advance_skb() returns false, it is certain that we cannot 
continue in the loop. So putting a break here avoids having to execute
'if' checks and while (len) checks down below.

^ permalink raw reply

* Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library
From: Ard Biesheuvel @ 2018-08-18  8:13 UTC (permalink / raw)
  To: D. J. Bernstein
  Cc: Eric Biggers, Jason A. Donenfeld, Eric Biggers,
	Linux Crypto Mailing List, LKML, Netdev, David Miller,
	Andrew Lutomirski, Greg Kroah-Hartman, Samuel Neves, Tanja Lange,
	Jean-Philippe Aumasson, Karthikeyan Bhargavan
In-Reply-To: <20180817073120.12640.qmail@cr.yp.to>



> On 17 Aug 2018, at 10:31, D. J. Bernstein <djb@cr.yp.to> wrote:
> 
> Eric Biggers writes:
>> If (more likely) you're talking about things like "use this NEON implementation
>> on Cortex-A7 but this other NEON implementation on Cortex-A53", it's up the
>> developers and community to test different CPUs and make appropriate decisions,
>> and yes it can be very useful to have external benchmarks like SUPERCOP to refer
>> to, and I appreciate your work in that area.
> 
> You seem to be talking about a process that selects (e.g.) ChaCha20
> implementations as follows: manually inspect benchmarks of various
> implementations on various CPUs, manually write code to map CPUs to
> implementations, manually update the code as necessary for new CPUs, and
> of course manually do the same for every other primitive that can see
> differences between microarchitectures (which isn't something weird---
> it's the normal situation after enough optimization effort).
> 
> This is quite a bit of manual work, so the kernel often doesn't do it,
> so we end up with unhappy people talking about performance regressions.
> 
> For comparison, imagine one simple central piece of code in the kernel
> to automatically do the following:
> 
>   When a CPU core is booted:
>     For each primitive:
>       Benchmark all implementations of the primitive on the core.
>       Select the fastest for subsequent use on the core.
> 
> If this is a general-purpose mechanism (as in SUPERCOP, NaCl, and
> libpqcrypto) rather than something ad-hoc (as in raid6), then there's no
> manual work per primitive, and no work per implementation. Each CPU, old
> or new, automatically obtains the fastest available code for that CPU.
> 
> The only cost is a moment of benchmarking at boot time. _If_ this is a
> noticeable cost then there are many ways to speed it up: for example,
> automatically copy the results across identical cores, automatically
> copy the results across boots if the cores are unchanged, automatically
> copy results from a central database indexed by CPU identifiers, etc.
> The SUPERCOP database is evolving towards enabling this type of sharing.
> 

‘Fastest’ does not imply ‘preferred’. For instance, running the table based cache thrashing generic AES implementation may be fast, but may put a disproportionate load on, e.g., a hyperthreading system, and as you have pointed out yourself, it is time variant as well. Then, there is the power consumption aspect: NEON bit sliced AES may be faster, but does a lot more work, and does it on the SIMD unit which could potentially be turned off entirely otherwise. Only the implementations based on h/w instructions can generally be assumed optimal in all senses, and there is no real point in benchmarking those against pure software implementations.

Then, there is the aspect of accelerators: the kernel’s crypto API seamlessly supports crypto peripherals, which may be slower or faster, have more or fewer queues than the number of CPUs, may offer additional benefits such as protected AES keys etc etc.

In the linux kernel, we generally try to stay away from policy decisions, and offer the controls to allow userland to take charge of this. The modularized crypto code can be blacklisted per algo implementation if desired, and beyond that, we simply try to offer functionality that covers the common case.

>> A lot of code can be shared, but in practice different environments have
>> different constraints, and kernel programming in particular has some distinct
>> differences from userspace programming.  For example, you cannot just use the
>> FPU (including SSE, AVX, NEON, etc.) registers whenever you want to, since on
>> most architectures they can't be used in some contexts such as hardirq context,
>> and even when they *can* be used you have to run special code before and after
>> which does things like saving all the FPU registers to the task_struct,
>> disabling preemption, and/or enabling the FPU.
> 
> Is there some reason that each implementor is being pestered to handle
> all this? Detecting FPU usage is a simple static-analysis exercise, and
> the rest sounds like straightforward boilerplate that should be handled
> centrally.
> 

Detecting it is easy but that does not mean that you can use SIMD in any context, and whether a certain function may ever be called from such a context cannot be decided by static analysis. Also, there are performance and latency concerns which need to be taken into account.

In the kernel, we simply cannot write our algorithm as if our code is the only thing running on the system.

>> But disabling preemption for
>> long periods of time hurts responsiveness, so it's also desirable to yield the
>> processor occasionally, which means that assembly implementations should be
>> incremental rather than having a single entry point that does everything.
> 
> Doing this rewrite automatically is a bit more of a code-analysis
> challenge, but the alternative approach of doing it by hand is insanely
> error-prone. See, e.g., https://eprint.iacr.org/2017/891.
> 
>> Many people may have contributed to SUPERCOP already, but that doesn't mean
>> there aren't things you could do to make it more appealing to contributors and
>> more of a community project,
> 
> The logic in this sentence is impeccable, and is already illustrated by
> many SUPERCOP improvements through the years from an increasing number
> of contributors, as summarized in the 87 release announcements so far on
> the relevant public mailing list, which you're welcome to study in
> detail along with the 400 megabytes of current code and as many previous
> versions as you're interested in. That's also the mailing list where
> people are told to send patches, as you'll see if you RTFM.
> 
>> So Linux distributions may not want to take on the legal risk of
>> distributing it
> 
> This is a puzzling comment. A moment ago we were talking about the
> possibility of useful sharing of (e.g.) ChaCha20 implementations between
> SUPERCOP and the Linux kernel, avoiding pointless fracturing of the
> community's development process for these implementations. This doesn't
> mean that the kernel should be grabbing implementations willy-nilly from
> SUPERCOP---surely the kernel should be doing security audits, and the
> kernel already has various coding requirements, and the kernel requires
> GPL compatibility, while putting any of these requirements into SUPERCOP
> would be counterproductive.
> 
> If you mean having the entire SUPERCOP benchmarking package distributed
> through Linux distributions, I have no idea what your motivation is or
> how this is supposed to be connected to anything else we're discussing.
> Obviously SUPERCOP's broad code-inclusion policies make this idea a
> non-starter.
> 
>> nor may companies want to take on the risk of contributing.
> 
> RTFM. People who submit code are authorizing public redistribution for
> benchmarking. It's up to them to decide if they want to allow more.
> 
> ---Dan

^ permalink raw reply

* Re: [PATCH 0/2] ethernet: Use NULL to compare with pointer-typed value rather than 0
From: zhong jiang @ 2018-08-18  6:45 UTC (permalink / raw)
  To: davem; +Cc: vz, slemieux.tyco, keescook, netdev, linux-kernel
In-Reply-To: <1534573773-17358-1-git-send-email-zhongjiang@huawei.com>

ingore the patchset.  should be change the title  from [patch] to [patchv2].
On 2018/8/18 14:29, zhong jiang wrote:
> v1->v2:
>  - According to Vladimir's suggestion. change a common 0 and NULL comparsion form.
>
> zhong jiang (2):
>   ethernet: declance:  Use NULL to compare with pointer-typed value
>     rather than 0
>   ethernet: lpc_eth: Use NULL to compare with pointer-typed value
>     rather than 0
>
>  drivers/net/ethernet/amd/declance.c | 3 +--
>  drivers/net/ethernet/nxp/lpc_eth.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)
>

^ permalink raw reply

* [PATCHv2 2/2] ethernet: lpc_eth: Use NULL to compare with pointer-typed value rather than 0
From: zhong jiang @ 2018-08-18  6:32 UTC (permalink / raw)
  To: davem; +Cc: vz, slemieux.tyco, keescook, netdev, linux-kernel
In-Reply-To: <1534573949-17548-1-git-send-email-zhongjiang@huawei.com>

We should use NULL to compare with pointer-typed value rather than 0.
The issue is detected with the help of Coccinelle.

Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/ethernet/nxp/lpc_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
index 08381ef..1c41b07 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -1350,7 +1350,7 @@ static int lpc_eth_drv_probe(struct platform_device *pdev)
 				"IRAM not big enough for net buffers, using SDRAM instead.\n");
 	}
 
-	if (pldat->dma_buff_base_v == 0) {
+	if (!pldat->dma_buff_base_v) {
 		ret = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
 		if (ret)
 			goto err_out_free_irq;
-- 
1.7.12.4

^ permalink raw reply related

* [PATCHv2 0/2] ethernet: Use NULL to compare with pointer-typed value rather than 0
From: zhong jiang @ 2018-08-18  6:32 UTC (permalink / raw)
  To: davem; +Cc: vz, slemieux.tyco, keescook, netdev, linux-kernel

v1->v2:
 - According to Vladimir's suggestion. change a common 0 and NULL comparsion form.

zhong jiang (2):
  ethernet: declance:  Use NULL to compare with pointer-typed value
    rather than 0
  ethernet: lpc_eth: Use NULL to compare with pointer-typed value
    rather than 0

 drivers/net/ethernet/amd/declance.c | 3 +--
 drivers/net/ethernet/nxp/lpc_eth.c  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

-- 
1.7.12.4

^ permalink raw reply

* [PATCH 2/2] ethernet: lpc_eth: Use NULL to compare with pointer-typed value rather than 0
From: zhong jiang @ 2018-08-18  6:29 UTC (permalink / raw)
  To: davem; +Cc: vz, slemieux.tyco, keescook, netdev, linux-kernel
In-Reply-To: <1534573773-17358-1-git-send-email-zhongjiang@huawei.com>

We should use NULL to compare with pointer-typed value rather than 0.
The issue is detected with the help of Coccinelle.

Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/ethernet/nxp/lpc_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
index 08381ef..1c41b07 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -1350,7 +1350,7 @@ static int lpc_eth_drv_probe(struct platform_device *pdev)
 				"IRAM not big enough for net buffers, using SDRAM instead.\n");
 	}
 
-	if (pldat->dma_buff_base_v == 0) {
+	if (!pldat->dma_buff_base_v) {
 		ret = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
 		if (ret)
 			goto err_out_free_irq;
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH 1/2] ethernet: declance:  Use NULL to compare with pointer-typed value rather than 0
From: zhong jiang @ 2018-08-18  6:29 UTC (permalink / raw)
  To: davem; +Cc: vz, slemieux.tyco, keescook, netdev, linux-kernel
In-Reply-To: <1534573773-17358-1-git-send-email-zhongjiang@huawei.com>

We should use NULL to compare with pointer-typed value rather than
0. The issue is detected with the help of Coccinelle.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/ethernet/amd/declance.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/declance.c b/drivers/net/ethernet/amd/declance.c
index 116997a..c636f02 100644
--- a/drivers/net/ethernet/amd/declance.c
+++ b/drivers/net/ethernet/amd/declance.c
@@ -606,8 +606,7 @@ static int lance_rx(struct net_device *dev)
 		} else {
 			len = (*rds_ptr(rd, mblength, lp->type) & 0xfff) - 4;
 			skb = netdev_alloc_skb(dev, len + 2);
-
-			if (skb == 0) {
+			if (!skb) {
 				dev->stats.rx_dropped++;
 				*rds_ptr(rd, mblength, lp->type) = 0;
 				*rds_ptr(rd, rmd1, lp->type) =
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH 0/2] ethernet: Use NULL to compare with pointer-typed value rather than 0
From: zhong jiang @ 2018-08-18  6:29 UTC (permalink / raw)
  To: davem; +Cc: vz, slemieux.tyco, keescook, netdev, linux-kernel

v1->v2:
 - According to Vladimir's suggestion. change a common 0 and NULL comparsion form.

zhong jiang (2):
  ethernet: declance:  Use NULL to compare with pointer-typed value
    rather than 0
  ethernet: lpc_eth: Use NULL to compare with pointer-typed value
    rather than 0

 drivers/net/ethernet/amd/declance.c | 3 +--
 drivers/net/ethernet/nxp/lpc_eth.c  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

-- 
1.7.12.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox