Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 03/10] bpf/verifier: refine retval R0 state for bpf_get_stack helper
From: Yonghong Song @ 2018-04-23 17:54 UTC (permalink / raw)
  To: ast, daniel, netdev, ecree; +Cc: kernel-team
In-Reply-To: <20180423175417.4104464-1-yhs@fb.com>

The special property of return values for helpers bpf_get_stack
and bpf_probe_read_str are captured in verifier.
Both helpers return a negative error code or
a length, which is equal to or smaller than the buffer
size argument. This additional information in the
verifier can avoid the condition such as "retval > bufsize"
in the bpf program. For example, for the code blow,
    usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
    if (usize < 0 || usize > max_len)
        return 0;
The verifier may have the following errors:
    52: (85) call bpf_get_stack#65
     R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
     R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
     R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R9_w=inv800 R10=fp0,call_-1
    53: (bf) r8 = r0
    54: (bf) r1 = r8
    55: (67) r1 <<= 32
    56: (bf) r2 = r1
    57: (77) r2 >>= 32
    58: (25) if r2 > 0x31f goto pc+33
     R0=inv(id=0) R1=inv(id=0,smax_value=9223372032559808512,
                         umax_value=18446744069414584320,
                         var_off=(0x0; 0xffffffff00000000))
     R2=inv(id=0,umax_value=799,var_off=(0x0; 0x3ff))
     R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R8=inv(id=0) R9=inv800 R10=fp0,call_-1
    59: (1f) r9 -= r8
    60: (c7) r1 s>>= 32
    61: (bf) r2 = r7
    62: (0f) r2 += r1
    math between map_value pointer and register with unbounded
    min value is not allowed
The failure is due to llvm compiler optimization where register "r2",
which is a copy of "r1", is tested for condition while later on "r1"
is used for map_ptr operation. The verifier is not able to track such
inst sequence effectively.

Without the "usize > max_len" condition, there is no llvm optimization
and the below generated code passed verifier:
    52: (85) call bpf_get_stack#65
     R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
     R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
     R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R9_w=inv800 R10=fp0,call_-1
    53: (b7) r1 = 0
    54: (bf) r8 = r0
    55: (67) r8 <<= 32
    56: (c7) r8 s>>= 32
    57: (6d) if r1 s> r8 goto pc+24
     R0=inv(id=0,umax_value=800) R1=inv0 R6=ctx(id=0,off=0,imm=0)
     R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R8=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R9=inv800
     R10=fp0,call_-1
    58: (bf) r2 = r7
    59: (0f) r2 += r8
    60: (1f) r9 -= r8
    61: (bf) r1 = r6

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/verifier.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index aba9425..d00bf53 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -164,6 +164,8 @@ struct bpf_call_arg_meta {
 	bool pkt_access;
 	int regno;
 	int access_size;
+	s64 msize_smax_value;
+	u64 msize_umax_value;
 };
 
 static DEFINE_MUTEX(bpf_verifier_lock);
@@ -1994,6 +1996,12 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 	} else if (arg_type_is_mem_size(arg_type)) {
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
+		/* remember the mem_size which may be used later
+		 * to refine return values.
+		 */
+		meta->msize_smax_value = reg->smax_value;
+		meta->msize_umax_value = reg->umax_value;
+
 		/* The register is SCALAR_VALUE; the access check
 		 * happens using its boundaries.
 		 */
@@ -2333,6 +2341,21 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 	return 0;
 }
 
+static void do_refine_retval_range(struct bpf_reg_state *regs, int ret_type,
+				   int func_id,
+				   struct bpf_call_arg_meta *meta)
+{
+	struct bpf_reg_state *ret_reg = &regs[BPF_REG_0];
+
+	if (ret_type != RET_INTEGER ||
+	    (func_id != BPF_FUNC_get_stack &&
+	     func_id != BPF_FUNC_probe_read_str))
+		return;
+
+	ret_reg->smax_value = meta->msize_smax_value;
+	ret_reg->umax_value = meta->msize_umax_value;
+}
+
 static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 {
 	const struct bpf_func_proto *fn = NULL;
@@ -2456,6 +2479,8 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		return -EINVAL;
 	}
 
+	do_refine_retval_range(regs, fn->ret_type, func_id, &meta);
+
 	err = check_map_func_compatibility(env, meta.map_ptr, func_id);
 	if (err)
 		return err;
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v5 04/10] bpf: remove never-hit branches in verifier adjust_scalar_min_max_vals
From: Yonghong Song @ 2018-04-23 17:54 UTC (permalink / raw)
  To: ast, daniel, netdev, ecree; +Cc: kernel-team
In-Reply-To: <20180423175417.4104464-1-yhs@fb.com>

In verifier function adjust_scalar_min_max_vals,
when src_known is false and the opcode is BPF_LSH/BPF_RSH,
early return will happen in the function. So remove
the branch in handling BPF_LSH/BPF_RSH when src_known is false.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/verifier.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d00bf53..1bbb43d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2932,10 +2932,7 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
 			dst_reg->umin_value <<= umin_val;
 			dst_reg->umax_value <<= umax_val;
 		}
-		if (src_known)
-			dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val);
-		else
-			dst_reg->var_off = tnum_lshift(tnum_unknown, umin_val);
+		dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val);
 		/* We may learn something more from the var_off */
 		__update_reg_bounds(dst_reg);
 		break;
@@ -2963,11 +2960,7 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
 		 */
 		dst_reg->smin_value = S64_MIN;
 		dst_reg->smax_value = S64_MAX;
-		if (src_known)
-			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
-						       umin_val);
-		else
-			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
+		dst_reg->var_off = tnum_rshift(dst_reg->var_off, umin_val);
 		dst_reg->umin_value >>= umax_val;
 		dst_reg->umax_value >>= umin_val;
 		/* We may learn something more from the var_off */
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v5 01/10] bpf: change prototype for stack_map_get_build_id_offset
From: Yonghong Song @ 2018-04-23 17:54 UTC (permalink / raw)
  To: ast, daniel, netdev, ecree; +Cc: kernel-team
In-Reply-To: <20180423175417.4104464-1-yhs@fb.com>

This patch didn't incur functionality change. The function prototype
got changed so that the same function can be reused later.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/stackmap.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 57eeb12..04f6ec1 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -262,16 +262,11 @@ static int stack_map_get_build_id(struct vm_area_struct *vma,
 	return ret;
 }
 
-static void stack_map_get_build_id_offset(struct bpf_map *map,
-					  struct stack_map_bucket *bucket,
+static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 					  u64 *ips, u32 trace_nr, bool user)
 {
 	int i;
 	struct vm_area_struct *vma;
-	struct bpf_stack_build_id *id_offs;
-
-	bucket->nr = trace_nr;
-	id_offs = (struct bpf_stack_build_id *)bucket->data;
 
 	/*
 	 * We cannot do up_read() in nmi context, so build_id lookup is
@@ -361,8 +356,10 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 			pcpu_freelist_pop(&smap->freelist);
 		if (unlikely(!new_bucket))
 			return -ENOMEM;
-		stack_map_get_build_id_offset(map, new_bucket, ips,
-					      trace_nr, user);
+		new_bucket->nr = trace_nr;
+		stack_map_get_build_id_offset(
+			(struct bpf_stack_build_id *)new_bucket->data,
+			ips, trace_nr, user);
 		trace_len = trace_nr * sizeof(struct bpf_stack_build_id);
 		if (hash_matches && bucket->nr == trace_nr &&
 		    memcmp(bucket->data, new_bucket->data, trace_len) == 0) {
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH 0/3] Introduce LSM-hook for socketpair(2)
From: Serge E. Hallyn @ 2018-04-23 17:52 UTC (permalink / raw)
  To: David Herrmann
  Cc: linux-kernel, James Morris, Paul Moore, teg, Stephen Smalley,
	selinux, linux-security-module, Eric Paris, serge, davem, netdev
In-Reply-To: <20180423133015.5455-1-dh.herrmann@gmail.com>

Quoting David Herrmann (dh.herrmann@gmail.com):
> Hi
> 
> This series adds a new LSM hook for the socketpair(2) syscall. The idea
> is to allow SO_PEERSEC to be called on AF_UNIX sockets created via
> socketpair(2), and return the same information as if you emulated
> socketpair(2) via a temporary listener socket. Right now SO_PEERSEC
> will return the unlabeled credentials for a socketpair, rather than the
> actual credentials of the creating process.
> 
> A simple call to:
> 
>     socketpair(AF_UNIX, SOCK_STREAM, 0, out);
> 
> can be emulated via a temporary listener socket bound to a unique,
> random name in the abstract namespace. By connecting to this listener
> socket, accept(2) will return the second part of the pair. If
> SO_PEERSEC is queried on these, the correct credentials of the creating
> process are returned. A simple comparison between the behavior of
> SO_PEERSEC on socketpair(2) and an emulated socketpair is included in
> the dbus-broker test-suite [1].
> 
> This patch series tries to close this gap and makes both behave the
> same. A new LSM-hook is added which allows LSMs to cache the correct
> peer information on newly created socket-pairs.
> 
> Apart from fixing this behavioral difference, the dbus-broker project
> actually needs to query the credentials of socketpairs, and currently
> must resort to racy procfs(2) queries to get the LSM credentials of its
> controller socket. Several parts of the dbus-broker project allow you
> to pass in a socket during execve(2), which will be used by the child
> process to accept control-commands from its parent. One natural way to
> create this communication channel is to use socketpair(2). However,
> right now SO_PEERSEC does not return any useful information, hence, the
> child-process would need other means to retrieve this information. By
> avoiding socketpair(2) and using the hacky-emulated version, this is not
> an issue.
> 
> There was a previous discussion on this matter [2] roughly a year ago.
> Back then there was the suspicion that proper SO_PEERSEC would confuse
> applications. However, we could not find any evidence backing this
> suspicion. Furthermore, we now actually see the contrary. Lack of
> SO_PEERSEC makes it a hassle to use socketpairs with LSM credentials.
> Hence, we propose to implement full SO_PEERSEC for socketpairs.
> 
> This series only adds SELinux backends, since that is what we need for
> RHEL. I will gladly extend the other LSMs if needed.
> 
> Thanks
> David
> 
> [1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c
> [2] https://www.spinics.net/lists/selinux/msg22674.html
> 
> David Herrmann (3):
>   security: add hook for socketpair(AF_UNIX, ...)
>   net/unix: hook unix_socketpair() into LSM
>   selinux: provide unix_stream_socketpair callback
> 
>  include/linux/lsm_hooks.h |  8 ++++++++
>  include/linux/security.h  |  7 +++++++
>  net/unix/af_unix.c        |  5 +++++
>  security/security.c       |  6 ++++++
>  security/selinux/hooks.c  | 14 ++++++++++++++
>  5 files changed, 40 insertions(+)

Makes sense to me, thanks.

Acked-by: Serge Hallyn <serge@hallyn.com>

^ permalink raw reply

* Re: [PATCH net-next 0/6] net: Extend availability of PHY statistics
From: Florian Fainelli @ 2018-04-23 17:46 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Andrew Lunn, Vivien Didelot, cphealy
In-Reply-To: <1524336953-26108-1-git-send-email-f.fainelli@gmail.com>

On 04/21/2018 11:55 AM, Florian Fainelli wrote:
> Hi all,
> 
> This patch series adds support for retrieving PHY statistics with DSA switches
> when the CPU port uses a PHY to PHY connection (as opposed to MAC to MAC).
> To get there a number of things are done:
> 
> - first we move the code dealing with PHY statistics outside of net/core/ethtool.c
>   and create helper functions since the same code will be reused
> - then we allow network device drivers to provide an ethtool_get_phy_stats callback
>   when the standard PHY library helpers are not suitable
> - we update the DSA functions dealing with ethtool operations to get passed a
>   stringset instead of assuming ETH_SS_STATS like they currently do
> - then we provide a set of standard helpers within DSA as a framework and add
>   the plumbing to allow retrieving the PHY statistics of the CPU port(s)
> - finally plug support for retrieving such PHY statistics with the b53 driver

David, there will be a v2 coming not just to fix the kbuild reported
issues, but also another one reported privately, thank you.

> 
> Florian Fainelli (6):
>   net: Move PHY statistics code into PHY library helpers
>   net: Allow network devices to have PHY statistics
>   net: dsa: Pass stringset to ethtool operations
>   net: dsa: Add helper function to obtain PHY device of a given port
>   net: dsa: Allow providing PHY statistics from CPU port
>   net: dsa: b53: Add support for reading PHY statistics
> 
>  drivers/net/dsa/b53/b53_common.c       | 66 ++++++++++++++++++++++---
>  drivers/net/dsa/b53/b53_priv.h         |  6 ++-
>  drivers/net/dsa/bcm_sf2.c              |  1 +
>  drivers/net/dsa/dsa_loop.c             | 11 ++++-
>  drivers/net/dsa/lan9303-core.c         | 11 ++++-
>  drivers/net/dsa/microchip/ksz_common.c | 11 ++++-
>  drivers/net/dsa/mt7530.c               | 11 ++++-
>  drivers/net/dsa/mv88e6xxx/chip.c       | 10 +++-
>  drivers/net/dsa/qca8k.c                | 10 +++-
>  drivers/net/phy/phy.c                  | 48 ++++++++++++++++++
>  include/linux/ethtool.h                |  5 ++
>  include/linux/phy.h                    |  4 ++
>  include/net/dsa.h                      | 12 ++++-
>  net/core/ethtool.c                     | 61 ++++++++---------------
>  net/dsa/master.c                       | 41 +++++++++++++---
>  net/dsa/port.c                         | 90 +++++++++++++++++++++++++++++-----
>  net/dsa/slave.c                        |  5 +-
>  17 files changed, 320 insertions(+), 83 deletions(-)
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-04-23 17:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jiri Pirko, Sridhar Samudrala, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh
In-Reply-To: <20180423202204-mutt-send-email-mst@kernel.org>

On Mon, 23 Apr 2018 20:24:56 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:
> > > >
> > > >I will NAK patches to change to common code for netvsc especially the
> > > >three device model.  MS worked hard with distro vendors to support transparent
> > > >mode, ans we really can't have a new model; or do backport.
> > > >
> > > >Plus, DPDK is now dependent on existing model.    
> > > 
> > > Sorry, but nobody here cares about dpdk or other similar oddities.  
> > 
> > The network device model is a userspace API, and DPDK is a userspace application.  
> 
> It is userspace but are you sure dpdk is actually poking at netdevs?
> AFAIK it's normally banging device registers directly.
> 
> > You can't go breaking userspace even if you don't like the application.  
> 
> Could you please explain how is the proposed patchset breaking
> userspace? Ignoring DPDK for now, I don't think it changes the userspace
> API at all.
> 

The DPDK has a device driver vdev_netvsc which scans the Linux network devices
to look for Linux netvsc device and the paired VF device and setup the
DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
and sets up TAP support over the Linux netvsc device as well as the Mellanox
VF device.

So it depends on existing 2 device model. You can't go to a 3 device model
or start hiding devices from userspace.

Also, I am working on associating netvsc and VF device based on serial number
rather than MAC address. The serial number is how Windows works now, and it makes
sense for Linux and Windows to use the same mechanism if possible.

^ permalink raw reply

* Re: [nf-next] netfilter: extend SRH match to support matching previous, next and last SID
From: Pablo Neira Ayuso @ 2018-04-23 17:30 UTC (permalink / raw)
  To: Ahmed Abdelsalam
  Cc: fw, davem, dav.lebrun, linux-kernel, netfilter-devel, coreteam,
	netdev
In-Reply-To: <1524480503-1883-2-git-send-email-amsalam20@gmail.com>

On Mon, Apr 23, 2018 at 05:48:22AM -0500, Ahmed Abdelsalam wrote:
> IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed by
> SR encapsulated packet. Each SID is encoded as an IPv6 prefix.
> 
> When a Firewall receives an SR encapsulated packet, it should be able to
> identify which node previously processed the packet (previous SID), which
> node is going to process the packet next (next SID), and which node is the
> last to process the packet (last SID) which represent the final destination
> of the packet in case of inline SR mode.
> 
> An example use-case of using these features could be SID list that includes
> two firewalls. When the second firewall receives a packet, it can check
> whether the packet has been processed by the first firewall or not. Based on
> that check, it decides to apply all rules, apply just subset of the rules,
> or totally skip all rules and forward the packet to the next SID.
> 
> This patch extends SRH match to support matching previous SID, next SID, and
> last SID.
> 
> Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
> ---
>  include/uapi/linux/netfilter_ipv6/ip6t_srh.h | 22 +++++++++++++--
>  net/ipv6/netfilter/ip6t_srh.c                | 41 +++++++++++++++++++++++++++-
>  2 files changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> index f3cc0ef..9808382 100644
> --- a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> +++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> @@ -17,7 +17,10 @@
>  #define IP6T_SRH_LAST_GT        0x0100
>  #define IP6T_SRH_LAST_LT        0x0200
>  #define IP6T_SRH_TAG            0x0400
> -#define IP6T_SRH_MASK           0x07FF
> +#define IP6T_SRH_PSID           0x0800
> +#define IP6T_SRH_NSID           0x1000
> +#define IP6T_SRH_LSID           0x2000
> +#define IP6T_SRH_MASK           0x3FFF
>  
>  /* Values for "mt_invflags" field in struct ip6t_srh */
>  #define IP6T_SRH_INV_NEXTHDR    0x0001
> @@ -31,7 +34,10 @@
>  #define IP6T_SRH_INV_LAST_GT    0x0100
>  #define IP6T_SRH_INV_LAST_LT    0x0200
>  #define IP6T_SRH_INV_TAG        0x0400
> -#define IP6T_SRH_INV_MASK       0x07FF
> +#define IP6T_SRH_INV_PSID       0x0800
> +#define IP6T_SRH_INV_NSID       0x1000
> +#define IP6T_SRH_INV_LSID       0x2000
> +#define IP6T_SRH_INV_MASK       0x3FFF
>  
>  /**
>   *      struct ip6t_srh - SRH match options
> @@ -40,6 +46,12 @@
>   *      @ segs_left: Segments left field of SRH
>   *      @ last_entry: Last entry field of SRH
>   *      @ tag: Tag field of SRH
> + *      @ psid_addr: Address of previous SID in SRH SID list
> + *      @ nsid_addr: Address of NEXT SID in SRH SID list
> + *      @ lsid_addr: Address of LAST SID in SRH SID list
> + *      @ psid_msk: Mask of previous SID in SRH SID list
> + *      @ nsid_msk: Mask of next SID in SRH SID list
> + *      @ lsid_msk: MAsk of last SID in SRH SID list
>   *      @ mt_flags: match options
>   *      @ mt_invflags: Invert the sense of match options
>   */
> @@ -50,6 +62,12 @@ struct ip6t_srh {
>  	__u8                    segs_left;
>  	__u8                    last_entry;
>  	__u16                   tag;
> +	struct in6_addr		psid_addr;
> +	struct in6_addr		nsid_addr;
> +	struct in6_addr		lsid_addr;
> +	struct in6_addr		psid_msk;
> +	struct in6_addr		nsid_msk;
> +	struct in6_addr		lsid_msk;

This is changing something exposed through UAPI, so you will need a
new revision for this.

>  	__u16                   mt_flags;
>  	__u16                   mt_invflags;
>  };
> diff --git a/net/ipv6/netfilter/ip6t_srh.c b/net/ipv6/netfilter/ip6t_srh.c
> index 33719d5..2b5cc73 100644
> --- a/net/ipv6/netfilter/ip6t_srh.c
> +++ b/net/ipv6/netfilter/ip6t_srh.c
> @@ -30,7 +30,9 @@ static bool srh_mt6(const struct sk_buff *skb, struct xt_action_param *par)
>  	const struct ip6t_srh *srhinfo = par->matchinfo;
>  	struct ipv6_sr_hdr *srh;
>  	struct ipv6_sr_hdr _srh;
> -	int hdrlen, srhoff = 0;
> +	int hdrlen, psidoff, nsidoff, lsidoff, srhoff = 0;
> +	struct in6_addr *psid, *nsid, *lsid;
> +	struct in6_addr _psid, _nsid, _lsid;

Could you rearrange variable definitions? ie. longest line first, eg.

	int hdrlen, psidoff, nsidoff, lsidoff, srhoff = 0;
  	const struct ip6t_srh *srhinfo = par->matchinfo;
	struct in6_addr *psid, *nsid, *lsid;
  	struct ipv6_sr_hdr *srh;
  	struct ipv6_sr_hdr _srh;

Thanks.

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-04-23 17:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Sridhar Samudrala, mst, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <20180423100406.71b95f74@xeon-e3>

Mon, Apr 23, 2018 at 07:04:06PM CEST, stephen@networkplumber.org wrote:
>On Fri, 20 Apr 2018 18:00:58 +0200
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Fri, Apr 20, 2018 at 05:28:02PM CEST, stephen@networkplumber.org wrote:
>> >On Thu, 19 Apr 2018 18:42:04 -0700
>> >Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>> >  
>> >> Use the registration/notification framework supported by the generic
>> >> failover infrastructure.
>> >> 
>> >> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>  
>> >
>> >Do what you want to other devices but leave netvsc alone.
>> >Adding these failover ops does not reduce the code size, and really is
>> >no benefit.  The netvsc device driver needs to be backported to several
>> >other distributions and doing this makes that harder.  
>> 
>> We should not care about the backport burden when we are trying to make
>> things right. And things are not right. The current netvsc approach is
>> just plain wrong shortcut. It should have been done in a generic way
>> from the very beginning. We are just trying to fix this situation.
>> 
>> Moreover, I believe that part of the fix is to convert netvsc to 3
>> netdev solution too. 2 netdev model is wrong.
>> 
>> 
>> >
>> >I will NAK patches to change to common code for netvsc especially the
>> >three device model.  MS worked hard with distro vendors to support transparent
>> >mode, ans we really can't have a new model; or do backport.
>> >
>> >Plus, DPDK is now dependent on existing model.  
>> 
>> Sorry, but nobody here cares about dpdk or other similar oddities.
>
>The network device model is a userspace API, and DPDK is a userspace application.
>You can't go breaking userspace even if you don't like the application.

I don't understand how you can break anything by exposing
just-another-netdevice. If you break it, it is already broken...

And how you can break anything in userspace by doing refactoring inside
the kernel is puzzling me even more...

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-04-23 17:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Pirko, Sridhar Samudrala, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh
In-Reply-To: <20180423100406.71b95f74@xeon-e3>

On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:
> > >
> > >I will NAK patches to change to common code for netvsc especially the
> > >three device model.  MS worked hard with distro vendors to support transparent
> > >mode, ans we really can't have a new model; or do backport.
> > >
> > >Plus, DPDK is now dependent on existing model.  
> > 
> > Sorry, but nobody here cares about dpdk or other similar oddities.
> 
> The network device model is a userspace API, and DPDK is a userspace application.

It is userspace but are you sure dpdk is actually poking at netdevs?
AFAIK it's normally banging device registers directly.

> You can't go breaking userspace even if you don't like the application.

Could you please explain how is the proposed patchset breaking
userspace? Ignoring DPDK for now, I don't think it changes the userspace
API at all.

-- 
MST

^ permalink raw reply

* Re: [PATCH v7 net-next 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-04-23 17:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, jiri, kubakici, netdev,
	virtualization, loseweigh, davem
In-Reply-To: <20180422200259-mutt-send-email-mst@kernel.org>

On 4/22/2018 10:06 AM, Michael S. Tsirkin wrote:
> On Thu, Apr 19, 2018 at 06:42:02PM -0700, Sridhar Samudrala wrote:
>> +#if IS_ENABLED(CONFIG_NET_FAILOVER)
>> +
>> +int failover_create(struct net_device *standby_dev,
>> +		    struct failover **pfailover);
> Should we rename all these structs net_failover?
> It's possible to extend the concept to storage I think.

We could, the only downside is that the names become longer. i think we need
to change the filenames and the function names also to be consistent.


>
>> +void failover_destroy(struct failover *failover);
>> +
>> +int failover_register(struct net_device *standby_dev, struct failover_ops *ops,
>> +		      struct failover **pfailover);
>> +void failover_unregister(struct failover *failover);
>> +
>> +int failover_slave_unregister(struct net_device *slave_dev);
>> +
>> +#else
>> +
>> +static inline
>> +int failover_create(struct net_device *standby_dev,
>> +		    struct failover **pfailover);
>> +{
>> +	return 0;
> Does this make callers do something sane?
> Shouldn't these return an error?

Yes. i think i should return -EOPNOTSUPP here, so that we fail
when CONFIG_NET_FAILOVER is not enabled and the virtio-net driver is trying
to create a failover device.


>> +}
>> +
>> +static inline
>> +void failover_destroy(struct failover *failover)
>> +{
>> +}
>> +
>> +static inline
>> +int failover_register(struct net_device *standby_dev, struct failover_ops *ops,
>> +		      struct pfailover **pfailover);
>> +{
>> +	return 0;
>> +}
> struct pfailover seems like a typo.

yes. will also change the return to -EOPNOTSUPP

>
>> +
>> +static inline
>> +void failover_unregister(struct failover *failover)
>> +{
>> +}
>> +
>> +static inline
>> +int failover_slave_unregister(struct net_device *slave_dev)
>> +{
>> +	return 0;
>> +}
> Does anyone test return value of unregister?
> should this be void?

yes. can be changed to void.


>
>> +
>> +#endif
>> +
>> +#endif /* _NET_FAILOVER_H */

^ permalink raw reply

* [PATCH bpf] bpf: disable and restore preemption in __BPF_PROG_RUN_ARRAY
From: Roman Gushchin @ 2018-04-23 17:09 UTC (permalink / raw)
  To: netdev; +Cc: kernel-team, Roman Gushchin, Alexei Starovoitov, Daniel Borkmann

Running bpf programs requires disabled preemption,
however at least some* of the BPF_PROG_RUN_ARRAY users
do not follow this rule.

To fix this bug, and also to make it not happen in the future,
let's add explicit preemption disabling/re-enabling
to the __BPF_PROG_RUN_ARRAY code.

* for example:
 [   17.624472] RIP: 0010:__cgroup_bpf_run_filter_sk+0x1c4/0x1d0
 ...
 [   17.640890]  inet6_create+0x3eb/0x520
 [   17.641405]  __sock_create+0x242/0x340
 [   17.641939]  __sys_socket+0x57/0xe0
 [   17.642370]  ? trace_hardirqs_off_thunk+0x1a/0x1c
 [   17.642944]  SyS_socket+0xa/0x10
 [   17.643357]  do_syscall_64+0x79/0x220
 [   17.643879]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 486e65e3db26..dc586cc64bc2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -351,6 +351,7 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
 		struct bpf_prog **_prog, *__prog;	\
 		struct bpf_prog_array *_array;		\
 		u32 _ret = 1;				\
+		preempt_disable();			\
 		rcu_read_lock();			\
 		_array = rcu_dereference(array);	\
 		if (unlikely(check_non_null && !_array))\
@@ -362,6 +363,7 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
 		}					\
 _out:							\
 		rcu_read_unlock();			\
+		preempt_enable_no_resched();		\
 		_ret;					\
 	 })
 
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH 0/3] Introduce LSM-hook for socketpair(2)
From: Casey Schaufler @ 2018-04-23 17:04 UTC (permalink / raw)
  To: David Herrmann, linux-kernel
  Cc: James Morris, Paul Moore, teg, Stephen Smalley, selinux,
	linux-security-module, Eric Paris, serge, davem, netdev,
	Casey Schaufler
In-Reply-To: <20180423133015.5455-1-dh.herrmann@gmail.com>

On 4/23/2018 6:30 AM, David Herrmann wrote:
> Hi
>
> This series adds a new LSM hook for the socketpair(2) syscall. The idea
> is to allow SO_PEERSEC to be called on AF_UNIX sockets created via
> socketpair(2), and return the same information as if you emulated
> socketpair(2) via a temporary listener socket. Right now SO_PEERSEC
> will return the unlabeled credentials for a socketpair, rather than the
> actual credentials of the creating process.
>
> ...
>
> This series only adds SELinux backends, since that is what we need for
> RHEL. I will gladly extend the other LSMs if needed.

I would be very happy to see a proposed patch for Smack. It shouldn't
be much different from the SELinux version, with the exception that it
will use pointers to smk_known structures instead of secids. It would be
a big help, as someone just threw a whole new species of scorpion into
this pit.

>
> Thanks
> David
>
> [1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c
> [2] https://www.spinics.net/lists/selinux/msg22674.html
>
> David Herrmann (3):
>   security: add hook for socketpair(AF_UNIX, ...)
>   net/unix: hook unix_socketpair() into LSM
>   selinux: provide unix_stream_socketpair callback
>
>  include/linux/lsm_hooks.h |  8 ++++++++
>  include/linux/security.h  |  7 +++++++
>  net/unix/af_unix.c        |  5 +++++
>  security/security.c       |  6 ++++++
>  security/selinux/hooks.c  | 14 ++++++++++++++
>  5 files changed, 40 insertions(+)
>

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-04-23 17:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Sridhar Samudrala, mst, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <20180420160058.GB2150@nanopsycho.orion>

On Fri, 20 Apr 2018 18:00:58 +0200
Jiri Pirko <jiri@resnulli.us> wrote:

> Fri, Apr 20, 2018 at 05:28:02PM CEST, stephen@networkplumber.org wrote:
> >On Thu, 19 Apr 2018 18:42:04 -0700
> >Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> Use the registration/notification framework supported by the generic
> >> failover infrastructure.
> >> 
> >> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>  
> >
> >Do what you want to other devices but leave netvsc alone.
> >Adding these failover ops does not reduce the code size, and really is
> >no benefit.  The netvsc device driver needs to be backported to several
> >other distributions and doing this makes that harder.  
> 
> We should not care about the backport burden when we are trying to make
> things right. And things are not right. The current netvsc approach is
> just plain wrong shortcut. It should have been done in a generic way
> from the very beginning. We are just trying to fix this situation.
> 
> Moreover, I believe that part of the fix is to convert netvsc to 3
> netdev solution too. 2 netdev model is wrong.
> 
> 
> >
> >I will NAK patches to change to common code for netvsc especially the
> >three device model.  MS worked hard with distro vendors to support transparent
> >mode, ans we really can't have a new model; or do backport.
> >
> >Plus, DPDK is now dependent on existing model.  
> 
> Sorry, but nobody here cares about dpdk or other similar oddities.

The network device model is a userspace API, and DPDK is a userspace application.
You can't go breaking userspace even if you don't like the application.

^ permalink raw reply

* Re: Bluetooth/lock_sock: false positive "WARNING: possible recursive locking detected"
From: Marcel Holtmann @ 2018-04-23 17:01 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Johan Hedberg, BlueZ development, ML netdev
In-Reply-To: <695d5172-e7f8-9e44-b6ab-0ce1a9473f34@suse.cz>

Hi Jiri,

>> [ 2891.586061] ============================================
>> [ 2891.586063] WARNING: possible recursive locking detected
>> [ 2891.586065] 4.16.2-10.ge881e16-default #1 Not tainted
>> [ 2891.586067] --------------------------------------------
>> [ 2891.586068] kworker/u9:3/873 is trying to acquire lock:
>> [ 2891.586070]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<000000007b85e829>] bt_accept_enqueue+0x29/0x90 [bluetooth]
>> [ 2891.586086]
>>               but task is already holding lock:
>> [ 2891.586088]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<0000000042f0b4a5>] l2cap_sock_new_connection_cb+0x18/0xa0 [bluetooth]
>> [ 2891.586109]
>>               other info that might help us debug this:
>> [ 2891.586111]  Possible unsafe locking scenario:
>> 
>> [ 2891.586115]        CPU0
>> [ 2891.586116]        ----
>> [ 2891.586117]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
>> [ 2891.586120]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
>> [ 2891.586122]
>>                *** DEADLOCK ***
>> 
>> [ 2891.586125]  May be due to missing lock nesting notation
>> 
>> [ 2891.586127] 5 locks held by kworker/u9:3/873:
>> [ 2891.586128]  #0:  ((wq_completion)"%s"hdev->name#2){+.+.}, at: [<000000004aa1a273>] process_one_work+0x1e3/0x6a0
>> [ 2891.586135]  #1:  ((work_completion)(&hdev->rx_work)){+.+.}, at: [<000000004aa1a273>] process_one_work+0x1e3/0x6a0
>> [ 2891.586140]  #2:  (&conn->chan_lock){+.+.}, at: [<00000000fbad6c82>] l2cap_connect+0x88/0x540 [bluetooth]
>> [ 2891.586155]  #3:  (&chan->lock/2){+.+.}, at: [<000000007c38e27e>] l2cap_connect+0xa0/0x540 [bluetooth]
>> [ 2891.586170]  #4:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<0000000042f0b4a5>] l2cap_sock_new_connection_cb+0x18/0xa0 [bluetooth]
>> [ 2891.586183]
>>               stack backtrace:
>> [ 2891.586187] CPU: 2 PID: 873 Comm: kworker/u9:3 Not tainted 4.16.2-10.ge881e16-default #1 openSUSE Tumbleweed (unreleased)
>> [ 2891.586189] Hardware name: Dell Inc. Latitude 7280/0KK5D1, BIOS 1.9.3 03/09/2018
>> [ 2891.586200] Workqueue: hci0 hci_rx_work [bluetooth]
>> [ 2891.586202] Call Trace:
>> [ 2891.586207]  dump_stack+0x85/0xc5
>> [ 2891.586211]  __lock_acquire+0x6b4/0x1370
>> [ 2891.586221]  lock_acquire+0x9f/0x210
>> [ 2891.586237]  lock_sock_nested+0x5a/0x80
>> [ 2891.586256]  bt_accept_enqueue+0x29/0x90 [bluetooth]
>> [ 2891.586268]  l2cap_sock_new_connection_cb+0x5d/0xa0 [bluetooth]
>> [ 2891.586280]  l2cap_connect+0x126/0x540 [bluetooth]
>> [ 2891.586315]  l2cap_sig_channel+0x443/0x13b0 [bluetooth]
>> [ 2891.586330]  l2cap_recv_frame+0x1a4/0x300 [bluetooth]
>> [ 2891.586341]  hci_rx_work+0x1c8/0x5c0 [bluetooth]
>> [ 2891.586345]  process_one_work+0x269/0x6a0
>> [ 2891.586350]  worker_thread+0x2b/0x3d0
>> [ 2891.586356]  kthread+0x113/0x130
>> [ 2891.586363]  ret_from_fork+0x24/0x50
>> [ 4954.622809] e1000e: eth0 NIC Link is Down
>> [ 4955.299532] PM: suspend entry (deep)
>> [ 4955.299538] PM: Syncing filesystems ... done.
> 
> This is:
>  lock_sock(sk);       in bt_accept_enqueue
> nested in
>  lock_sock(parent);   in l2cap_sock_new_connection_cb
> 
> So this looks like a false positive to me. So I believe this is a fix:
> 
> --- a/net/bluetooth/l2cap_sock.c
> +++ b/net/bluetooth/l2cap_sock.c
> @@ -1232,7 +1232,7 @@ static struct l2cap_chan
> *l2cap_sock_new_connection_cb(struct l2cap_chan *chan)
> {
>        struct sock *sk, *parent = chan->data;
> 
> -       lock_sock(parent);
> +       lock_sock_nested(parent, L2CAP_NESTING_PARENT);
> 
>        /* Check for backlog size */
>        if (sk_acceptq_is_full(parent)) {
> 

can you send a patch for it?

Regards

Marcel

^ permalink raw reply

* KMSAN: uninit-value in _copy_to_iter (2)
From: syzbot @ 2018-04-23 16:56 UTC (permalink / raw)
  To: avagin, davem, dingtianhong, edumazet, elena.reshetova,
	linux-kernel, matthew, mingo, netdev, pabeni, syzkaller-bugs,
	viro, willemb

Hello,

syzbot hit the following crash on  
https://github.com/google/kmsan.git/master commit
d2d741e5d1898dfde1a75ea3d29a9a3e2edf0617 (Sun Apr 22 15:05:22 2018 +0000)
kmsan: add initialization for shmem pages
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=87cfa083e727a224754b

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6616554548494336
Kernel config: https://syzkaller.appspot.com/x/.config?id=328654897048964367
compiler: clang version 7.0.0 (trunk 329391)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

==================================================================
BUG: KMSAN: uninit-value in copyout lib/iov_iter.c:140 [inline]
BUG: KMSAN: uninit-value in _copy_to_iter+0x1bb3/0x28f0 lib/iov_iter.c:571
CPU: 0 PID: 7670 Comm: syz-executor7 Not tainted 4.16.0+ #86
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157
  kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
  copyout lib/iov_iter.c:140 [inline]
  _copy_to_iter+0x1bb3/0x28f0 lib/iov_iter.c:571
  copy_to_iter include/linux/uio.h:106 [inline]
  skb_copy_datagram_iter+0x443/0xf70 net/core/datagram.c:431
  skb_copy_datagram_msg include/linux/skbuff.h:3264 [inline]
  netlink_recvmsg+0x6f1/0x1900 net/netlink/af_netlink.c:1958
  sock_recvmsg_nosec net/socket.c:803 [inline]
  sock_recvmsg+0x1d0/0x230 net/socket.c:810
  ___sys_recvmsg+0x3fb/0x810 net/socket.c:2205
  __sys_recvmmsg+0x54e/0xdb0 net/socket.c:2313
  SYSC_recvmmsg+0x29b/0x3e0 net/socket.c:2394
  SyS_recvmmsg+0x76/0xa0 net/socket.c:2378
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x455389
RSP: 002b:00007f0281d3dc68 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007f0281d3e6d4 RCX: 0000000000455389
RDX: 0000000000000003 RSI: 0000000020001f80 RDI: 0000000000000014
RBP: 000000000072bea0 R08: 0000000020002040 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000049e R14: 00000000006f9f70 R15: 0000000000000000

Uninit was stored to memory at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
  kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
  kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:526
  __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:477
  __nla_put lib/nlattr.c:569 [inline]
  nla_put+0x276/0x340 lib/nlattr.c:627
  copy_to_user_policy_type net/xfrm/xfrm_user.c:1678 [inline]
  build_acquire net/xfrm/xfrm_user.c:2850 [inline]
  xfrm_send_acquire+0x1068/0x1690 net/xfrm/xfrm_user.c:2873
  km_query net/xfrm/xfrm_state.c:1953 [inline]
  xfrm_state_find+0x3ad8/0x4f40 net/xfrm/xfrm_state.c:1021
  xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:1393 [inline]
  xfrm_tmpl_resolve net/xfrm/xfrm_policy.c:1437 [inline]
  xfrm_resolve_and_create_bundle+0xc31/0x5270 net/xfrm/xfrm_policy.c:1833
  xfrm_lookup+0x606/0x39d0 net/xfrm/xfrm_policy.c:2163
  xfrm_lookup_route+0xfa/0x360 net/xfrm/xfrm_policy.c:2283
  ip6_dst_lookup_flow+0x221/0x270 net/ipv6/ip6_output.c:1099
  ip6_datagram_dst_update+0x93a/0x1470 net/ipv6/datagram.c:91
  __ip6_datagram_connect+0x14f6/0x1a20 net/ipv6/datagram.c:257
  ip6_datagram_connect net/ipv6/datagram.c:280 [inline]
  ip6_datagram_connect_v6_only+0x104/0x180 net/ipv6/datagram.c:292
  inet_dgram_connect+0x2e8/0x4d0 net/ipv4/af_inet.c:542
  SYSC_connect+0x41a/0x510 net/socket.c:1639
  SyS_connect+0x54/0x80 net/socket.c:1620
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Local variable description: ----upt.i.i@xfrm_send_acquire
Variable was created at:
  xfrm_send_acquire+0x73/0x1690 net/xfrm/xfrm_user.c:2864
  km_query net/xfrm/xfrm_state.c:1953 [inline]
  xfrm_state_find+0x3ad8/0x4f40 net/xfrm/xfrm_state.c:1021

Byte 200 of 207 is uninitialized
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
==================================================================
CPU: 1 PID: 7675 Comm: syz-executor3 Not tainted 4.16.0+ #86


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [PATCH 3/3] selinux: provide unix_stream_socketpair callback
From: Stephen Smalley @ 2018-04-23 16:48 UTC (permalink / raw)
  To: David Herrmann, linux-kernel
  Cc: James Morris, Paul Moore, teg, selinux, linux-security-module,
	Eric Paris, serge, davem, netdev
In-Reply-To: <20180423133015.5455-4-dh.herrmann@gmail.com>

On 04/23/2018 09:30 AM, David Herrmann wrote:
> Make sure to implement the new unix_stream_socketpair callback so the
> SO_PEERSEC call on socketpair(2)s will return correct information.
> 
> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>

Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

> ---
>  security/selinux/hooks.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 4cafe6a19167..828881d9a41d 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4905,6 +4905,18 @@ static int selinux_socket_unix_stream_connect(struct sock *sock,
>  	return 0;
>  }
>  
> +static int selinux_socket_unix_stream_socketpair(struct sock *socka,
> +						 struct sock *sockb)
> +{
> +	struct sk_security_struct *sksec_a = socka->sk_security;
> +	struct sk_security_struct *sksec_b = sockb->sk_security;
> +
> +	sksec_a->peer_sid = sksec_b->sid;
> +	sksec_b->peer_sid = sksec_a->sid;
> +
> +	return 0;
> +}
> +
>  static int selinux_socket_unix_may_send(struct socket *sock,
>  					struct socket *other)
>  {
> @@ -6995,6 +7007,8 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(inode_getsecctx, selinux_inode_getsecctx),
>  
>  	LSM_HOOK_INIT(unix_stream_connect, selinux_socket_unix_stream_connect),
> +	LSM_HOOK_INIT(unix_stream_socketpair,
> +			selinux_socket_unix_stream_socketpair),
>  	LSM_HOOK_INIT(unix_may_send, selinux_socket_unix_may_send),
>  
>  	LSM_HOOK_INIT(socket_create, selinux_socket_create),
> 

^ permalink raw reply

* [PATCH net 3/3] amd-xgbe: Only use the SFP supported transceiver signals
From: Tom Lendacky @ 2018-04-23 16:43 UTC (permalink / raw)
  To: netdev; +Cc: David Miller
In-Reply-To: <20180423164258.18740.98574.stgit@tlendack-t1.amdoffice.net>

The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.)
that it supports.  Update the driver to include checking the eeprom data
when deciding whether to use a transceiver signal.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |   71 +++++++++++++++++++++------
 1 file changed, 54 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index b48efc0..aac8843 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -253,6 +253,10 @@ enum xgbe_sfp_speed {
 #define XGBE_SFP_BASE_VENDOR_SN			4
 #define XGBE_SFP_BASE_VENDOR_SN_LEN		16
 
+#define XGBE_SFP_EXTD_OPT1			1
+#define XGBE_SFP_EXTD_OPT1_RX_LOS		BIT(1)
+#define XGBE_SFP_EXTD_OPT1_TX_FAULT		BIT(3)
+
 #define XGBE_SFP_EXTD_DIAG			28
 #define XGBE_SFP_EXTD_DIAG_ADDR_CHANGE		BIT(2)
 
@@ -332,6 +336,7 @@ struct xgbe_phy_data {
 
 	unsigned int sfp_gpio_address;
 	unsigned int sfp_gpio_mask;
+	unsigned int sfp_gpio_inputs;
 	unsigned int sfp_gpio_rx_los;
 	unsigned int sfp_gpio_tx_fault;
 	unsigned int sfp_gpio_mod_absent;
@@ -986,6 +991,49 @@ static void xgbe_phy_sfp_external_phy(struct xgbe_prv_data *pdata)
 	phy_data->sfp_phy_avail = 1;
 }
 
+static bool xgbe_phy_check_sfp_rx_los(struct xgbe_phy_data *phy_data)
+{
+	u8 *sfp_extd = phy_data->sfp_eeprom.extd;
+
+	if (!(sfp_extd[XGBE_SFP_EXTD_OPT1] & XGBE_SFP_EXTD_OPT1_RX_LOS))
+		return false;
+
+	if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_RX_LOS)
+		return false;
+
+	if (phy_data->sfp_gpio_inputs & (1 << phy_data->sfp_gpio_rx_los))
+		return true;
+
+	return false;
+}
+
+static bool xgbe_phy_check_sfp_tx_fault(struct xgbe_phy_data *phy_data)
+{
+	u8 *sfp_extd = phy_data->sfp_eeprom.extd;
+
+	if (!(sfp_extd[XGBE_SFP_EXTD_OPT1] & XGBE_SFP_EXTD_OPT1_TX_FAULT))
+		return false;
+
+	if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_TX_FAULT)
+		return false;
+
+	if (phy_data->sfp_gpio_inputs & (1 << phy_data->sfp_gpio_tx_fault))
+		return true;
+
+	return false;
+}
+
+static bool xgbe_phy_check_sfp_mod_absent(struct xgbe_phy_data *phy_data)
+{
+	if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_MOD_ABSENT)
+		return false;
+
+	if (phy_data->sfp_gpio_inputs & (1 << phy_data->sfp_gpio_mod_absent))
+		return true;
+
+	return false;
+}
+
 static bool xgbe_phy_belfuse_parse_quirks(struct xgbe_prv_data *pdata)
 {
 	struct xgbe_phy_data *phy_data = pdata->phy_data;
@@ -1031,6 +1079,10 @@ static void xgbe_phy_sfp_parse_eeprom(struct xgbe_prv_data *pdata)
 	if (sfp_base[XGBE_SFP_BASE_EXT_ID] != XGBE_SFP_EXT_ID_SFP)
 		return;
 
+	/* Update transceiver signals (eeprom extd/options) */
+	phy_data->sfp_tx_fault = xgbe_phy_check_sfp_tx_fault(phy_data);
+	phy_data->sfp_rx_los = xgbe_phy_check_sfp_rx_los(phy_data);
+
 	if (xgbe_phy_sfp_parse_quirks(pdata))
 		return;
 
@@ -1196,7 +1248,6 @@ static int xgbe_phy_sfp_read_eeprom(struct xgbe_prv_data *pdata)
 static void xgbe_phy_sfp_signals(struct xgbe_prv_data *pdata)
 {
 	struct xgbe_phy_data *phy_data = pdata->phy_data;
-	unsigned int gpio_input;
 	u8 gpio_reg, gpio_ports[2];
 	int ret;
 
@@ -1211,23 +1262,9 @@ static void xgbe_phy_sfp_signals(struct xgbe_prv_data *pdata)
 		return;
 	}
 
-	gpio_input = (gpio_ports[1] << 8) | gpio_ports[0];
-
-	if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_MOD_ABSENT) {
-		/* No GPIO, just assume the module is present for now */
-		phy_data->sfp_mod_absent = 0;
-	} else {
-		if (!(gpio_input & (1 << phy_data->sfp_gpio_mod_absent)))
-			phy_data->sfp_mod_absent = 0;
-	}
-
-	if (!(phy_data->sfp_gpio_mask & XGBE_GPIO_NO_RX_LOS) &&
-	    (gpio_input & (1 << phy_data->sfp_gpio_rx_los)))
-		phy_data->sfp_rx_los = 1;
+	phy_data->sfp_gpio_inputs = (gpio_ports[1] << 8) | gpio_ports[0];
 
-	if (!(phy_data->sfp_gpio_mask & XGBE_GPIO_NO_TX_FAULT) &&
-	    (gpio_input & (1 << phy_data->sfp_gpio_tx_fault)))
-		phy_data->sfp_tx_fault = 1;
+	phy_data->sfp_mod_absent = xgbe_phy_check_sfp_mod_absent(phy_data);
 }
 
 static void xgbe_phy_sfp_mod_absent(struct xgbe_prv_data *pdata)

^ permalink raw reply related

* [PATCH net 2/3] amd-xgbe: Improve KR auto-negotiation and training
From: Tom Lendacky @ 2018-04-23 16:43 UTC (permalink / raw)
  To: netdev; +Cc: David Miller
In-Reply-To: <20180423164258.18740.98574.stgit@tlendack-t1.amdoffice.net>

Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks
to improve the ability to successfully complete Clause 73 AN when running
at 10gbps.  Hardware can sometimes have issues with CDR lock when the
AN DME page exchange is being performed.

The AN and KR training hooks are used as follows:
- The pre AN hook is used to disable CDR tracking in the PHY so that the
  DME page exchange can be successfully and consistently completed.
- The post KR training hook is used to re-enable the CDR tracking so that
  KR training can successfully complete.
- The post AN hook is used to check for an unsuccessful AN which will
  increase a CDR tracking enablement delay (up to a maximum value).

Add two debugfs entries to allow control over use of the CDR tracking
workaround.  The debugfs entries allow the CDR tracking workaround to
be disabled and determine whether to re-enable CDR tracking before or
after link training has been initiated.

Also, with these changes the receiver reset cycle that is performed during
the link status check can be performed less often.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-common.h  |    8 ++
 drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c |   16 +++
 drivers/net/ethernet/amd/xgbe/xgbe-main.c    |    1 
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c    |    8 +-
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c     |    2 
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c  |  125 ++++++++++++++++++++++++++
 drivers/net/ethernet/amd/xgbe/xgbe.h         |    4 +
 7 files changed, 160 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
index 7ea72ef..d272dc6 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
@@ -1321,6 +1321,10 @@
 #define MDIO_VEND2_AN_STAT		0x8002
 #endif
 
+#ifndef MDIO_VEND2_PMA_CDR_CONTROL
+#define MDIO_VEND2_PMA_CDR_CONTROL	0x8056
+#endif
+
 #ifndef MDIO_CTRL1_SPEED1G
 #define MDIO_CTRL1_SPEED1G		(MDIO_CTRL1_SPEED10G & ~BMCR_SPEED100)
 #endif
@@ -1369,6 +1373,10 @@
 #define XGBE_AN_CL37_TX_CONFIG_MASK	0x08
 #define XGBE_AN_CL37_MII_CTRL_8BIT	0x0100
 
+#define XGBE_PMA_CDR_TRACK_EN_MASK	0x01
+#define XGBE_PMA_CDR_TRACK_EN_OFF	0x00
+#define XGBE_PMA_CDR_TRACK_EN_ON	0x01
+
 /* Bit setting and getting macros
  *  The get macro will extract the current bit field value from within
  *  the variable
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c b/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c
index 7d128be..b911439 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c
@@ -519,6 +519,22 @@ void xgbe_debugfs_init(struct xgbe_prv_data *pdata)
 				   "debugfs_create_file failed\n");
 	}
 
+	if (pdata->vdata->an_cdr_workaround) {
+		pfile = debugfs_create_bool("an_cdr_workaround", 0600,
+					    pdata->xgbe_debugfs,
+					    &pdata->debugfs_an_cdr_workaround);
+		if (!pfile)
+			netdev_err(pdata->netdev,
+				   "debugfs_create_bool failed\n");
+
+		pfile = debugfs_create_bool("an_cdr_track_early", 0600,
+					    pdata->xgbe_debugfs,
+					    &pdata->debugfs_an_cdr_track_early);
+		if (!pfile)
+			netdev_err(pdata->netdev,
+				   "debugfs_create_bool failed\n");
+	}
+
 	kfree(buf);
 }
 
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index 795e556..441d0973 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -349,6 +349,7 @@ int xgbe_config_netdev(struct xgbe_prv_data *pdata)
 	XGMAC_SET_BITS(pdata->rss_options, MAC_RSSCR, UDP4TE, 1);
 
 	/* Call MDIO/PHY initialization routine */
+	pdata->debugfs_an_cdr_workaround = pdata->vdata->an_cdr_workaround;
 	ret = pdata->phy_if.phy_init(pdata);
 	if (ret)
 		return ret;
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index e3d361e..1b45cd7 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -432,6 +432,8 @@ static void xgbe_an73_disable(struct xgbe_prv_data *pdata)
 	xgbe_an73_set(pdata, false, false);
 	xgbe_an73_disable_interrupts(pdata);
 
+	pdata->an_start = 0;
+
 	netif_dbg(pdata, link, pdata->netdev, "CL73 AN disabled\n");
 }
 
@@ -511,11 +513,11 @@ static enum xgbe_an xgbe_an73_tx_training(struct xgbe_prv_data *pdata,
 		XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL,
 			    reg);
 
-		if (pdata->phy_if.phy_impl.kr_training_post)
-			pdata->phy_if.phy_impl.kr_training_post(pdata);
-
 		netif_dbg(pdata, link, pdata->netdev,
 			  "KR training initiated\n");
+
+		if (pdata->phy_if.phy_impl.kr_training_post)
+			pdata->phy_if.phy_impl.kr_training_post(pdata);
 	}
 
 	return XGBE_AN_PAGE_RECEIVED;
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
index eb23f9b..82d1f41 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
@@ -456,6 +456,7 @@ static int xgbe_pci_resume(struct pci_dev *pdev)
 	.irq_reissue_support		= 1,
 	.tx_desc_prefetch		= 5,
 	.rx_desc_prefetch		= 5,
+	.an_cdr_workaround		= 1,
 };
 
 static const struct xgbe_version_data xgbe_v2b = {
@@ -470,6 +471,7 @@ static int xgbe_pci_resume(struct pci_dev *pdev)
 	.irq_reissue_support		= 1,
 	.tx_desc_prefetch		= 5,
 	.rx_desc_prefetch		= 5,
+	.an_cdr_workaround		= 1,
 };
 
 static const struct pci_device_id xgbe_pci_table[] = {
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index 3304a29..b48efc0 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -147,6 +147,14 @@
 /* Rate-change complete wait/retry count */
 #define XGBE_RATECHANGE_COUNT		500
 
+/* CDR delay values for KR support (in usec) */
+#define XGBE_CDR_DELAY_INIT		10000
+#define XGBE_CDR_DELAY_INC		10000
+#define XGBE_CDR_DELAY_MAX		100000
+
+/* RRC frequency during link status check */
+#define XGBE_RRC_FREQUENCY		10
+
 enum xgbe_port_mode {
 	XGBE_PORT_MODE_RSVD = 0,
 	XGBE_PORT_MODE_BACKPLANE,
@@ -355,6 +363,10 @@ struct xgbe_phy_data {
 	unsigned int redrv_addr;
 	unsigned int redrv_lane;
 	unsigned int redrv_model;
+
+	/* KR AN support */
+	unsigned int phy_cdr_notrack;
+	unsigned int phy_cdr_delay;
 };
 
 /* I2C, MDIO and GPIO lines are muxed, so only one device at a time */
@@ -2361,7 +2373,7 @@ static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart)
 		return 1;
 
 	/* No link, attempt a receiver reset cycle */
-	if (phy_data->rrc_count++) {
+	if (phy_data->rrc_count++ > XGBE_RRC_FREQUENCY) {
 		phy_data->rrc_count = 0;
 		xgbe_phy_rrc(pdata);
 	}
@@ -2669,6 +2681,103 @@ static bool xgbe_phy_port_enabled(struct xgbe_prv_data *pdata)
 	return true;
 }
 
+static void xgbe_phy_cdr_track(struct xgbe_prv_data *pdata)
+{
+	struct xgbe_phy_data *phy_data = pdata->phy_data;
+
+	if (!pdata->debugfs_an_cdr_workaround)
+		return;
+
+	if (!phy_data->phy_cdr_notrack)
+		return;
+
+	usleep_range(phy_data->phy_cdr_delay,
+		     phy_data->phy_cdr_delay + 500);
+
+	XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_CDR_CONTROL,
+			 XGBE_PMA_CDR_TRACK_EN_MASK,
+			 XGBE_PMA_CDR_TRACK_EN_ON);
+
+	phy_data->phy_cdr_notrack = 0;
+}
+
+static void xgbe_phy_cdr_notrack(struct xgbe_prv_data *pdata)
+{
+	struct xgbe_phy_data *phy_data = pdata->phy_data;
+
+	if (!pdata->debugfs_an_cdr_workaround)
+		return;
+
+	if (phy_data->phy_cdr_notrack)
+		return;
+
+	XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_CDR_CONTROL,
+			 XGBE_PMA_CDR_TRACK_EN_MASK,
+			 XGBE_PMA_CDR_TRACK_EN_OFF);
+
+	xgbe_phy_rrc(pdata);
+
+	phy_data->phy_cdr_notrack = 1;
+}
+
+static void xgbe_phy_kr_training_post(struct xgbe_prv_data *pdata)
+{
+	if (!pdata->debugfs_an_cdr_track_early)
+		xgbe_phy_cdr_track(pdata);
+}
+
+static void xgbe_phy_kr_training_pre(struct xgbe_prv_data *pdata)
+{
+	if (pdata->debugfs_an_cdr_track_early)
+		xgbe_phy_cdr_track(pdata);
+}
+
+static void xgbe_phy_an_post(struct xgbe_prv_data *pdata)
+{
+	struct xgbe_phy_data *phy_data = pdata->phy_data;
+
+	switch (pdata->an_mode) {
+	case XGBE_AN_MODE_CL73:
+	case XGBE_AN_MODE_CL73_REDRV:
+		if (phy_data->cur_mode != XGBE_MODE_KR)
+			break;
+
+		xgbe_phy_cdr_track(pdata);
+
+		switch (pdata->an_result) {
+		case XGBE_AN_READY:
+		case XGBE_AN_COMPLETE:
+			break;
+		default:
+			if (phy_data->phy_cdr_delay < XGBE_CDR_DELAY_MAX)
+				phy_data->phy_cdr_delay += XGBE_CDR_DELAY_INC;
+			else
+				phy_data->phy_cdr_delay = XGBE_CDR_DELAY_INIT;
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static void xgbe_phy_an_pre(struct xgbe_prv_data *pdata)
+{
+	struct xgbe_phy_data *phy_data = pdata->phy_data;
+
+	switch (pdata->an_mode) {
+	case XGBE_AN_MODE_CL73:
+	case XGBE_AN_MODE_CL73_REDRV:
+		if (phy_data->cur_mode != XGBE_MODE_KR)
+			break;
+
+		xgbe_phy_cdr_notrack(pdata);
+		break;
+	default:
+		break;
+	}
+}
+
 static void xgbe_phy_stop(struct xgbe_prv_data *pdata)
 {
 	struct xgbe_phy_data *phy_data = pdata->phy_data;
@@ -2680,6 +2789,9 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata)
 	xgbe_phy_sfp_reset(phy_data);
 	xgbe_phy_sfp_mod_absent(pdata);
 
+	/* Reset CDR support */
+	xgbe_phy_cdr_track(pdata);
+
 	/* Power off the PHY */
 	xgbe_phy_power_off(pdata);
 
@@ -2712,6 +2824,9 @@ static int xgbe_phy_start(struct xgbe_prv_data *pdata)
 	/* Start in highest supported mode */
 	xgbe_phy_set_mode(pdata, phy_data->start_mode);
 
+	/* Reset CDR support */
+	xgbe_phy_cdr_track(pdata);
+
 	/* After starting the I2C controller, we can check for an SFP */
 	switch (phy_data->port_mode) {
 	case XGBE_PORT_MODE_SFP:
@@ -3019,6 +3134,8 @@ static int xgbe_phy_init(struct xgbe_prv_data *pdata)
 		}
 	}
 
+	phy_data->phy_cdr_delay = XGBE_CDR_DELAY_INIT;
+
 	/* Register for driving external PHYs */
 	mii = devm_mdiobus_alloc(pdata->dev);
 	if (!mii) {
@@ -3071,4 +3188,10 @@ void xgbe_init_function_ptrs_phy_v2(struct xgbe_phy_if *phy_if)
 	phy_impl->an_advertising	= xgbe_phy_an_advertising;
 
 	phy_impl->an_outcome		= xgbe_phy_an_outcome;
+
+	phy_impl->an_pre		= xgbe_phy_an_pre;
+	phy_impl->an_post		= xgbe_phy_an_post;
+
+	phy_impl->kr_training_pre	= xgbe_phy_kr_training_pre;
+	phy_impl->kr_training_post	= xgbe_phy_kr_training_post;
 }
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h
index fa0b51e..95d4b56 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe.h
@@ -994,6 +994,7 @@ struct xgbe_version_data {
 	unsigned int irq_reissue_support;
 	unsigned int tx_desc_prefetch;
 	unsigned int rx_desc_prefetch;
+	unsigned int an_cdr_workaround;
 };
 
 struct xgbe_vxlan_data {
@@ -1262,6 +1263,9 @@ struct xgbe_prv_data {
 	unsigned int debugfs_xprop_reg;
 
 	unsigned int debugfs_xi2c_reg;
+
+	bool debugfs_an_cdr_workaround;
+	bool debugfs_an_cdr_track_early;
 };
 
 /* Function prototypes*/

^ permalink raw reply related

* [PATCH net 1/3] amd-xgbe: Add pre/post auto-negotiation phy hooks
From: Tom Lendacky @ 2018-04-23 16:43 UTC (permalink / raw)
  To: netdev; +Cc: David Miller
In-Reply-To: <20180423164258.18740.98574.stgit@tlendack-t1.amdoffice.net>

Add hooks to the driver auto-negotiation (AN) flow to allow the different
phy implementations to perform any steps necessary to improve AN.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c |   16 ++++++++++++++--
 drivers/net/ethernet/amd/xgbe/xgbe.h      |    5 +++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index 072b9f6..e3d361e 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -437,6 +437,9 @@ static void xgbe_an73_disable(struct xgbe_prv_data *pdata)
 
 static void xgbe_an_restart(struct xgbe_prv_data *pdata)
 {
+	if (pdata->phy_if.phy_impl.an_pre)
+		pdata->phy_if.phy_impl.an_pre(pdata);
+
 	switch (pdata->an_mode) {
 	case XGBE_AN_MODE_CL73:
 	case XGBE_AN_MODE_CL73_REDRV:
@@ -453,6 +456,9 @@ static void xgbe_an_restart(struct xgbe_prv_data *pdata)
 
 static void xgbe_an_disable(struct xgbe_prv_data *pdata)
 {
+	if (pdata->phy_if.phy_impl.an_post)
+		pdata->phy_if.phy_impl.an_post(pdata);
+
 	switch (pdata->an_mode) {
 	case XGBE_AN_MODE_CL73:
 	case XGBE_AN_MODE_CL73_REDRV:
@@ -637,11 +643,11 @@ static enum xgbe_an xgbe_an73_incompat_link(struct xgbe_prv_data *pdata)
 			return XGBE_AN_NO_LINK;
 	}
 
-	xgbe_an73_disable(pdata);
+	xgbe_an_disable(pdata);
 
 	xgbe_switch_mode(pdata);
 
-	xgbe_an73_restart(pdata);
+	xgbe_an_restart(pdata);
 
 	return XGBE_AN_INCOMPAT_LINK;
 }
@@ -820,6 +826,9 @@ static void xgbe_an37_state_machine(struct xgbe_prv_data *pdata)
 		pdata->an_result = pdata->an_state;
 		pdata->an_state = XGBE_AN_READY;
 
+		if (pdata->phy_if.phy_impl.an_post)
+			pdata->phy_if.phy_impl.an_post(pdata);
+
 		netif_dbg(pdata, link, pdata->netdev, "CL37 AN result: %s\n",
 			  xgbe_state_as_string(pdata->an_result));
 	}
@@ -903,6 +912,9 @@ static void xgbe_an73_state_machine(struct xgbe_prv_data *pdata)
 		pdata->kx_state = XGBE_RX_BPA;
 		pdata->an_start = 0;
 
+		if (pdata->phy_if.phy_impl.an_post)
+			pdata->phy_if.phy_impl.an_post(pdata);
+
 		netif_dbg(pdata, link, pdata->netdev, "CL73 AN result: %s\n",
 			  xgbe_state_as_string(pdata->an_result));
 	}
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h
index ad102c8..fa0b51e 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe.h
@@ -833,6 +833,7 @@ struct xgbe_hw_if {
 /* This structure represents implementation specific routines for an
  * implementation of a PHY. All routines are required unless noted below.
  *   Optional routines:
+ *     an_pre, an_post
  *     kr_training_pre, kr_training_post
  */
 struct xgbe_phy_impl_if {
@@ -875,6 +876,10 @@ struct xgbe_phy_impl_if {
 	/* Process results of auto-negotiation */
 	enum xgbe_mode (*an_outcome)(struct xgbe_prv_data *);
 
+	/* Pre/Post auto-negotiation support */
+	void (*an_pre)(struct xgbe_prv_data *);
+	void (*an_post)(struct xgbe_prv_data *);
+
 	/* Pre/Post KR training enablement support */
 	void (*kr_training_pre)(struct xgbe_prv_data *);
 	void (*kr_training_post)(struct xgbe_prv_data *);

^ permalink raw reply related

* [PATCH net 0/3] amd-xgbe: AMD XGBE driver fixes 2018-04-23
From: Tom Lendacky @ 2018-04-23 16:42 UTC (permalink / raw)
  To: netdev; +Cc: David Miller

This patch series addresses some issues in the AMD XGBE driver.

The following fixes are included in this driver update series:

- Improve KR auto-negotiation and training (2 patches)
  - Add pre and post auto-negotiation hooks
  - Use the pre and post auto-negotiation hooks to disable CDR tracking
    during auto-negotiation page exchange in KR mode
- Check for SFP tranceiver signal support and only use the signal if the
  SFP indicates that it is supported

This patch series is based on net.

---

Please queue this patch series up to stable releases 4.14 and above.

Tom Lendacky (3):
      amd-xgbe: Add pre/post auto-negotiation phy hooks
      amd-xgbe: Improve KR auto-negotiation and training
      amd-xgbe: Only use the SFP supported transceiver signals


 drivers/net/ethernet/amd/xgbe/xgbe-common.h  |    8 +
 drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c |   16 ++
 drivers/net/ethernet/amd/xgbe/xgbe-main.c    |    1 
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c    |   24 +++
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c     |    2 
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c  |  196 ++++++++++++++++++++++++--
 drivers/net/ethernet/amd/xgbe/xgbe.h         |    9 +
 7 files changed, 233 insertions(+), 23 deletions(-)

-- 
Tom Lendacky

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/9] bpf/verifier: improve register value range tracking with ARSH
From: Yonghong Song @ 2018-04-23 16:19 UTC (permalink / raw)
  To: Edward Cree, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <adef0d25-578c-6eb1-7dea-8c81ba6647e0@solarflare.com>



On 4/23/18 5:25 AM, Edward Cree wrote:
> On 20/04/18 23:18, Yonghong Song wrote:
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 3c8bb92..01c215d 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>>   		/* We may learn something more from the var_off */
>>   		__update_reg_bounds(dst_reg);
>>   		break;
>> +	case BPF_ARSH:
>> +		if (umax_val >= insn_bitness) {
>> +			/* Shifts greater than 31 or 63 are undefined.
>> +			 * This includes shifts by a negative number.
>> +			 */
>> +			mark_reg_unknown(env, regs, insn->dst_reg);
>> +			break;
>> +		}
>> +		if (dst_reg->smin_value < 0)
>> +			dst_reg->smin_value >>= umin_val;
>> +		else
>> +			dst_reg->smin_value >>= umax_val;
>> +		if (dst_reg->smax_value < 0)
>> +			dst_reg->smax_value >>= umax_val;
>> +		else
>> +			dst_reg->smax_value >>= umin_val;
>> +		if (src_known)
>> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
>> +						       umin_val);
> tnum_rshift is an unsigned shift, it won't do what you want here.
> I think you could write a tnum_arshift that looks something like this
>   (UNTESTED!):
> 
>      struct tnum tnum_arshift(struct tnum a, u8 shift)
>      {
>          return TNUM(((s64)a.value) >> shift, ((s64)a.mask) >> shift);
>      }
> Theory: if value sign bit is 1 then number is known negative so populate
>   upper bits with known 1s.  If mask sign bit is 1 then number might be
>   negative so populate upper bits with unknown.  Otherwise, number is
>   known positive so populate upper bits with known 0s.

Right, my last version just used this tnum_arshift :-).


>> +		else
>> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
> Applying the above here, tnum_arshift(tnum_unknown, ...) would always just
>   return tnum_unknown, so just do "dst_reg->var_off = tnum_unknown;".
> The reason for the corresponding logic in the BPF_RSH case is that a right
>   logical shift _always_ populates upper bits with zeroes.
> In any case these 'else' branches are currently never taken because they
>   fall foul of the check Alexei added just before the switch,
>      if (!src_known &&
>          opcode != BPF_ADD && opcode != BPF_SUB && opcode != BPF_AND) {
>          __mark_reg_unknown(dst_reg);
>          return 0;
>      }
> So I can guarantee you haven't tested this code :-)

I just noticed this last night and removed the !src_known branch all 
together here and from LSH/RSH.

> 
>> +		dst_reg->umin_value >>= umax_val;
>> +		dst_reg->umax_value >>= umin_val;
> FWIW I think the way to handle umin/umax here is to blow them away and
>   just rely on inferring new ubounds from the sbounds (i.e. the inverse of
>   what we do just above in case BPF_RSH) since BPF_ARSH is essentially an
>   operation on the signed value.  I don't think there is a need to support
>   cases where the unsigned bounds of a signed shift of a value that may
>   cross the sign boundary at (1<<63) are needed to verify a program.
> (Unlike in the unsigned shift case, it is at least _possible_ for there to
>   be information from the ubounds that we can't get from the sbounds - but
>   it's a contrived case that isn't likely to be useful in real programs.)

This makes sense and will make code simpler and easy to understand.
Will make the change.

Thanks!

> 
> -Ed
>> +		/* We may learn something more from the var_off */
>> +		__update_reg_bounds(dst_reg);
>> +		break;
>>   	default:
>>   		mark_reg_unknown(env, regs, insn->dst_reg);
>>   		break;

^ permalink raw reply

* Re: [PATCH bpf-next 02/15] xsk: add user memory registration support sockopt
From: Michael S. Tsirkin @ 2018-04-23 16:18 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, alexander.h.duyck, alexander.duyck,
	john.fastabend, ast, brouer, willemdebruijn.kernel, daniel,
	netdev, Björn Töpel, michael.lundkvist,
	jesse.brandeburg, anjali.singhai, qi.z.zhang
In-Reply-To: <20180423135619.7179-3-bjorn.topel@gmail.com>

On Mon, Apr 23, 2018 at 03:56:06PM +0200, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> In this commit the base structure of the AF_XDP address family is set
> up. Further, we introduce the abilty register a window of user memory
> to the kernel via the XDP_UMEM_REG setsockopt syscall. The memory
> window is viewed by an AF_XDP socket as a set of equally large
> frames. After a user memory registration all frames are "owned" by the
> user application, and not the kernel.
> 
> Co-authored-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  include/uapi/linux/if_xdp.h |  34 +++++++
>  net/Makefile                |   1 +
>  net/xdp/Makefile            |   2 +
>  net/xdp/xdp_umem.c          | 237 ++++++++++++++++++++++++++++++++++++++++++++
>  net/xdp/xdp_umem.h          |  42 ++++++++
>  net/xdp/xdp_umem_props.h    |  23 +++++
>  net/xdp/xsk.c               | 223 +++++++++++++++++++++++++++++++++++++++++
>  7 files changed, 562 insertions(+)
>  create mode 100644 include/uapi/linux/if_xdp.h
>  create mode 100644 net/xdp/Makefile
>  create mode 100644 net/xdp/xdp_umem.c
>  create mode 100644 net/xdp/xdp_umem.h
>  create mode 100644 net/xdp/xdp_umem_props.h
>  create mode 100644 net/xdp/xsk.c
> 
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> new file mode 100644
> index 000000000000..41252135a0fe
> --- /dev/null
> +++ b/include/uapi/linux/if_xdp.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> + *
> + * if_xdp: XDP socket user-space interface
> + * Copyright(c) 2018 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * Author(s): Björn Töpel <bjorn.topel@intel.com>
> + *	      Magnus Karlsson <magnus.karlsson@intel.com>
> + */
> +
> +#ifndef _LINUX_IF_XDP_H
> +#define _LINUX_IF_XDP_H
> +
> +#include <linux/types.h>
> +
> +/* XDP socket options */
> +#define XDP_UMEM_REG			3
> +
> +struct xdp_umem_reg {
> +	__u64 addr; /* Start of packet data area */
> +	__u64 len; /* Length of packet data area */
> +	__u32 frame_size; /* Frame size */
> +	__u32 frame_headroom; /* Frame head room */
> +};
> +
> +#endif /* _LINUX_IF_XDP_H */
> diff --git a/net/Makefile b/net/Makefile
> index a6147c61b174..77aaddedbd29 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -85,3 +85,4 @@ obj-y				+= l3mdev/
>  endif
>  obj-$(CONFIG_QRTR)		+= qrtr/
>  obj-$(CONFIG_NET_NCSI)		+= ncsi/
> +obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
> diff --git a/net/xdp/Makefile b/net/xdp/Makefile
> new file mode 100644
> index 000000000000..a5d736640a0f
> --- /dev/null
> +++ b/net/xdp/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o
> +
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> new file mode 100644
> index 000000000000..bff058f5a769
> --- /dev/null
> +++ b/net/xdp/xdp_umem.c
> @@ -0,0 +1,237 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* XDP user-space packet buffer
> + * Copyright(c) 2018 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> +#include <linux/sched/task.h>
> +#include <linux/uaccess.h>
> +#include <linux/slab.h>
> +#include <linux/bpf.h>
> +#include <linux/mm.h>
> +
> +#include "xdp_umem.h"
> +
> +#define XDP_UMEM_MIN_FRAME_SIZE 2048
> +
> +int xdp_umem_create(struct xdp_umem **umem)
> +{
> +	*umem = kzalloc(sizeof(**umem), GFP_KERNEL);
> +
> +	if (!(*umem))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> +{
> +	unsigned int i;
> +
> +	if (umem->pgs) {
> +		for (i = 0; i < umem->npgs; i++)

Since you pin them with FOLL_WRITE, I assume these pages
are written to.
Don't you need set_page_dirty_lock here?

> +			put_page(umem->pgs[i]);
> +
> +		kfree(umem->pgs);
> +		umem->pgs = NULL;
> +	}
> +}
> +
> +static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
> +{
> +	if (umem->user) {
> +		atomic_long_sub(umem->npgs, &umem->user->locked_vm);
> +		free_uid(umem->user);
> +	}
> +}
> +
> +static void xdp_umem_release(struct xdp_umem *umem)
> +{
> +	struct task_struct *task;
> +	struct mm_struct *mm;
> +	unsigned long diff;
> +
> +	if (umem->pgs) {
> +		xdp_umem_unpin_pages(umem);
> +
> +		task = get_pid_task(umem->pid, PIDTYPE_PID);
> +		put_pid(umem->pid);
> +		if (!task)
> +			goto out;
> +		mm = get_task_mm(task);
> +		put_task_struct(task);
> +		if (!mm)
> +			goto out;
> +
> +		diff = umem->size >> PAGE_SHIFT;
> +
> +		down_write(&mm->mmap_sem);
> +		mm->pinned_vm -= diff;
> +		up_write(&mm->mmap_sem);
> +		mmput(mm);
> +		umem->pgs = NULL;
> +	}
> +
> +	xdp_umem_unaccount_pages(umem);
> +out:
> +	kfree(umem);
> +}
> +
> +void xdp_put_umem(struct xdp_umem *umem)
> +{
> +	if (!umem)
> +		return;
> +
> +	if (atomic_dec_and_test(&umem->users))
> +		xdp_umem_release(umem);
> +}
> +
> +static int xdp_umem_pin_pages(struct xdp_umem *umem)
> +{
> +	unsigned int gup_flags = FOLL_WRITE;
> +	long npgs;
> +	int err;
> +
> +	umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL);
> +	if (!umem->pgs)
> +		return -ENOMEM;
> +
> +	npgs = get_user_pages(umem->address, umem->npgs,
> +			      gup_flags, &umem->pgs[0], NULL);
> +	if (npgs != umem->npgs) {
> +		if (npgs >= 0) {
> +			umem->npgs = npgs;
> +			err = -ENOMEM;
> +			goto out_pin;
> +		}
> +		err = npgs;
> +		goto out_pgs;
> +	}
> +	return 0;
> +
> +out_pin:
> +	xdp_umem_unpin_pages(umem);
> +out_pgs:
> +	kfree(umem->pgs);
> +	umem->pgs = NULL;
> +	return err;
> +}
> +
> +static int xdp_umem_account_pages(struct xdp_umem *umem)
> +{
> +	unsigned long lock_limit, new_npgs, old_npgs;
> +
> +	if (capable(CAP_IPC_LOCK))
> +		return 0;
> +
> +	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +	umem->user = get_uid(current_user());
> +
> +	do {
> +		old_npgs = atomic_long_read(&umem->user->locked_vm);
> +		new_npgs = old_npgs + umem->npgs;
> +		if (new_npgs > lock_limit) {
> +			free_uid(umem->user);
> +			umem->user = NULL;
> +			return -ENOBUFS;
> +		}
> +	} while (atomic_long_cmpxchg(&umem->user->locked_vm, old_npgs,
> +				     new_npgs) != old_npgs);
> +	return 0;
> +}
> +
> +static int __xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
> +{
> +	u32 frame_size = mr->frame_size, frame_headroom = mr->frame_headroom;
> +	u64 addr = mr->addr, size = mr->len;
> +	unsigned int nframes;
> +	int size_chk, err;
> +
> +	if (frame_size < XDP_UMEM_MIN_FRAME_SIZE || frame_size > PAGE_SIZE) {
> +		/* Strictly speaking we could support this, if:
> +		 * - huge pages, or*

what does "or*" here mean?

> +		 * - using an IOMMU, or
> +		 * - making sure the memory area is consecutive
> +		 * but for now, we simply say "computer says no".
> +		 */
> +		return -EINVAL;
> +	}
> +
> +	if (!is_power_of_2(frame_size))
> +		return -EINVAL;
> +
> +	if (!PAGE_ALIGNED(addr)) {
> +		/* Memory area has to be page size aligned. For
> +		 * simplicity, this might change.
> +		 */
> +		return -EINVAL;
> +	}
> +
> +	if ((addr + size) < addr)
> +		return -EINVAL;
> +
> +	nframes = size / frame_size;
> +	if (nframes == 0 || nframes > UINT_MAX)
> +		return -EINVAL;
> +
> +	frame_headroom = ALIGN(frame_headroom, 64);
> +
> +	size_chk = frame_size - frame_headroom - XDP_PACKET_HEADROOM;
> +	if (size_chk < 0)
> +		return -EINVAL;
> +
> +	umem->pid = get_task_pid(current, PIDTYPE_PID);
> +	umem->size = (size_t)size;
> +	umem->address = (unsigned long)addr;
> +	umem->props.frame_size = frame_size;
> +	umem->props.nframes = nframes;
> +	umem->frame_headroom = frame_headroom;
> +	umem->npgs = size / PAGE_SIZE;
> +	umem->pgs = NULL;
> +	umem->user = NULL;
> +
> +	umem->frame_size_log2 = ilog2(frame_size);
> +	umem->nfpp_mask = (PAGE_SIZE / frame_size) - 1;
> +	umem->nfpplog2 = ilog2(PAGE_SIZE / frame_size);
> +	atomic_set(&umem->users, 1);
> +
> +	err = xdp_umem_account_pages(umem);
> +	if (err)
> +		goto out;
> +
> +	err = xdp_umem_pin_pages(umem);
> +	if (err)
> +		goto out;
> +	return 0;
> +
> +out:
> +	put_pid(umem->pid);
> +	return err;
> +}
> +
> +int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
> +{
> +	int err;
> +
> +	if (!umem)
> +		return -EINVAL;
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	err = __xdp_umem_reg(umem, mr);
> +
> +	up_write(&current->mm->mmap_sem);
> +	return err;
> +}
> +
> diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> new file mode 100644
> index 000000000000..58714f4f7f25
> --- /dev/null
> +++ b/net/xdp/xdp_umem.h
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + * XDP user-space packet buffer
> + * Copyright(c) 2018 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#ifndef XDP_UMEM_H_
> +#define XDP_UMEM_H_
> +
> +#include <linux/mm.h>
> +#include <linux/if_xdp.h>
> +
> +#include "xdp_umem_props.h"
> +
> +struct xdp_umem {
> +	struct page **pgs;
> +	struct xdp_umem_props props;
> +	u32 npgs;
> +	u32 frame_headroom;
> +	u32 nfpp_mask;
> +	u32 nfpplog2;
> +	u32 frame_size_log2;
> +	struct user_struct *user;
> +	struct pid *pid;
> +	unsigned long address;
> +	size_t size;
> +	atomic_t users;
> +};
> +
> +int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr);
> +void xdp_put_umem(struct xdp_umem *umem);
> +int xdp_umem_create(struct xdp_umem **umem);
> +
> +#endif /* XDP_UMEM_H_ */
> diff --git a/net/xdp/xdp_umem_props.h b/net/xdp/xdp_umem_props.h
> new file mode 100644
> index 000000000000..77fb5daf29f3
> --- /dev/null
> +++ b/net/xdp/xdp_umem_props.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + * XDP user-space packet buffer
> + * Copyright(c) 2018 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#ifndef XDP_UMEM_PROPS_H_
> +#define XDP_UMEM_PROPS_H_
> +
> +struct xdp_umem_props {
> +	u32 frame_size;
> +	u32 nframes;
> +};
> +
> +#endif /* XDP_UMEM_PROPS_H_ */
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> new file mode 100644
> index 000000000000..19fc719cbe0d
> --- /dev/null
> +++ b/net/xdp/xsk.c
> @@ -0,0 +1,223 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* XDP sockets
> + *
> + * AF_XDP sockets allows a channel between XDP programs and userspace
> + * applications.
> + * Copyright(c) 2018 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * Author(s): Björn Töpel <bjorn.topel@intel.com>
> + *	      Magnus Karlsson <magnus.karlsson@intel.com>
> + */
> +
> +#define pr_fmt(fmt) "AF_XDP: %s: " fmt, __func__
> +
> +#include <linux/if_xdp.h>
> +#include <linux/init.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> +#include <linux/sched/task.h>
> +#include <linux/socket.h>
> +#include <linux/file.h>
> +#include <linux/uaccess.h>
> +#include <linux/net.h>
> +#include <linux/netdevice.h>
> +#include <net/sock.h>
> +
> +#include "xdp_umem.h"
> +
> +struct xdp_sock {
> +	/* struct sock must be the first member of struct xdp_sock */
> +	struct sock sk;
> +	struct xdp_umem *umem;
> +	/* Protects multiple processes in the control path */
> +	struct mutex mutex;
> +};
> +
> +static struct xdp_sock *xdp_sk(struct sock *sk)
> +{
> +	return (struct xdp_sock *)sk;
> +}
> +
> +static int xsk_release(struct socket *sock)
> +{
> +	struct sock *sk = sock->sk;
> +	struct net *net;
> +
> +	if (!sk)
> +		return 0;
> +
> +	net = sock_net(sk);
> +
> +	local_bh_disable();
> +	sock_prot_inuse_add(net, sk->sk_prot, -1);
> +	local_bh_enable();
> +
> +	sock_orphan(sk);
> +	sock->sk = NULL;
> +
> +	sk_refcnt_debug_release(sk);
> +	sock_put(sk);
> +
> +	return 0;
> +}
> +
> +static int xsk_setsockopt(struct socket *sock, int level, int optname,
> +			  char __user *optval, unsigned int optlen)
> +{
> +	struct sock *sk = sock->sk;
> +	struct xdp_sock *xs = xdp_sk(sk);
> +	int err;
> +
> +	if (level != SOL_XDP)
> +		return -ENOPROTOOPT;
> +
> +	switch (optname) {
> +	case XDP_UMEM_REG:
> +	{
> +		struct xdp_umem_reg mr;
> +		struct xdp_umem *umem;
> +
> +		if (xs->umem)
> +			return -EBUSY;
> +
> +		if (copy_from_user(&mr, optval, sizeof(mr)))
> +			return -EFAULT;
> +
> +		mutex_lock(&xs->mutex);
> +		err = xdp_umem_create(&umem);
> +
> +		err = xdp_umem_reg(umem, &mr);
> +		if (err) {
> +			kfree(umem);
> +			mutex_unlock(&xs->mutex);
> +			return err;
> +		}
> +
> +		/* Make sure umem is ready before it can be seen by others */
> +		smp_wmb();
> +
> +		xs->umem = umem;
> +		mutex_unlock(&xs->mutex);
> +		return 0;
> +	}
> +	default:
> +		break;
> +	}
> +
> +	return -ENOPROTOOPT;
> +}
> +
> +static struct proto xsk_proto = {
> +	.name =		"XDP",
> +	.owner =	THIS_MODULE,
> +	.obj_size =	sizeof(struct xdp_sock),
> +};
> +
> +static const struct proto_ops xsk_proto_ops = {
> +	.family =	PF_XDP,
> +	.owner =	THIS_MODULE,
> +	.release =	xsk_release,
> +	.bind =		sock_no_bind,
> +	.connect =	sock_no_connect,
> +	.socketpair =	sock_no_socketpair,
> +	.accept =	sock_no_accept,
> +	.getname =	sock_no_getname,
> +	.poll =		sock_no_poll,
> +	.ioctl =	sock_no_ioctl,
> +	.listen =	sock_no_listen,
> +	.shutdown =	sock_no_shutdown,
> +	.setsockopt =	xsk_setsockopt,
> +	.getsockopt =	sock_no_getsockopt,
> +	.sendmsg =	sock_no_sendmsg,
> +	.recvmsg =	sock_no_recvmsg,
> +	.mmap =		sock_no_mmap,
> +	.sendpage =	sock_no_sendpage,
> +};
> +
> +static void xsk_destruct(struct sock *sk)
> +{
> +	struct xdp_sock *xs = xdp_sk(sk);
> +
> +	if (!sock_flag(sk, SOCK_DEAD))
> +		return;
> +
> +	xdp_put_umem(xs->umem);
> +
> +	sk_refcnt_debug_dec(sk);
> +}
> +
> +static int xsk_create(struct net *net, struct socket *sock, int protocol,
> +		      int kern)
> +{
> +	struct sock *sk;
> +	struct xdp_sock *xs;
> +
> +	if (!ns_capable(net->user_ns, CAP_NET_RAW))
> +		return -EPERM;
> +	if (sock->type != SOCK_RAW)
> +		return -ESOCKTNOSUPPORT;
> +
> +	if (protocol)
> +		return -EPROTONOSUPPORT;
> +
> +	sock->state = SS_UNCONNECTED;
> +
> +	sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern);
> +	if (!sk)
> +		return -ENOBUFS;
> +
> +	sock->ops = &xsk_proto_ops;
> +
> +	sock_init_data(sock, sk);
> +
> +	sk->sk_family = PF_XDP;
> +
> +	sk->sk_destruct = xsk_destruct;
> +	sk_refcnt_debug_inc(sk);
> +
> +	xs = xdp_sk(sk);
> +	mutex_init(&xs->mutex);
> +
> +	local_bh_disable();
> +	sock_prot_inuse_add(net, &xsk_proto, 1);
> +	local_bh_enable();
> +
> +	return 0;
> +}
> +
> +static const struct net_proto_family xsk_family_ops = {
> +	.family = PF_XDP,
> +	.create = xsk_create,
> +	.owner	= THIS_MODULE,
> +};
> +
> +static int __init xsk_init(void)
> +{
> +	int err;
> +
> +	err = proto_register(&xsk_proto, 0 /* no slab */);
> +	if (err)
> +		goto out;
> +
> +	err = sock_register(&xsk_family_ops);
> +	if (err)
> +		goto out_proto;
> +
> +	return 0;
> +
> +out_proto:
> +	proto_unregister(&xsk_proto);
> +out:
> +	return err;
> +}
> +
> +fs_initcall(xsk_init);
> -- 
> 2.14.1

^ permalink raw reply

* Re: [PATCH net] sfc: ARFS filter IDs
From: David Miller @ 2018-04-23 16:16 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev
In-Reply-To: <2ee1ef47-886d-278d-4a8d-234d74e26ad7@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Mon, 23 Apr 2018 14:08:51 +0100

> Associate an arbitrary ID with each ARFS filter, allowing to properly query
>  for expiry.  The association is maintained in a hash table, which is
>  protected by a spinlock.
> 
> Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations")
> Signed-off-by: Edward Cree <ecree@solarflare.com>

Hmmm, this adds a warning:

drivers/net/ethernet/sfc/farch.c: In function ‘efx_farch_filter_rfs_expire_one’:
drivers/net/ethernet/sfc/farch.c:2938:7: warning: ‘rule’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (rule)
       ^

^ permalink raw reply

* Re: [PATCH net-next] net: init sk_cookie for inet socket
From: Eric Dumazet @ 2018-04-23 16:09 UTC (permalink / raw)
  To: David Miller, laoar.shao
  Cc: eric.dumazet, alexei.starovoitov, netdev, linux-kernel
In-Reply-To: <20180423.115821.640630949143585629.davem@davemloft.net>



On 04/23/2018 08:58 AM, David Miller wrote:
> From: Yafang Shao <laoar.shao@gmail.com>
> Date: Sun, 22 Apr 2018 21:50:04 +0800
> 
>> With sk_cookie we can identify a socket, that is very helpful for
>> traceing and statistic, i.e. tcp tracepiont and ebpf.
>> So we'd better init it by default for inet socket.
>> When using it, we just need call atomic64_read(&sk->sk_cookie).
>>
>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> 
> Applied, thank you.
> 

This is adding yet another atomic_inc on a global cache line.

Most applications do not need the cookie being ever set.

The existing mechanism was fine. Set it on demand.

^ permalink raw reply

* Re: [PATCH net-next 0/2] Add configuration information to register dump and debug data
From: David Miller @ 2018-04-23 16:06 UTC (permalink / raw)
  To: denis.bolotin; +Cc: netdev, ariel.elior
In-Reply-To: <20180423115605.8531-1-denis.bolotin@cavium.com>

From: Denis Bolotin <denis.bolotin@cavium.com>
Date: Mon, 23 Apr 2018 14:56:03 +0300

> The purpose of this patchset is to add configuration information to the
> debug data collection, which already contains register dump.
> The first patch (removing the ptt) is essential because it prevents the
> unnecessary ptt acquirement when calling mcp APIs.

Series applied, thank you.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox