Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/2] PTP: add support for one-shot output
From: Richard Cochran @ 2019-08-29 17:25 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: Christopher S Hall, netdev, linux-kernel
In-Reply-To: <20190829095825.2108-2-felipe.balbi@linux.intel.com>

On Thu, Aug 29, 2019 at 12:58:25PM +0300, Felipe Balbi wrote:
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index 98ec1395544e..a407e5f76e2d 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -177,9 +177,8 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
>  			err = -EFAULT;
>  			break;
>  		}
> -		if ((req.perout.flags || req.perout.rsv[0] || req.perout.rsv[1]
> -				|| req.perout.rsv[2] || req.perout.rsv[3])
> -			&& cmd == PTP_PEROUT_REQUEST2) {
> +		if ((req.perout.rsv[0] || req.perout.rsv[1] || req.perout.rsv[2]
> +			|| req.perout.rsv[3]) && cmd == PTP_PEROUT_REQUEST2) {

Please check that the reserved bits of req.perout.flags, namely
~PTP_PEROUT_ONE_SHOT, are clear.

>  			err = -EINVAL;
>  			break;
>  		} else if (cmd == PTP_PEROUT_REQUEST) {
> diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
> index 039cd62ec706..95840e5f5c53 100644
> --- a/include/uapi/linux/ptp_clock.h
> +++ b/include/uapi/linux/ptp_clock.h
> @@ -67,7 +67,9 @@ struct ptp_perout_request {
>  	struct ptp_clock_time start;  /* Absolute start time. */
>  	struct ptp_clock_time period; /* Desired period, zero means disable. */
>  	unsigned int index;           /* Which channel to configure. */
> -	unsigned int flags;           /* Reserved for future use. */
> +
> +#define PTP_PEROUT_ONE_SHOT BIT(0)
> +	unsigned int flags;

@davem  Any CodingStyle policy on #define within a struct?  (Some
maintainers won't allow it.)

>  	unsigned int rsv[4];          /* Reserved for future use. */
>  };
>  
> -- 
> 2.23.0
> 

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/3] capability: introduce CAP_BPF and CAP_TRACING
From: Alexei Starovoitov @ 2019-08-29 17:24 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, luto, davem, peterz, rostedt, netdev, bpf,
	kernel-team, linux-api
In-Reply-To: <87ef14iffx.fsf@toke.dk>

On Thu, Aug 29, 2019 at 09:44:18AM +0200, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <ast@kernel.org> writes:
> 
> > CAP_BPF allows the following BPF operations:
> > - Loading all types of BPF programs
> > - Creating all types of BPF maps except:
> >    - stackmap that needs CAP_TRACING
> >    - devmap that needs CAP_NET_ADMIN
> >    - cpumap that needs CAP_SYS_ADMIN
> 
> Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap?

Currently it's cap_sys_admin and I think it should stay this way
because it creates kthreads.


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-29 17:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steven Rostedt, Peter Zijlstra, Alexei Starovoitov, Kees Cook,
	LSM List, James Morris, Jann Horn, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <CALCETrWYu0XB_d-MhXFgopEmBu-pog493G1e+KsE3dS32UULgA@mail.gmail.com>

On Thu, Aug 29, 2019 at 08:43:23AM -0700, Andy Lutomirski wrote:
> 
> I can imagine splitting it into three capabilities:
> 
> CAP_TRACE_KERNEL: learn which kernel functions are called when.  This
> would allow perf profiling, for example, but not sampling of kernel
> regs.
> 
> CAP_TRACE_READ_KERNEL_DATA: allow the tracing, profiling, etc features
> that can read the kernel's data.  So you get function arguments via
> kprobe, kernel regs, and APIs that expose probe_kernel_read()
> 
> CAP_TRACE_USER: trace unrelated user processes
> 
> I'm not sure the code is written in a way that makes splitting
> CAP_TRACE_KERNEL and CAP_TRACE_READ_KERNEL_DATA, and I'm not sure that
> CAP_TRACE_KERNEL is all that useful except for plain perf record
> without CAP_TRACE_READ_KERNEL_DATA.  What do you all think?  I suppose
> it could also be:
> 
> CAP_PROFILE_KERNEL: Use perf with events that aren't kprobes or
> tracepoints.  Does not grant the ability to sample regs or the kernel
> stack directly.
> 
> CAP_TRACE_KERNEL: Use all of perf, ftrace, kprobe, etc.
> 
> CAP_TRACE_USER: Use all of perf with scope limited to user mode and uprobes.

imo that makes little sense from security pov, since
such CAP_TRACE_KERNEL (ex kprobe) can trace "unrelated user process"
just as well. Yet not letting it do cleanly via uprobe.
Sort of like giving a spare key for back door of the house and
saying no, you cannot have main door key.


^ permalink raw reply

* Re: [PATCH v2 1/2] PTP: introduce new versions of IOCTLs
From: Richard Cochran @ 2019-08-29 17:21 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: Christopher S Hall, netdev, linux-kernel
In-Reply-To: <20190829095825.2108-1-felipe.balbi@linux.intel.com>

> @@ -139,11 +141,24 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
>  		break;
>  
>  	case PTP_EXTTS_REQUEST:
> +	case PTP_EXTTS_REQUEST2:
> +		memset(&req, 0, sizeof(req));
> +
>  		if (copy_from_user(&req.extts, (void __user *)arg,
>  				   sizeof(req.extts))) {
>  			err = -EFAULT;
>  			break;
>  		}
> +		if ((req.extts.flags || req.extts.rsv[0] || req.extts.rsv[1])
> +			&& cmd == PTP_EXTTS_REQUEST2) {
> +			err = -EINVAL;
> +			break;
> +		} else if (cmd == PTP_EXTTS_REQUEST) {
> +			req.extts.flags = 0;

This still isn't quite right.  Sorry that was my fault.

The req.extts.flags can be (PTP_ENABLE_FEATURE | PTP_RISING_EDGE |
PTP_FALLING_EDGE), and ENABLE is used immediately below in this case.

Please #define those bits into a valid mask, and then:

- for PTP_EXTTS_REQUEST2 check that ~mask is zero, and
- for PTP_EXTTS_REQUEST clear the ~mask bits for the drivers. 

Thanks again for cleaning this up!

Richard

> +			req.extts.rsv[0] = 0;
> +			req.extts.rsv[1] = 0;
> +		}
> +			
>  		if (req.extts.index >= ops->n_ext_ts) {
>  			err = -EINVAL;
>  			break;

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-29 17:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Andy Lutomirski, Alexei Starovoitov, Kees Cook,
	LSM List, James Morris, Jann Horn, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190829093434.36540972@gandalf.local.home>

On Thu, Aug 29, 2019 at 09:34:34AM -0400, Steven Rostedt wrote:
> 
> As the above seems to favor the idea of CAP_TRACING allowing write
> access to tracefs, should we have a CAP_TRACING_RO for just read access
> and limited perf abilities?

read only vs writeable is an attribute of the file system.
Bringing such things into caps seem wrong to me.


^ permalink raw reply

* Re: auto-split of commit. Was: [PATCH bpf-next 04/10] tools/bpf: add libbpf_prog_type_(from|to)_str helpers
From: Alexei Starovoitov @ 2019-08-29 17:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jakub Kicinski, Julia Kartseva, ast, Thomas Gleixner, rdna, bpf,
	daniel, netdev, kernel-team
In-Reply-To: <20190829065151.GB30423@kroah.com>

On Thu, Aug 29, 2019 at 08:51:51AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Aug 28, 2019 at 04:46:28PM -0700, Alexei Starovoitov wrote:
> > On Wed, Aug 28, 2019 at 04:34:22PM -0700, Jakub Kicinski wrote:
> > > 
> > > Greg, Thomas, libbpf is extracted from the kernel sources and
> > > maintained in a clone repo on GitHub for ease of packaging.
> > > 
> > > IIUC Alexei's concern is that since we are moving the commits from
> > > the kernel repo to the GitHub one we have to preserve the commits
> > > exactly as they are, otherwise SOB lines lose their power.
> > > 
> > > Can you provide some guidance on whether that's a valid concern, 
> > > or whether it's perfectly fine to apply a partial patch?
> > 
> > Right. That's exactly the concern.
> > 
> > Greg, Thomas,
> > could you please put your legal hat on and clarify the following.
> > Say some developer does a patch that modifies
> > include/uapi/linux/bpf.h
> > ..some other kernel code...and
> > tools/include/uapi/linux/bpf.h
> > 
> > That tools/include/uapi/linux/bpf.h is used by perf and by libbpf.
> > We have automatic mirror of tools/libbpf into github/libbpf/
> > so that external projects and can do git submodule of it,
> > can build packages out of it, etc.
> > 
> > The question is whether it's ok to split tools/* part out of
> > original commit, keep Author and SOB, create new commit out of it,
> > and automatically push that auto-generated commit into github mirror.
> 
> Note, I am not a laywer, and am not _your_ lawyer either, only _your_
> lawyer can answer questions as to what is best for you.
> 
> That being said, from a "are you keeping the correct authorship info",
> yes, it sounds like you are doing the correct thing here.
> 
> Look at what I do for stable kernels, I take the original commit and add
> it to "another tree" keeping the original author and s-o-b chain intact,
> and adding a "this is the original git commit id" type message to the
> changelog text so that people can link it back to the original.

I think you're describing 'git cherry-pick -x'.
The question was about taking pieces of the original commit. Not the whole commit.
Author field obviously stays, but SOB is questionable.
If author meant to change X and Y and Z. Silently taking only Z chunk of the diff
doesn't quite seem right.
If we document that such commit split happens in Documentation/bpf/bpf_devel_QA.rst
do you think it will be enough to properly inform developers?
The main concern is the surprise factor when people start seeing their commits
in the mirror, but not their full commits.


^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mv88e6xxx: fix freeing unused SERDES IRQ
From: Andrew Lunn @ 2019-08-29 17:08 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, davem, Marek Behún, f.fainelli
In-Reply-To: <20190828185511.21956-1-vivien.didelot@gmail.com>

On Wed, Aug 28, 2019 at 02:55:11PM -0400, Vivien Didelot wrote:
> Now mv88e6xxx does not enable its ports at setup itself and let
> the DSA core handle this, unused ports are disabled without being
> powered on first. While that is expected, the SERDES powering code
> was assuming that a port was already set up before powering it down,
> resulting in freeing an unused IRQ. The patch fixes this assumption.
> 
> Fixes: b759f528ca3d ("net: dsa: mv88e6xxx: enable SERDES after setup")
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

Tested-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH net-next v2 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag
From: Davide Caratti @ 2019-08-29 16:48 UTC (permalink / raw)
  To: borisp, jakub.kicinski, Eric Dumazet
  Cc: aviadye, davejwatson, davem, john.fastabend, Matthieu Baerts,
	netdev
In-Reply-To: <cover.1567095873.git.dcaratti@redhat.com>

When an application configures kernel TLS on top of a TCP socket, it's
now possible for inet_diag_handler() to collect information regarding the
protocol version, the cipher type and TX / RX configuration, in case
INET_DIAG_INFO is requested.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 include/net/tls.h              | 19 ++++++++++
 include/uapi/linux/inet_diag.h |  1 +
 include/uapi/linux/tls.h       | 15 ++++++++
 net/tls/tls_main.c             | 64 ++++++++++++++++++++++++++++++++++
 4 files changed, 99 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4997742475cd..990f1d9182a3 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -431,6 +431,25 @@ static inline bool is_tx_ready(struct tls_sw_context_tx *ctx)
 	return READ_ONCE(rec->tx_ready);
 }
 
+static inline u16 tls_user_config(struct tls_context *ctx, bool tx)
+{
+	u16 config = tx ? ctx->tx_conf : ctx->rx_conf;
+
+	switch (config) {
+	case TLS_BASE:
+		return TLS_CONF_BASE;
+	case TLS_SW:
+		return TLS_CONF_SW;
+#ifdef CONFIG_TLS_DEVICE
+	case TLS_HW:
+		return TLS_CONF_HW;
+#endif
+	case TLS_HW_RECORD:
+		return TLS_CONF_HW_RECORD;
+	}
+	return 0;
+}
+
 struct sk_buff *
 tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
 		      struct sk_buff *skb);
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index e2c6273274f3..a1ff345b3f33 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -162,6 +162,7 @@ enum {
 enum {
 	INET_ULP_INFO_UNSPEC,
 	INET_ULP_INFO_NAME,
+	INET_ULP_INFO_TLS,
 	__INET_ULP_INFO_MAX,
 };
 #define INET_ULP_INFO_MAX (__INET_ULP_INFO_MAX - 1)
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 5b9c26753e46..bcd2869ed472 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -109,4 +109,19 @@ struct tls12_crypto_info_aes_ccm_128 {
 	unsigned char rec_seq[TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE];
 };
 
+enum {
+	TLS_INFO_UNSPEC,
+	TLS_INFO_VERSION,
+	TLS_INFO_CIPHER,
+	TLS_INFO_TXCONF,
+	TLS_INFO_RXCONF,
+	__TLS_INFO_MAX,
+};
+#define TLS_INFO_MAX (__TLS_INFO_MAX - 1)
+
+#define TLS_CONF_BASE 1
+#define TLS_CONF_SW 2
+#define TLS_CONF_HW 3
+#define TLS_CONF_HW_RECORD 4
+
 #endif /* _UAPI_LINUX_TLS_H */
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index f8f2d2c3d627..3351a2ace369 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -39,6 +39,7 @@
 #include <linux/netdevice.h>
 #include <linux/sched/signal.h>
 #include <linux/inetdevice.h>
+#include <linux/inet_diag.h>
 
 #include <net/tls.h>
 
@@ -835,6 +836,67 @@ static void tls_update(struct sock *sk, struct proto *p)
 	}
 }
 
+static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
+{
+	struct tls_context *ctx;
+	u16 version, cipher_type;
+	struct nlattr *start;
+	int err;
+
+	start = nla_nest_start_noflag(skb, INET_ULP_INFO_TLS);
+	if (!start)
+		return -EMSGSIZE;
+
+	rcu_read_lock();
+	ctx = rcu_dereference(inet_csk(sk)->icsk_ulp_data);
+	if (!ctx) {
+		err = 0;
+		goto nla_failure;
+	}
+	version = ctx->prot_info.version;
+	if (version) {
+		err = nla_put_u16(skb, TLS_INFO_VERSION, version);
+		if (err)
+			goto nla_failure;
+	}
+	cipher_type = ctx->prot_info.cipher_type;
+	if (cipher_type) {
+		err = nla_put_u16(skb, TLS_INFO_CIPHER, cipher_type);
+		if (err)
+			goto nla_failure;
+	}
+	err = nla_put_u16(skb, TLS_INFO_TXCONF, tls_user_config(ctx, true));
+	if (err)
+		goto nla_failure;
+
+	err = nla_put_u16(skb, TLS_INFO_RXCONF, tls_user_config(ctx, false));
+	if (err)
+		goto nla_failure;
+
+	rcu_read_unlock();
+	nla_nest_end(skb, start);
+	return 0;
+
+nla_failure:
+	rcu_read_unlock();
+	nla_nest_cancel(skb, start);
+	return err;
+}
+
+static size_t tls_get_info_size(const struct sock *sk)
+{
+	size_t size = 0;
+
+	size += nla_total_size(0) +		/* INET_ULP_INFO_TLS */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_VERSION */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_CIPHER */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_RXCONF */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_TXCONF */
+		0;
+
+	return size;
+}
+
 void tls_register_device(struct tls_device *device)
 {
 	spin_lock_bh(&device_spinlock);
@@ -856,6 +918,8 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
 	.owner			= THIS_MODULE,
 	.init			= tls_init,
 	.update			= tls_update,
+	.get_info		= tls_get_info,
+	.get_info_size		= tls_get_info_size,
 };
 
 static int __init tls_register(void)
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v2 2/3] tcp: ulp: add functions to dump ulp-specific information
From: Davide Caratti @ 2019-08-29 16:48 UTC (permalink / raw)
  To: borisp, jakub.kicinski, Eric Dumazet
  Cc: aviadye, davejwatson, davem, john.fastabend, Matthieu Baerts,
	netdev
In-Reply-To: <cover.1567095873.git.dcaratti@redhat.com>

currently, only getsockopt(TCP_ULP) can be invoked to know if a ULP is on
top of a TCP socket. Extend idiag_get_aux() and idiag_get_aux_size(),
introduced by commit b37e88407c1d ("inet_diag: allow protocols to provide
additional data"), to report the ULP name and other information that can
be made available by the ULP through optional functions.

Users having CAP_NET_ADMIN privileges will then be able to retrieve this
information through inet_diag_handler, if they specify INET_DIAG_INFO in
the request.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 include/net/tcp.h              |  3 ++
 include/uapi/linux/inet_diag.h |  8 ++++++
 net/ipv4/tcp_diag.c            | 52 +++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 77fe87f7a992..c9a3f9688223 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2122,6 +2122,9 @@ struct tcp_ulp_ops {
 	void (*update)(struct sock *sk, struct proto *p);
 	/* cleanup ulp */
 	void (*release)(struct sock *sk);
+	/* diagnostic */
+	int (*get_info)(const struct sock *sk, struct sk_buff *skb);
+	size_t (*get_info_size)(const struct sock *sk);
 
 	char		name[TCP_ULP_NAME_MAX];
 	struct module	*owner;
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index e8baca85bac6..e2c6273274f3 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -153,11 +153,19 @@ enum {
 	INET_DIAG_BBRINFO,	/* request as INET_DIAG_VEGASINFO */
 	INET_DIAG_CLASS_ID,	/* request as INET_DIAG_TCLASS */
 	INET_DIAG_MD5SIG,
+	INET_DIAG_ULP_INFO,
 	__INET_DIAG_MAX,
 };
 
 #define INET_DIAG_MAX (__INET_DIAG_MAX - 1)
 
+enum {
+	INET_ULP_INFO_UNSPEC,
+	INET_ULP_INFO_NAME,
+	__INET_ULP_INFO_MAX,
+};
+#define INET_ULP_INFO_MAX (__INET_ULP_INFO_MAX - 1)
+
 /* INET_DIAG_MEM */
 
 struct inet_diag_meminfo {
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index a3a386236d93..babc156deabb 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -81,13 +81,42 @@ static int tcp_diag_put_md5sig(struct sk_buff *skb,
 }
 #endif
 
+static int tcp_diag_put_ulp(struct sk_buff *skb, struct sock *sk,
+			    const struct tcp_ulp_ops *ulp_ops)
+{
+	struct nlattr *nest;
+	int err;
+
+	nest = nla_nest_start_noflag(skb, INET_DIAG_ULP_INFO);
+	if (!nest)
+		return -EMSGSIZE;
+
+	err = nla_put_string(skb, INET_ULP_INFO_NAME, ulp_ops->name);
+	if (err)
+		goto nla_failure;
+
+	if (ulp_ops->get_info)
+		err = ulp_ops->get_info(sk, skb);
+	if (err)
+		goto nla_failure;
+
+	nla_nest_end(skb, nest);
+	return 0;
+
+nla_failure:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
 static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 			    struct sk_buff *skb)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
+	int err = 0;
+
 #ifdef CONFIG_TCP_MD5SIG
 	if (net_admin) {
 		struct tcp_md5sig_info *md5sig;
-		int err = 0;
 
 		rcu_read_lock();
 		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
@@ -99,11 +128,21 @@ static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 	}
 #endif
 
+	if (net_admin) {
+		const struct tcp_ulp_ops *ulp_ops;
+
+		ulp_ops = icsk->icsk_ulp_ops;
+		if (ulp_ops)
+			err = tcp_diag_put_ulp(skb, sk, ulp_ops);
+		if (err)
+			return err;
+	}
 	return 0;
 }
 
 static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	size_t size = 0;
 
 #ifdef CONFIG_TCP_MD5SIG
@@ -124,6 +163,17 @@ static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 	}
 #endif
 
+	if (net_admin) {
+		const struct tcp_ulp_ops *ulp_ops;
+
+		ulp_ops = icsk->icsk_ulp_ops;
+		if (ulp_ops) {
+			size += nla_total_size(0) +
+				nla_total_size(TCP_ULP_NAME_MAX);
+			if (ulp_ops->get_info_size)
+				size += ulp_ops->get_info_size(sk);
+		}
+	}
 	return size;
 }
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v2 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data
From: Davide Caratti @ 2019-08-29 16:48 UTC (permalink / raw)
  To: borisp, jakub.kicinski, Eric Dumazet
  Cc: aviadye, davejwatson, davem, john.fastabend, Matthieu Baerts,
	netdev
In-Reply-To: <cover.1567095873.git.dcaratti@redhat.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>

We need to make sure context does not get freed while diag
code is interrogating it. Free struct tls_context with
kfree_rcu().

We add the __rcu annotation directly in icsk, and cast it
away in the datapath accessor. Presumably all ULPs will
do a similar thing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/inet_connection_sock.h |  2 +-
 include/net/tls.h                  |  9 +++++++--
 net/core/sock_map.c                |  2 +-
 net/tls/tls_device.c               |  2 +-
 net/tls/tls_main.c                 | 26 +++++++++++++++++++-------
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c57d53e7e02c..895546058a20 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -97,7 +97,7 @@ struct inet_connection_sock {
 	const struct tcp_congestion_ops *icsk_ca_ops;
 	const struct inet_connection_sock_af_ops *icsk_af_ops;
 	const struct tcp_ulp_ops  *icsk_ulp_ops;
-	void			  *icsk_ulp_data;
+	void __rcu		  *icsk_ulp_data;
 	void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
 	struct hlist_node         icsk_listen_portaddr_node;
 	unsigned int		  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
diff --git a/include/net/tls.h b/include/net/tls.h
index 41b2d41bb1b8..4997742475cd 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -41,6 +41,7 @@
 #include <linux/tcp.h>
 #include <linux/skmsg.h>
 #include <linux/netdevice.h>
+#include <linux/rcupdate.h>
 
 #include <net/tcp.h>
 #include <net/strparser.h>
@@ -290,6 +291,7 @@ struct tls_context {
 
 	struct list_head list;
 	refcount_t refcount;
+	struct rcu_head rcu;
 };
 
 enum tls_offload_ctx_dir {
@@ -348,7 +350,7 @@ struct tls_offload_context_rx {
 #define TLS_OFFLOAD_CONTEXT_SIZE_RX					\
 	(sizeof(struct tls_offload_context_rx) + TLS_DRIVER_STATE_SIZE_RX)
 
-void tls_ctx_free(struct tls_context *ctx);
+void tls_ctx_free(struct sock *sk, struct tls_context *ctx);
 int wait_on_pending_writer(struct sock *sk, long *timeo);
 int tls_sk_query(struct sock *sk, int optname, char __user *optval,
 		int __user *optlen);
@@ -467,7 +469,10 @@ static inline struct tls_context *tls_get_ctx(const struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
-	return icsk->icsk_ulp_data;
+	/* Use RCU on icsk_ulp_data only for sock diag code,
+	 * TLS data path doesn't need rcu_dereference().
+	 */
+	return (__force void *)icsk->icsk_ulp_data;
 }
 
 static inline void tls_advance_record_sn(struct sock *sk,
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 1330a7442e5b..01998860afaa 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -345,7 +345,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 		return -EINVAL;
 	if (unlikely(idx >= map->max_entries))
 		return -E2BIG;
-	if (unlikely(icsk->icsk_ulp_data))
+	if (unlikely(rcu_access_pointer(icsk->icsk_ulp_data)))
 		return -EINVAL;
 
 	link = sk_psock_init_link();
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a470df7ffcf9..e188139f0464 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -61,7 +61,7 @@ static void tls_device_free_ctx(struct tls_context *ctx)
 	if (ctx->rx_conf == TLS_HW)
 		kfree(tls_offload_ctx_rx(ctx));
 
-	tls_ctx_free(ctx);
+	tls_ctx_free(NULL, ctx);
 }
 
 static void tls_device_gc_task(struct work_struct *work)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 43252a801c3f..f8f2d2c3d627 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -251,14 +251,26 @@ static void tls_write_space(struct sock *sk)
 	ctx->sk_write_space(sk);
 }
 
-void tls_ctx_free(struct tls_context *ctx)
+/**
+ * tls_ctx_free() - free TLS ULP context
+ * @sk:  socket to with @ctx is attached
+ * @ctx: TLS context structure
+ *
+ * Free TLS context. If @sk is %NULL caller guarantees that the socket
+ * to which @ctx was attached has no outstanding references.
+ */
+void tls_ctx_free(struct sock *sk, struct tls_context *ctx)
 {
 	if (!ctx)
 		return;
 
 	memzero_explicit(&ctx->crypto_send, sizeof(ctx->crypto_send));
 	memzero_explicit(&ctx->crypto_recv, sizeof(ctx->crypto_recv));
-	kfree(ctx);
+
+	if (sk)
+		kfree_rcu(ctx, rcu);
+	else
+		kfree(ctx);
 }
 
 static void tls_sk_proto_cleanup(struct sock *sk,
@@ -306,7 +318,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 
 	write_lock_bh(&sk->sk_callback_lock);
 	if (free_ctx)
-		icsk->icsk_ulp_data = NULL;
+		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
 	sk->sk_prot = ctx->sk_proto;
 	if (sk->sk_write_space == tls_write_space)
 		sk->sk_write_space = ctx->sk_write_space;
@@ -321,7 +333,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	ctx->sk_proto_close(sk, timeout);
 
 	if (free_ctx)
-		tls_ctx_free(ctx);
+		tls_ctx_free(sk, ctx);
 }
 
 static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
@@ -610,7 +622,7 @@ static struct tls_context *create_ctx(struct sock *sk)
 	if (!ctx)
 		return NULL;
 
-	icsk->icsk_ulp_data = ctx;
+	rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
 	ctx->setsockopt = sk->sk_prot->setsockopt;
 	ctx->getsockopt = sk->sk_prot->getsockopt;
 	ctx->sk_proto_close = sk->sk_prot->close;
@@ -651,8 +663,8 @@ static void tls_hw_sk_destruct(struct sock *sk)
 
 	ctx->sk_destruct(sk);
 	/* Free ctx */
-	tls_ctx_free(ctx);
-	icsk->icsk_ulp_data = NULL;
+	rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
+	tls_ctx_free(sk, ctx);
 }
 
 static int tls_hw_prot(struct sock *sk)
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next v2 0/3] net: tls: add socket diag
From: Davide Caratti @ 2019-08-29 16:48 UTC (permalink / raw)
  To: borisp, jakub.kicinski, Eric Dumazet
  Cc: aviadye, davejwatson, davem, john.fastabend, Matthieu Baerts,
	netdev

The current kernel does not provide any diagnostic tool, except
getsockopt(TCP_ULP), to know more about TCP sockets that have an upper
layer protocol (ULP) on top of them. This series extends the set of
information exported by INET_DIAG_INFO, to include data that are
specific to the ULP (and that might be meaningful for debug/testing
purposes).

patch 1/3 ensures that the control plane reads/updates ULP specific data
using RCU.

patch 2/3 extends INET_DIAG_INFO and allows knowing the ULP name for
each TCP socket that has done setsockopt(TCP_ULP) successfully.

patch 3/3 extends kTLS to let programs like 'ss' know the protocol
version and the cipher in use.

Changes since v1:
- don't worry about grace period when accessing ulp_ops, thanks to
  Jakub Kicinski and Eric Dumazet
- use rcu_dereference() to access ULP data in tls get_info(), and 
  test against NULL value, thanks to Jakub Kicinski
- move RCU protected section inside tls get_info(), thanks to Jakub
  Kicinski

Changes since RFC:
- some coding style fixes, thanks to Jakub Kicinski
- add X_UNSPEC as lowest value of uAPI enums, thanks to Jakub Kicinski
- fix assignment of struct nlattr *start, thanks to Jakub Kicinski
- let tls dump RXCONF and TXCONF, suggested by Jakub Kicinski
- don't dump anything if TLS version or cipher are 0 (but still return a
  constant size in get_aux_size()), thanks to Boris Pismenny
- constify first argument of get_info() and get_size()
- use RCU to access access ulp_ops, like it's done for ca_ops
- add patch 1/3, from Jakub Kicinski


Davide Caratti (2):
  tcp: ulp: add functions to dump ulp-specific information
  net: tls: export protocol version, cipher, tx_conf/rx_conf to socket
    diag

Jakub Kicinski (1):
  net/tls: use RCU protection on icsk->icsk_ulp_data

 include/net/inet_connection_sock.h |  2 +-
 include/net/tcp.h                  |  3 +
 include/net/tls.h                  | 28 +++++++++-
 include/uapi/linux/inet_diag.h     |  9 +++
 include/uapi/linux/tls.h           | 15 +++++
 net/core/sock_map.c                |  2 +-
 net/ipv4/tcp_diag.c                | 52 ++++++++++++++++-
 net/tls/tls_device.c               |  2 +-
 net/tls/tls_main.c                 | 90 +++++++++++++++++++++++++++---
 9 files changed, 190 insertions(+), 13 deletions(-)

-- 
2.20.1


^ permalink raw reply

* [PATCH v3 10/11] udp: Remove unlikely() from IS_ERR*() condition
From: Denis Efremov @ 2019-08-29 16:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denis Efremov, David S. Miller, Joe Perches, Andrew Morton,
	netdev
In-Reply-To: <20190829165025.15750-1-efremov@linux.com>

"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses
unlikely() internally.

Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netdev@vger.kernel.org
---
 include/net/udp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 79d141d2103b..bad74f780831 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -480,7 +480,7 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 	 * CB fragment
 	 */
 	segs = __skb_gso_segment(skb, features, false);
-	if (unlikely(IS_ERR_OR_NULL(segs))) {
+	if (IS_ERR_OR_NULL(segs)) {
 		int segs_nr = skb_shinfo(skb)->gso_segs;
 
 		atomic_add(segs_nr, &sk->sk_drops);
-- 
2.21.0


^ permalink raw reply related

* [PATCH v3 03/11] net/mlx5e: Remove unlikely() from WARN*() condition
From: Denis Efremov @ 2019-08-29 16:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denis Efremov, Boris Pismenny, Saeed Mahameed, Leon Romanovsky,
	Joe Perches, Andrew Morton, netdev
In-Reply-To: <20190829165025.15750-1-efremov@linux.com>

"unlikely(WARN_ON_ONCE(x))" is excessive. WARN_ON_ONCE() already uses
unlikely() internally.

Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Boris Pismenny <borisp@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 7833ddef0427..e5222d17df35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -408,7 +408,7 @@ struct sk_buff *mlx5e_ktls_handle_tx_skb(struct net_device *netdev,
 		goto out;
 
 	tls_ctx = tls_get_ctx(skb->sk);
-	if (unlikely(WARN_ON_ONCE(tls_ctx->netdev != netdev)))
+	if (WARN_ON_ONCE(tls_ctx->netdev != netdev))
 		goto err_out;
 
 	priv_tx = mlx5e_get_ktls_tx_priv_ctx(tls_ctx);
-- 
2.21.0


^ permalink raw reply related

* [PATCH v3 01/11] checkpatch: check for nested (un)?likely() calls
From: Denis Efremov @ 2019-08-29 16:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denis Efremov, Alexander Viro, Anton Altaparmakov,
	Boris Ostrovsky, Boris Pismenny, Darrick J. Wong, David S. Miller,
	Dennis Dalessandro, Dmitry Torokhov, Inaky Perez-Gonzalez,
	Juergen Gross, Leon Romanovsky, Mike Marciniszyn, Pali Rohár,
	Rob Clark, Saeed Mahameed, Sean Paul, linux-arm-msm,
	linux-fsdevel, linux-input, linux-ntfs-dev, linux-rdma,
	linux-wimax, linux-xfs, xen-devel, netdev, dri-devel, Joe Perches,
	Andrew Morton, Andy Whitcroft

IS_ERR(), IS_ERR_OR_NULL(), IS_ERR_VALUE() and WARN*() already contain
unlikely() optimization internally. Thus, there is no point in calling
these functions and defines under likely()/unlikely().

This check is based on the coccinelle rule developed by Enrico Weigelt
https://lore.kernel.org/lkml/1559767582-11081-1-git-send-email-info@metux.net/

Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Whitcroft <apw@canonical.com>
---
 scripts/checkpatch.pl | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 93a7edfe0f05..56969ce06df4 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -6480,6 +6480,12 @@ sub process {
 			     "Using $1 should generally have parentheses around the comparison\n" . $herecurr);
 		}
 
+# nested likely/unlikely calls
+		if ($line =~ /\b(?:(?:un)?likely)\s*\(\s*!?\s*(IS_ERR(?:_OR_NULL|_VALUE)?|WARN)/) {
+			WARN("LIKELY_MISUSE",
+			     "nested (un)?likely() calls, $1 already uses unlikely() internally\n" . $herecurr);
+		}
+
 # whine mightly about in_atomic
 		if ($line =~ /\bin_atomic\s*\(/) {
 			if ($realfile =~ m@^drivers/@) {
-- 
2.21.0


^ permalink raw reply related

* Re: general protection fault in tls_sk_proto_close (2)
From: Jakub Kicinski @ 2019-08-29 16:43 UTC (permalink / raw)
  To: Hillf Danton
  Cc: john.fastabend, syzbot, aviadye, borisp, daniel, davejwatson,
	davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <20190829035200.3340-1-hdanton@sina.com>

On Thu, 29 Aug 2019 11:52:00 +0800, Hillf Danton wrote:
> Alternatively work is done if sock is closed again. Anyway ctx is reset
> under sock's callback lock in write mode.
> 
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -295,6 +295,8 @@ static void tls_sk_proto_close(struct so
>  	long timeo = sock_sndtimeo(sk, 0);
>  	bool free_ctx;
>  
> +	if (!ctx)
> +		return;
>  	if (ctx->tx_conf == TLS_SW)
>  		tls_sw_cancel_work_tx(ctx);

That's no bueno, the real socket's close will never get called.

^ permalink raw reply

* Re: [PATCH net-next] r8152: fix accessing skb after napi_gro_receive
From: Eric Dumazet @ 2019-08-29 16:31 UTC (permalink / raw)
  To: Hayes Wang, netdev; +Cc: nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-302-Taiwan-albertk@realtek.com>



On 8/19/19 5:15 AM, Hayes Wang wrote:
> Fix accessing skb after napi_gro_receive which is caused by
> commit 47922fcde536 ("r8152: support skb_add_rx_frag").
> 
> Fixes: 47922fcde536 ("r8152: support skb_add_rx_frag")
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>
> ---

It is customary to add a tag to credit the reporter...

Something like :

Reported-by: ....

Thanks.

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/3] capability: introduce CAP_BPF and CAP_TRACING
From: Andy Lutomirski @ 2019-08-29 16:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David S. Miller, Peter Zijlstra,
	Steven Rostedt, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <536636ad-0baf-31e9-85fe-2591b65068df@iogearbox.net>

> On Aug 29, 2019, at 8:47 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
>> On 8/29/19 7:12 AM, Alexei Starovoitov wrote:
>> [...]
>>  +/*
>> + * CAP_BPF allows the following BPF operations:
>> + * - Loading all types of BPF programs
>> + * - Creating all types of BPF maps except:
>> + *    - stackmap that needs CAP_TRACING
>> + *    - devmap that needs CAP_NET_ADMIN
>> + *    - cpumap that needs CAP_SYS_ADMIN
>> + * - Advanced verifier features
>> + *   - Indirect variable access
>> + *   - Bounded loops
>> + *   - BPF to BPF function calls
>> + *   - Scalar precision tracking
>> + *   - Larger complexity limits
>> + *   - Dead code elimination
>> + *   - And potentially other features
>> + * - Use of pointer-to-integer conversions in BPF programs
>> + * - Bypassing of speculation attack hardening measures
>> + * - Loading BPF Type Format (BTF) data
>> + * - Iterate system wide loaded programs, maps, BTF objects
>> + * - Retrieve xlated and JITed code of BPF programs
>> + * - Access maps and programs via id
>> + * - Use bpf_spin_lock() helper
>
> This is still very wide. Consider following example: app has CAP_BPF +
> CAP_NET_ADMIN. Why can't we in this case *only* allow loading networking
> related [plus generic] maps and programs? If it doesn't have CAP_TRACING,
> what would be a reason to allow loading it? Same vice versa. There are
> some misc program types like the infraread stuff, but they could continue
> to live under [CAP_BPF +] CAP_SYS_ADMIN as fallback. I think categorizing
> a specific list of prog and map types might be more clear than disallowing
> some helpers like below (e.g. why choice of bpf_probe_read() but not
> bpf_probe_write_user() etc).

Wow, I didn’t notice that bpf_probe_write_user() existed. That should
need something like CAP_PTRACE or CAP_SYS_ADMIN.

I'm starting to think that something like this:

https://lore.kernel.org/bpf/968f3551247a43e1104b198f2e58fb0595d425e7.1565040372.git.luto@kernel.org/

should maybe be finished before CAP_BPF happens at all.  It really
looks like the bpf operations that need privilege need to get fully
catalogued and dealt with rather than just coming up with a new
capability that covers a huge swath.

(bpf_probe_write_user() is also terminally broken on architectures
like s390x, but that's not really relevant right now.  I'm a bit
surprised it works on x86 with SMAP, though.)

^ permalink raw reply

* [PATCH net-next v2 0/2] Fixes for unlocked cls hardware offload API refactoring
From: Vlad Buslov @ 2019-08-29 16:15 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, saeedm, idosch, sergei.shtylyov,
	Vlad Buslov

Two fixes for my "Refactor cls hardware offload API to support
rtnl-independent drivers" series.

Vlad Buslov (2):
  net: sched: cls_matchall: cleanup flow_action before deallocating
  net/mlx5e: Move local var definition into ifdef block

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 ++++--
 net/sched/cls_matchall.c                          | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [PATCH net-next v2 2/2] net/mlx5e: Move local var definition into ifdef block
From: Vlad Buslov @ 2019-08-29 16:15 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, saeedm, idosch, sergei.shtylyov,
	Vlad Buslov, tanhuazhong
In-Reply-To: <20190829161517.20935-1-vladbu@mellanox.com>

New local variable "struct flow_block_offload *f" was added to
mlx5e_setup_tc() in recent rtnl lock removal patches. The variable is used
in code that is only compiled when CONFIG_MLX5_ESWITCH is enabled. This
results compilation warning about unused variable when CONFIG_MLX5_ESWITCH
is not set. Move the variable definition into eswitch-specific code block
from the beginning of mlx5e_setup_tc() function.

Fixes: c9f14470d048 ("net: sched: add API for registering unlocked offload block callbacks")
Reported-by: tanhuazhong <tanhuazhong@huawei.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---

Notes:
    Changes V1 -> V2:
    
    - Fixed error in commit message.

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8592b98d0e70..c10a1fc8e469 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3470,16 +3470,18 @@ static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			  void *type_data)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
-	struct flow_block_offload *f = type_data;
 
 	switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
-	case TC_SETUP_BLOCK:
+	case TC_SETUP_BLOCK: {
+		struct flow_block_offload *f = type_data;
+
 		f->unlocked_driver_cb = true;
 		return flow_block_cb_setup_simple(type_data,
 						  &mlx5e_block_cb_list,
 						  mlx5e_setup_tc_block_cb,
 						  priv, priv, true);
+	}
 #endif
 	case TC_SETUP_QDISC_MQPRIO:
 		return mlx5e_setup_tc_mqprio(priv, type_data);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 1/2] net: sched: cls_matchall: cleanup flow_action before deallocating
From: Vlad Buslov @ 2019-08-29 16:15 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, saeedm, idosch, sergei.shtylyov,
	Vlad Buslov
In-Reply-To: <20190829161517.20935-1-vladbu@mellanox.com>

Recent rtnl lock removal patch changed flow_action infra to require proper
cleanup besides simple memory deallocation. However, matchall classifier
was not updated to call tc_cleanup_flow_action(). Add proper cleanup to
mall_replace_hw_filter() and mall_reoffload().

Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/cls_matchall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 3266f25011cc..7fc2eb62aa98 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -111,6 +111,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 
 	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSMATCHALL, &cls_mall,
 			      skip_sw, &head->flags, &head->in_hw_count, true);
+	tc_cleanup_flow_action(&cls_mall.rule->action);
 	kfree(cls_mall.rule);
 
 	if (err) {
@@ -313,6 +314,7 @@ static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 	err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSMATCHALL,
 				    &cls_mall, cb_priv, &head->flags,
 				    &head->in_hw_count);
+	tc_cleanup_flow_action(&cls_mall.rule->action);
 	kfree(cls_mall.rule);
 
 	if (err)
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net-next 2/2] net/mlx5e: Move local var definition into ifdef block
From: Vlad Buslov @ 2019-08-29 16:13 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Vlad Buslov, netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	Saeed Mahameed, Ido Schimmel, tanhuazhong
In-Reply-To: <365fea71-4f35-188d-89e1-d8b97e3df141@cogentembedded.com>


On Thu 29 Aug 2019 at 12:29, Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> wrote:
> Hello!
>
> On 28.08.2019 19:41, Vlad Buslov wrote:
>
>> New local variable "struct flow_block_offload *f" was added to
>> mlx5e_setup_tc() in recent rtnl lock removal patches. The variable is used
>> in code that is only compiled when CONFIG_MLX5_ESWITCH is enabled. This
>> results compilation warning about unused variable when CONFIG_MLX5_ESWITCH
>> is not set. Move the variable definition into eswitch-specific code block
>> from the begging of mlx5e_setup_tc() function.
>
>    Beginning?

Yep. Thanks for spotting it!

>
>> Fixes: c9f14470d048 ("net: sched: add API for registering unlocked offload block callbacks")
>> Reported-by: tanhuazhong <tanhuazhong@huawei.com>
>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> [...]
>
> MBR, Sergei


^ permalink raw reply

* Re: [PATCH bpf-next 2/3] tools: bpftool: improve and check builds for different make invocations
From: Ilya Leoshkevich @ 2019-08-29 16:03 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, netdev, oss-drivers,
	Lorenz Bauer, Jakub Kicinski
In-Reply-To: <20190829105645.12285-3-quentin.monnet@netronome.com>

> Am 29.08.2019 um 12:56 schrieb Quentin Monnet <quentin.monnet@netronome.com>:
> 
> +make_and_clean() {
> +	echo -e "\$PWD:    $PWD"
> +	echo -e "command: make -s $* >/dev/null"
> +	make $J -s $* >/dev/null

Would it make sense to set ERROR=1 if make produces a bpftool binary,
but still fails with a non-zero RC for whatever reason?


^ permalink raw reply

* [PATCH v2 net-next 04/15] net: sgi: ioc3-eth: use defines for constants dealing with desc rings
From: Thomas Bogendoerfer @ 2019-08-29 15:50 UTC (permalink / raw)
  To: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190829155014.9229-1-tbogendoerfer@suse.de>

Descriptor ring sizes of the IOC3 are more or less fixed size. To
make clearer where there is a relation to ring sizes use defines.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
---
 drivers/net/ethernet/sgi/ioc3-eth.c | 42 +++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/sgi/ioc3-eth.c b/drivers/net/ethernet/sgi/ioc3-eth.c
index 51cc1389e204..ba18a53fbbe6 100644
--- a/drivers/net/ethernet/sgi/ioc3-eth.c
+++ b/drivers/net/ethernet/sgi/ioc3-eth.c
@@ -61,10 +61,16 @@
 #include <asm/sn/ioc3.h>
 #include <asm/pci/bridge.h>
 
-/* 64 RX buffers.  This is tunable in the range of 16 <= x < 512.  The
- * value must be a power of two.
+/* Number of RX buffers.  This is tunable in the range of 16 <= x < 512.
+ * The value must be a power of two.
  */
-#define RX_BUFFS 64
+#define RX_BUFFS		64
+#define RX_RING_ENTRIES		512		/* fixed in hardware */
+#define RX_RING_MASK		(RX_RING_ENTRIES - 1)
+
+/* 128 TX buffers (not tunable) */
+#define TX_RING_ENTRIES		128
+#define TX_RING_MASK		(TX_RING_ENTRIES - 1)
 
 #define ETCSR_FD   ((17 << ETCSR_IPGR2_SHIFT) | (11 << ETCSR_IPGR1_SHIFT) | 21)
 #define ETCSR_HD   ((21 << ETCSR_IPGR2_SHIFT) | (21 << ETCSR_IPGR1_SHIFT) | 21)
@@ -76,8 +82,8 @@ struct ioc3_private {
 	u32 *ssram;
 	unsigned long *rxr;		/* pointer to receiver ring */
 	struct ioc3_etxd *txr;
-	struct sk_buff *rx_skbs[512];
-	struct sk_buff *tx_skbs[128];
+	struct sk_buff *rx_skbs[RX_RING_ENTRIES];
+	struct sk_buff *tx_skbs[TX_RING_ENTRIES];
 	int rx_ci;			/* RX consumer index */
 	int rx_pi;			/* RX producer index */
 	int tx_ci;			/* TX consumer index */
@@ -573,10 +579,10 @@ static inline void ioc3_rx(struct net_device *dev)
 		ip->rx_skbs[n_entry] = new_skb;
 		rxr[n_entry] = cpu_to_be64(ioc3_map(rxb, 1));
 		rxb->w0 = 0;				/* Clear valid flag */
-		n_entry = (n_entry + 1) & 511;		/* Update erpir */
+		n_entry = (n_entry + 1) & RX_RING_MASK;	/* Update erpir */
 
 		/* Now go on to the next ring entry.  */
-		rx_entry = (rx_entry + 1) & 511;
+		rx_entry = (rx_entry + 1) & RX_RING_MASK;
 		skb = ip->rx_skbs[rx_entry];
 		rxb = (struct ioc3_erxbuf *)(skb->data - RX_OFFSET);
 		w0 = be32_to_cpu(rxb->w0);
@@ -598,7 +604,7 @@ static inline void ioc3_tx(struct net_device *dev)
 	spin_lock(&ip->ioc3_lock);
 	etcir = readl(&regs->etcir);
 
-	tx_entry = (etcir >> 7) & 127;
+	tx_entry = (etcir >> 7) & TX_RING_MASK;
 	o_entry = ip->tx_ci;
 	packets = 0;
 	bytes = 0;
@@ -610,17 +616,17 @@ static inline void ioc3_tx(struct net_device *dev)
 		dev_consume_skb_irq(skb);
 		ip->tx_skbs[o_entry] = NULL;
 
-		o_entry = (o_entry + 1) & 127;		/* Next */
+		o_entry = (o_entry + 1) & TX_RING_MASK;	/* Next */
 
 		etcir = readl(&regs->etcir);		/* More pkts sent?  */
-		tx_entry = (etcir >> 7) & 127;
+		tx_entry = (etcir >> 7) & TX_RING_MASK;
 	}
 
 	dev->stats.tx_packets += packets;
 	dev->stats.tx_bytes += bytes;
 	ip->txqlen -= packets;
 
-	if (ip->txqlen < 128)
+	if (netif_queue_stopped(dev) && ip->txqlen < TX_RING_ENTRIES)
 		netif_wake_queue(dev);
 
 	ip->tx_ci = o_entry;
@@ -765,10 +771,10 @@ static inline void ioc3_clean_rx_ring(struct ioc3_private *ip)
 		ip->rx_skbs[ip->rx_pi] = ip->rx_skbs[ip->rx_ci];
 		ip->rxr[ip->rx_pi++] = ip->rxr[ip->rx_ci++];
 	}
-	ip->rx_pi &= 511;
-	ip->rx_ci &= 511;
+	ip->rx_pi &= RX_RING_MASK;
+	ip->rx_ci &= RX_RING_MASK;
 
-	for (i = ip->rx_ci; i != ip->rx_pi; i = (i + 1) & 511) {
+	for (i = ip->rx_ci; i != ip->rx_pi; i = (i + 1) & RX_RING_MASK) {
 		skb = ip->rx_skbs[i];
 		rxb = (struct ioc3_erxbuf *)(skb->data - RX_OFFSET);
 		rxb->w0 = 0;
@@ -780,7 +786,7 @@ static inline void ioc3_clean_tx_ring(struct ioc3_private *ip)
 	struct sk_buff *skb;
 	int i;
 
-	for (i = 0; i < 128; i++) {
+	for (i = 0; i < TX_RING_ENTRIES; i++) {
 		skb = ip->tx_skbs[i];
 		if (skb) {
 			ip->tx_skbs[i] = NULL;
@@ -812,7 +818,7 @@ static void ioc3_free_rings(struct ioc3_private *ip)
 			if (skb)
 				dev_kfree_skb_any(skb);
 
-			n_entry = (n_entry + 1) & 511;
+			n_entry = (n_entry + 1) & RX_RING_MASK;
 		}
 		free_page((unsigned long)ip->rxr);
 		ip->rxr = NULL;
@@ -1425,13 +1431,13 @@ static netdev_tx_t ioc3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	mb(); /* make sure all descriptor changes are visible */
 
 	ip->tx_skbs[produce] = skb;			/* Remember skb */
-	produce = (produce + 1) & 127;
+	produce = (produce + 1) & TX_RING_MASK;
 	ip->tx_pi = produce;
 	writel(produce << 7, &ip->regs->etpir);		/* Fire ... */
 
 	ip->txqlen++;
 
-	if (ip->txqlen >= 127)
+	if (ip->txqlen >= (TX_RING_ENTRIES - 1))
 		netif_stop_queue(dev);
 
 	spin_unlock_irq(&ip->ioc3_lock);
-- 
2.13.7


^ permalink raw reply related

* [PATCH v2 net-next 07/15] net: sgi: ioc3-eth: separate tx and rx ring handling
From: Thomas Bogendoerfer @ 2019-08-29 15:50 UTC (permalink / raw)
  To: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190829155014.9229-1-tbogendoerfer@suse.de>

After allocation of descriptor memory is now done once in probe
handling of tx ring is completely done by ioc3_clean_tx_ring. So
we remove the remaining tx ring actions out of ioc3_alloc_rings
and ioc3_free_rings and rename it to ioc3_[alloc|free]_rx_bufs
to better describe what they are doing.

Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
---
 drivers/net/ethernet/sgi/ioc3-eth.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/sgi/ioc3-eth.c b/drivers/net/ethernet/sgi/ioc3-eth.c
index e3867ea9abb7..de20f644e07d 100644
--- a/drivers/net/ethernet/sgi/ioc3-eth.c
+++ b/drivers/net/ethernet/sgi/ioc3-eth.c
@@ -778,13 +778,11 @@ static inline void ioc3_clean_tx_ring(struct ioc3_private *ip)
 	ip->tx_ci = 0;
 }
 
-static void ioc3_free_rings(struct ioc3_private *ip)
+static void ioc3_free_rx_bufs(struct ioc3_private *ip)
 {
 	struct sk_buff *skb;
 	int rx_entry, n_entry;
 
-	ioc3_clean_tx_ring(ip);
-
 	n_entry = ip->rx_ci;
 	rx_entry = ip->rx_pi;
 
@@ -797,7 +795,7 @@ static void ioc3_free_rings(struct ioc3_private *ip)
 	}
 }
 
-static void ioc3_alloc_rings(struct net_device *dev)
+static void ioc3_alloc_rx_bufs(struct net_device *dev)
 {
 	struct ioc3_private *ip = netdev_priv(dev);
 	struct ioc3_erxbuf *rxb;
@@ -827,9 +825,6 @@ static void ioc3_alloc_rings(struct net_device *dev)
 	}
 	ip->rx_ci = 0;
 	ip->rx_pi = RX_BUFFS;
-
-	ip->tx_pi = 0;
-	ip->tx_ci = 0;
 }
 
 static void ioc3_init_rings(struct net_device *dev)
@@ -838,8 +833,8 @@ static void ioc3_init_rings(struct net_device *dev)
 	struct ioc3_ethregs *regs = ip->regs;
 	unsigned long ring;
 
-	ioc3_free_rings(ip);
-	ioc3_alloc_rings(dev);
+	ioc3_free_rx_bufs(ip);
+	ioc3_alloc_rx_bufs(dev);
 
 	ioc3_clean_tx_ring(ip);
 
@@ -965,7 +960,9 @@ static int ioc3_close(struct net_device *dev)
 	ioc3_stop(ip);
 	free_irq(dev->irq, dev);
 
-	ioc3_free_rings(ip);
+	ioc3_free_rx_bufs(ip);
+	ioc3_clean_tx_ring(ip);
+
 	return 0;
 }
 
@@ -1266,7 +1263,7 @@ static int ioc3_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 out_stop:
 	ioc3_stop(ip);
 	del_timer_sync(&ip->ioc3_timer);
-	ioc3_free_rings(ip);
+	ioc3_free_rx_bufs(ip);
 	kfree(ip->rxr);
 	kfree(ip->txr);
 out_res:
-- 
2.13.7


^ permalink raw reply related

* [PATCH v2 net-next 06/15] net: sgi: ioc3-eth: get rid of ioc3_clean_rx_ring()
From: Thomas Bogendoerfer @ 2019-08-29 15:50 UTC (permalink / raw)
  To: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190829155014.9229-1-tbogendoerfer@suse.de>

Move clearing of the descriptor valid bit into ioc3_alloc_rings. This
makes ioc3_clean_rx_ring obsolete.

Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
---
 drivers/net/ethernet/sgi/ioc3-eth.c | 22 +---------------------
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/sgi/ioc3-eth.c b/drivers/net/ethernet/sgi/ioc3-eth.c
index d9d94a55ac34..e3867ea9abb7 100644
--- a/drivers/net/ethernet/sgi/ioc3-eth.c
+++ b/drivers/net/ethernet/sgi/ioc3-eth.c
@@ -761,26 +761,6 @@ static void ioc3_mii_start(struct ioc3_private *ip)
 	add_timer(&ip->ioc3_timer);
 }
 
-static inline void ioc3_clean_rx_ring(struct ioc3_private *ip)
-{
-	struct ioc3_erxbuf *rxb;
-	struct sk_buff *skb;
-	int i;
-
-	for (i = ip->rx_ci; i & 15; i++) {
-		ip->rx_skbs[ip->rx_pi] = ip->rx_skbs[ip->rx_ci];
-		ip->rxr[ip->rx_pi++] = ip->rxr[ip->rx_ci++];
-	}
-	ip->rx_pi &= RX_RING_MASK;
-	ip->rx_ci &= RX_RING_MASK;
-
-	for (i = ip->rx_ci; i != ip->rx_pi; i = (i + 1) & RX_RING_MASK) {
-		skb = ip->rx_skbs[i];
-		rxb = (struct ioc3_erxbuf *)(skb->data - RX_OFFSET);
-		rxb->w0 = 0;
-	}
-}
-
 static inline void ioc3_clean_tx_ring(struct ioc3_private *ip)
 {
 	struct sk_buff *skb;
@@ -841,6 +821,7 @@ static void ioc3_alloc_rings(struct net_device *dev)
 		/* Because we reserve afterwards. */
 		skb_put(skb, (1664 + RX_OFFSET));
 		rxb = (struct ioc3_erxbuf *)skb->data;
+		rxb->w0 = 0;	/* Clear valid flag */
 		ip->rxr[i] = cpu_to_be64(ioc3_map(rxb, 1));
 		skb_reserve(skb, RX_OFFSET);
 	}
@@ -860,7 +841,6 @@ static void ioc3_init_rings(struct net_device *dev)
 	ioc3_free_rings(ip);
 	ioc3_alloc_rings(dev);
 
-	ioc3_clean_rx_ring(ip);
 	ioc3_clean_tx_ring(ip);
 
 	/* Now the rx ring base, consume & produce registers.  */
-- 
2.13.7


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox