Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] tcp: restore autocorking
From: Eric Dumazet @ 2018-05-03 13:55 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Miller, Tal Gilboa, netdev, Michael Wenig, Eric Dumazet
In-Reply-To: <298c3b30-d3d3-6069-dd46-bcf0ce315126@mellanox.com>

On Thu, May 3, 2018 at 4:06 AM Tariq Toukan <tariqt@mellanox.com> wrote:



> On 03/05/2018 6:25 AM, Eric Dumazet wrote:
> > When adding rb-tree for TCP retransmit queue, we inadvertently broke
> > TCP autocorking.
> >
> > tcp_should_autocork() should really check if the rtx queue is not empty.
> >

> Hi Eric,

> We are glad to see that the issue that Tal investigated and reported [1]
> is now addressed.
> Thanks for doing that!

> Tal, let’s perf test to see the effect of this fix.

> Best,
> Tariq

> [1] https://patchwork.ozlabs.org/cover/822218/

Yes, but you never really gave any  insights of what exact tests you were
running :/

Sometimes the crystal ball is silent, sometimes it gives a hint :)

^ permalink raw reply

* Re: DSA switch
From: Andrew Lunn @ 2018-05-03 14:01 UTC (permalink / raw)
  To: Ran Shalit; +Cc: Jiri Pirko, netdev
In-Reply-To: <CAJ2oMhJx-n6qWchO27BV2u9Jd5X7ymtSfWeqL971DQJVQQNM-g@mail.gmail.com>

> It seems that although the bridge command functions, it takes several
> seconds (~6-7 seconds !) from the time it resturns to shell till a
> real communication works (ping between 2 PCs connected to switch).
> Does it usually takes so much time ?

PHY link up can take a second or two. Less if you use interrupts.

What do you have the bridge forwarding delay set to?

I would suggest profiling it, see what is take time, and how you can
tune it.

     Andrew

^ permalink raw reply

* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Steven Sistare @ 2018-05-03 14:01 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Alexander Duyck, Daniel Jordan, LKML, Jeff Kirsher,
	intel-wired-lan, Netdev, Greg KH
In-Reply-To: <CAKgT0UfNBugQywBZuepUJ6deEYsSWmTw3b6VbQBJLn-kKXe7ng@mail.gmail.com>

On 5/3/2018 9:46 AM, Alexander Duyck wrote:
> On Thu, May 3, 2018 at 6:30 AM, Pavel Tatashin
> <pasha.tatashin@oracle.com> wrote:
>> Hi Alex,
> 
> Hi Pavel,
> 
>>> I'm not a fan of dropping the mutex while we go through
>>> ixgbe_close_suspend. I'm concerned it will result in us having a
>>> number of races on shutdown.
>>
>> I would agree, but ixgbe_close_suspend() is already called without this
>> mutex from ixgbe_close(). This path is executed also during machine
>> shutdown but when kexec'ed. So, it is either an existing bug or there are
>> no races. But, because ixgbe_close() is called from the userland, and a
>> little earlier than ixgbe_shutdown() I think this means there are no races.
> 
> All callers of ixgbe_close are supposed to already be holding the RTNL
> mutex. That is the whole reason why we need to hold it here as the
> shutdown path doesn't have the mutex held otherwise. The mutex was
> added here because shutdown was racing with the open/close calls.
> 
>>
>>> If anything, I think we would need to find a replacement for the mutex
>>> that can operate at the per-netdev level if you are wanting to
>>> parallelize this.
>>
>> Yes, if lock is necessary, it can be replaced in this place (and added to
>> ixgbe_close())  with something scalable.
> 
> I wouldn't suggest adding yet another lock for all this. Instead it
> might make more sense for us to start looking at breaking up the
> locking since most of the networking subsystem uses the rtnl_lock call
> it might be time to start looking at doing something like cutting back
> on the use of this in cases where all the resources needed are
> specific to one device and instead have a per device lock that could
> address those kind of cases.

Hi Pavel, you might want to pull lock optimization out of this patch series.
The parallelization by itself is valuable, and optimizations for individual 
devices including locks can come later.

- Steve

^ permalink raw reply

* [PATCH] netfilter: nf_queue: Replace conntrack entry
From: Kristian Evensen @ 2018-05-03 14:07 UTC (permalink / raw)
  To: netfilter-devel
  Cc: Kristian Evensen, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, David S. Miller, coreteam, netdev, linux-kernel

SKBs are assigned a conntrack entry before being passed to any NFQUEUEs,
and if no entry is found then a new one is created. This behavior causes
problems for some traffic patterns. For example, if two UDP packets
to/from the same host (using the same ports) arrive at the "same" time,
both are assigned a new conntrack entry. After the first packet have
traversed all chains, the conntrack entry will be inserted into the
global table. The second packet will then be dropped during the
insertion step, as an entry for the same flow already exists. One type
of application that frequently generates this traffic pattern, is DNS
resolvers.

This commit introduces a new function that checks, and potentially
replaces, the conntrack entry for any additional "new" SKBs mapping to
an existing flow. While not a perfect solution, there are still
situations where to-be-dropped SKBs can slip through, the situations is
improved considerably. On the routers I have used for testing, packets
belonging to the same UDP flow are let through (when generating the
traffic pattern described above). Without the change in this commit, all
packets except the first one was dropped.

With the change in this commit, a user can implement "perfect" solutions
in user-space. An application can for example keep track of seen UDP
flows, and then only release packets belonging to one flow when the
entry has been created. Without the change, and SKB is stuck with the
original conntrack entry.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
---
 net/netfilter/nfnetlink_queue.c | 68 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index c97966298..150c11ff4 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -43,6 +43,9 @@
 
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 #include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_l3proto.h>
+#include <net/netfilter/nf_conntrack_l4proto.h>
 #endif
 
 #define NFQNL_QMAX_DEFAULT 1024
@@ -1046,6 +1049,53 @@ static int nfq_id_after(unsigned int id, unsigned int max)
 	return (int)(id - max) > 0;
 }
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+static void nfqnl_update_ct(struct net *net, struct sk_buff *skb)
+{
+	const struct nf_conntrack_l3proto *l3proto;
+	const struct nf_conntrack_l4proto *l4proto;
+	struct nf_conntrack_tuple_hash *h;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = NULL;
+	unsigned int dataoff;
+	u16 l3num;
+	u8 l4num;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	l3num = nf_ct_l3num(ct);
+	l3proto = nf_ct_l3proto_find_get(l3num);
+
+	if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff,
+				 &l4num) <= 0) {
+		return;
+	}
+
+	l4proto = nf_ct_l4proto_find_get(l3num, l4num);
+
+	if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
+			     l4num, net, &tuple, l3proto, l4proto)) {
+		return;
+	}
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)
+	h = nf_conntrack_find_get(net, &ct->zone, &tuple);
+#else
+	h = nf_conntrack_find_get(net, NULL, &tuple);
+#endif
+
+	if (h) {
+		pr_debug("%s: tuple %u %pI4:%hu -> %pI4:%hu\n", __func__,
+			 tuple.dst.protonum, &tuple.src.u3.ip,
+			 ntohs(tuple.src.u.all), &tuple.dst.u3.ip,
+			 ntohs(tuple.dst.u.all));
+		nf_ct_put(ct);
+		ct = nf_ct_tuplehash_to_ctrack(h);
+		nf_ct_set(skb, ct, IP_CT_NEW);
+	}
+}
+#endif
+
 static int nfqnl_recv_verdict_batch(struct net *net, struct sock *ctnl,
 				    struct sk_buff *skb,
 				    const struct nlmsghdr *nlh,
@@ -1060,6 +1110,7 @@ static int nfqnl_recv_verdict_batch(struct net *net, struct sock *ctnl,
 	LIST_HEAD(batch_list);
 	u16 queue_num = ntohs(nfmsg->res_id);
 	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+	enum ip_conntrack_info ctinfo;
 
 	queue = verdict_instance_lookup(q, queue_num,
 					NETLINK_CB(skb).portid);
@@ -1090,6 +1141,16 @@ static int nfqnl_recv_verdict_batch(struct net *net, struct sock *ctnl,
 	list_for_each_entry_safe(entry, tmp, &batch_list, list) {
 		if (nfqa[NFQA_MARK])
 			entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+			nf_ct_get(entry->skb, &ctinfo);
+
+			if (ctinfo == IP_CT_NEW && verdict != NF_STOLEN &&
+			    verdict != NF_DROP) {
+				nfqnl_update_ct(net, entry->skb);
+			}
+#endif
+
 		nf_reinject(entry, verdict);
 	}
 	return 0;
@@ -1213,6 +1274,13 @@ static int nfqnl_recv_verdict(struct net *net, struct sock *ctnl,
 	if (nfqa[NFQA_MARK])
 		entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	nf_ct_get(entry->skb, &ctinfo);
+
+	if (ctinfo == IP_CT_NEW && verdict != NF_STOLEN && verdict != NF_DROP)
+		nfqnl_update_ct(net, entry->skb);
+#endif
+
 	nf_reinject(entry, verdict);
 	return 0;
 }
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v1 net-next] net: dsa: mv88e6xxx: Fix name string for 88E6141
From: Marek Behún @ 2018-05-03 14:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Greg Kroah-Hartman, Vivien Didelot, Arkadi Sharshevsky,
	David S . Miller
In-Reply-To: <20180503131143.GA17027@lunn.ch>

Hmm, sorry, this is already merged. It seems I forgor to etch new
commits for some time.

On Thu, 3 May 2018 15:11:43 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Thu, May 03, 2018 at 03:06:55PM +0200, Marek Behún wrote:
> > The structure was copied from 88E6341 but the name was not changed.
> > 
> > Signed-off-by: Marek Behun <marek.behun@nic.cz>  
> 
> Hi Marek
> 
> Which tree is this against?
> 
> > ---
> >  drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
> > b/drivers/net/dsa/mv88e6xxx/chip.c index e9600a82dc83..fae362020305
> > 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > @@ -3206,7 +3206,7 @@ static const struct mv88e6xxx_info
> > mv88e6xxx_table[] = { [MV88E6141] = {
> >  		.prod_num = MV88E6XXX_PORT_SWITCH_ID_PROD_6141,
> >  		.family = MV88E6XXX_FAMILY_6341,
> > -		.name = "Marvell 88E6341",
> > +		.name = "Marvell 88E6141",
> >  		.num_databases = 4096,
> >  		.num_ports = 6,
> >  		.max_vid = 4095,
> > -- 
> > 2.16.1
> >   
> 
> commit 79a68b2631d8ec3e149081b1ecfb23509c040b4e
> Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Date:   Tue Mar 20 10:44:40 2018 +0100
> 
>     net: dsa: mv88e6xxx: Fix name of switch 88E6141
>     
>     The switch name is emitted in the kernel log, so having the right
> name there is nice.
>     
>     Fixes: 1558727a1c1b ("net: dsa: mv88e6xxx: Add support for
> ethernet switch 88E6141") Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>     Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 

^ permalink raw reply

* Re: [PATCH RESEND] SUNRPC: fix include for cmpxchg_relaxed()
From: Jeff Layton @ 2018-05-03 14:16 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, J . Bruce Fields,
	David S . Miller, linux-nfs, netdev
In-Reply-To: <20180503135050.15778-1-mark.rutland@arm.com>

On Thu, 2018-05-03 at 14:50 +0100, Mark Rutland wrote:
> Currently net/sunrpc/xprtmultipath.c is the only file outside of arch/
> headers and asm-generic/ headers to include <asm/cmpxhcg.h>, apparently
> for the use of cmpxchg_relaxed().
> 
> However, many architectures do not provide cmpxchg_relaxed() in their
> <asm/cmpxhcg.h>, and it is necessary to include <linux/atomic.h> to get
> this definition, as noted in Documentation/core-api/atomic_ops.rst:
> 
>   If someone wants to use xchg(), cmpxchg() and their variants,
>   linux/atomic.h should be included rather than asm/cmpxchg.h, unless
>   the code is in arch/* and can take care of itself.
> 
> Evidently we're getting the right header this via some transitive
> include today, but this isn't something we can/should rely upon,
> especially with ongoing rework of the atomic headers for KASAN
> instrumentation.
> 
> Let's fix the code to include <linux/atomic.h>, avoiding fragility.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Trond Myklebust <trond.myklebust@primarydata.com>
> Cc: Anna Schumaker <anna.schumaker@netapp.com>
> Cc: J. Bruce Fields <bfields@fieldses.org>
> Cc: Jeff Layton <jlayton@poochiereds.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: linux-nfs@vger.kernel.org
> Cc: netdev@vger.kernel.org
> ---
>  net/sunrpc/xprtmultipath.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> I sent this about a year ago [1], but got no response. This still applies atop
> of v4.17-rc3.
> 
> I'm currently trying to implement instrumented atomics for arm64, and it would
> be great to have this fixed.
> 
> Mark.
> 
> [1] https://lkml.kernel.org/r/1489574142-20856-1-git-send-email-mark.rutland@arm.com
> 
> diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c
> index e2d64c7138c3..d897f41be244 100644
> --- a/net/sunrpc/xprtmultipath.c
> +++ b/net/sunrpc/xprtmultipath.c
> @@ -13,7 +13,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/rculist.h>
>  #include <linux/slab.h>
> -#include <asm/cmpxchg.h>
> +#include <linux/atomic.h>
>  #include <linux/spinlock.h>
>  #include <linux/sunrpc/xprt.h>
>  #include <linux/sunrpc/addr.h>

Looks fine.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Pavel Tatashin @ 2018-05-03 14:23 UTC (permalink / raw)
  To: Steven Sistare
  Cc: alexander.duyck, Daniel Jordan, LKML, jeffrey.t.kirsher,
	intel-wired-lan, netdev, gregkh
In-Reply-To: <50c9ac34-3971-ac1b-0b5e-1bfa004fd4cf@oracle.com>

> Hi Pavel, you might want to pull lock optimization out of this patch
series.
> The parallelization by itself is valuable, and optimizations for
individual
> devices including locks can come later.

Hi Steve,

Yes, I will pull this patch out of the series. Thank you for the suggestion.

Pavel

^ permalink raw reply

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
From: Edward Cree @ 2018-05-03 14:23 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: daniel, torvalds, gregkh, luto, netdev, linux-kernel, kernel-team
In-Reply-To: <20180503043604.1604587-3-ast@kernel.org>

On 03/05/18 05:36, Alexei Starovoitov wrote:
> bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
> and user mode helper code that is embedded into bpfilter.ko
>
> The steps to build bpfilter.ko are the following:
> - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
> - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
>   is converted into bpfilter_umh.o object file
>   with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
>   Example:
>   $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
>   0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
>   0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
>   0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
> - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
>
> bpfilter_kern.c is a normal kernel module code that calls
> the fork_usermode_blob() helper to execute part of its own data
> as a user mode process.
>
> Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> is placed into .init.rodata section, so it's freed as soon as __init
> function of bpfilter.ko is finished.
> As part of __init the bpfilter.ko does first request/reply action
> via two unix pipe provided by fork_usermode_blob() helper to
> make sure that umh is healthy. If not it will kill it via pid.
>
> Later bpfilter_process_sockopt() will be called from bpfilter hooks
> in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
>
> If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
> kill umh as well.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpfilter.h      | 15 +++++++
>  include/uapi/linux/bpfilter.h | 21 ++++++++++
>  net/Kconfig                   |  2 +
>  net/Makefile                  |  1 +
>  net/bpfilter/Kconfig          | 17 ++++++++
>  net/bpfilter/Makefile         | 24 +++++++++++
>  net/bpfilter/bpfilter_kern.c  | 93 +++++++++++++++++++++++++++++++++++++++++++
>  net/bpfilter/main.c           | 63 +++++++++++++++++++++++++++++
>  net/bpfilter/msgfmt.h         | 17 ++++++++
>  net/ipv4/Makefile             |  2 +
>  net/ipv4/bpfilter/Makefile    |  2 +
>  net/ipv4/bpfilter/sockopt.c   | 42 +++++++++++++++++++
>  net/ipv4/ip_sockglue.c        | 17 ++++++++
>  13 files changed, 316 insertions(+)
>  create mode 100644 include/linux/bpfilter.h
>  create mode 100644 include/uapi/linux/bpfilter.h
>  create mode 100644 net/bpfilter/Kconfig
>  create mode 100644 net/bpfilter/Makefile
>  create mode 100644 net/bpfilter/bpfilter_kern.c
>  create mode 100644 net/bpfilter/main.c
>  create mode 100644 net/bpfilter/msgfmt.h
>  create mode 100644 net/ipv4/bpfilter/Makefile
>  create mode 100644 net/ipv4/bpfilter/sockopt.c
>
> diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h
> new file mode 100644
> index 000000000000..687b1760bb9f
> --- /dev/null
> +++ b/include/linux/bpfilter.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_BPFILTER_H
> +#define _LINUX_BPFILTER_H
> +
> +#include <uapi/linux/bpfilter.h>
> +
> +struct sock;
> +int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char *optval,
> +			    unsigned int optlen);
> +int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char *optval,
> +			    int *optlen);
> +extern int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
> +				       char __user *optval,
> +				       unsigned int optlen, bool is_set);
> +#endif
> diff --git a/include/uapi/linux/bpfilter.h b/include/uapi/linux/bpfilter.h
> new file mode 100644
> index 000000000000..2ec3cc99ea4c
> --- /dev/null
> +++ b/include/uapi/linux/bpfilter.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _UAPI_LINUX_BPFILTER_H
> +#define _UAPI_LINUX_BPFILTER_H
> +
> +#include <linux/if.h>
> +
> +enum {
> +	BPFILTER_IPT_SO_SET_REPLACE = 64,
> +	BPFILTER_IPT_SO_SET_ADD_COUNTERS = 65,
> +	BPFILTER_IPT_SET_MAX,
> +};
> +
> +enum {
> +	BPFILTER_IPT_SO_GET_INFO = 64,
> +	BPFILTER_IPT_SO_GET_ENTRIES = 65,
> +	BPFILTER_IPT_SO_GET_REVISION_MATCH = 66,
> +	BPFILTER_IPT_SO_GET_REVISION_TARGET = 67,
> +	BPFILTER_IPT_GET_MAX,
> +};
> +
> +#endif /* _UAPI_LINUX_BPFILTER_H */
> diff --git a/net/Kconfig b/net/Kconfig
> index b62089fb1332..ed6368b306fa 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -201,6 +201,8 @@ source "net/bridge/netfilter/Kconfig"
>  
>  endif
>  
> +source "net/bpfilter/Kconfig"
> +
>  source "net/dccp/Kconfig"
>  source "net/sctp/Kconfig"
>  source "net/rds/Kconfig"
> diff --git a/net/Makefile b/net/Makefile
> index a6147c61b174..7f982b7682bd 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_TLS)		+= tls/
>  obj-$(CONFIG_XFRM)		+= xfrm/
>  obj-$(CONFIG_UNIX)		+= unix/
>  obj-$(CONFIG_NET)		+= ipv6/
> +obj-$(CONFIG_BPFILTER)		+= bpfilter/
>  obj-$(CONFIG_PACKET)		+= packet/
>  obj-$(CONFIG_NET_KEY)		+= key/
>  obj-$(CONFIG_BRIDGE)		+= bridge/
> diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
> new file mode 100644
> index 000000000000..782a732b9a5c
> --- /dev/null
> +++ b/net/bpfilter/Kconfig
> @@ -0,0 +1,17 @@
> +menuconfig BPFILTER
> +	bool "BPF based packet filtering framework (BPFILTER)"
> +	default n
> +	depends on NET && BPF
> +	help
> +	  This builds experimental bpfilter framework that is aiming to
> +	  provide netfilter compatible functionality via BPF
> +
> +if BPFILTER
> +config BPFILTER_UMH
> +	tristate "bpftiler kernel module with user mode helper"
sp. "bpftiler" -> "bpfilter"
> +	default m
> +	depends on m
> +	help
> +	  This builds bpfilter kernel module with embedded user mode helper
> +endif
> +
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> new file mode 100644
> index 000000000000..897eedae523e
> --- /dev/null
> +++ b/net/bpfilter/Makefile
> @@ -0,0 +1,24 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for the Linux BPFILTER layer.
> +#
> +
> +hostprogs-y := bpfilter_umh
> +bpfilter_umh-objs := main.o
> +HOSTCFLAGS += -I. -Itools/include/
> +
> +# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> +# inside bpfilter_umh.o elf file referenced by
> +# _binary_net_bpfilter_bpfilter_umh_start symbol
> +# which bpfilter_kern.c passes further into umh blob loader at run-time
> +quiet_cmd_copy_umh = GEN $@
> +      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> +      $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
> +      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> +      --rename-section .data=.init.rodata $< $@
> +
> +$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
> +	$(call cmd,copy_umh)
> +
> +obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
> +bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
> new file mode 100644
> index 000000000000..e0a6fdd5842b
> --- /dev/null
> +++ b/net/bpfilter/bpfilter_kern.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/umh.h>
> +#include <linux/bpfilter.h>
> +#include <linux/sched.h>
> +#include <linux/sched/signal.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include "msgfmt.h"
> +
> +#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
> +#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
> +
> +extern char UMH_start;
> +extern char UMH_end;
> +
> +static struct umh_info info;
> +
> +static void shutdown_umh(struct umh_info *info)
> +{
> +	struct task_struct *tsk;
> +
> +	tsk = pid_task(find_vpid(info->pid), PIDTYPE_PID);
> +	if (tsk)
> +		force_sig(SIGKILL, tsk);
> +	fput(info->pipe_to_umh);
> +	fput(info->pipe_from_umh);
> +}
> +
> +static void stop_umh(void)
> +{
> +	if (bpfilter_process_sockopt) {
I worry about locking here.  Is it possible for two calls to
 bpfilter_process_sockopt() to run in parallel, both fail, and thus both
 call stop_umh()?  And if both end up calling shutdown_umh(), we double
 fput().
> +		bpfilter_process_sockopt = NULL;
> +		shutdown_umh(&info);
> +	}
> +}
> +
> +static int __bpfilter_process_sockopt(struct sock *sk, int optname,
> +				      char __user *optval,
> +				      unsigned int optlen, bool is_set)
> +{
> +	struct mbox_request req;
> +	struct mbox_reply reply;
> +	loff_t pos;
> +	ssize_t n;
> +
> +	req.is_set = is_set;
> +	req.pid = current->pid;
> +	req.cmd = optname;
> +	req.addr = (long)optval;
> +	req.len = optlen;
> +	n = __kernel_write(info.pipe_to_umh, &req, sizeof(req), &pos);
> +	if (n != sizeof(req)) {
> +		pr_err("write fail %zd\n", n);
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	pos = 0;
> +	n = kernel_read(info.pipe_from_umh, &reply, sizeof(reply), &pos);
> +	if (n != sizeof(reply)) {
> +		pr_err("read fail %zd\n", n);
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	return reply.status;
> +}
> +
> +static int __init load_umh(void)
> +{
> +	int err;
> +
> +	err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
> +	if (err)
> +		return err;
> +	pr_info("Loaded umh pid %d\n", info.pid);
> +	bpfilter_process_sockopt = &__bpfilter_process_sockopt;
> +
> +	if (__bpfilter_process_sockopt(NULL, 0, 0, 0, 0) != 0) {
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	return 0;
> +}
> +
> +static void __exit fini_umh(void)
> +{
> +	stop_umh();
> +}
> +module_init(load_umh);
> +module_exit(fini_umh);
> +MODULE_LICENSE("GPL");
> diff --git a/net/bpfilter/main.c b/net/bpfilter/main.c
> new file mode 100644
> index 000000000000..81bbc1684896
> --- /dev/null
> +++ b/net/bpfilter/main.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <sys/uio.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <sys/socket.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include "include/uapi/linux/bpf.h"
> +#include <asm/unistd.h>
> +#include "msgfmt.h"
> +
> +int debug_fd;
> +
> +static int handle_get_cmd(struct mbox_request *cmd)
> +{
> +	switch (cmd->cmd) {
> +	case 0:
> +		return 0;
> +	default:
> +		break;
> +	}
> +	return -ENOPROTOOPT;
> +}
> +
> +static int handle_set_cmd(struct mbox_request *cmd)
> +{
> +	return -ENOPROTOOPT;
> +}
> +
> +static void loop(void)
> +{
> +	while (1) {
> +		struct mbox_request req;
> +		struct mbox_reply reply;
> +		int n;
> +
> +		n = read(0, &req, sizeof(req));
> +		if (n != sizeof(req)) {
> +			dprintf(debug_fd, "invalid request %d\n", n);
> +			return;
> +		}
> +
> +		reply.status = req.is_set ?
> +			handle_set_cmd(&req) :
> +			handle_get_cmd(&req);
> +
> +		n = write(1, &reply, sizeof(reply));
> +		if (n != sizeof(reply)) {
> +			dprintf(debug_fd, "reply failed %d\n", n);
> +			return;
> +		}
> +	}
> +}
> +
> +int main(void)
> +{
> +	debug_fd = open("/dev/console", 00000002 | 00000100);
Should probably handle failure of this open() call.
> +	dprintf(debug_fd, "Started bpfilter\n");
> +	loop();
> +	close(debug_fd);
> +	return 0;
> +}
> diff --git a/net/bpfilter/msgfmt.h b/net/bpfilter/msgfmt.h
> new file mode 100644
> index 000000000000..94b9ac9e5114
> --- /dev/null
> +++ b/net/bpfilter/msgfmt.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _NET_BPFILTER_MSGFMT_H
> +#define _NET_BPFTILER_MSGFMT_H
Another bpftiler here, should be
+#define _NET_BPFILTER_MSGFMT_H

-Ed
> +
> +struct mbox_request {
> +	__u64 addr;
> +	__u32 len;
> +	__u32 is_set;
> +	__u32 cmd;
> +	__u32 pid;
> +};
> +
> +struct mbox_reply {
> +	__u32 status;
> +};
> +
> +#endif
> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
> index b379520f9133..7018f91c5a39 100644
> --- a/net/ipv4/Makefile
> +++ b/net/ipv4/Makefile
> @@ -16,6 +16,8 @@ obj-y     := route.o inetpeer.o protocol.o \
>  	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
>  	     metrics.o
>  
> +obj-$(CONFIG_BPFILTER) += bpfilter/
> +
>  obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
>  obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
>  obj-$(CONFIG_PROC_FS) += proc.o
> diff --git a/net/ipv4/bpfilter/Makefile b/net/ipv4/bpfilter/Makefile
> new file mode 100644
> index 000000000000..ce262d76cc48
> --- /dev/null
> +++ b/net/ipv4/bpfilter/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_BPFILTER) += sockopt.o
> +
> diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c
> new file mode 100644
> index 000000000000..42a96d2d8d05
> --- /dev/null
> +++ b/net/ipv4/bpfilter/sockopt.c
> @@ -0,0 +1,42 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/uaccess.h>
> +#include <linux/bpfilter.h>
> +#include <uapi/linux/bpf.h>
> +#include <linux/wait.h>
> +#include <linux/kmod.h>
> +
> +int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
> +				char __user *optval,
> +				unsigned int optlen, bool is_set);
> +EXPORT_SYMBOL_GPL(bpfilter_process_sockopt);
> +
> +int bpfilter_mbox_request(struct sock *sk, int optname, char __user *optval,
> +			  unsigned int optlen, bool is_set)
> +{
> +	if (!bpfilter_process_sockopt) {
> +		int err = request_module("bpfilter");
> +
> +		if (err)
> +			return err;
> +		if (!bpfilter_process_sockopt)
> +			return -ECHILD;
> +	}
> +	return bpfilter_process_sockopt(sk, optname, optval, optlen, is_set);
> +}
> +
> +int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval,
> +			    unsigned int optlen)
> +{
> +	return bpfilter_mbox_request(sk, optname, optval, optlen, true);
> +}
> +
> +int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval,
> +			    int __user *optlen)
> +{
> +	int len;
> +
> +	if (get_user(len, optlen))
> +		return -EFAULT;
> +
> +	return bpfilter_mbox_request(sk, optname, optval, len, false);
> +}
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 5ad2d8ed3a3f..e0791faacb24 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -47,6 +47,8 @@
>  #include <linux/errqueue.h>
>  #include <linux/uaccess.h>
>  
> +#include <linux/bpfilter.h>
> +
>  /*
>   *	SOL_IP control messages.
>   */
> @@ -1244,6 +1246,11 @@ int ip_setsockopt(struct sock *sk, int level,
>  		return -ENOPROTOOPT;
>  
>  	err = do_ip_setsockopt(sk, level, optname, optval, optlen);
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_SET_REPLACE &&
> +	    optname < BPFILTER_IPT_SET_MAX)
> +		err = bpfilter_ip_set_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
> @@ -1552,6 +1559,11 @@ int ip_getsockopt(struct sock *sk, int level,
>  	int err;
>  
>  	err = do_ip_getsockopt(sk, level, optname, optval, optlen, 0);
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_GET_INFO &&
> +	    optname < BPFILTER_IPT_GET_MAX)
> +		err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
> @@ -1584,6 +1596,11 @@ int compat_ip_getsockopt(struct sock *sk, int level, int optname,
>  	err = do_ip_getsockopt(sk, level, optname, optval, optlen,
>  		MSG_CMSG_COMPAT);
>  
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_GET_INFO &&
> +	    optname < BPFILTER_IPT_GET_MAX)
> +		err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&

^ permalink raw reply

* RE: [PATCH net] tcp: restore autocorking
From: Michael Wenig @ 2018-05-03 14:27 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

Thank you, Eric, for fixing this!

Michael 

-----Original Message-----
From: Eric Dumazet [mailto:edumazet@google.com] 
Sent: Wednesday, May 2, 2018 8:25 PM
To: David S . Miller <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>; Eric Dumazet <edumazet@google.com>; Michael Wenig <mwenig@vmware.com>; Eric Dumazet <eric.dumazet@gmail.com>
Subject: [PATCH net] tcp: restore autocorking

When adding rb-tree for TCP retransmit queue, we inadvertently broke TCP autocorking.

tcp_should_autocork() should really check if the rtx queue is not empty.

Tested:

Before the fix :
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      2682.85   2.47     1.59     3.618   2.329
TcpExtTCPAutoCorking            33                 0.0

// Same test, but forcing TCP_NODELAY
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      1408.75   2.44     2.96     6.802   8.259
TcpExtTCPAutoCorking            1                  0.0

After the fix :
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      5472.46   2.45     1.43     1.761   1.027
TcpExtTCPAutoCorking            361293             0.0

// With TCP_NODELAY option
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      5454.96   2.46     1.63     1.775   1.174
TcpExtTCPAutoCorking            315448             0.0

Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Wenig <mwenig@vmware.com>
Tested-by: Michael Wenig <mwenig@vmware.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 44be7f43455e4aefde8db61e2d941a69abcc642a..c9d00ef54deca15d5760bcbe154001a96fa1e2a7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -697,7 +697,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb,  {
 	return skb->len < size_goal &&
 	       sock_net(sk)->ipv4.sysctl_tcp_autocorking &&
-	       skb != tcp_write_queue_head(sk) &&
+	       !tcp_rtx_queue_empty(sk) &&
 	       refcount_read(&sk->sk_wmem_alloc) > skb->truesize;  }

^ permalink raw reply

* [PATCH v2 net-next] net: dsa: mv88e6xxx: 88E6141/6341 SERDES support
From: Marek Behún @ 2018-05-03 14:27 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Greg Kroah-Hartman, Vivien Didelot,
	Arkadi Sharshevsky, David S . Miller, Marek Behún

The 88E6141/6341 switches (also known as Topaz) have 1 SGMII lane,
which can be configured the same way as the SERDES lane on 88E6390.

Signed-off-by: Marek Behun <marek.behun@nic.cz>
---
 drivers/net/dsa/mv88e6xxx/chip.c   |  2 ++
 drivers/net/dsa/mv88e6xxx/serdes.c | 20 ++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/serdes.h |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index bd74c45f5495..fae362020305 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2426,6 +2426,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
 	.reset = mv88e6352_g1_reset,
 	.vtu_getnext = mv88e6352_g1_vtu_getnext,
 	.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+	.serdes_power = mv88e6341_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6161_ops = {
@@ -2924,6 +2925,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
 	.reset = mv88e6352_g1_reset,
 	.vtu_getnext = mv88e6352_g1_vtu_getnext,
 	.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+	.serdes_power = mv88e6341_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6350_ops = {
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c b/drivers/net/dsa/mv88e6xxx/serdes.c
index f3c01119b3d1..bfecbdf5f64d 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -227,3 +227,23 @@ int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
 
 	return 0;
 }
+
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
+{
+	int err;
+	u8 cmode;
+
+	if (port != 5)
+		return 0;
+
+	err = mv88e6xxx_port_get_cmode(chip, port, &cmode);
+	if (err)
+		return err;
+
+	if ((cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X) ||
+	     (cmode == MV88E6XXX_PORT_STS_CMODE_SGMII) ||
+	     (cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX))
+		return mv88e6390_serdes_sgmii(chip, MV88E6341_ADDR_SERDES, on);
+
+	return 0;
+}
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.h b/drivers/net/dsa/mv88e6xxx/serdes.h
index 5c1cd6d8e9a5..87bfafc5fb29 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.h
+++ b/drivers/net/dsa/mv88e6xxx/serdes.h
@@ -19,6 +19,8 @@
 #define MV88E6352_ADDR_SERDES		0x0f
 #define MV88E6352_SERDES_PAGE_FIBER	0x01
 
+#define MV88E6341_ADDR_SERDES		0x15
+
 #define MV88E6390_PORT9_LANE0		0x09
 #define MV88E6390_PORT9_LANE1		0x12
 #define MV88E6390_PORT9_LANE2		0x13
@@ -42,6 +44,7 @@
 #define MV88E6390_SGMII_CONTROL_LOOPBACK	BIT(14)
 #define MV88E6390_SGMII_CONTROL_PDOWN		BIT(11)
 
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 int mv88e6352_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 
-- 
2.16.1

^ permalink raw reply related

* Re: [PATCH v1 net-next] net: dsa: mv88e6xxx: 88E6141/6341 SERDES support
From: Marek Behún @ 2018-05-03 14:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Greg Kroah-Hartman, Vivien Didelot, Arkadi Sharshevsky,
	David S . Miller
In-Reply-To: <20180503131515.GB17027@lunn.ch>

Sent v2

On Thu, 3 May 2018 15:15:15 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Thu, May 03, 2018 at 03:06:48PM +0200, Marek Behún wrote:
> > The 88E6141/6341 switches (also known as Topaz) have 1 SGMII lane,
> > which can be configured the same way as the SERDES lane on 88E6390.
> > 
> > Signed-off-by: Marek Behun <marek.behun@nic.cz>  
> 
> > +int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port,
> > bool on) +{
> > +	int err;
> > +	u8 cmode;
> > +
> > +	if (port != 5)
> > +		return 0;
> > +
> > +	err = mv88e6xxx_port_get_cmode(chip, port, &cmode);
> > +	if (err)
> > +		return err;
> > +
> > +	if ((cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X) ||
> > +	     (cmode == MV88E6XXX_PORT_STS_CMODE_SGMII) ||
> > +	     (cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX))
> > +		return mv88e6390_serdes_sgmii(chip, 0x15, on);  
> 
> Hi Marek
> 
> Please add a #define for this 0x15.
> 
> Otherwise, this looks good.
> 
> 	   Andrew

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: Neal Cardwell @ 2018-05-03 14:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Netdev, mwenig, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

On Wed, May 2, 2018 at 11:25 PM Eric Dumazet <edumazet@google.com> wrote:

> When adding rb-tree for TCP retransmit queue, we inadvertently broke
> TCP autocorking.

> tcp_should_autocork() should really check if the rtx queue is not empty.

...

> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Nice. Thanks, Eric!

neal

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: Soheil Hassas Yeganeh @ 2018-05-03 14:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Michael Wenig, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

On Wed, May 2, 2018 at 11:25 PM, Eric Dumazet <edumazet@google.com> wrote:
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you for catching and fixing this!

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: David Miller @ 2018-05-03 15:07 UTC (permalink / raw)
  To: bjorn.topel
  Cc: magnus.karlsson, alexander.h.duyck, alexander.duyck,
	john.fastabend, ast, brouer, willemdebruijn.kernel, daniel, mst,
	netdev, bjorn.topel, michael.lundkvist, jesse.brandeburg,
	anjali.singhai, qi.z.zhang
In-Reply-To: <20180502110136.3738-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@gmail.com>
Date: Wed,  2 May 2018 13:01:21 +0200

> This patch set introduces a new address family called AF_XDP that is
> optimized for high performance packet processing and, in upcoming
> patch sets, zero-copy semantics. In this patch set, we have removed
> all zero-copy related code in order to make it smaller, simpler and
> hopefully more review friendly. This patch set only supports copy-mode
> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> for RX using the XDP_DRV path. Zero-copy support requires XDP and
> driver changes that Jesper Dangaard Brouer is working on. Some of his
> work has already been accepted. We will publish our zero-copy support
> for RX and TX on top of his patch sets at a later point in time.
 ...

Looks great.

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Do You Need A Hand?
From: Mavis Wanczyk @ 2018-05-03  9:04 UTC (permalink / raw)





I am Mavis Wanczyk i know you may not know me but am the latest  
largest US Powerball lottery winner of $758.7m just of recent, am  
currently helping out  people in need of financial assistance, i know  
it's hard to believe anything on the internet,  so if you don't need  
my help please don't reply to this message.

Regards
Mavis Wanczyk

^ permalink raw reply

* Re: [PATCH net-next 0/2] Update csum tc action for batch operation.
From: David Miller @ 2018-05-03 15:16 UTC (permalink / raw)
  To: cdillaba; +Cc: netdev, jhs, xiyou.wangcong
In-Reply-To: <1525184264-9436-1-git-send-email-cdillaba@mojatatu.com>

From: Craig Dillabaugh <cdillaba@mojatatu.com>
Date: Tue,  1 May 2018 10:17:42 -0400

> This patchset includes two patches the first updating act_csum.c 
> to include the get_fill_size routine required for batch operation, and
> the second including updated TDC tests for the feature.

Series applied.

^ permalink raw reply

* [PATCH] qed: fix spelling mistake: "offloded" -> "offloaded"
From: Colin King @ 2018-05-03 15:19 UTC (permalink / raw)
  To: Ariel Elior, everest-linux-l2, David S . Miller, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in DP_NOTICE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index fb7c2d1562ae..6acfd43c1a4f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -848,7 +848,7 @@ int qed_roce_query_qp(struct qed_hwfn *p_hwfn,
 
 	if (!(qp->resp_offloaded)) {
 		DP_NOTICE(p_hwfn,
-			  "The responder's qp should be offloded before requester's\n");
+			  "The responder's qp should be offloaded before requester's\n");
 		return -EINVAL;
 	}
 
-- 
2.17.0

^ permalink raw reply related

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
From: Naveen N. Rao @ 2018-05-03 15:20 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, netdev, Michael Ellerman
In-Reply-To: <415b415e-f47f-082c-1bc9-87d3e9d3aed1__9575.16645216874$1520270545$gmane$org@ fb.com>

Alexei Starovoitov wrote:
> On 3/1/18 12:51 AM, Naveen N. Rao wrote:
>> Daniel Borkmann wrote:
>>>
>>> Worst case if there's nothing better, potentially what one could do in
>>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>>> have the imm part as the index pointing to one of them, just unfortunate
>>> that it's likely only needed in ppc64.
>>
>> Ok. We seem to have discussed a few different aspects in this thread.
>> Let me summarize the different aspects we have discussed:
>> 1. Passing address of JIT'ed function to the JIT engines:
>>    Two approaches discussed:
>>    a. Existing approach, where the subprog address is encoded as an
>> offset from __bpf_call_base() in imm32 field of the BPF call
>> instruction. This requires the JIT'ed function to be within 2GB of
>> __bpf_call_base(), which won't be true on ppc64, at the least. So,
>> this won't on ppc64 (and any other architectures where vmalloc'ed
>> (module_alloc()) memory is from a different, far, address range).
> 
> it looks like ppc64 doesn't guarantee today that all of module_alloc()
> will be within 32-bit, but I think it should be trivial to add such
> guarantee. If so, we can define another __bpf_call_base specifically
> for bpf-to-bpf calls when jit is on.

Ok, we prefer not to do that for powerpc (atleast, not for all of 
module_alloc()) at this point.

And since option (c) below is not preferable, I think we will implement 
what Daniel suggested above. This patchset already handles communicating 
the BPF function addresses to the JIT engine, and enhancing 
bpf_prog_get_info_by_fd() should address the concerns with bpftool.

- Naveen

^ permalink raw reply

* Re: [PATCH net-next v7 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: David Miller @ 2018-05-03 15:24 UTC (permalink / raw)
  To: toke; +Cc: netdev, cake
In-Reply-To: <152527386316.14936.5409621935637217368.stgit@alrua-kau>

From: Toke Høiland-Jørgensen <toke@toke.dk>
Date: Wed, 02 May 2018 17:11:03 +0200

> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> new file mode 100644
> index 000000000000..18bc147f12bc
> --- /dev/null
> +++ b/net/sched/sch_cake.c
> +static inline cobalt_time_t cobalt_get_time(void)
> +{
> +	return ktime_get_ns();
> +}

Please do not use inline in foo.c files, let the compiler decide.

> +static inline u32
> +cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode)
> +{

Especially for this monster!  Yikes!

^ permalink raw reply

* Re: [PATCH net,stable] qmi_wwan: do not steal interfaces from class drivers
From: David Miller @ 2018-05-03 15:25 UTC (permalink / raw)
  To: bjorn; +Cc: netdev, linux-usb
In-Reply-To: <20180502202254.3021-1-bjorn@mork.no>

From: Bjørn Mork <bjorn@mork.no>
Date: Wed,  2 May 2018 22:22:54 +0200

> The USB_DEVICE_INTERFACE_NUMBER matching macro assumes that
> the { vendorid, productid, interfacenumber } set uniquely
> identifies one specific function.  This has proven to fail
> for some configurable devices. One example is the Quectel
> EM06/EP06 where the same interface number can be either
> QMI or MBIM, without the device ID changing either.
> 
> Fix by requiring the vendor-specific class for interface number
> based matching.  Functions of other classes can and should use
> class based matching instead.
> 
> Fixes: 03304bcb5ec4 ("net: qmi_wwan: use fixed interface number matching")
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
> It's quite possible that the fix should be integrated in the
> USB_DEVICE_INTERFACE_NUMBER macro instead.  But that has grown a few
> other users since it was added, so changing it now seems risky. 
> Another option is of course adding a new match macro with the
> USB_CLASS_VENDOR_SPEC match integrated. Maybe best?
> 
> But I'm proposing this as-is for now, since this quickfix seems most
> suitable for stable backporting.

Yes, this simpler approache is better for net and -stable.

Applied.

If you want to do something more sophisticated, that can be done
in net-next.

Thanks.

^ permalink raw reply

* Re: [PATCH net] rds: do not leak kernel memory to user land
From: David Miller @ 2018-05-03 15:26 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, santosh.shilimkar, linux-rdma
In-Reply-To: <20180502215339.117702-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed,  2 May 2018 14:53:39 -0700

> syzbot/KMSAN reported an uninit-value in put_cmsg(), originating
> from rds_cmsg_recv().
> 
> Simply clear the structure, since we have holes there, or since
> rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX.
 ...
> Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] inet: add bound ports statistic
From: David Miller @ 2018-05-03 15:27 UTC (permalink / raw)
  To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20180502221108.3261-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed,  2 May 2018 15:11:08 -0700

> +int inet_bind_bucket_count(struct proto *prot);
 ...
> +/* Count how many any entries are in the bind hash table */
> +unsigned int inet_bind_bucket_count(struct proto *prot)

If it doesn't build, it definitely wasn't tested.

Right? :)

^ permalink raw reply

* Re: [PATCH net-next v7 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-05-03 15:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake
In-Reply-To: <20180503.112402.774278226377528271.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Wed, 02 May 2018 17:11:03 +0200
>
>> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
>> new file mode 100644
>> index 000000000000..18bc147f12bc
>> --- /dev/null
>> +++ b/net/sched/sch_cake.c
>> +static inline cobalt_time_t cobalt_get_time(void)
>> +{
>> +	return ktime_get_ns();
>> +}
>
> Please do not use inline in foo.c files, let the compiler decide.

Right, will fix :)

>> +static inline u32
>> +cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode)
>> +{
>
> Especially for this monster!  Yikes!

Fair point...

^ permalink raw reply

* Re: [PATCH net-next] ip6_gre: correct the function name in ip6gre_tnl_addr_conflict() comment
From: David Miller @ 2018-05-03 15:29 UTC (permalink / raw)
  To: sunlw.fnst; +Cc: netdev
In-Reply-To: <20180503013429.4330-1-sunlw.fnst@cn.fujitsu.com>

From: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Date: Thu, 3 May 2018 09:34:29 +0800

> The function name is wrong in ip6gre_tnl_addr_conflict() comment, which
> use ip6_tnl_addr_conflict instead of ip6gre_tnl_addr_conflict.
> 
> Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: David Miller @ 2018-05-03 15:29 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, mwenig, eric.dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed,  2 May 2018 20:25:13 -0700

> When adding rb-tree for TCP retransmit queue, we inadvertently broke
> TCP autocorking.
> 
> tcp_should_autocork() should really check if the rtx queue is not empty.
> 
> Tested:
 ...
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox