Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Xin Long @ 2018-05-23  7:04 UTC (permalink / raw)
  To: Neil Horman
  Cc: Michael Tuexen, Marcelo Ricardo Leitner, network dev, linux-sctp,
	davem
In-Reply-To: <20180522115151.GA28740@hmswarspite.think-freely.org>

On Tue, May 22, 2018 at 7:51 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Tue, May 22, 2018 at 03:07:57PM +0800, Xin Long wrote:
>> On Mon, May 21, 2018 at 9:48 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> > On Mon, May 21, 2018 at 02:16:56PM +0200, Michael Tuexen wrote:
>> >> > On 21. May 2018, at 13:39, Neil Horman <nhorman@tuxdriver.com> wrote:
>> >> >
>> >> > On Sun, May 20, 2018 at 10:54:04PM -0300, Marcelo Ricardo Leitner wrote:
>> >> >> On Sun, May 20, 2018 at 08:50:59PM -0400, Neil Horman wrote:
>> >> >>> On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote:
>> >> >>>> This feature is actually already supported by sk->sk_reuse which can be
>> >> >>>> set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in
>> >> >>>> section 8.1.27, like:
>> >> >>>>
>> >> >>>>  - This option only supports one-to-one style SCTP sockets
>> >> >>>>  - This socket option must not be used after calling bind()
>> >> >>>>    or sctp_bindx().
>> >> >>>>
>> >> >>>> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
>> >> >>>> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
>> >> >>>> work in linux.
>> >> >>>>
>> >> >>>> This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR,
>> >> >>>> just with some extra setup limitations that are neeeded when it is being
>> >> >>>> enabled.
>> >> >>>>
>> >> >>>> "It should be noted that the behavior of the socket-level socket option
>> >> >>>> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
>> >> >>>> leaves SO_REUSEADDR as is for the compatibility.
>> >> >>>>
>> >> >>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> >> >>>> ---
>> >> >>>> include/uapi/linux/sctp.h |  1 +
>> >> >>>> net/sctp/socket.c         | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>> >> >>>> 2 files changed, 49 insertions(+)
>> >> >>>>
>> >> >>> A few things:
>> >> >>>
>> >> >>> 1) I agree with Tom, this feature is a complete duplication of the SK_REUSEPORT
>> >> >>> socket option.  I understand that this is an implementation of the option in the
>> >> >>> RFC, but its definately a duplication of a feature, which makes several things
>> >> >>> really messy.
>> >> >>>
>> >> >>> 2) The overloading of the sk_reuse opeion is a bad idea, for several reasons.
>> >> >>> Chief among them is the behavioral interference between this patch and the
>> >> >>> SO_REUSEADDR socket level option, that also sets this feature.  If you set
>> >> >>> sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature regardless
>> >> >>> of the bind or 1:1/1:m state of the socket.  Vice versa, if you set this socket
>> >> >>> option via the SCTP_PORT_REUSE option you will inadvertently turn on address
>> >> >>> reuse for the socket.  We can't do that.
>> >> >>
>> >> >> Given your comments, going a bit further here, one other big
>> >> >> implication is that a port would never be able to be considered to
>> >> >> fully meet SCTP standards regarding reuse because a rogue application
>> >> >> may always abuse of the socket level opt to gain access to the port.
>> >> >>
>> >> >> IOW, the patch allows the application to use such restrictions against
>> >> >> itself and nothing else, which undermines the patch idea.
>> >> >>
>> >> > Agreed.
>> >> >
>> >> >> I lack the knowledge on why the SCTP option was proposed in the RFC. I
>> >> >> guess they had a good reason to add the restriction on 1:1/1:m style.
>> >> >> Does the usage of the current imply in any risk to SCTP sockets? If
>> >> >> yes, that would give some grounds for going forward with the SCTP
>> >> >> option.
>> >> >>
>> >> > I'm also not privy to why the sctp option was proposed, though I expect that the
>> >> > lack of standardization of SO_REUSEPORT probably had something to do with it.
>> >> > As for the reasoning behind restriction to only 1:1 sockets, if I had to guess,
>> >> > I would say it likely because it creates ordering difficulty at the application
>> >> > level.
>> >> >
>> >> > CC-ing Michael Tuxen, who I believe had some input on this RFC.  Hopefully he
>> >> > can shed some light on this.
>> >> Dear all,
>> >>
>> >> the reason this was added is to have a specified way to allow a system to
>> >> behave like a client and server making use of the INIT collision.
>> >>
>> >> For 1-to-many style sockets you can do this by creating a socket, binding it,
>> >> calling listen on it and trying to connect to the peer.
>> >>
>> >> For 1-to-1 style sockets you need two sockets for it. One listener and one
>> >> you use to connect (and close it in case of failure, open a new one...).
>> >>
>> >> It was not clear if one can achieve this with SO_REUSEPORT and/or SO_REUSEADDR
>> >> on all platforms. We left that unspecified.
>> >>
>> >> I hope this makes the intention clearer.
>> >>
>> > I think it makes the intention clearer yes, but it unfortunately does nothing in
>> > my mind to clarify how the implementation should best handle the potential
>> > overlap in functionality.  What I see here is that we have two functional paths
>> > (the SO_REUSEPORT path and the SCTP_PORT_REUSE path), which may or may not
>> > (depending on the OS implementation achieve the same functional goal (allowing
>> > multiple sockets to share a port while allowing one socket to listen and the
>> > other connect to a remote peer).  If both implementations do the same thing on a
>> > given platform, we can either just alias one to another and be done, but if they
>> > don't then we either have to implement both paths, and ensure that the
>> > SO_REUSEPORT path is a no-op/error return for SCTP sockets, or that each path
>> > implements a distinct feature set that is cleaarly documented.
>> >
>> > That said, I think we may be in luck.  Looking at the connect and listen paths,
>> > it appears to me that:
>> >
>> > 1) Sockets ignore SO_REUSEPORT in the connect and listen paths (save for any
>> > autobinding) so it would appear that the intent of the SCTP rfc can be honored
>> > via SO_REUSEPORT on linux.
>> >
>> > 2) SO_REUSEPORT prevents changing state after a bind has occured, so we can honr
>> > that part of the SCTP RFC.
>> >
>> > The only missing part is the restriction that SCTP_REUSE_PORT has which is
>> > unaccounted for is that 1:M sockets aren't allowed to enable port reuse.
>> > However, I think the implication from Michaels description above is that port
>> > reuse on a 1:M socket is implicit because a single socket can connect and listen
>> > in that use case, rather than there being a danger to doing so.
>> >
>> > As such, I would propose that we implement this socket option by simply setting
>> > the sk->sk_reuseport field in the sock structure, and document the fact that
>> > linux does not restrict port reuse from 1:M sockets.
>> Note that, sk->sk_reuseport is not affecting linux SCTP socket at all now.
>> linux SCTP socket doesn't really have SO_REUSEADDR (sk->sk_reuse)
>> support, but use sk->sk_reuse as REUSE_PORT, (yes, it is confusing).
>> Pls refer to sctp_get_port_local().
>>
> No, its not used now, but if you do use it to do something specific to SCTP (via
> the SCTP_REUSE_PORT socket option), you risk aliasing SO_REUSEPORT behavior to
> it, and if it doesn't match what the RFC behavior mandates, thats a problem.
>
>> So I'm not sure using sk->sk_reuseport here means we will drop sk->sk_reuse
>> use in linux SCTP but use sk->sk_reuseport instead, or we will think that socket
>> enables 'port reuse' when either of them is set.
>>
> I don't think we would drop the behavior of sk_reuse here, why would we?  As far
> as I can see, the behavior of SO_REUSEADDR (not SO_REUSEPORT), isn't in
> question, is it?
>
>> Note some users may be already using SO_REUSEADDR to enable the 'port
>> reuse' in linux sctp socket. If we're changing to sk->sk_reuseport, we may face
>> a compatibility problem.
>>
> I don't see how the behavior of SO_REUSEADDR is in question here.  All I'm
> suggesting is that you simplify this patch so that the SCTP_REUSE_PORT socket
> option set sk_reuseport, as that option to my eyes conforms to the sctp rfc
> requirements.  Or am I' missing something?
No, I am :)
sk_reuseport seems more complicated than sk_reuse. I kind of mixed them.
I need to check more beofore continuing. Thanks.

>
> Neil
>
>>
>> >
>> > Thoughts?
>> > Neil
>> >
>>

^ permalink raw reply

* [PATCH] netfilter: uapi: includes linux/types.h
From: YueHaibing @ 2018-05-23  7:03 UTC (permalink / raw)
  To: pablo, kadlec; +Cc: linux-kernel, netdev, coreteam, YueHaibing

gcc-7.3.0 report following warning:
./usr/include/linux/netfilter/nf_osf.h:27: found __[us]{8,16,32,64} type without #include <linux/types.h>

includes linux/types.h to fix it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 include/uapi/linux/netfilter/nf_osf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_osf.h b/include/uapi/linux/netfilter/nf_osf.h
index 45376ea..d1dbe00 100644
--- a/include/uapi/linux/netfilter/nf_osf.h
+++ b/include/uapi/linux/netfilter/nf_osf.h
@@ -1,6 +1,8 @@
 #ifndef _NF_OSF_H
 #define _NF_OSF_H
 
+#include <linux/types.h>
+
 #define MAXGENRELEN	32
 
 #define NF_OSF_GENRE	(1 << 0)
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next v2] net: sched: don't disable bh when accessing action idr
From: Vlad Buslov @ 2018-05-23  6:57 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko, LKML
In-Reply-To: <CAM_iQpVMFSxnN6dEi0U4=aw7s7TcJ88ShiNTDXcoRUVL82jTZw@mail.gmail.com>


On Wed 23 May 2018 at 01:10, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, May 21, 2018 at 1:03 PM, Vlad Buslov <vladbu@mellanox.com> wrote:
>> Initial net_device implementation used ingress_lock spinlock to synchronize
>> ingress path of device. This lock was used in both process and bh context.
>> In some code paths action map lock was obtained while holding ingress_lock.
>> Commit e1e992e52faa ("[NET_SCHED] protect action config/dump from irqs")
>> modified actions to always disable bh, while using action map lock, in
>> order to prevent deadlock on ingress_lock in softirq. This lock was removed
>> from net_device, so disabling bh, while accessing action map, is no longer
>> necessary.
>>
>> Replace all action idr spinlock usage with regular calls that do not
>> disable bh.
>
> While your patch is probably fine, the above justification seems not.

Sorry if I missed something. My justification is based on commit
description that added bh disable in subject code.

>
> In the past, tc actions could be released in BH context because tc
> filters use call_rcu(). However, I moved them to a workqueue recently.
> So before my change I don't think you can remove the BH protection,
> otherwise race with idr_remove()...

Found commit series that you described. Will modify commit message
accordingly.

Thanks,
Vlad

^ permalink raw reply

* Re: [PATCH net] sctp: fix the issue that flags are ignored when using kernel_connect
From: Xin Long @ 2018-05-23  6:55 UTC (permalink / raw)
  To: David Miller
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Michal Kubecek
In-Reply-To: <20180522.134014.1038689773683362410.davem@davemloft.net>

On Wed, May 23, 2018 at 1:40 AM, David Miller <davem@davemloft.net> wrote:
> From: Xin Long <lucien.xin@gmail.com>
> Date: Sun, 20 May 2018 16:39:10 +0800
>
>> Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags
>> param can't be passed into its proto .connect where this flags is really
>> needed.
>>
>> sctp works around it by getting flags from socket file in __sctp_connect.
>> It works for connecting from userspace, as inherently the user sock has
>> socket file and it passes f_flags as the flags param into the proto_ops
>> .connect.
>>
>> However, the sock created by sock_create_kern doesn't have a socket file,
>> and it passes the flags (like O_NONBLOCK) by using the flags param in
>> kernel_connect, which calls proto_ops .connect later.
>>
>> So to fix it, this patch defines a new proto_ops .connect for sctp,
>> sctp_inet_connect, which calls __sctp_connect() directly with this
>> flags param. After this, the sctp's proto .connect can be removed.
>>
>> Note that sctp_inet_connect doesn't need to do some checks that are not
>> needed for sctp, which makes thing better than with inet_dgram_connect.
>>
>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>
> Applied, thank you.
>
> I don't see a Fixes: tag, please give me some guidance me wrt. -stable.
The problem is there since the beginning, I think there's no need
to go to -stable.

^ permalink raw reply

* Re: [PATCH net-next 2/3] net/ipv6: Udate fib6_table_lookup tracepoint
From: kbuild test robot @ 2018-05-23  6:46 UTC (permalink / raw)
  To: dsahern; +Cc: kbuild-all, netdev, David Ahern
In-Reply-To: <20180521212443.23612-3-dsahern@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 6415 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/dsahern-kernel-org/net-ipv4-Udate-fib_table_lookup-tracepoint/20180523-083238
config: x86_64-randconfig-s4-05231222 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net/core/net-traces.o: In function `perf_trace_fib6_table_lookup':
>> include/trace/events/fib6.h:13: undefined reference to `ip6_rt_type_to_error'
   net/core/net-traces.o: In function `trace_event_raw_event_fib6_table_lookup':
>> include/trace/events/fib6.h:13: undefined reference to `ip6_rt_type_to_error'

vim +13 include/trace/events/fib6.h

b811580d David Ahern     2015-11-19  12  
b811580d David Ahern     2015-11-19 @13  TRACE_EVENT(fib6_table_lookup,
b811580d David Ahern     2015-11-19  14  
d4bea421 David Ahern     2018-05-09  15  	TP_PROTO(const struct net *net, const struct fib6_info *f6i,
b65f164d Paolo Abeni     2017-10-19  16  		 struct fib6_table *table, const struct flowi6 *flp),
b811580d David Ahern     2015-11-19  17  
d4bea421 David Ahern     2018-05-09  18  	TP_ARGS(net, f6i, table, flp),
b811580d David Ahern     2015-11-19  19  
b811580d David Ahern     2015-11-19  20  	TP_STRUCT__entry(
b811580d David Ahern     2015-11-19  21  		__field(	u32,	tb_id		)
6d233bf4 David Ahern     2018-05-21  22  		__field(	int,	err		)
b811580d David Ahern     2015-11-19  23  		__field(	int,	oif		)
b811580d David Ahern     2015-11-19  24  		__field(	int,	iif		)
b811580d David Ahern     2015-11-19  25  		__field(	__u8,	tos		)
b811580d David Ahern     2015-11-19  26  		__field(	__u8,	scope		)
b811580d David Ahern     2015-11-19  27  		__field(	__u8,	flags		)
b811580d David Ahern     2015-11-19  28  		__array(	__u8,	src,	16	)
b811580d David Ahern     2015-11-19  29  		__array(	__u8,	dst,	16	)
6d233bf4 David Ahern     2018-05-21  30  		__field(        u16,	sport		)
6d233bf4 David Ahern     2018-05-21  31  		__field(        u16,	dport		)
6d233bf4 David Ahern     2018-05-21  32  		__field(        u8,	proto		)
6d233bf4 David Ahern     2018-05-21  33  		__field(        u8,	rt_type		)
b811580d David Ahern     2015-11-19  34  		__dynamic_array(	char,	name,	IFNAMSIZ )
b811580d David Ahern     2015-11-19  35  		__array(		__u8,	gw,	16	 )
b811580d David Ahern     2015-11-19  36  	),
b811580d David Ahern     2015-11-19  37  
b811580d David Ahern     2015-11-19  38  	TP_fast_assign(
b811580d David Ahern     2015-11-19  39  		struct in6_addr *in6;
b811580d David Ahern     2015-11-19  40  
b65f164d Paolo Abeni     2017-10-19  41  		__entry->tb_id = table->tb6_id;
6d233bf4 David Ahern     2018-05-21  42  		__entry->err = ip6_rt_type_to_error(f6i->fib6_type);
b811580d David Ahern     2015-11-19  43  		__entry->oif = flp->flowi6_oif;
b811580d David Ahern     2015-11-19  44  		__entry->iif = flp->flowi6_iif;
69716a2b Daniel Borkmann 2016-03-18  45  		__entry->tos = ip6_tclass(flp->flowlabel);
b811580d David Ahern     2015-11-19  46  		__entry->scope = flp->flowi6_scope;
b811580d David Ahern     2015-11-19  47  		__entry->flags = flp->flowi6_flags;
b811580d David Ahern     2015-11-19  48  
b811580d David Ahern     2015-11-19  49  		in6 = (struct in6_addr *)__entry->src;
b811580d David Ahern     2015-11-19  50  		*in6 = flp->saddr;
b811580d David Ahern     2015-11-19  51  
b811580d David Ahern     2015-11-19  52  		in6 = (struct in6_addr *)__entry->dst;
b811580d David Ahern     2015-11-19  53  		*in6 = flp->daddr;
b811580d David Ahern     2015-11-19  54  
6d233bf4 David Ahern     2018-05-21  55  		__entry->proto = flp->flowi6_proto;
6d233bf4 David Ahern     2018-05-21  56  		if (__entry->proto == IPPROTO_TCP ||
6d233bf4 David Ahern     2018-05-21  57  		    __entry->proto == IPPROTO_UDP) {
6d233bf4 David Ahern     2018-05-21  58  			__entry->sport = ntohs(flp->fl6_sport);
6d233bf4 David Ahern     2018-05-21  59  			__entry->dport = ntohs(flp->fl6_dport);
6d233bf4 David Ahern     2018-05-21  60  		} else {
6d233bf4 David Ahern     2018-05-21  61  			__entry->sport = 0;
6d233bf4 David Ahern     2018-05-21  62  			__entry->dport = 0;
6d233bf4 David Ahern     2018-05-21  63  		}
6d233bf4 David Ahern     2018-05-21  64  
d4bea421 David Ahern     2018-05-09  65  		if (f6i->fib6_nh.nh_dev) {
d4bea421 David Ahern     2018-05-09  66  			__assign_str(name, f6i->fib6_nh.nh_dev);
b811580d David Ahern     2015-11-19  67  		} else {
6d233bf4 David Ahern     2018-05-21  68  			__assign_str(name, "-");
b811580d David Ahern     2015-11-19  69  		}
d4bea421 David Ahern     2018-05-09  70  		if (f6i == net->ipv6.fib6_null_entry) {
b811580d David Ahern     2015-11-19  71  			struct in6_addr in6_zero = {};
b811580d David Ahern     2015-11-19  72  
b811580d David Ahern     2015-11-19  73  			in6 = (struct in6_addr *)__entry->gw;
b811580d David Ahern     2015-11-19  74  			*in6 = in6_zero;
b811580d David Ahern     2015-11-19  75  
d4bea421 David Ahern     2018-05-09  76  		} else if (f6i) {
b811580d David Ahern     2015-11-19  77  			in6 = (struct in6_addr *)__entry->gw;
d4bea421 David Ahern     2018-05-09  78  			*in6 = f6i->fib6_nh.nh_gw;
b811580d David Ahern     2015-11-19  79  		}
b811580d David Ahern     2015-11-19  80  	),
b811580d David Ahern     2015-11-19  81  
6d233bf4 David Ahern     2018-05-21  82  	TP_printk("table %3u oif %d iif %d proto %u %pI6c/%u -> %pI6c/%u tos %d scope %d flags %x ==> dev %s gw %pI6c err %d",
6d233bf4 David Ahern     2018-05-21  83  		  __entry->tb_id, __entry->oif, __entry->iif, __entry->proto,
6d233bf4 David Ahern     2018-05-21  84  		  __entry->src, __entry->sport, __entry->dst, __entry->dport,
6d233bf4 David Ahern     2018-05-21  85  		  __entry->tos, __entry->scope, __entry->flags,
6d233bf4 David Ahern     2018-05-21  86  		  __get_str(name), __entry->gw, __entry->err)
b811580d David Ahern     2015-11-19  87  );
b811580d David Ahern     2015-11-19  88  

:::::: The code at line 13 was first introduced by commit
:::::: b811580d91e9c0945b0a923dcec3e10cce04ac30 net: IPv6 fib lookup tracepoint

:::::: TO: David Ahern <dsa@cumulusnetworks.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30959 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-23  6:27 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Michael S. Tsirkin, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <8f611f3b-88d7-4d41-fd47-d07f11d0f25a@intel.com>

Tue, May 22, 2018 at 10:54:29PM CEST, sridhar.samudrala@intel.com wrote:
>
>
>On 5/22/2018 9:12 AM, Jiri Pirko wrote:
>> Fixing the subj, sorry about that.
>> 
>> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>> > On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>> > > Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>> > > > > Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> > > > > > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> > > > > > > Use the registration/notification framework supported by the generic
>> > > > > > > failover infrastructure.
>> > > > > > > 
>> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > > > > > In previous patchset versions, the common code did
>> > > > > > netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> > > > > > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> > > > > > 
>> > > > > > This should be part of the common "failover" code.
>> > > > Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>> > > > netvsc and only commonize the notifier and the main event handler routine.
>> > > > Another complication is that netvsc does part of registration in a delayed workqueue.
>> > > :( This kind of degrades the whole efford of having single solution
>> > > in "failover" module. I think that common parts, as
>> > > netdev_rx_handler_register() and others certainly is should be inside
>> > > the common module. This is not a good time to minimize changes. Let's do
>> > > the thing properly and fix the netvsc mess now.
>> > > 
>> > > 
>> > > > It should be possible to move some of the code from net_failover.c to generic
>> > > > failover.c in future if Stephen is ok with it.
>> > > > 
>> > > > 
>> > > > > Also note that in the current patchset you use IFF_FAILOVER flag for
>> > > > > master, yet for the slave you use IFF_SLAVE. That is wrong.
>> > > > > IFF_FAILOVER_SLAVE should be used.
>> > > > Not sure which code you are referring to.  I only set IFF_FAILOVER_SLAVE
>> > > > in patch 3.
>> > > The existing netvsc driver.
>> > We really can't change netvsc's flags now, even if it's interface is
>> > messy, it's being used in the field. We can add a flag that makes netvsc
>> > behave differently, and if this flag also allows enhanced functionality
>> > userspace will gradually switch.
>> Okay, although in this case, it really does not make much sense, so be
>> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
>> now. (This once-wrong-forever-wrong policy is flustrating me).
>> 
>> But since this patchset introduces private flag IFF_FAILOVER and
>> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
>> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
>> netdevice to get at least some consistency between virtio_net and
>> netvsc.
>
>OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
>register/unregister routines so that it is consistent with virtio_net.
>
>Based on your discussion with mst, i think we can even remove IFF_SLAVE
>setting on netvsc as it should not impact userspace.  If Stephen is OK
>we can make this change too.
>
>Do you see any other items that need to be resolved for this series to go in
>this merge window?

As I wrote previously, the common code including rx_handler registration
and setting of flags and master link should be done in a common code,
moved away from netvsc code.

Thanks.


>
>
>
>> 
>> > Anything breaking userspace I fully expect Stephen to nack and
>> > IMO with good reason.
>> > 
>> > -- 
>> > MST
>

^ permalink raw reply

* [PATCH net-next v3 4/7] net: dsa: qca8k: Force CPU port to its highest bandwidth
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>

By default autonegotiation is enabled to configure MAC on all ports.
For the CPU port autonegotiation can not be used so we need to set
some sensible defaults manually.

This patch forces the default setting of the CPU port to 1000Mbps/full
duplex which is the chip maximum capability.

Also correct size of the bit field used to configure link speed.

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v3:
 - none

Changes in v2:
 - Add "Fixes" tag as pointed out by Florian.
 - Add "Reviewed-by" tags from Andrew and Florian.

 drivers/net/dsa/qca8k.c | 6 +++++-
 drivers/net/dsa/qca8k.h | 6 ++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 0d224f3..14a108b38 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -537,6 +537,7 @@ qca8k_setup(struct dsa_switch *ds)
 {
 	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
 	int ret, i, phy_mode = -1;
+	u32 mask;
 
 	pr_debug("qca: setup\n");
 
@@ -564,7 +565,10 @@ qca8k_setup(struct dsa_switch *ds)
 	if (ret < 0)
 		return ret;
 
-	/* Enable CPU Port */
+	/* Enable CPU Port, force it to maximum bandwidth and full-duplex */
+	mask = QCA8K_PORT_STATUS_SPEED_1000 | QCA8K_PORT_STATUS_TXFLOW |
+	       QCA8K_PORT_STATUS_RXFLOW | QCA8K_PORT_STATUS_DUPLEX;
+	qca8k_write(priv, QCA8K_REG_PORT_STATUS(QCA8K_CPU_PORT), mask);
 	qca8k_reg_set(priv, QCA8K_REG_GLOBAL_FW_CTRL0,
 		      QCA8K_GLOBAL_FW_CTRL0_CPU_PORT_EN);
 	qca8k_port_set_status(priv, QCA8K_CPU_PORT, 1);
diff --git a/drivers/net/dsa/qca8k.h b/drivers/net/dsa/qca8k.h
index 1cf8a92..5bda165 100644
--- a/drivers/net/dsa/qca8k.h
+++ b/drivers/net/dsa/qca8k.h
@@ -51,8 +51,10 @@
 #define QCA8K_GOL_MAC_ADDR0				0x60
 #define QCA8K_GOL_MAC_ADDR1				0x64
 #define QCA8K_REG_PORT_STATUS(_i)			(0x07c + (_i) * 4)
-#define   QCA8K_PORT_STATUS_SPEED			GENMASK(2, 0)
-#define   QCA8K_PORT_STATUS_SPEED_S			0
+#define   QCA8K_PORT_STATUS_SPEED			GENMASK(1, 0)
+#define   QCA8K_PORT_STATUS_SPEED_10			0
+#define   QCA8K_PORT_STATUS_SPEED_100			0x1
+#define   QCA8K_PORT_STATUS_SPEED_1000			0x2
 #define   QCA8K_PORT_STATUS_TXMAC			BIT(2)
 #define   QCA8K_PORT_STATUS_RXMAC			BIT(3)
 #define   QCA8K_PORT_STATUS_TXFLOW			BIT(4)
-- 
2.7.4

^ permalink raw reply related

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Christoph Hellwig @ 2018-05-23  6:20 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Jason Gunthorpe, davem, dledford, Sindhu Devale, netdev,
	linux-rdma, nhorman, sassmann, jogreene, Shiraz Saleem
In-Reply-To: <2fcc22b088fd04dafbdc1582d03c4bca21eefade.camel@intel.com>

On Tue, May 22, 2018 at 02:50:32PM -0700, Jeff Kirsher wrote:
> The ABI rarely changes, if at all.  The issue OSV's are seeing is that
> upgrading i40e, requires that i40iw be recompiled even though there
> were no updates/changes to the ABI.

So fscking what.  If you upgrade one part of the kernel you have to
rebuild anbything.  If people try to get away without that that is
their problem, an certainly not something in any way supported by
Linux.

^ permalink raw reply

* [PATCH net-next v3 7/7] net: dsa: qca8k: Remove redundant parentheses
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>

Fix warning reported by checkpatch.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v3:
 - none

Changes in v2:
 - Fix typo in subject.
 - Add "Reviewed-by" tags from Andrew and Florian.

 drivers/net/dsa/qca8k.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index c834893..c0da402 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -513,7 +513,7 @@ qca8k_port_set_status(struct qca8k_priv *priv, int port, int enable)
 	pr_debug("qca: port %i set status %i\n", port, enable);
 
 	/* Port 0 and 6 have no internal PHY */
-	if ((port > 0) && (port < 6))
+	if (port > 0 && port < 6)
 		mask |= QCA8K_PORT_STATUS_LINK_AUTO;
 
 	if (enable)
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v3 6/7] net: dsa: qca8k: Replace GPL boilerplate by SPDX
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>

Replace the GPLv2 license boilerplate with the SPDX license identifier.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v3:
 - none

Changes in v2:
 - Add commit message.
 - Add "Reviewed-by" tags from Andrew and Florian.

 drivers/net/dsa/qca8k.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 7eba987..c834893 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -1,17 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (C) 2009 Felix Fietkau <nbd@nbd.name>
  * Copyright (C) 2011-2012 Gabor Juhos <juhosg@openwrt.org>
  * Copyright (c) 2015, The Linux Foundation. All rights reserved.
  * Copyright (c) 2016 John Crispin <john@phrozen.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
  */
 
 #define DEBUG
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v3 5/7] net: dsa: qca8k: Allow overwriting CPU port setting
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>

Implement adjust_link function that allows to overwrite default CPU port
setting using fixed-link device tree subnode.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v3:
 - none

Changes in v2:
 - Add "Reviewed-by" tags from Andrew and Florian.

 drivers/net/dsa/qca8k.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 drivers/net/dsa/qca8k.h |  1 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 14a108b38..7eba987 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -636,6 +636,47 @@ qca8k_setup(struct dsa_switch *ds)
 	return 0;
 }
 
+static void
+qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy)
+{
+	struct qca8k_priv *priv = ds->priv;
+	u32 reg;
+
+	/* Force fixed-link setting for CPU port, skip others. */
+	if (!phy_is_pseudo_fixed_link(phy))
+		return;
+
+	/* Set port speed */
+	switch (phy->speed) {
+	case 10:
+		reg = QCA8K_PORT_STATUS_SPEED_10;
+		break;
+	case 100:
+		reg = QCA8K_PORT_STATUS_SPEED_100;
+		break;
+	case 1000:
+		reg = QCA8K_PORT_STATUS_SPEED_1000;
+		break;
+	default:
+		dev_dbg(priv->dev, "port%d link speed %dMbps not supported.\n",
+			port, phy->speed);
+		return;
+	}
+
+	/* Set duplex mode */
+	if (phy->duplex == DUPLEX_FULL)
+		reg |= QCA8K_PORT_STATUS_DUPLEX;
+
+	/* Force flow control */
+	if (dsa_is_cpu_port(ds, port))
+		reg |= QCA8K_PORT_STATUS_RXFLOW | QCA8K_PORT_STATUS_TXFLOW;
+
+	/* Force link down before changing MAC options */
+	qca8k_port_set_status(priv, port, 0);
+	qca8k_write(priv, QCA8K_REG_PORT_STATUS(port), reg);
+	qca8k_port_set_status(priv, port, 1);
+}
+
 static int
 qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum)
 {
@@ -909,6 +950,7 @@ qca8k_get_tag_protocol(struct dsa_switch *ds, int port)
 static const struct dsa_switch_ops qca8k_switch_ops = {
 	.get_tag_protocol	= qca8k_get_tag_protocol,
 	.setup			= qca8k_setup,
+	.adjust_link            = qca8k_adjust_link,
 	.get_strings		= qca8k_get_strings,
 	.phy_read		= qca8k_phy_read,
 	.phy_write		= qca8k_phy_write,
@@ -942,6 +984,7 @@ qca8k_sw_probe(struct mdio_device *mdiodev)
 		return -ENOMEM;
 
 	priv->bus = mdiodev->bus;
+	priv->dev = &mdiodev->dev;
 
 	/* read the switches ID register */
 	id = qca8k_read(priv, QCA8K_REG_MASK_CTRL);
diff --git a/drivers/net/dsa/qca8k.h b/drivers/net/dsa/qca8k.h
index 5bda165..613fe5c5 100644
--- a/drivers/net/dsa/qca8k.h
+++ b/drivers/net/dsa/qca8k.h
@@ -167,6 +167,7 @@ struct qca8k_priv {
 	struct ar8xxx_port_status port_sts[QCA8K_NUM_PORTS];
 	struct dsa_switch *ds;
 	struct mutex reg_mutex;
+	struct device *dev;
 };
 
 struct qca8k_mib_desc {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v3 3/7] net: dsa: qca8k: Enable RXMAC when bringing up a port
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>

When a port is brought up/down do not enable/disable only the TXMAC
but the RXMAC as well. This is essential for the CPU port to work.

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v3:
 - none

Changes in v2:
 - Add "Fixes" tag as pointed out by Florian.
 - Add "Reviewed-by" tags from Andrew and Florian.

 drivers/net/dsa/qca8k.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 6a3ffb2..0d224f3 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -516,7 +516,7 @@ qca8k_set_pad_ctrl(struct qca8k_priv *priv, int port, int mode)
 static void
 qca8k_port_set_status(struct qca8k_priv *priv, int port, int enable)
 {
-	u32 mask = QCA8K_PORT_STATUS_TXMAC;
+	u32 mask = QCA8K_PORT_STATUS_TXMAC | QCA8K_PORT_STATUS_RXMAC;
 
 	pr_debug("qca: port %i set status %i\n", port, enable);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v3 2/7] net: dsa: qca8k: Add support for QCA8334 switch
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>

Add support for the four-port variant of the Qualcomm QCA833x switch.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
Changes in v3:
 - Add "Reviewed-by" tag from Andrew

Changes in v2:
 - Add commit message

 drivers/net/dsa/qca8k.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 3684e56..6a3ffb2 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -1010,6 +1010,7 @@ static SIMPLE_DEV_PM_OPS(qca8k_pm_ops,
 			 qca8k_suspend, qca8k_resume);
 
 static const struct of_device_id qca8k_of_match[] = {
+	{ .compatible = "qca,qca8334" },
 	{ .compatible = "qca,qca8337" },
 	{ /* sentinel */ },
 };
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v3 1/7] net: dsa: qca8k: Add QCA8334 binding documentation
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>

Add support for the four-port variant of the Qualcomm QCA833x switch.

The CPU port default link settings can be reconfigured using
a fixed-link sub-node.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
Changes in v3:
 - Correct fixed-link node documentation term: s/property/node.
 - Add "Reviewed-by" tag from Rob and Andrew.

Changes in v2:
 - Add commit message and document fixed-link binding.

 .../devicetree/bindings/net/dsa/qca8k.txt          | 23 +++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
index 9c67ee4..bbcb255 100644
--- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt
+++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
@@ -2,7 +2,10 @@
 
 Required properties:
 
-- compatible: should be "qca,qca8337"
+- compatible: should be one of:
+    "qca,qca8334"
+    "qca,qca8337"
+
 - #size-cells: must be 0
 - #address-cells: must be 1
 
@@ -14,6 +17,20 @@ port and PHY id, each subnode describing a port needs to have a valid phandle
 referencing the internal PHY connected to it. The CPU port of this switch is
 always port 0.
 
+A CPU port node has the following optional node:
+
+- fixed-link            : Fixed-link subnode describing a link to a non-MDIO
+                          managed entity. See
+                          Documentation/devicetree/bindings/net/fixed-link.txt
+                          for details.
+
+For QCA8K the 'fixed-link' sub-node supports only the following properties:
+
+- 'speed' (integer, mandatory), to indicate the link speed. Accepted
+  values are 10, 100 and 1000
+- 'full-duplex' (boolean, optional), to indicate that full duplex is
+  used. When absent, half duplex is assumed.
+
 Example:
 
 
@@ -53,6 +70,10 @@ Example:
 					label = "cpu";
 					ethernet = <&gmac1>;
 					phy-mode = "rgmii";
+					fixed-link {
+						speed = 1000;
+						full-duplex;
+					};
 				};
 
 				port@1 {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v3 0/7] Add support for QCA8334 switch
From: Michal Vokáč @ 2018-05-23  6:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devicetree, f.fainelli, vivien.didelot, andrew,
	mark.rutland, robh+dt, davem, michal.vokac

This series basically adds support for a QCA8334 ethernet switch to the
qca8k driver. It is a four-port variant of the already supported seven
port QCA8337. Register map is the same for the whole familly and all chips
have the same device ID.

Major part of this series enhances the CPU port setting. Currently the CPU
port is not set to any sensible defaults compatible with the xGMII
interface. This series forces the CPU port to its maximum bandwidth and
also allows to adjust the new defaults using fixed-link device tree
sub-node.

Alongside these changes I fixed two checkpatch warnings regarding SPDX and
redundant parentheses.

Changes in v3:
 - Rebased on latest net-next/master.
 - Corrected fixed-link documentation.

Michal Vokáč (7):
  net: dsa: qca8k: Add QCA8334 binding documentation
  net: dsa: qca8k: Add support for QCA8334 switch
  net: dsa: qca8k: Enable RXMAC when bringing up a port
  net: dsa: qca8k: Force CPU port to its highest bandwidth
  net: dsa: qca8k: Allow overwriting CPU port setting
  net: dsa: qca8k: Replace GPL boilerplate by SPDX
  net: dsa: qca8k: Remove redundant parentheses

 .../devicetree/bindings/net/dsa/qca8k.txt          | 23 +++++++-
 drivers/net/dsa/qca8k.c                            | 64 ++++++++++++++++++----
 drivers/net/dsa/qca8k.h                            |  7 ++-
 3 files changed, 79 insertions(+), 15 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Christoph Hellwig @ 2018-05-23  6:19 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Jason Gunthorpe, davem, dledford, Sindhu Devale, netdev,
	linux-rdma, nhorman, sassmann, jogreene, Shiraz Saleem
In-Reply-To: <079ceee3bc8cd0ea50dd7ddc12b27512ca5ac49e.camel@intel.com>

On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> > Why would you want to do this? The rdma driver is non-functional
> > without the ethernet driver, so why on earth would we want to defeat
> > the module dependency mechanism?
> 
> This change is driven by the OSV's like Red Hat, where customer's were
> updating the i40e driver, which in turn broke i40iw.

Doctor it hurts when I do this..

There is no reason to make a mess of our drivers because people are
doing things they should haver never done and that aren't supported
in Linux.

If Intel didn;t offer any out of tree drivers I'm pretty sure no
customer would even attempt this.  So fix this where the problem is.

^ permalink raw reply

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
From: Or Gerlitz @ 2018-05-23  6:15 UTC (permalink / raw)
  To: Huy Nguyen; +Cc: Linux Netdev List
In-Reply-To: <1576e986-6e04-63f3-3569-a105493929a6@mellanox.com>

On Wed, May 23, 2018 at 4:01 AM, Huy Nguyen <huyn@mellanox.com> wrote:
> Dear Jakub, PSB.
> On 5/22/2018 1:32 PM, Jakub Kicinski wrote:

>> Devlink API accommodates requirements of simpler (SwitchX2?) and more
>> advanced schemes (present in Spectrum).  The simpler/basic static
>> threshold configurations is exactly what you are doing here, AFAIU.

> [HQN] Devlink API is tailored specifically for switch. We don't configure
> threshold configuration
> explicitly. It is done via PFC. Once PFC is enabled on priority, threshold
> is setup based on our
> proprietary formula that were tested rigorously for performance.

Huy, please do not prefix your reply lines with your name, it's not needed
and confusing, the email clients used by people in this list do the job.

^ permalink raw reply

* [PATCH net-next] cxgb4: Add new T6 device ids
From: Ganesh Goudar @ 2018-05-23  6:06 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar

Add 0x6088 and 0x6089 device ids for new T6 cards.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index adacc63..c7f8d04 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -212,6 +212,8 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
 	CH_PCI_ID_TABLE_FENTRY(0x6085), /* Custom T6240-SO */
 	CH_PCI_ID_TABLE_FENTRY(0x6086), /* Custom T6225-SO-CR */
 	CH_PCI_ID_TABLE_FENTRY(0x6087), /* Custom T6225-CR */
+	CH_PCI_ID_TABLE_FENTRY(0x6088), /* Custom T62100-CR */
+	CH_PCI_ID_TABLE_FENTRY(0x6089), /* Custom T62100-KR */
 CH_PCI_DEVICE_ID_TABLE_DEFINE_END;
 
 #endif /* __T4_PCI_ID_TBL_H__ */
-- 
2.1.0

^ permalink raw reply related

* [PATCH] net: phy: replace bool members in struct phy_device with bit-fields
From: Heiner Kallweit @ 2018-05-23  6:05 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn; +Cc: netdev@vger.kernel.org

In struct phy_device we have a number of flags being defined as type
bool. Similar to e.g. struct pci_dev we can save some space by using
bit-fields.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 include/linux/phy.h | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 073235e70..6cd090984 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -406,13 +406,17 @@ struct phy_device {
 	u32 phy_id;
 
 	struct phy_c45_device_ids c45_ids;
-	bool is_c45;
-	bool is_internal;
-	bool is_pseudo_fixed_link;
-	bool has_fixups;
-	bool suspended;
-	bool sysfs_links;
-	bool loopback_enabled;
+	unsigned is_c45:1;
+	unsigned is_internal:1;
+	unsigned is_pseudo_fixed_link:1;
+	unsigned has_fixups:1;
+	unsigned suspended:1;
+	unsigned sysfs_links:1;
+	unsigned loopback_enabled:1;
+
+	unsigned autoneg:1;
+	/* The most recently read link state */
+	unsigned link:1;
 
 	enum phy_state state;
 
@@ -429,9 +433,6 @@ struct phy_device {
 	int pause;
 	int asym_pause;
 
-	/* The most recently read link state */
-	int link;
-
 	/* Enabled Interrupts */
 	u32 interrupts;
 
@@ -444,8 +445,6 @@ struct phy_device {
 	/* Energy efficient ethernet modes which should be prohibited */
 	u32 eee_broken_modes;
 
-	int autoneg;
-
 	int link_timeout;
 
 #ifdef CONFIG_LED_TRIGGER_PHY
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
From: Y Song @ 2018-05-23  4:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alban Crequy, netdev, linux-kernel, containers, cgroups,
	Alban Crequy, tj
In-Reply-To: <20180523033550.z3tqo4lhd3zrmtdu@ast-mbp>

On Tue, May 22, 2018 at 8:35 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, May 22, 2018 at 08:33:24PM -0700, Y Song wrote:
>> +       struct cgroup *cgrp = task_dfl_cgroup(current);
>> +       if (!cgrp)
>> +               return -EINVAL;
>
> why this check is needed?

No reason :-) Originally I am concerned whether it is possible cgrp
could be NULL.
By looking at the code, it SEEMS to me that it could not be NULL, but I am not
100% sure (as I am not a cgroup expert). Since you are asking,
probably means it cannot be NULL, so will remove it in formal upstream patch.

^ permalink raw reply

* [PATCH] ppp: remove the PPPIOCDETACH ioctl
From: Eric Biggers @ 2018-05-23  3:59 UTC (permalink / raw)
  To: linux-ppp, Paul Mackerras
  Cc: netdev, linux-fsdevel, linux-kernel, Guillaume Nault,
	syzkaller-bugs, Eric Biggers
In-Reply-To: <20180523032958.GE658@sol.localdomain>

From: Eric Biggers <ebiggers@google.com>

The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea.  It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference.  However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances.  As reported by syzbot, this can trivially
be used to cause a use-after-free.

Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.

All pppd versions released in the last 15 years just close() the file
descriptor instead.

Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices.

Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 Documentation/networking/ppp_generic.txt |  6 -----
 drivers/net/ppp/ppp_generic.c            | 29 ------------------------
 fs/compat_ioctl.c                        |  1 -
 include/uapi/linux/ppp-ioctl.h           |  1 -
 4 files changed, 37 deletions(-)

diff --git a/Documentation/networking/ppp_generic.txt b/Documentation/networking/ppp_generic.txt
index 091d20273dcb..61daf4b39600 100644
--- a/Documentation/networking/ppp_generic.txt
+++ b/Documentation/networking/ppp_generic.txt
@@ -300,12 +300,6 @@ unattached instance are:
 The ioctl calls available on an instance of /dev/ppp attached to a
 channel are:
 
-* PPPIOCDETACH detaches the instance from the channel.  This ioctl is
-  deprecated since the same effect can be achieved by closing the
-  instance.  In order to prevent possible races this ioctl will fail
-  with an EINVAL error if more than one file descriptor refers to this
-  instance (i.e. as a result of dup(), dup2() or fork()).
-
 * PPPIOCCONNECT connects this channel to a PPP interface.  The
   argument should point to an int containing the interface unit
   number.  It will return an EINVAL error if the channel is already
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index dc7c7ec43202..dce8812fe802 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -603,35 +603,6 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		goto out;
 	}
 
-	if (cmd == PPPIOCDETACH) {
-		/*
-		 * We have to be careful here... if the file descriptor
-		 * has been dup'd, we could have another process in the
-		 * middle of a poll using the same file *, so we had
-		 * better not free the interface data structures -
-		 * instead we fail the ioctl.  Even in this case, we
-		 * shut down the interface if we are the owner of it.
-		 * Actually, we should get rid of PPPIOCDETACH, userland
-		 * (i.e. pppd) could achieve the same effect by closing
-		 * this fd and reopening /dev/ppp.
-		 */
-		err = -EINVAL;
-		if (pf->kind == INTERFACE) {
-			ppp = PF_TO_PPP(pf);
-			rtnl_lock();
-			if (file == ppp->owner)
-				unregister_netdevice(ppp->dev);
-			rtnl_unlock();
-		}
-		if (atomic_long_read(&file->f_count) < 2) {
-			ppp_release(NULL, file);
-			err = 0;
-		} else
-			pr_warn("PPPIOCDETACH file->f_count=%ld\n",
-				atomic_long_read(&file->f_count));
-		goto out;
-	}
-
 	if (pf->kind == CHANNEL) {
 		struct channel *pch;
 		struct ppp_channel *chan;
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index ef80085ed564..8285b570d635 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -917,7 +917,6 @@ COMPATIBLE_IOCTL(PPPIOCSDEBUG)
 /* PPPIOCGIDLE is translated */
 COMPATIBLE_IOCTL(PPPIOCNEWUNIT)
 COMPATIBLE_IOCTL(PPPIOCATTACH)
-COMPATIBLE_IOCTL(PPPIOCDETACH)
 COMPATIBLE_IOCTL(PPPIOCSMRRU)
 COMPATIBLE_IOCTL(PPPIOCCONNECT)
 COMPATIBLE_IOCTL(PPPIOCDISCONN)
diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h
index b19a9c249b15..d46caf217ea4 100644
--- a/include/uapi/linux/ppp-ioctl.h
+++ b/include/uapi/linux/ppp-ioctl.h
@@ -106,7 +106,6 @@ struct pppol2tp_ioc_stats {
 #define PPPIOCGIDLE	_IOR('t', 63, struct ppp_idle) /* get idle time */
 #define PPPIOCNEWUNIT	_IOWR('t', 62, int)	/* create new ppp unit */
 #define PPPIOCATTACH	_IOW('t', 61, int)	/* attach to ppp unit */
-#define PPPIOCDETACH	_IOW('t', 60, int)	/* detach from ppp unit/chan */
 #define PPPIOCSMRRU	_IOW('t', 59, int)	/* set multilink MRU */
 #define PPPIOCCONNECT	_IOW('t', 58, int)	/* connect channel to unit */
 #define PPPIOCDISCONN	_IO('t', 57)		/* disconnect channel */
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
From: Alexei Starovoitov @ 2018-05-23  3:35 UTC (permalink / raw)
  To: Y Song
  Cc: Alban Crequy, netdev, linux-kernel, containers, cgroups,
	Alban Crequy, tj
In-Reply-To: <CAH3MdRVdfw52atavT3KL8MpPw7zDM_hR6aUcqDP1PogLn_sH+w@mail.gmail.com>

On Tue, May 22, 2018 at 08:33:24PM -0700, Y Song wrote:
> +       struct cgroup *cgrp = task_dfl_cgroup(current);
> +       if (!cgrp)
> +               return -EINVAL;

why this check is needed?

^ permalink raw reply

* Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
From: Y Song @ 2018-05-23  3:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alban Crequy, netdev, linux-kernel, containers, cgroups,
	Alban Crequy, tj
In-Reply-To: <CAH3MdRWgruVq+3r+2pHTah-c2zTO03vPkepjWDZ0_KrYcroy9A@mail.gmail.com>

I did a quick prototyping and the above interface seems working fine.

The kernel change:
===============

[yhs@localhost bpf-next]$ git diff
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 97446bbe2ca5..669b7383fddb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1976,7 +1976,8 @@ union bpf_attr {
        FN(fib_lookup),                 \
        FN(sock_hash_update),           \
        FN(msg_redirect_hash),          \
-       FN(sk_redirect_hash),
+       FN(sk_redirect_hash),           \
+       FN(get_current_cgroup_id),

 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ce2cbbff27e4..e11e3298f911 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -493,6 +493,21 @@ static const struct bpf_func_proto
bpf_current_task_under_cgroup_proto = {
        .arg2_type      = ARG_ANYTHING,
 };

+BPF_CALL_0(bpf_get_current_cgroup_id)
+{
+       struct cgroup *cgrp = task_dfl_cgroup(current);
+       if (!cgrp)
+               return -EINVAL;
+
+       return cgrp->kn->id.id;
+}
+
+static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
+       .func           = bpf_get_current_cgroup_id,
+       .gpl_only       = false,
+       .ret_type       = RET_INTEGER,
+};
+
 BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
           const void *, unsafe_ptr)
 {
@@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
struct bpf_prog *prog)
                return &bpf_get_prandom_u32_proto;
        case BPF_FUNC_probe_read_str:
                return &bpf_probe_read_str_proto;
+       case BPF_FUNC_get_current_cgroup_id:
+               return &bpf_get_current_cgroup_id_proto;
        default:
                return NULL;
        }

The following program can be used to print out a cgroup id given a cgroup path.
[yhs@localhost cg]$ cat get_cgroup_id.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
    int dirfd, err, flags, mount_id, fhsize;
    struct file_handle *fhp;
    char *pathname;

    if (argc != 2) {
        printf("usage: %s <cgroup_path>\n", argv[0]);
        return 1;
    }

    pathname = argv[1];
    dirfd = AT_FDCWD;
    flags = 0;

    fhsize = sizeof(*fhp);
    fhp = malloc(fhsize);
    if (!fhp)
        return 1;

    err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
    if (err >= 0) {
        printf("error\n");
        return 1;
    }

    fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
    fhp = realloc(fhp, fhsize);
    if (!fhp)
        return 1;

    err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
    if (err < 0)
        perror("name_to_handle_at");
    else {
        int i;

        printf("dir = %s, mount_id = %d\n", pathname, mount_id);
        printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
            fhp->handle_type);
        if (fhp->handle_bytes != 8)
            return 1;

        printf("cgroup_id = 0x%llx\n", *(unsigned long long *)fhp->f_handle);
    }

    return 0;
}
[yhs@localhost cg]$

Given a cgroup path, the user can get cgroup_id and use it in their bpf
program for filtering purpose.

I run a simple program t.c
   int main() { while(1) sleep(1); return 0; }
in the cgroup v2 directory /home/yhs/tmp/yhs
   none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)

$ ./get_cgroup_id /home/yhs/tmp/yhs
dir = /home/yhs/tmp/yhs, mount_id = 124
handle_bytes = 8, handle_type = 1
cgroup_id = 0x1000006b2

// the below command to get cgroup_id from the kernel for the
// process compiled with t.c and ran under /home/yhs/tmp/yhs:
$ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
PID     TID     COMM            FUNC             -
4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
^C[yhs@localhost tools]$

The kernel and user space cgid matches. Will provide a
formal patch later.




On Mon, May 21, 2018 at 5:24 PM, Y Song <ys114321@gmail.com> wrote:
> On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>>>
>>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>>> +{
>>> +     // TODO: pick the correct hierarchy instead of the mem controller
>>> +     struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>>> +
>>> +     if (unlikely(!cgrp))
>>> +             return -EINVAL;
>>> +     if (unlikely(hierarchy))
>>> +             return -EINVAL;
>>> +     if (unlikely(flags))
>>> +             return -EINVAL;
>>> +
>>> +     return cgrp->kn->id.ino;
>>
>> ino only is not enough to identify cgroup. It needs generation number too.
>> I don't quite see how hierarchy and flags can be used in the future.
>> Also why limit it to memcg?
>>
>> How about something like this instead:
>>
>> BPF_CALL_2(bpf_get_current_cgroup_id)
>> {
>>         struct cgroup *cgrp = task_dfl_cgroup(current);
>>
>>         return cgrp->kn->id.id;
>> }
>> The user space can use fhandle api to get the same 64-bit id.
>
> I think this should work. This will also be useful to bcc as user
> space can encode desired id
> in the bpf program and compared that id to the current cgroup id, so we can have
> cgroup level tracing (esp. stat collection) support. To cope with
> cgroup hierarchy, user can use
> cgroup-array based approach or explicitly compare against multiple cgroup id's.

^ permalink raw reply related

* Re: [PATCH] e1000: check the return of pci_get_drvdata() in e1000_remove()
From: Bo Chen @ 2018-05-23  3:31 UTC (permalink / raw)
  Cc: netdev, linux-kernel
In-Reply-To: <20180522184702.64cd62f6@xeon-e3>

Re-send to mailing lists as the previous email was rejected because of
not using plain text.

-----

Hi Stephen,

Thanks for the quick reply and your comments. I am new to network
drivers, and certainly trust your assessment.

I was assuming 'pdev->dev.p' can be NULL with a 'kzalloc()' failure in
'pci_set_drvdata()' invoked by 'e1000_probe()', in which case the
"pci_get_drvdata()" will return a NULL pointer. But I did not trace
back to confirm whether 'pdev->dev.p' has to be valid before
'e1000_probe()' is invoked. Maybe this is what I am missing? Please
excuse my newbie questions and mistakes.

Here is some context about why I started this patch. I am building a
tool to perform "grey-box" fault injection on linux-kernel-module
binaries. In my tool, there is a set of kernel API functions that
faults can be generated, for example making the return of
'__kmalloc()' be NULL. My tool fired an alarm (a kernel panic), when a
fault on ''dev_get_drvdata()" is injected to 'e1000.ko' (and other
drivers were fine, such as e100.ko and pcnet32.ko). It seems that I
was wrong to assume 'dev_get_drvdata()' can return NULL and inject
faults from it. Do you have any suggestions about how I can avoid such
wrong assumptions?

Thanks again for your time and attention. I hope this patch is not
wasting too much effort from the netdev community. Any other comments
or suggestions will be appreciated.

Best Regards,
Bo Chen

On Tue, May 22, 2018 at 6:47 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 22 May 2018 17:17:43 -0700
> Bo Chen <chenbo@pdx.edu> wrote:
>
> > This check on pci_get_drvdata() prevents potential invalid pointer dereferences,
> > and is a common practice in *_remove() functions from other drivers, such as
> > 'intel/e100.c', 'amd/pcnet32.c', 'realtek/8139too.c', and 'broadcom/tg3.c'.
> >
> > Signed-off-by: Bo Chen <chenbo@pdx.edu>
>
> Why check for something that can never be true.
> You are creating dead code paths that can never be exercised.

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] qed*: Add support for management firmware TLV request.
From: David Miller @ 2018-05-23  3:31 UTC (permalink / raw)
  To: sudarsana.kalluru; +Cc: netdev, Ariel.Elior, chad.dupuis, manish.rangankar
In-Reply-To: <20180522072846.2454-1-sudarsana.kalluru@cavium.com>

From: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Date: Tue, 22 May 2018 00:28:36 -0700

> From: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
> 
> Management firmware (MFW) requires config and state information from
> the driver. It queries this via TLV (type-length-value) request wherein
> mfw specificies the list of required TLVs. Driver fills the TLV data
> and responds back to MFW.
> This patch series adds qed/qede/qedf/qedi driver implementation for
> supporting the TLV queries from MFW.
> 
> Changes from previous versions:
> -------------------------------
> v2: Split patch (2) into multiple simpler patches.
> v2: Update qed_tlv_parsed_buf->p_val datatype to void pointer to avoid
>     bunch of unnecessary typecasts.
> 
> Please consider applying this series to "net-next".

Series applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox