Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next v2] tools/bpf: add log_level to bpf_load_program_attr
From: Yonghong Song @ 2019-02-07  7:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev@vger.kernel.org, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team
In-Reply-To: <20190207065330.3krsb3ll5wi4kiz3@ast-mbp.dhcp.thefacebook.com>



On 2/6/19 10:53 PM, Alexei Starovoitov wrote:
> On Wed, Feb 06, 2019 at 10:15:50PM -0800, Yonghong Song wrote:
>> The kernel verifier has three levels of logs:
>>      0: no logs
>>      1: logs mostly useful
>>    > 1: verbose
>>
>> Current libbpf API functions bpf_load_program_xattr() and
>> bpf_load_program() cannot specify log_level.
>> The bcc, however, provides an interface for user to
>> specify log_level 2 for verbose output.
>>
>> This patch added log_level into structure
>> bpf_load_program_attr, so users, including bcc, can use
>> bpf_load_program_xattr() to change log_level. The
>> supported log_level is 0, 1, and 2.
>>
>> The bpf selftest test_sock.c is modified to enable log_level = 2.
>> If the "verbose" in test_sock.c is changed to true,
>> the test will output logs like below:
>>    $ ./test_sock
>>    func#0 @0
>>    0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
>>    0: (bf) r6 = r1
>>    1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
>>    1: (61) r7 = *(u32 *)(r6 +28)
>>    invalid bpf_context access off=28 size=4
>>
>>    Test case: bind4 load with invalid access: src_ip6 .. [PASS]
>>    ...
>>    Test case: bind6 allow all .. [PASS]
>>    Summary: 16 PASSED, 0 FAILED
>>
>> Some test_sock tests are negative tests and verbose verifier
>> log will be printed out as shown in the above.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   tools/lib/bpf/bpf.c                     | 7 ++++++-
>>   tools/lib/bpf/bpf.h                     | 1 +
>>   tools/testing/selftests/bpf/test_sock.c | 9 ++++++++-
>>   3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> Changelog:
>>    v1 -> v2:
>>      . make log_level as the last member of struct bpf_load_program_attr.
>>      . return -EINVAL if bpf_load_program_attr.log_level > 2.
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 3defad77dc7a..6734c167279d 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -214,12 +214,17 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>   {
>>   	void *finfo = NULL, *linfo = NULL;
>>   	union bpf_attr attr;
>> +	__u32 log_level;
>>   	__u32 name_len;
>>   	int fd;
>>   
>>   	if (!load_attr)
>>   		return -EINVAL;
>>   
>> +	log_level = load_attr->log_level;
>> +	if (log_level > 2)
>> +		return -EINVAL;
>> +
>>   	name_len = load_attr->name ? strlen(load_attr->name) : 0;
>>   
>>   	bzero(&attr, sizeof(attr));
>> @@ -292,7 +297,7 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>   	/* Try again with log */
>>   	attr.log_buf = ptr_to_u64(log_buf);
>>   	attr.log_size = log_buf_sz;
>> -	attr.log_level = 1;
>> +	attr.log_level = (log_level == 2) ? log_level : 1;
> 
> I think that if user specified non zero log_level in xattr
> they probably want to get the log even when program was successfully loaded?
> Whereas above hunk will make an effect only when program is rejected.

In most cases, user wants to get log only when error happens.
But user specifying log_level=1 && non-null log_buf could indicate
intention to get the log even when successful.

To support all use cases regarding to log_level, we can do:
   . if log_level = 0, log_buf != NULL, existing behavior
     first without log; if fails, try with log_level=1.
   . if log_level=1/2, only one try with corresponding log_level.

> In addition non-zero log_level and !log_buf should be immediate error
> without calling kernel?

This makes sense. log_level, log_buf and log_buf_sz all need to be
consistent. The consistency of log_buf and log_buf_sz is not currently 
checked and left to kernel, so I did the same for log_level. I can add 
checks for all of them.

^ permalink raw reply

* Re: [PATCH 1/2] xsk: do not use mmap_sem
From: Björn Töpel @ 2019-02-07  7:43 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, linux-mm, LKML, David S . Miller, Bjorn Topel,
	Magnus Karlsson, Netdev, Davidlohr Bueso
In-Reply-To: <20190207053740.26915-2-dave@stgolabs.net>

Den tors 7 feb. 2019 kl 06:38 skrev Davidlohr Bueso <dave@stgolabs.net>:
>
> Holding mmap_sem exclusively for a gup() is an overkill.
> Lets replace the call for gup_fast() and let the mm take
> it if necessary.
>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Bjorn Topel <bjorn.topel@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> CC: netdev@vger.kernel.org
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  net/xdp/xdp_umem.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 5ab236c5c9a5..25e1e76654a8 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -265,10 +265,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
>         if (!umem->pgs)
>                 return -ENOMEM;
>
> -       down_write(&current->mm->mmap_sem);
> -       npgs = get_user_pages(umem->address, umem->npgs,
> -                             gup_flags, &umem->pgs[0], NULL);
> -       up_write(&current->mm->mmap_sem);
> +       npgs = get_user_pages_fast(umem->address, umem->npgs,
> +                                  gup_flags, &umem->pgs[0]);
>

Thanks for the patch!

The lifetime of the pinning is similar to RDMA umem mapping, so isn't
gup_longterm preferred?


Björn

>         if (npgs != umem->npgs) {
>                 if (npgs >= 0) {
> --
> 2.16.4
>

^ permalink raw reply

* Re: [PATCH bpf-next 1/6] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
From: Martin Lau @ 2019-02-07  7:27 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev@vger.kernel.org, Alexei Starovoitov, Kernel Team,
	Lawrence Brakmo
In-Reply-To: <20190205005029.yxowrbz4aiht7jhm@kafai-mbp>

On Mon, Feb 04, 2019 at 04:50:32PM -0800, Martin Lau wrote:
> On Mon, Feb 04, 2019 at 11:33:28PM +0100, Daniel Borkmann wrote:
> > Hi Martin,
> > 
> > On 02/01/2019 08:03 AM, Martin KaFai Lau wrote:
> > > In kernel, it is common to check "!skb->sk && sk_fullsock(skb->sk)"
> > > before accessing the fields in sock.  For example, in __netdev_pick_tx:
> > > 
> > > static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
> > > 			    struct net_device *sb_dev)
> > > {
> > > 	/* ... */
> > > 
> > > 	struct sock *sk = skb->sk;
> > > 
> > > 		if (queue_index != new_index && sk &&
> > > 		    sk_fullsock(sk) &&
> > > 		    rcu_access_pointer(sk->sk_dst_cache))
> > > 			sk_tx_queue_set(sk, new_index);
> > > 
> > > 	/* ... */
> > > 
> > > 	return queue_index;
> > > }
> > > 
> > > This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
> > > where a few of the convert_ctx_access() in filter.c has already been
> > > accessing the skb->sk sock_common's fields,
> > > e.g. sock_ops_convert_ctx_access().
> > > 
> > > "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
> > > Some of the fileds in "bpf_sock" will not be directly
> > > accessible through the "__sk_buff->sk" pointer.  It is limited
> > > by the new "bpf_sock_common_is_valid_access()".
> > > e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
> > >      are not allowed.
> > > 
> > > The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
> > > can be used to get a sk with all accessible fields in "bpf_sock".
> > > This helper is added to both cg_skb and sched_(cls|act).
> > > 
> > > int cg_skb_foo(struct __sk_buff *skb) {
> > > 	struct bpf_sock *sk;
> > > 	__u32 family;
> > > 
> > > 	sk = skb->sk;
> > > 	if (!sk)
> > > 		return 1;
> > > 
> > > 	sk = bpf_sk_fullsock(sk);
> > > 	if (!sk)
> > > 		return 1;
> > > 
> > > 	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
> > > 		return 1;
> > > 
> > > 	/* some_traffic_shaping(); */
> > > 
> > > 	return 1;
> > > }
> > > 
> > > (1) The sk is read only
> > > 
> > > (2) There is no new "struct bpf_sock_common" introduced.
> > > 
> > > (3) Future kernel sock's members could be added to bpf_sock only
> > >     instead of repeatedly adding at multiple places like currently
> > >     in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
> > > 
> > > (4) After "sk = skb->sk", the reg holding sk is in type
> > >     PTR_TO_SOCK_COMMON_OR_NULL.
> > > 
> > > (5) After bpf_sk_fullsock(), the return type will be in type
> > >     PTR_TO_SOCKET_OR_NULL which is the same as the return type of
> > >     bpf_sk_lookup_xxx().
> > > 
> > >     However, bpf_sk_fullsock() does not take refcnt.  The
> > >     acquire_reference_state() is only depending on the return type now.
> > >     To avoid it, a new is_acquire_function() is checked before calling
> > >     acquire_reference_state().
> > 
> > Bit unfortunate that a helper like bpf_sk_fullsock() would be needed, after
> > all this is more of an implementation detail which we would expose here to
> > the developer.
> > 
> > Is there a specific reason why fetching skb->sk couldn't already be of the
> > type PTR_TO_SOCKET_OR_NULL such that the bpf_sk_fullsock() step wouldn't be
> > needed and most logic we have today could already be reused (modulo refcnt
> > avoidance)?
> Not all running context has a fullsock (PTR_TO_SOCKET_OR_NULL).
> 
> Based on how sk_to_full_sk() is used (e.g. in bpf_get_socket_uid()),
> not sure a sk (e.g. tw sock) can always be traced back to a full sk.
> 
> In term of the patch implementation, it is not much difference.  It is a bit
> simplier without bpf_sk_fullsock() and PTR_TO_SOCK_COMMON(_OR_NULL) but
> not a lot.  I have tried both.
> 
> The "fullsock" has already been exposed in another form.
> e.g. In sock_ops, the tcp_sock fields is not read if it is not a fullsock
> while other sock_common fields will still be available.  The bpf_prog
> can test the sock_ops->is_fullsock for what to do.
> 
> > 
> > In particular, do you need the skb->sk without the full-sk part somewhere
> > (e.g. in tw socks)? Why not doing something like sk_to_full_sk() inside the
> > helper or even better as BPF ctx rewrite upon skb->sk to fetch the full sk
> > parent where you could also access remaining bpf_sock fields?
> I am thinking more on what if the bpf_prog only needs the fields from
> sock_common (e.g. the src/dst ip/port) and skb already has
> other needed info (e.g. protocol/mark/priority).
> Enforing skb->sk must be a fullsock will unnecessarily limit those
> bpf_prog from seeing all skb.
> 
> A "struct bpf_common_sock" could be added instead vs a bpf_sk_fullsock()
> tester.  I think having one "struct bpf_sock" is better and less confusing.
> 
> Later, for the running context that is sure to have a fullsock,
> skb->sk can directly have PTR_TO_SOCKET_OR_NULL instead of
> PTR_TO_SOCK_COMMON_OR_NULL.
> 
Following up the discussion in the iovisor conf call.

One of discussion was about:
other than tw, can __sk_buff->sk always return a
fullsock (PTR_TO_SOCKET_OR_NULL).  In request_sock case,
it is doable because it can trace back to the listener sock.

However, that will go back to the sock_common accessing question.
In particular, how to access the sock_common's fields of the
request_sock itself?  Those fields in the request_sock are different
from its listener sock.  e.g. the skc_daddr and skc_dport.

Also, if the sock_common fields of tw is needed, it will become weird
because likely a new "struct bpf_tw_sock" is needed which is OK
but all sock_common fields need to be copied from bpf_sock
to bpf_tw_sock.

I think reading a sk from a ctx should return the
most basic type PTR_TO_SOCK_COMMON_OR_NULL (unless the running
ctx can guarantee that it always has a fullsock).
Currently, it is __sk_buff->sk.  Later, sock_ops->sk...etc.
One single 'struct bpf_sock' and limit fullsock field access
at verification time.  The bpf_prog then moves down the chain
based on what it needs.  It could be fullsock, tcp_sock...etc.

I think that will be the most flexible way to write bpf_prog
while also avoid having duplicate fields in different
bpf struct in uapi.

^ permalink raw reply

* Re: [PATCH bpf-next v2] tools/bpf: add log_level to bpf_load_program_attr
From: Alexei Starovoitov @ 2019-02-07  6:53 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20190207061550.1045583-1-yhs@fb.com>

On Wed, Feb 06, 2019 at 10:15:50PM -0800, Yonghong Song wrote:
> The kernel verifier has three levels of logs:
>     0: no logs
>     1: logs mostly useful
>   > 1: verbose
> 
> Current libbpf API functions bpf_load_program_xattr() and
> bpf_load_program() cannot specify log_level.
> The bcc, however, provides an interface for user to
> specify log_level 2 for verbose output.
> 
> This patch added log_level into structure
> bpf_load_program_attr, so users, including bcc, can use
> bpf_load_program_xattr() to change log_level. The
> supported log_level is 0, 1, and 2.
> 
> The bpf selftest test_sock.c is modified to enable log_level = 2.
> If the "verbose" in test_sock.c is changed to true,
> the test will output logs like below:
>   $ ./test_sock
>   func#0 @0
>   0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
>   0: (bf) r6 = r1
>   1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
>   1: (61) r7 = *(u32 *)(r6 +28)
>   invalid bpf_context access off=28 size=4
> 
>   Test case: bind4 load with invalid access: src_ip6 .. [PASS]
>   ...
>   Test case: bind6 allow all .. [PASS]
>   Summary: 16 PASSED, 0 FAILED
> 
> Some test_sock tests are negative tests and verbose verifier
> log will be printed out as shown in the above.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  tools/lib/bpf/bpf.c                     | 7 ++++++-
>  tools/lib/bpf/bpf.h                     | 1 +
>  tools/testing/selftests/bpf/test_sock.c | 9 ++++++++-
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> Changelog:
>   v1 -> v2:
>     . make log_level as the last member of struct bpf_load_program_attr.
>     . return -EINVAL if bpf_load_program_attr.log_level > 2.
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 3defad77dc7a..6734c167279d 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -214,12 +214,17 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>  {
>  	void *finfo = NULL, *linfo = NULL;
>  	union bpf_attr attr;
> +	__u32 log_level;
>  	__u32 name_len;
>  	int fd;
>  
>  	if (!load_attr)
>  		return -EINVAL;
>  
> +	log_level = load_attr->log_level;
> +	if (log_level > 2)
> +		return -EINVAL;
> +
>  	name_len = load_attr->name ? strlen(load_attr->name) : 0;
>  
>  	bzero(&attr, sizeof(attr));
> @@ -292,7 +297,7 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>  	/* Try again with log */
>  	attr.log_buf = ptr_to_u64(log_buf);
>  	attr.log_size = log_buf_sz;
> -	attr.log_level = 1;
> +	attr.log_level = (log_level == 2) ? log_level : 1;

I think that if user specified non zero log_level in xattr
they probably want to get the log even when program was successfully loaded?
Whereas above hunk will make an effect only when program is rejected.
In addition non-zero log_level and !log_buf should be immediate error
without calling kernel?


^ permalink raw reply

* Re: [PATCH] ipmr: ip6mr: Create new sockopt to clear mfc cache only
From: Nikolay Aleksandrov @ 2019-02-07  6:32 UTC (permalink / raw)
  To: Callum Sinclair, davem, kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20190207020828.21854-2-callum.sinclair@alliedtelesis.co.nz>

On 07/02/2019 04:08, Callum Sinclair wrote:
> Currently the only way to clear the mfc cache was to delete the entries
> one by one using the MRT_DEL_MFC socket option or to destroy and
> recreate the socket.
> 
> Create a new socket option which will clear the multicast forwarding
> cache on the socket without destroying the socket.
> 
> Signed-off-by: Callum Sinclair <callum.sinclair@alliedtelesis.co.nz>
> ---
>  include/uapi/linux/mroute.h  |  7 +++-
>  include/uapi/linux/mroute6.h |  7 +++-
>  net/ipv4/ipmr.c              | 69 +++++++++++++++++++++-------------
>  net/ipv6/ip6mr.c             | 73 ++++++++++++++++++++++--------------
>  4 files changed, 99 insertions(+), 57 deletions(-)
> 

Hi,
Thanks for working on this. I think you missed one comment, this still seems
to clean all tables even though the socket has a table assigned. Could it
act only on that table ? All of the MRT calls besides the init act only on
the initialized table. 
Also you're not checking if optlen is proper size, and I wonder which kernel is this
based on ? Because in net-next ip_mroute_setsockopt() takes rtnl in the beginning
and releases it in the end with the exception of MRT_DONE which needs to release it
earlier, the code below would cause a deadlock trying to get rtnl again in MRT_FLUSH.
This patch should be targeted at net-next, please indicate that also in your subject:
e.g. [PATCH net-next].

Thanks,
 Nik

> diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
> index 5d37a9ccce63..2d475edc3ec3 100644
> --- a/include/uapi/linux/mroute.h
> +++ b/include/uapi/linux/mroute.h
> @@ -28,12 +28,17 @@
>  #define MRT_TABLE	(MRT_BASE+9)	/* Specify mroute table ID		*/
>  #define MRT_ADD_MFC_PROXY	(MRT_BASE+10)	/* Add a (*,*|G) mfc entry	*/
>  #define MRT_DEL_MFC_PROXY	(MRT_BASE+11)	/* Del a (*,*|G) mfc entry	*/
> -#define MRT_MAX		(MRT_BASE+11)
> +#define MRT_FLUSH	(MRT_BASE+12)	/* Flush all multicast entries and vifs	*/
> +#define MRT_MAX		(MRT_BASE+12)
>  
>  #define SIOCGETVIFCNT	SIOCPROTOPRIVATE	/* IP protocol privates */
>  #define SIOCGETSGCNT	(SIOCPROTOPRIVATE+1)
>  #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
>  
> +/* Flags used for MRT_FLUSH */
> +#define MRT_FLUSH_ENTRIES	1	/* For flushing all multicast entries */
> +#define MRT_FLUSH_VIFS		2	/* For flushing all multicast vifs */
> +
>  #define MAXVIFS		32
>  typedef unsigned long vifbitmap_t;	/* User mode code depends on this lot */
>  typedef unsigned short vifi_t;
> diff --git a/include/uapi/linux/mroute6.h b/include/uapi/linux/mroute6.h
> index 9999cc006390..b04094d997c8 100644
> --- a/include/uapi/linux/mroute6.h
> +++ b/include/uapi/linux/mroute6.h
> @@ -31,12 +31,17 @@
>  #define MRT6_TABLE	(MRT6_BASE+9)	/* Specify mroute table ID		*/
>  #define MRT6_ADD_MFC_PROXY	(MRT6_BASE+10)	/* Add a (*,*|G) mfc entry	*/
>  #define MRT6_DEL_MFC_PROXY	(MRT6_BASE+11)	/* Del a (*,*|G) mfc entry	*/
> -#define MRT6_MAX	(MRT6_BASE+11)
> +#define MRT6_FLUSH	(MRT6_BASE+12)	/* Flush all multicast entries and vifs	*/
> +#define MRT6_MAX	(MRT6_BASE+12)
>  
>  #define SIOCGETMIFCNT_IN6	SIOCPROTOPRIVATE	/* IP protocol privates */
>  #define SIOCGETSGCNT_IN6	(SIOCPROTOPRIVATE+1)
>  #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
>  
> +/* Flags used for MRT6_FLUSH*/
> +#define MRT6_FLUSH_ENTRIES	1	/* For flushing all multicast entries */
> +#define MRT6_FLUSH_VIFS		2	/* For flushing all multicast vifs */
> +
>  #define MAXMIFS		32
>  typedef unsigned long mifbitmap_t;	/* User mode code depends on this lot */
>  typedef unsigned short mifi_t;
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index ddbf8c9a1abb..2eb569138569 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -416,7 +416,7 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 id)
>  static void ipmr_free_table(struct mr_table *mrt)
>  {
>  	del_timer_sync(&mrt->ipmr_expire_timer);
> -	mroute_clean_tables(mrt, true);
> +	mroute_clean_tables(mrt, true, MRT_FLUSH_VIFS | MRT_FLUSH_ENTRIES);
>  	rhltable_destroy(&mrt->mfc_hash);
>  	kfree(mrt);
>  }
> @@ -1299,44 +1299,48 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
>  }
>  
>  /* Close the multicast socket, and clear the vif tables etc */
> -static void mroute_clean_tables(struct mr_table *mrt, bool all)
> +static void mroute_clean_tables(struct mr_table *mrt, bool all, int flags)
>  {
>  	struct net *net = read_pnet(&mrt->net);
> -	struct mr_mfc *c, *tmp;
>  	struct mfc_cache *cache;
> +	struct mr_mfc *c, *tmp;
>  	LIST_HEAD(list);
>  	int i;
>  
>  	/* Shut down all active vif entries */
> -	for (i = 0; i < mrt->maxvif; i++) {
> -		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
> -			continue;
> -		vif_delete(mrt, i, 0, &list);
> +	if (flags & MRT_FLUSH_VIFS) {
> +		for (i = 0; i < mrt->maxvif; i++) {
> +			if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
> +				continue;
> +				vif_delete(mrt, i, 0, &list);
> +		}
> +		unregister_netdevice_many(&list);
>  	}
> -	unregister_netdevice_many(&list);
>  
>  	/* Wipe the cache */
> -	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
> -		if (!all && (c->mfc_flags & MFC_STATIC))
> -			continue;
> -		rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
> -		list_del_rcu(&c->list);
> -		cache = (struct mfc_cache *)c;
> -		call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
> -					      mrt->id);
> -		mroute_netlink_event(mrt, cache, RTM_DELROUTE);
> -		mr_cache_put(c);
> -	}
> -
> -	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
> -		spin_lock_bh(&mfc_unres_lock);
> -		list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
> -			list_del(&c->list);
> +	if (flags & MRT_FLUSH_ENTRIES) {
> +		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
> +			if (!all && (c->mfc_flags & MFC_STATIC))
> +				continue;
> +			rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
> +			list_del_rcu(&c->list);
>  			cache = (struct mfc_cache *)c;
> +			call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
> +										  mrt->id);
>  			mroute_netlink_event(mrt, cache, RTM_DELROUTE);
> -			ipmr_destroy_unres(mrt, cache);
> +			mr_cache_put(c);
> +		}
> +
> +		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
> +			spin_lock_bh(&mfc_unres_lock);
> +			list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
> +				list_del(&c->list);
> +				cache = (struct mfc_cache *)c;
> +				mroute_netlink_event(mrt, cache, RTM_DELROUTE);
> +				ipmr_destroy_unres(mrt, cache);
> +			}
> +			spin_unlock_bh(&mfc_unres_lock);
>  		}
> -		spin_unlock_bh(&mfc_unres_lock);
>  	}
>  }
>  
> @@ -1357,7 +1361,7 @@ static void mrtsock_destruct(struct sock *sk)
>  						    NETCONFA_IFINDEX_ALL,
>  						    net->ipv4.devconf_all);
>  			RCU_INIT_POINTER(mrt->mroute_sk, NULL);
> -			mroute_clean_tables(mrt, false);
> +			mroute_clean_tables(mrt, false, MRT_FLUSH_VIFS | MRT_FLUSH_ENTRIES);
>  		}
>  	}
>  	rtnl_unlock();
> @@ -1482,6 +1486,17 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
>  					   sk == rtnl_dereference(mrt->mroute_sk),
>  					   parent);
>  		break;
> +	case MRT_FLUSH:
> +		if (get_user(val, (int __user *)optval)) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +		rtnl_lock();
> +		ipmr_for_each_table(mrt, net) {
> +			mroute_clean_tables(mrt, true, val);
> +		}
> +		rtnl_unlock();
> +		break;
>  	/* Control PIM assert. */
>  	case MRT_ASSERT:
>  		if (optlen != sizeof(val)) {
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index 30337b38274b..473c83d197fe 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -393,7 +393,7 @@ static struct mr_table *ip6mr_new_table(struct net *net, u32 id)
>  static void ip6mr_free_table(struct mr_table *mrt)
>  {
>  	del_timer_sync(&mrt->ipmr_expire_timer);
> -	mroute_clean_tables(mrt, true);
> +	mroute_clean_tables(mrt, true, MRT6_FLUSH_VIFS | MRT6_FLUSH_ENTRIES);
>  	rhltable_destroy(&mrt->mfc_hash);
>  	kfree(mrt);
>  }
> @@ -1496,43 +1496,47 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
>   *	Close the multicast socket, and clear the vif tables etc
>   */
>  
> -static void mroute_clean_tables(struct mr_table *mrt, bool all)
> +static void mroute_clean_tables(struct mr_table *mrt, bool all, int flags)
>  {
>  	struct mr_mfc *c, *tmp;
>  	LIST_HEAD(list);
>  	int i;
>  
>  	/* Shut down all active vif entries */
> -	for (i = 0; i < mrt->maxvif; i++) {
> -		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
> -			continue;
> -		mif6_delete(mrt, i, 0, &list);
> +	if (flags & MRT6_FLUSH_VIFS) {
> +		for (i = 0; i < mrt->maxvif; i++) {
> +			if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
> +				continue;
> +			mif6_delete(mrt, i, 0, &list);
> +		}
> +		unregister_netdevice_many(&list);
>  	}
> -	unregister_netdevice_many(&list);
>  
>  	/* Wipe the cache */
> -	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
> -		if (!all && (c->mfc_flags & MFC_STATIC))
> -			continue;
> -		rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params);
> -		list_del_rcu(&c->list);
> -		mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE);
> -		mr_cache_put(c);
> -	}
> +	if (flags & MRT6_FLUSH_ENTRIES) {
> +		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
> +			if (!all && (c->mfc_flags & MFC_STATIC))
> +				continue;
> +			rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params);
> +			list_del_rcu(&c->list);
> +			mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE);
> +			mr_cache_put(c);
> +		}
>  
> -	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
> -		spin_lock_bh(&mfc_unres_lock);
> -		list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
> -			list_del(&c->list);
> -			call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net),
> -						       FIB_EVENT_ENTRY_DEL,
> -						       (struct mfc6_cache *)c,
> -						       mrt->id);
> -			mr6_netlink_event(mrt, (struct mfc6_cache *)c,
> -					  RTM_DELROUTE);
> -			ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c);
> +		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
> +			spin_lock_bh(&mfc_unres_lock);
> +			list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
> +				list_del(&c->list);
> +				call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net),
> +											   FIB_EVENT_ENTRY_DEL,
> +											   (struct mfc6_cache *)c,
> +											   mrt->id);
> +				mr6_netlink_event(mrt, (struct mfc6_cache *)c,
> +						  RTM_DELROUTE);
> +				ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c);
> +			}
> +			spin_unlock_bh(&mfc_unres_lock);
>  		}
> -		spin_unlock_bh(&mfc_unres_lock);
>  	}
>  }
>  
> @@ -1588,7 +1592,7 @@ int ip6mr_sk_done(struct sock *sk)
>  						     NETCONFA_IFINDEX_ALL,
>  						     net->ipv6.devconf_all);
>  
> -			mroute_clean_tables(mrt, false);
> +			mroute_clean_tables(mrt, false, MRT6_FLUSH_VIFS | MRT6_FLUSH_ENTRIES);
>  			err = 0;
>  			break;
>  		}
> @@ -1703,6 +1707,19 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
>  					    parent);
>  		rtnl_unlock();
>  		return ret;
> +	case MRT6_DEL_MFC_ALL:
> +	{
> +		int flags;
> +
> +		if (get_user(flags, (int __user *)optval))
> +			return -EFAULT;
> +		rtnl_lock();
> +		ip6mr_for_each_table(mrt, net) {
> +			mroute_clean_tables(mrt, true, flags);
> +		}
> +		rtnl_unlock();
> +		return 0;
> +	}
>  
>  	/*
>  	 *	Control PIM assert (to activate pim will activate assert)
> 


^ permalink raw reply

* [PATCH bpf-next v2] tools/bpf: add log_level to bpf_load_program_attr
From: Yonghong Song @ 2019-02-07  6:15 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Yonghong Song

The kernel verifier has three levels of logs:
    0: no logs
    1: logs mostly useful
  > 1: verbose

Current libbpf API functions bpf_load_program_xattr() and
bpf_load_program() cannot specify log_level.
The bcc, however, provides an interface for user to
specify log_level 2 for verbose output.

This patch added log_level into structure
bpf_load_program_attr, so users, including bcc, can use
bpf_load_program_xattr() to change log_level. The
supported log_level is 0, 1, and 2.

The bpf selftest test_sock.c is modified to enable log_level = 2.
If the "verbose" in test_sock.c is changed to true,
the test will output logs like below:
  $ ./test_sock
  func#0 @0
  0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  0: (bf) r6 = r1
  1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  1: (61) r7 = *(u32 *)(r6 +28)
  invalid bpf_context access off=28 size=4

  Test case: bind4 load with invalid access: src_ip6 .. [PASS]
  ...
  Test case: bind6 allow all .. [PASS]
  Summary: 16 PASSED, 0 FAILED

Some test_sock tests are negative tests and verbose verifier
log will be printed out as shown in the above.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/bpf.c                     | 7 ++++++-
 tools/lib/bpf/bpf.h                     | 1 +
 tools/testing/selftests/bpf/test_sock.c | 9 ++++++++-
 3 files changed, 15 insertions(+), 2 deletions(-)

Changelog:
  v1 -> v2:
    . make log_level as the last member of struct bpf_load_program_attr.
    . return -EINVAL if bpf_load_program_attr.log_level > 2.

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3defad77dc7a..6734c167279d 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -214,12 +214,17 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 {
 	void *finfo = NULL, *linfo = NULL;
 	union bpf_attr attr;
+	__u32 log_level;
 	__u32 name_len;
 	int fd;
 
 	if (!load_attr)
 		return -EINVAL;
 
+	log_level = load_attr->log_level;
+	if (log_level > 2)
+		return -EINVAL;
+
 	name_len = load_attr->name ? strlen(load_attr->name) : 0;
 
 	bzero(&attr, sizeof(attr));
@@ -292,7 +297,7 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 	/* Try again with log */
 	attr.log_buf = ptr_to_u64(log_buf);
 	attr.log_size = log_buf_sz;
-	attr.log_level = 1;
+	attr.log_level = (log_level == 2) ? log_level : 1;
 	log_buf[0] = 0;
 	fd = sys_bpf_prog_load(&attr, sizeof(attr));
 done:
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index ed09eed2dc3b..6ffdd79bea89 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -85,6 +85,7 @@ struct bpf_load_program_attr {
 	__u32 line_info_rec_size;
 	const void *line_info;
 	__u32 line_info_cnt;
+	__u32 log_level;
 };
 
 /* Flags to direct loading requirements */
diff --git a/tools/testing/selftests/bpf/test_sock.c b/tools/testing/selftests/bpf/test_sock.c
index 561ffb6d6433..fb679ac3d4b0 100644
--- a/tools/testing/selftests/bpf/test_sock.c
+++ b/tools/testing/selftests/bpf/test_sock.c
@@ -20,6 +20,7 @@
 #define MAX_INSNS	512
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
+static bool verbose = false;
 
 struct sock_test {
 	const char *descr;
@@ -325,6 +326,7 @@ static int load_sock_prog(const struct bpf_insn *prog,
 			  enum bpf_attach_type attach_type)
 {
 	struct bpf_load_program_attr attr;
+	int ret;
 
 	memset(&attr, 0, sizeof(struct bpf_load_program_attr));
 	attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
@@ -332,8 +334,13 @@ static int load_sock_prog(const struct bpf_insn *prog,
 	attr.insns = prog;
 	attr.insns_cnt = probe_prog_length(attr.insns);
 	attr.license = "GPL";
+	attr.log_level = 2;
 
-	return bpf_load_program_xattr(&attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
+	ret = bpf_load_program_xattr(&attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
+	if (verbose && ret < 0)
+		fprintf(stderr, "%s\n", bpf_log_buf);
+
+	return ret;
 }
 
 static int attach_sock_prog(int cgfd, int progfd,
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] net: stmmac: Variable "val" in function sun8i_dwmac_set_syscon() could be uninitialized
From: Yizhuo Zhai @ 2019-02-07  5:53 UTC (permalink / raw)
  To: David Miller
  Cc: Chengyu Song, Zhiyun Qian, Giuseppe Cavallaro, Alexandre Torgue,
	Maxime Ripard, Chen-Yu Tsai, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <CABvMjLQ6MYtGYeLxwceZsvcyn4oScgMo+BGQMHw7SkZ1uxFmHQ@mail.gmail.com>

Thanks, but why initialization matters here? Is performance the main concern?


On Wed, Feb 6, 2019 at 9:52 PM Yizhuo Zhai <yzhai003@ucr.edu> wrote:
>
> Thanks, but why initialization matters here? Is performance the main concern?
>
> On Wed, Feb 6, 2019 at 8:17 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Yizhuo <yzhai003@ucr.edu>
>> Date: Tue,  5 Feb 2019 14:15:59 -0800
>>
>> > @@ -639,9 +639,14 @@ static int sun8i_dwmac_set_syscon(struct stmmac_priv *priv)
>> >       struct sunxi_priv_data *gmac = priv->plat->bsp_priv;
>> >       struct device_node *node = priv->device->of_node;
>> >       int ret;
>> > -     u32 reg, val;
>> > +     u32 reg, val = 0;
>> > +
>> > +     ret = regmap_read(gmac->regmap, SYSCON_EMAC_REG, &val);
>> > +     if (ret) {
>> > +             dev_err(priv->device, "Fail to read SYSCON_EMAC_REG.\n");
>> > +             return ret;
>> > +     }
>>
>> I agree with the other reviewer that since you check 'ret' the initialization of
>> 'val' is no longer needed.
>
>
>
> --
> Kind Regards,
>
> Yizhuo Zhai
>
> Computer Science, Graduate Student
> University of California, Riverside



-- 
Kind Regards,

Yizhuo Zhai

Computer Science, Graduate Student
University of California, Riverside

^ permalink raw reply

* Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros
From: Eli Britstein @ 2019-02-07  5:47 UTC (permalink / raw)
  To: Ben Pfaff, Gregory Rose
  Cc: David Miller, yihung.wei@gmail.com, dev@openvswitch.org,
	netdev@vger.kernel.org, simon.horman@netronome.com
In-Reply-To: <20190205202316.GS27607@ovn.org>


On 2/5/2019 10:23 PM, Ben Pfaff wrote:
> On Tue, Feb 05, 2019 at 10:22:09AM -0800, Gregory Rose wrote:
>> On 2/5/2019 4:02 AM, Eli Britstein wrote:
>>> On 2/4/2019 10:07 PM, David Miller wrote:
>>>> From: Yi-Hung Wei <yihung.wei@gmail.com>
>>>> Date: Mon, 4 Feb 2019 10:47:18 -0800
>>>>
>>>>> For example, to see how 'struct ovs_key_ipv6' is defined, now we need
>>>>> to trace how OVS_KEY_IPV6_FIELDS is defined, and how OVS_KEY_FIELD_ARR
>>>>> and OVS_KEY_FIELD defined.  I think it makes the header file to be
>>>>> more complicated.
>>>> I completely agree.
>>>>
>>>> Unless this is totally unavoidable, I do not want to apply a patch
>>>> which makes reading and auditing the networking code more difficult.
>>> This technique is discussed for example in
>>> https://stackoverflow.com/questions/6635851/real-world-use-of-x-macros,
>>> and I found existing examples of using it in the kernel tree:
>>>
>>> __BPF_FUNC_MAPPER in commit ebb676daa1a34 ("bpf: Print function name in
>>> addition to function id")
>>>
>>> __AAL_STAT_ITEMS and __SONET_ITEMS in commit 607ca46e97a1b ("UAPI:
>>> (Scripted) Disintegrate include/linux"), the successor of commit
>>> 1da177e4c3f4 ("Linux-2.6.12-rc2"). I didn't dig deeper to the past.
>>>
>>> I can agree it makes that H file a bit more complicated, but for sure
>>> less than ## macros that are widely used.
>>>
>>> However, I think the alternatives of generating such defines by some
>>> scripts, or having the fields in more than one place are even worse, so
>>> it is a kind of unavoidable.
>> Why is using a script to generate defines for the requirements of your
>> application or driver so bad?  Your patch
>> turns openvswitch.h into some hardly readable code - using a script and
>> having a machine output the macros
>> your application or driver needs seems like a much better alternative to me.
OK, let's abandon this patch. I'll go with the script alternative.
> In addition, one of the reasons that developers prefer to avoid
> duplication is because of the need to synchronize copies when one of
> them changes.  This doesn't apply in the same way to these structures,
> because they are ABIs that will not change.
This is correct, but still, though it is not likely to change, it might 
will, so I think we must avoid duplication.

^ permalink raw reply

* [PATCH v2 net-next] net: phy: fixed_phy: Fix fixed_phy not checking GPIO
From: Moritz Fischer @ 2019-02-07  5:45 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, hkallweit1, f.fainelli, andrew, moritz,
	Moritz Fischer

Fix fixed_phy not checking GPIO if no link_update callback
is registered.

In the original version all users registered a link_update
callback so the issue was masked.

Fixes: a5597008dbc2 ("phy: fixed_phy: Add gpio to determine link up/down.")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---

Changes from v1:
- Added Andrew's Reviewed-by: tag
- Added Fixes: tag (Thanks for digging, Andrew!)

---
 drivers/net/phy/fixed_phy.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index f136a23c1a35..d810f914aaa4 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -85,11 +85,11 @@ static int fixed_mdio_read(struct mii_bus *bus, int phy_addr, int reg_num)
 				s = read_seqcount_begin(&fp->seqcount);
 				fp->status.link = !fp->no_carrier;
 				/* Issue callback if user registered it. */
-				if (fp->link_update) {
+				if (fp->link_update)
 					fp->link_update(fp->phydev->attached_dev,
 							&fp->status);
-					fixed_phy_update(fp);
-				}
+				/* Check the GPIO for change in status */
+				fixed_phy_update(fp);
 				state = fp->status;
 			} while (read_seqcount_retry(&fp->seqcount, s));
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH 1/2] xsk: do not use mmap_sem
From: Davidlohr Bueso @ 2019-02-07  5:37 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, dave, linux-kernel, David S . Miller, Bjorn Topel,
	Magnus Karlsson, netdev, Davidlohr Bueso
In-Reply-To: <20190207053740.26915-1-dave@stgolabs.net>

Holding mmap_sem exclusively for a gup() is an overkill.
Lets replace the call for gup_fast() and let the mm take
it if necessary.

Cc: David S. Miller <davem@davemloft.net>
Cc: Bjorn Topel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
CC: netdev@vger.kernel.org
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 net/xdp/xdp_umem.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 5ab236c5c9a5..25e1e76654a8 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -265,10 +265,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
 	if (!umem->pgs)
 		return -ENOMEM;
 
-	down_write(&current->mm->mmap_sem);
-	npgs = get_user_pages(umem->address, umem->npgs,
-			      gup_flags, &umem->pgs[0], NULL);
-	up_write(&current->mm->mmap_sem);
+	npgs = get_user_pages_fast(umem->address, umem->npgs,
+				   gup_flags, &umem->pgs[0]);
 
 	if (npgs != umem->npgs) {
 		if (npgs >= 0) {
-- 
2.16.4


^ permalink raw reply related

* Re: Stack sends oversize UDP packet to the driver
From: Mahesh Bandewar (महेश बंडेवार) @ 2019-02-07  4:51 UTC (permalink / raw)
  To: Michael Chan
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet,
	Willem de Bruijn
In-Reply-To: <CACKFLimWHQA_52D_6FZDOeJftXt-YqFOdADMDFTFuUkK__Ay2Q@mail.gmail.com>

On Tue, Feb 5, 2019 at 11:36 AM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Wed, Jan 30, 2019 at 5:00 PM Mahesh Bandewar (महेश बंडेवार)
> <maheshb@google.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 1:07 AM Michael Chan <michael.chan@broadcom.com> wrote:
> > >
> > > On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
> > > <maheshb@google.com> wrote:
> > >
> > > >
> > > > The idea behind the fix is very simple and it is to create a dst-only
> > > > (unregistered) device with a very low MTU and use it instead of 'lo'
> > > > while invalidating the dst. This would make it *not* forward packets
> > > > to driver which might need fragmentation.
> > > >
> > >
> > > We tested the 2 patches many times and including an overnight test.  I
> > > can confirm that the oversize UDP packets are no longer seen with the
> > > patches applied.  However, I don't see the blackhole xmit function
> > > getting called to free the SKBs though.
> > >
> > Thanks for the confirmation Michael. The blackhole device mtu is
> > really small, so I would assume the fragmentation code dropped those
> > packets before calling the xmit function (in ip_fragment), you could
> > verify that with icmp counters.
> >
>
> I've looked at this a little more.  The blackhole_dev is not IFF_UP |
> IFF_RUNNING, right?  May be that's why the packets are never getting
> to the xmit function?
Yes, so I added those two flags and ended up writing a test-module for
the device (which I will include while posting the patch-series).
However, adding those flags is also not sufficient since the qdisc is
initialized to noop_qdisc so qdisc enqueue will drop packets before
hitting the ndo_start_xmit().

^ permalink raw reply

* RE: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Pankaj Bansal @ 2019-02-07  4:42 UTC (permalink / raw)
  To: Leo Li, Andrew Lunn
  Cc: Shawn Guo, Florian Fainelli, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Rob Herring
In-Reply-To: <CADRPPNRFfP1UHSF+99LJ_6QMCFeXi=QhmNJAhAtBPZ_pJ-jmgg@mail.gmail.com>



> -----Original Message-----
> From: Li Yang [mailto:leoyang.li@nxp.com]
> Sent: Thursday, 7 February, 2019 05:09 AM
> To: Andrew Lunn <andrew@lunn.ch>
> Cc: Pankaj Bansal <pankaj.bansal@nxp.com>; Shawn Guo
> <shawnguo@kernel.org>; Florian Fainelli <f.fainelli@gmail.com>;
> netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Rob Herring
> <robh+dt@kernel.org>
> Subject: Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> On Wed, Feb 6, 2019 at 3:46 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > > > >  &i2c0 {
> > > > > >         status = "okay";
> > > > > >
> > > > > > +       fpga@66 {
> > > > > > +               compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> > > > > > +               reg = <0x66>;
> > > > > > +               #address-cells = <1>;
> > > > > > +               #size-cells = <0>;
> > > > > > +
> > > > > > +               mdio-mux-1@54 {
> > > > >
> > > > > Still no compatible string defined for the node.  Probably
> > > > > should be
> > > > > "mdio-mux- mmioreg", "mdio-mux"
> > > >
> > > > it is not a specific device. MDIO mux is meant to be controlled by
> > > > some registers of parent device (FPGA).
> > > > Therefore, IMO this should not be a device and there should not be
> > > > any "compatible" property for it.
> >
> > > If it is not a device why we are defining a device node for it?  It
> > > is probably not a physical device per se, but it can be considered a
> > > virtual device provided by FPGA.
> >
> > It is a physical device. But it happens to be embedded inside another
> > device. And that embedded is not performed as a bus with devices on
> > it, so the device tree concepts don't fit directly.
> 
> Whether or not it is populated as a bus(which probably should as the FPGA does
> contain many different functions and these functions like the mdio-mux we are
> discussing about could have separate drivers), the node should have a new
> binding documentation similar to the mdio_mux_mmioreg binding or even
> covers the mmioreg too.  And the best way to match the node with the binding
> is through compatible strings IMO.  This is why I'm asking the node to have a
> compatible string.

The mdio_mux is NOT a device. FPGA is a device that provides the mdio mux functionality
(among other functions). The mdio mux is controlled via some bits In one of the FPGA register.

In my previous approach, I also used a compatible field for mdio_mux node in FPGA.
https://www.mail-archive.com/netdev@vger.kernel.org/msg252744.html
The FPGA driver would create as many platform devices for each subnode, and those devices
Would attach to mdio_mux_regmap driver based on compatible field.

BUT, this platform device creation is the problem. Since mdio_mux is not an actual device, how can
We create a platform device for it?

Which is why I removed the compatible field from mdio_mux nodes. The FPGA driver detects the mdio_mux
Using their name i.e. " mdio-mux-1@54". Like this: 
	for_each_child_of_node(dev->of_node, child) {
		if (!of_node_name_prefix(child, "mdio-mux"))

Refer : https://patchwork.kernel.org/patch/10797227/ 

> 
> >
> > > This also bring up another question that why this device cannot
> > > reuse the existing drivers/net/phy/mdio-mux-mmioreg.c driver?
> >
> > Because it is on an i2c bus, not an mmio bus.
> 
> Oops, I missed that.
> 
> >
> > > If we think regmap is a better solution, shall we replace the
> > > mmioreg driver with the regmap driver?
> >
> > regmap can be used with mmio. But for a single MMIO register it is a
> > huge framework. So it makes sense to keep mdio-mux-mmioreg simple.
> >
> > If however the device is already using regmap, adding one more
> > register is very little overhead. And it might be possible to use this
> > new mux with an mmio regmap, or an spi regmap, etc. So we seem to be
> > covering the best of both worlds.
> 
> Ya.  It would be ideal if the new driver can cover the legacy mdio-mux-mmioreg
> case too.

Refer : https://patchwork.kernel.org/patch/10797227/
The mdio_mux_regmap can be used by any FPGA be it i2c connected or MMIO based.

> 
> Regards,
> Leo

^ permalink raw reply

* Re: [PATCH] net: stmmac: Variable "val" in function sun8i_dwmac_set_syscon() could be uninitialized
From: David Miller @ 2019-02-07  4:17 UTC (permalink / raw)
  To: yzhai003
  Cc: csong, zhiyunq, peppe.cavallaro, alexandre.torgue, maxime.ripard,
	wens, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <20190205221559.17545-1-yzhai003@ucr.edu>

From: Yizhuo <yzhai003@ucr.edu>
Date: Tue,  5 Feb 2019 14:15:59 -0800

> @@ -639,9 +639,14 @@ static int sun8i_dwmac_set_syscon(struct stmmac_priv *priv)
>  	struct sunxi_priv_data *gmac = priv->plat->bsp_priv;
>  	struct device_node *node = priv->device->of_node;
>  	int ret;
> -	u32 reg, val;
> +	u32 reg, val = 0;
> +
> +	ret = regmap_read(gmac->regmap, SYSCON_EMAC_REG, &val);
> +	if (ret) {
> +		dev_err(priv->device, "Fail to read SYSCON_EMAC_REG.\n");
> +		return ret;
> +	}

I agree with the other reviewer that since you check 'ret' the initialization of
'val' is no longer needed.

^ permalink raw reply

* Re: [PATCH v3] net: emac: remove IBM_EMAC_RX_SKB_HEADROOM
From: David Miller @ 2019-02-07  4:16 UTC (permalink / raw)
  To: chunkeey; +Cc: netdev
In-Reply-To: <20190205212009.8820-1-chunkeey@gmail.com>

From: Christian Lamparter <chunkeey@gmail.com>
Date: Tue,  5 Feb 2019 22:20:09 +0100

> The EMAC driver had a custom IBM_EMAC_RX_SKB_HEADROOM
> Kconfig option that reserved additional skb headroom for RX.
> This patch removes the option and migrates the code
> to use napi_alloc_skb() and netdev_alloc_skb_ip_align()
> in its place.
> 
> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
> ---
>  v2 -> v3: drop inlines

Applied to net-next.

^ permalink raw reply

* Re: [PATCH] net: Don't default Cavium PTP driver to 'y'
From: David Miller @ 2019-02-07  3:48 UTC (permalink / raw)
  To: helgaas
  Cc: aleksey.makarov, rad, pombredanne, netdev, linux-kernel,
	linux-arm-kernel
In-Reply-To: <154939964148.34496.18016063453530224332.stgit@bhelgaas-glaptop.roam.corp.google.com>

From: Bjorn Helgaas <helgaas@kernel.org>
Date: Tue, 05 Feb 2019 14:47:21 -0600

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> 8c56df372bc1 ("net: add support for Cavium PTP coprocessor") added the
> Cavium PTP coprocessor driver and enabled it by default.  Remove the
> "default y" because the driver only applies to Cavium ThunderX processors.
> 
> Fixes: 8c56df372bc1 ("net: add support for Cavium PTP coprocessor")
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: phy: improve genphy_c45_read_link
From: David Miller @ 2019-02-07  3:47 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, linux, netdev
In-Reply-To: <303235da-0957-32bb-29ff-014c85b93f72@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Tue, 5 Feb 2019 20:41:37 +0100

> Let's make genphy_c45_read_link behave the same as genphy_update_link
> and set phydev->link in the function directly. This allows to simplify
> the callers. In addition don't check further devices once we detect
> that at least one device reports link as down.
> 
> v2:
> - remove an unused variable
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied, thanks Heiner.

^ permalink raw reply

* Re: general protection fault in __bfs (2)
From: syzbot @ 2019-02-07  3:42 UTC (permalink / raw)
  To: andy.shevchenko, bp, douly.fnst, hpa, konrad.wilk, len.brown,
	linux-kernel, mingo, netdev, puwen, rppt, syzkaller-bugs, tglx,
	x86
In-Reply-To: <00000000000086d87305801011c4@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    8834f5600cf3 Linux 5.0-rc5
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=143fd61f400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8f00801d7b7c4fe6
dashboard link: https://syzkaller.appspot.com/bug?extid=c58fa3b1231d2ea0c4d3
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15bab650c00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17a331df400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c58fa3b1231d2ea0c4d3@syzkaller.appspotmail.com

Started in network mode
Own node identity 7f000001, cluster identity 4711
Enabled bearer <udp:syz0>, priority 10
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7800 Comm: syz-executor309 Not tainted 5.0.0-rc5 #59
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:lock_accessed kernel/locking/lockdep.c:968 [inline]
RIP: 0010:__bfs kernel/locking/lockdep.c:1035 [inline]
RIP: 0010:__bfs+0x18a/0x5a0 kernel/locking/lockdep.c:989
Code: 8b 3f 4d 39 fd 0f 84 77 01 00 00 49 8d 7f 10 4c 89 f8 4c 8b 0d f7 94  
fa 08 48 89 f9 48 2d a0 27 31 8a 48 c1 e9 03 48 c1 f8 06 <42> 80 3c 31 00  
0f 85 db 02 00 00 49 8b 77 10 4c 8d 46 2c 4c 89 c1
RSP: 0018:ffff8880ae807828 EFLAGS: 00010003
RAX: 0000000001d73b61 RBX: ffff8880ae8078f0 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 1ffffffff14aad54 RDI: 0000000000000010
RBP: ffff8880ae807890 R08: 0000000000000001 R09: 0000000000001c9c
R10: ffffed1015d05bcf R11: 0000000000000000 R12: 0000000000000006
R13: ffffffff8a220a50 R14: dffffc0000000000 R15: 0000000000000000
FS:  00007f2ebceb4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000340 CR3: 000000009ad36000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <IRQ>
  __bfs_forwards kernel/locking/lockdep.c:1063 [inline]
  find_usage_forwards kernel/locking/lockdep.c:1363 [inline]
  check_usage_forwards+0x119/0x340 kernel/locking/lockdep.c:2575
  mark_lock_irq kernel/locking/lockdep.c:2690 [inline]
  mark_lock+0x427/0x1380 kernel/locking/lockdep.c:3062
  mark_irqflags kernel/locking/lockdep.c:2940 [inline]
  __lock_acquire+0x128f/0x4700 kernel/locking/lockdep.c:3295
  lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3841
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
  spin_lock include/linux/spinlock.h:329 [inline]
  __queue_work+0x9ce/0x1180 kernel/workqueue.c:1434
  delayed_work_timer_fn+0x5d/0x90 kernel/workqueue.c:1520
  call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
  expire_timers kernel/time/timer.c:1358 [inline]
  __run_timers kernel/time/timer.c:1681 [inline]
  __run_timers kernel/time/timer.c:1649 [inline]
  run_timer_softirq+0x44c/0x1700 kernel/time/timer.c:1694
  __do_softirq+0x266/0x95a kernel/softirq.c:292
  invoke_softirq kernel/softirq.c:373 [inline]
  irq_exit+0x180/0x1d0 kernel/softirq.c:413
  exiting_irq arch/x86/include/asm/apic.h:536 [inline]
  smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:766  
[inline]
RIP: 0010:console_unlock+0xc0d/0x10a0 kernel/printk/printk.c:2416
Code: fc ff df 48 c1 e8 03 80 3c 08 00 0f 85 49 04 00 00 48 83 3d 94 56 38  
07 00 0f 84 af 02 00 00 e8 b9 af 15 00 48 8b 7d 98 57 9d <0f> 1f 44 00 00  
e9 5f ff ff ff e8 a4 af 15 00 48 8b 7d 08 c7 05 56
RSP: 0018:ffff88809accf1b8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
RAX: ffff88808cc40300 RBX: 0000000000000200 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff815a2b67 RDI: 0000000000000293
RBP: ffff88809accf248 R08: ffff88808cc40300 R09: ffff88808cc40c18
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff841a1770 R14: ffffffff88f81d70 R15: 0000000000006601
  vprintk_emit+0x280/0x6d0 kernel/printk/printk.c:1931
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1958
  vprintk_func+0x7e/0x189 kernel/printk/printk_safe.c:398
  printk+0xba/0xed kernel/printk/printk.c:1991
  tipc_enable_bearer.cold+0x2d/0xda net/tipc/bearer.c:335
  __tipc_nl_bearer_enable+0x2d1/0x3b0 net/tipc/bearer.c:899
  tipc_nl_bearer_enable+0x23/0x40 net/tipc/bearer.c:907
  genl_family_rcv_msg+0x6e1/0xd90 net/netlink/genetlink.c:601
  genl_rcv_msg+0xca/0x16c net/netlink/genetlink.c:626
  netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2477
  genl_rcv+0x29/0x40 net/netlink/genetlink.c:637
  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
  netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336
  netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1917
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg+0xdd/0x130 net/socket.c:631
  ___sys_sendmsg+0x806/0x930 net/socket.c:2116
  __sys_sendmsg+0x105/0x1d0 net/socket.c:2154
  __do_sys_sendmsg net/socket.c:2163 [inline]
  __se_sys_sendmsg net/socket.c:2161 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44ddd9
Code: e8 5c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 1b c9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2ebceb3ce8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000700028 RCX: 000000000044ddd9
RDX: 0000000000000004 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000700020 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000002 R11: 0000000000000246 R12: 000000000070002c
R13: 000000000080fc2f R14: 00007f2ebceb49c0 R15: 0000000000000000
Modules linked in:
---[ end trace 23d22854ea5c89a7 ]---
RIP: 0010:lock_accessed kernel/locking/lockdep.c:968 [inline]
RIP: 0010:__bfs kernel/locking/lockdep.c:1035 [inline]
RIP: 0010:__bfs+0x18a/0x5a0 kernel/locking/lockdep.c:989
Code: 8b 3f 4d 39 fd 0f 84 77 01 00 00 49 8d 7f 10 4c 89 f8 4c 8b 0d f7 94  
fa 08 48 89 f9 48 2d a0 27 31 8a 48 c1 e9 03 48 c1 f8 06 <42> 80 3c 31 00  
0f 85 db 02 00 00 49 8b 77 10 4c 8d 46 2c 4c 89 c1
RSP: 0018:ffff8880ae807828 EFLAGS: 00010003
RAX: 0000000001d73b61 RBX: ffff8880ae8078f0 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 1ffffffff14aad54 RDI: 0000000000000010
RBP: ffff8880ae807890 R08: 0000000000000001 R09: 0000000000001c9c
R10: ffffed1015d05bcf R11: 0000000000000000 R12: 0000000000000006
R13: ffffffff8a220a50 R14: dffffc0000000000 R15: 0000000000000000
FS:  00007f2ebceb4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000340 CR3: 000000009ad36000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


^ permalink raw reply

* Re: [PATCH net-next] net: phy: fixed_phy: Fix fixed_phy not checking GPIO
From: Andrew Lunn @ 2019-02-07  3:04 UTC (permalink / raw)
  To: Moritz Fischer; +Cc: netdev, f.fainelli, hkallweit1, davem
In-Reply-To: <20190206181040.29539-1-mdf@kernel.org>

On Wed, Feb 06, 2019 at 10:10:40AM -0800, Moritz Fischer wrote:
> Fix fixed_phy not checking GPIO if no link_update callback
> is registered.
> 
> Signed-off-by: Moritz Fischer <mdf@kernel.org>

Fixes: a5597008dbc2 ("phy: fixed_phy: Add gpio to determine link up/down.")

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: phy: fixed_phy: Fix fixed_phy not checking GPIO
From: Andrew Lunn @ 2019-02-07  3:00 UTC (permalink / raw)
  To: Moritz Fischer; +Cc: netdev, f.fainelli, hkallweit1, davem
In-Reply-To: <20190206181040.29539-1-mdf@kernel.org>

On Wed, Feb 06, 2019 at 10:10:40AM -0800, Moritz Fischer wrote:
> Fix fixed_phy not checking GPIO if no link_update callback
> is registered.
> 
> Signed-off-by: Moritz Fischer <mdf@kernel.org>
> ---
> 
> Hi all,
> 
> I've been trying to figure out where exactly this broke,
> it must've been somewhere when the file was refactored
> in connection with phylink?

After some digging, i now understand. It was 'broken' right from the
beginning. The only user of this when i introduced it was a board with
an Ethernet switch driven by a DSA driver. The GPIO is used to
indicate if an SFP has a loss of signal indication.

DSA would look for the fixed-link properties for the switch port, and
if it found it, use of_phy_register_fixed_link() to register a fixed
link.

However, the broadcom SF2 switch driver needs to use the callback
method of reporting link up/down for a fixed-link. So the DSA core
always registers a generic DSA callback, which then calls into the DSA
driver if its driver structure implements the callback.

So we always had the case of both, so despite it being broken, it
worked...

	Andrew

^ permalink raw reply

* [PATCH] ipmr: ip6mr: Create new sockopt to clear mfc cache only
From: Callum Sinclair @ 2019-02-07  2:08 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, nikolay, netdev, linux-kernel; +Cc: Callum Sinclair
In-Reply-To: <20190207020828.21854-1-callum.sinclair@alliedtelesis.co.nz>

Currently the only way to clear the mfc cache was to delete the entries
one by one using the MRT_DEL_MFC socket option or to destroy and
recreate the socket.

Create a new socket option which will clear the multicast forwarding
cache on the socket without destroying the socket.

Signed-off-by: Callum Sinclair <callum.sinclair@alliedtelesis.co.nz>
---
 include/uapi/linux/mroute.h  |  7 +++-
 include/uapi/linux/mroute6.h |  7 +++-
 net/ipv4/ipmr.c              | 69 +++++++++++++++++++++-------------
 net/ipv6/ip6mr.c             | 73 ++++++++++++++++++++++--------------
 4 files changed, 99 insertions(+), 57 deletions(-)

diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
index 5d37a9ccce63..2d475edc3ec3 100644
--- a/include/uapi/linux/mroute.h
+++ b/include/uapi/linux/mroute.h
@@ -28,12 +28,17 @@
 #define MRT_TABLE	(MRT_BASE+9)	/* Specify mroute table ID		*/
 #define MRT_ADD_MFC_PROXY	(MRT_BASE+10)	/* Add a (*,*|G) mfc entry	*/
 #define MRT_DEL_MFC_PROXY	(MRT_BASE+11)	/* Del a (*,*|G) mfc entry	*/
-#define MRT_MAX		(MRT_BASE+11)
+#define MRT_FLUSH	(MRT_BASE+12)	/* Flush all multicast entries and vifs	*/
+#define MRT_MAX		(MRT_BASE+12)
 
 #define SIOCGETVIFCNT	SIOCPROTOPRIVATE	/* IP protocol privates */
 #define SIOCGETSGCNT	(SIOCPROTOPRIVATE+1)
 #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
 
+/* Flags used for MRT_FLUSH */
+#define MRT_FLUSH_ENTRIES	1	/* For flushing all multicast entries */
+#define MRT_FLUSH_VIFS		2	/* For flushing all multicast vifs */
+
 #define MAXVIFS		32
 typedef unsigned long vifbitmap_t;	/* User mode code depends on this lot */
 typedef unsigned short vifi_t;
diff --git a/include/uapi/linux/mroute6.h b/include/uapi/linux/mroute6.h
index 9999cc006390..b04094d997c8 100644
--- a/include/uapi/linux/mroute6.h
+++ b/include/uapi/linux/mroute6.h
@@ -31,12 +31,17 @@
 #define MRT6_TABLE	(MRT6_BASE+9)	/* Specify mroute table ID		*/
 #define MRT6_ADD_MFC_PROXY	(MRT6_BASE+10)	/* Add a (*,*|G) mfc entry	*/
 #define MRT6_DEL_MFC_PROXY	(MRT6_BASE+11)	/* Del a (*,*|G) mfc entry	*/
-#define MRT6_MAX	(MRT6_BASE+11)
+#define MRT6_FLUSH	(MRT6_BASE+12)	/* Flush all multicast entries and vifs	*/
+#define MRT6_MAX	(MRT6_BASE+12)
 
 #define SIOCGETMIFCNT_IN6	SIOCPROTOPRIVATE	/* IP protocol privates */
 #define SIOCGETSGCNT_IN6	(SIOCPROTOPRIVATE+1)
 #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
 
+/* Flags used for MRT6_FLUSH*/
+#define MRT6_FLUSH_ENTRIES	1	/* For flushing all multicast entries */
+#define MRT6_FLUSH_VIFS		2	/* For flushing all multicast vifs */
+
 #define MAXMIFS		32
 typedef unsigned long mifbitmap_t;	/* User mode code depends on this lot */
 typedef unsigned short mifi_t;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ddbf8c9a1abb..2eb569138569 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -416,7 +416,7 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 id)
 static void ipmr_free_table(struct mr_table *mrt)
 {
 	del_timer_sync(&mrt->ipmr_expire_timer);
-	mroute_clean_tables(mrt, true);
+	mroute_clean_tables(mrt, true, MRT_FLUSH_VIFS | MRT_FLUSH_ENTRIES);
 	rhltable_destroy(&mrt->mfc_hash);
 	kfree(mrt);
 }
@@ -1299,44 +1299,48 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
 }
 
 /* Close the multicast socket, and clear the vif tables etc */
-static void mroute_clean_tables(struct mr_table *mrt, bool all)
+static void mroute_clean_tables(struct mr_table *mrt, bool all, int flags)
 {
 	struct net *net = read_pnet(&mrt->net);
-	struct mr_mfc *c, *tmp;
 	struct mfc_cache *cache;
+	struct mr_mfc *c, *tmp;
 	LIST_HEAD(list);
 	int i;
 
 	/* Shut down all active vif entries */
-	for (i = 0; i < mrt->maxvif; i++) {
-		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
-			continue;
-		vif_delete(mrt, i, 0, &list);
+	if (flags & MRT_FLUSH_VIFS) {
+		for (i = 0; i < mrt->maxvif; i++) {
+			if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+				continue;
+				vif_delete(mrt, i, 0, &list);
+		}
+		unregister_netdevice_many(&list);
 	}
-	unregister_netdevice_many(&list);
 
 	/* Wipe the cache */
-	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
-		if (!all && (c->mfc_flags & MFC_STATIC))
-			continue;
-		rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
-		list_del_rcu(&c->list);
-		cache = (struct mfc_cache *)c;
-		call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
-					      mrt->id);
-		mroute_netlink_event(mrt, cache, RTM_DELROUTE);
-		mr_cache_put(c);
-	}
-
-	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
-		spin_lock_bh(&mfc_unres_lock);
-		list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
-			list_del(&c->list);
+	if (flags & MRT_FLUSH_ENTRIES) {
+		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
+			if (!all && (c->mfc_flags & MFC_STATIC))
+				continue;
+			rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
+			list_del_rcu(&c->list);
 			cache = (struct mfc_cache *)c;
+			call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
+										  mrt->id);
 			mroute_netlink_event(mrt, cache, RTM_DELROUTE);
-			ipmr_destroy_unres(mrt, cache);
+			mr_cache_put(c);
+		}
+
+		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
+			spin_lock_bh(&mfc_unres_lock);
+			list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
+				list_del(&c->list);
+				cache = (struct mfc_cache *)c;
+				mroute_netlink_event(mrt, cache, RTM_DELROUTE);
+				ipmr_destroy_unres(mrt, cache);
+			}
+			spin_unlock_bh(&mfc_unres_lock);
 		}
-		spin_unlock_bh(&mfc_unres_lock);
 	}
 }
 
@@ -1357,7 +1361,7 @@ static void mrtsock_destruct(struct sock *sk)
 						    NETCONFA_IFINDEX_ALL,
 						    net->ipv4.devconf_all);
 			RCU_INIT_POINTER(mrt->mroute_sk, NULL);
-			mroute_clean_tables(mrt, false);
+			mroute_clean_tables(mrt, false, MRT_FLUSH_VIFS | MRT_FLUSH_ENTRIES);
 		}
 	}
 	rtnl_unlock();
@@ -1482,6 +1486,17 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 					   sk == rtnl_dereference(mrt->mroute_sk),
 					   parent);
 		break;
+	case MRT_FLUSH:
+		if (get_user(val, (int __user *)optval)) {
+			ret = -EFAULT;
+			break;
+		}
+		rtnl_lock();
+		ipmr_for_each_table(mrt, net) {
+			mroute_clean_tables(mrt, true, val);
+		}
+		rtnl_unlock();
+		break;
 	/* Control PIM assert. */
 	case MRT_ASSERT:
 		if (optlen != sizeof(val)) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 30337b38274b..473c83d197fe 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -393,7 +393,7 @@ static struct mr_table *ip6mr_new_table(struct net *net, u32 id)
 static void ip6mr_free_table(struct mr_table *mrt)
 {
 	del_timer_sync(&mrt->ipmr_expire_timer);
-	mroute_clean_tables(mrt, true);
+	mroute_clean_tables(mrt, true, MRT6_FLUSH_VIFS | MRT6_FLUSH_ENTRIES);
 	rhltable_destroy(&mrt->mfc_hash);
 	kfree(mrt);
 }
@@ -1496,43 +1496,47 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
  *	Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr_table *mrt, bool all)
+static void mroute_clean_tables(struct mr_table *mrt, bool all, int flags)
 {
 	struct mr_mfc *c, *tmp;
 	LIST_HEAD(list);
 	int i;
 
 	/* Shut down all active vif entries */
-	for (i = 0; i < mrt->maxvif; i++) {
-		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
-			continue;
-		mif6_delete(mrt, i, 0, &list);
+	if (flags & MRT6_FLUSH_VIFS) {
+		for (i = 0; i < mrt->maxvif; i++) {
+			if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+				continue;
+			mif6_delete(mrt, i, 0, &list);
+		}
+		unregister_netdevice_many(&list);
 	}
-	unregister_netdevice_many(&list);
 
 	/* Wipe the cache */
-	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
-		if (!all && (c->mfc_flags & MFC_STATIC))
-			continue;
-		rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params);
-		list_del_rcu(&c->list);
-		mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE);
-		mr_cache_put(c);
-	}
+	if (flags & MRT6_FLUSH_ENTRIES) {
+		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
+			if (!all && (c->mfc_flags & MFC_STATIC))
+				continue;
+			rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params);
+			list_del_rcu(&c->list);
+			mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE);
+			mr_cache_put(c);
+		}
 
-	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
-		spin_lock_bh(&mfc_unres_lock);
-		list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
-			list_del(&c->list);
-			call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net),
-						       FIB_EVENT_ENTRY_DEL,
-						       (struct mfc6_cache *)c,
-						       mrt->id);
-			mr6_netlink_event(mrt, (struct mfc6_cache *)c,
-					  RTM_DELROUTE);
-			ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c);
+		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
+			spin_lock_bh(&mfc_unres_lock);
+			list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
+				list_del(&c->list);
+				call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net),
+											   FIB_EVENT_ENTRY_DEL,
+											   (struct mfc6_cache *)c,
+											   mrt->id);
+				mr6_netlink_event(mrt, (struct mfc6_cache *)c,
+						  RTM_DELROUTE);
+				ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c);
+			}
+			spin_unlock_bh(&mfc_unres_lock);
 		}
-		spin_unlock_bh(&mfc_unres_lock);
 	}
 }
 
@@ -1588,7 +1592,7 @@ int ip6mr_sk_done(struct sock *sk)
 						     NETCONFA_IFINDEX_ALL,
 						     net->ipv6.devconf_all);
 
-			mroute_clean_tables(mrt, false);
+			mroute_clean_tables(mrt, false, MRT6_FLUSH_VIFS | MRT6_FLUSH_ENTRIES);
 			err = 0;
 			break;
 		}
@@ -1703,6 +1707,19 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 					    parent);
 		rtnl_unlock();
 		return ret;
+	case MRT6_DEL_MFC_ALL:
+	{
+		int flags;
+
+		if (get_user(flags, (int __user *)optval))
+			return -EFAULT;
+		rtnl_lock();
+		ip6mr_for_each_table(mrt, net) {
+			mroute_clean_tables(mrt, true, flags);
+		}
+		rtnl_unlock();
+		return 0;
+	}
 
 	/*
 	 *	Control PIM assert (to activate pim will activate assert)
-- 
2.20.1


^ permalink raw reply related

* ipmr: ip6mr: Create new sockopt to clear mfc cache only
From: Callum Sinclair @ 2019-02-07  2:08 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, nikolay, netdev, linux-kernel; +Cc: Callum Sinclair

Created a way to clear the multicast forwarding cache on a socket
without having to either remove the entries manually using the delete
entry socket option or destroy and recreate the multicast socket.

Using the flags MRT_FLUSH_ENTRIES and MRT_FLUSH_VIFS, all multicast
entries can be cleared, all multicast interfaces can be closed or both
can be cleared using one sockopt call.

Patch Set 2:
  - Fix Compile Errors

Patch Set 3:
  - Fix Style Errors

Patch Set 4:
  - Implemented a way to clear the entries or vifs based off an input flag.

Callum Sinclair (1):
  ipmr: ip6mr: Create new sockopt to clear mfc cache only

 include/uapi/linux/mroute.h  |  7 +++-
 include/uapi/linux/mroute6.h |  7 +++-
 net/ipv4/ipmr.c              | 69 +++++++++++++++++++++-------------
 net/ipv6/ip6mr.c             | 73 ++++++++++++++++++++++--------------
 4 files changed, 99 insertions(+), 57 deletions(-)

-- 
2.20.1

^ permalink raw reply

* Re: Waiting for vrf to become free on rmmod of bridge...
From: David Ahern @ 2019-02-07  1:50 UTC (permalink / raw)
  To: Ben Greear, netdev
In-Reply-To: <cc2ceb6c-3d31-b19e-0e26-3e8689ff651e@candelatech.com>

On 2/6/19 3:20 PM, Ben Greear wrote:
> Hello,
> 
> I just saw this warning on a system running a hacked 4.20.2+ kernel. 
> Any known bugs
> of this nature in this (upstream) kernel?  The command that is blocked is:
> 'rmmod bridge llc'
> 
> [17069.299135] unregister_netdevice: waiting for _vrf13 to become free.
> Usage count = 1
> [17079.306438] unregister_netdevice: waiting for _vrf13 to become free.
> Usage count = 1
> [17089.314656] unregister_netdevice: waiting for _vrf13 to become free.
> Usage count = 1
> [17099.322870] unregister_netdevice: waiting for _vrf13 to become free.
> Usage count = 1
> 
> Thanks,
> Ben
> 

No known refcount issues with vrf.

I use namespaces for testing which creates devices, adds routes, runs
traffic and deletes the device and namespace. That series in the tests
has been known to trigger refcount problems in the past.

^ permalink raw reply

* Re: [PATCH net] net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: Yanjun Zhu @ 2019-02-07  1:51 UTC (permalink / raw)
  To: Yang Wei, netdev; +Cc: davem, shannon.nelson, robh, yang.wei9
In-Reply-To: <1549383584-5605-1-git-send-email-albin_yang@163.com>


On 2019/2/6 0:19, Yang Wei wrote:
> From: Yang Wei <yang.wei9@zte.com.cn>
>
> dev_consume_skb_irq() should be called when skb xmit done. It makes
> drop profiles(dropwatch, perf) more friendly.

Thanks a lot. I am OK.

Zhu Yanjun

>
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
> ---
>   drivers/net/ethernet/sun/cassini.c | 2 +-
>   drivers/net/ethernet/sun/sunbmac.c | 2 +-
>   drivers/net/ethernet/sun/sunhme.c  | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/sun/cassini.c b/drivers/net/ethernet/sun/cassini.c
> index 7ec4eb7..6fc05c1 100644
> --- a/drivers/net/ethernet/sun/cassini.c
> +++ b/drivers/net/ethernet/sun/cassini.c
> @@ -1898,7 +1898,7 @@ static inline void cas_tx_ringN(struct cas *cp, int ring, int limit)
>   		cp->net_stats[ring].tx_packets++;
>   		cp->net_stats[ring].tx_bytes += skb->len;
>   		spin_unlock(&cp->stat_lock[ring]);
> -		dev_kfree_skb_irq(skb);
> +		dev_consume_skb_irq(skb);
>   	}
>   	cp->tx_old[ring] = entry;
>   
> diff --git a/drivers/net/ethernet/sun/sunbmac.c b/drivers/net/ethernet/sun/sunbmac.c
> index 720b7ac..e9b757b 100644
> --- a/drivers/net/ethernet/sun/sunbmac.c
> +++ b/drivers/net/ethernet/sun/sunbmac.c
> @@ -781,7 +781,7 @@ static void bigmac_tx(struct bigmac *bp)
>   
>   		DTX(("skb(%p) ", skb));
>   		bp->tx_skbs[elem] = NULL;
> -		dev_kfree_skb_irq(skb);
> +		dev_consume_skb_irq(skb);
>   
>   		elem = NEXT_TX(elem);
>   	}
> diff --git a/drivers/net/ethernet/sun/sunhme.c b/drivers/net/ethernet/sun/sunhme.c
> index ff641cf..d007dfe 100644
> --- a/drivers/net/ethernet/sun/sunhme.c
> +++ b/drivers/net/ethernet/sun/sunhme.c
> @@ -1962,7 +1962,7 @@ static void happy_meal_tx(struct happy_meal *hp)
>   			this = &txbase[elem];
>   		}
>   
> -		dev_kfree_skb_irq(skb);
> +		dev_consume_skb_irq(skb);
>   		dev->stats.tx_packets++;
>   	}
>   	hp->tx_old = elem;

^ permalink raw reply

* Re: [PATCH net] net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-07  1:43 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, f.fainelli, andrew, yang.wei9
In-Reply-To: <1549383954-5843-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed,  6 Feb 2019 00:25:54 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in sbdma_tx_process() when
> skb xmit done. It makes drop profiles(dropwatch, perf) more
> friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-07  1:43 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, romieu, yang.wei9
In-Reply-To: <1549383774-5726-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed,  6 Feb 2019 00:22:54 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in velocity_free_tx_buf()
> when skb xmit done. It makes drop profiles(dropwatch, perf) more
> friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox