Netdev List
 help / color / mirror / Atom feed
* Re: Reminder: 99 open syzbot bugs in net subsystem
From: Eric Dumazet @ 2019-07-25  5:04 UTC (permalink / raw)
  To: David Miller, eric.dumazet, dvyukov, netdev, fw, i.maximets,
	edumazet, dsahern, linux-kernel, syzkaller-bugs
In-Reply-To: <20190724210950.GH213255@gmail.com>



On 7/24/19 11:09 PM, Eric Biggers wrote:
> On Wed, Jul 24, 2019 at 01:09:28PM -0700, David Miller wrote:
>> From: Eric Biggers <ebiggers@kernel.org>
>> Date: Wed, 24 Jul 2019 11:37:12 -0700
>>
>>> We can argue about what words to use to describe this situation, but
>>> it doesn't change the situation itself.
>>
>> And we should argue about those words because it matters to humans and
>> effects how they feel, and humans ultimately fix these bugs.
>>
>> So please stop with the hyperbole.
>>
>> Thank you.
> 
> Okay, there are 151 bugs that syzbot saw on the mainline Linux kernel in the
> last 7 days (90.1% with reproducers).  Of those, 59 were reported over 3 months
> ago (89.8% with reproducers).  Of those, 12 were reported over a year ago (83.3%
> with reproducers).
> 
> No opinion on whether those are small/medium/large numbers, in case it would
> hurt someone's feelings.
> 
> These numbers do *not* include bugs that are still valid but weren't seen on
> mainline in last 7 days, e.g.:
> 
> - Bugs that are seen only rarely, so by chance weren't seen in last 7 days.
> - Bugs only in linux-next and/or subsystem branches.
> - Bugs that were seen in mainline more than 7 days ago, and then only on
>   linux-next or subsystem branch in last 7 days.
> - Bugs that stopped being seen due to a change in syzkaller.
> - Bugs that stopped being seen due to a change in kernel config.
> - Bugs that stopped being seen due to other environment changes such as kernel
>   command line parameters.
> - Bugs that stopped being seen due to a kernel change that hid the bug but
>   didn't actually fix it, i.e. still reachable in other ways.
> 

We do not doubt syzkaller is an incredible tool.

But netdev@ and lkml@ are mailing lists for humans to interact,
exchange ideas, send patches and review them.

To me, an issue that was reported to netdev by a real user is _way_ more important
than potential issues that a bot might have found doing crazy things.

We need to keep optimal S/N on mailing lists, so any bots trying to interact
with these lists must be very cautious and damn smart.

When I have time to spare and can work on syzbot reports, I am going to a web
page where I can see them and select the ones it makes sense to fix.
I hate having to set up email filters.


^ permalink raw reply

* Re: [PATCH] [v2 net-next] ipvs: reduce kernel stack usage
From: Julian Anastasov @ 2019-07-25  5:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Wensong Zhang, Simon Horman, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, Jacky Hu, Matteo Croce, netdev, lvs-devel,
	linux-kernel, netfilter-devel, coreteam
In-Reply-To: <20190722113009.2704377-1-arnd@arndb.de>


	Hello,

On Mon, 22 Jul 2019, Arnd Bergmann wrote:

> With the new CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL option, the stack
> usage in the ipvs debug output grows because each instance of
> IP_VS_DBG_BUF() now has its own buffer of 160 bytes that add up
> rather than reusing the stack slots:
> 
> net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_sched_persist':
> net/netfilter/ipvs/ip_vs_core.c:427:1: error: the frame size of 1052 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_new_conn_out':
> net/netfilter/ipvs/ip_vs_core.c:1231:1: error: the frame size of 1048 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> net/netfilter/ipvs/ip_vs_ftp.c: In function 'ip_vs_ftp_out':
> net/netfilter/ipvs/ip_vs_ftp.c:397:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> net/netfilter/ipvs/ip_vs_ftp.c: In function 'ip_vs_ftp_in':
> net/netfilter/ipvs/ip_vs_ftp.c:555:1: error: the frame size of 1200 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> 
> Since printk() already has a way to print IPv4/IPv6 addresses using
> the %pISc format string, use that instead, combined with a macro that
> creates a local sockaddr structure on the stack. These will still
> add up, but the stack frames are now under 200 bytes.
> 
> So far I have also only added three files that caused the warning
> messages to be reported. There are still a lot of other instances of
> IP_VS_DBG_BUF() that could be converted the same way after the
> basic idea is confirmed.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2:
>  - use compressed ipv6 format
>  - fix port number endianess

	Thanks! Some comments below...

> ---
>  include/net/ip_vs.h             | 59 ++++++++++++++++-----------------
>  net/netfilter/ipvs/ip_vs_core.c | 44 ++++++++++++------------
>  net/netfilter/ipvs/ip_vs_ftp.c  | 20 +++++------
>  3 files changed, 60 insertions(+), 63 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 3759167f91f5..eace4374e69f 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -227,6 +227,16 @@ static inline const char *ip_vs_dbg_addr(int af, char *buf, size_t buf_len,
>  		       sizeof(ip_vs_dbg_buf), addr,			\
>  		       &ip_vs_dbg_idx)
>  
> +#define IP_VS_DBG_SOCKADDR4(fam, addr, port)				\
> +	(struct sockaddr*)&(struct sockaddr_in)				\
> +	{ .sin_family = (fam), .sin_addr = (addr)->in, .sin_port = (port) }
> +#define IP_VS_DBG_SOCKADDR6(fam, addr, port)				\
> +	(struct sockaddr*)&(struct sockaddr_in6) \
> +	{ .sin6_family = (fam), .sin6_addr = (addr)->in6, .sin6_port = (port) }
> +#define IP_VS_DBG_SOCKADDR(fam, addr, port) (fam == AF_INET ?		\
> +			IP_VS_DBG_SOCKADDR4(fam, addr, port) :		\
> +			IP_VS_DBG_SOCKADDR6(fam, addr, port))
> +
>  #define IP_VS_DBG(level, msg, ...)					\
>  	do {								\
>  		if (level <= ip_vs_get_debug_level())			\
> @@ -251,6 +261,7 @@ static inline const char *ip_vs_dbg_addr(int af, char *buf, size_t buf_len,
>  #else	/* NO DEBUGGING at ALL */
>  #define IP_VS_DBG_BUF(level, msg...)  do {} while (0)
>  #define IP_VS_ERR_BUF(msg...)  do {} while (0)
> +#define IP_VS_DBG_SOCKADDR(fam, addr, port) NULL
>  #define IP_VS_DBG(level, msg...)  do {} while (0)
>  #define IP_VS_DBG_RL(msg...)  do {} while (0)
>  #define IP_VS_DBG_PKT(level, af, pp, skb, ofs, msg)	do {} while (0)
> @@ -1244,31 +1255,23 @@ static inline void ip_vs_control_del(struct ip_vs_conn *cp)
>  {
>  	struct ip_vs_conn *ctl_cp = cp->control;
>  	if (!ctl_cp) {
> -		IP_VS_ERR_BUF("request control DEL for uncontrolled: "
> -			      "%s:%d to %s:%d\n",
> -			      IP_VS_DBG_ADDR(cp->af, &cp->caddr),
> -			      ntohs(cp->cport),
> -			      IP_VS_DBG_ADDR(cp->af, &cp->vaddr),
> -			      ntohs(cp->vport));
> +		pr_err("request control DEL for uncontrolled: "
> +		       "%pISpc to %pISpcn",

	Backslash was lost

> +		       IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, cp->cport),
> +		       IP_VS_DBG_SOCKADDR(cp->af, &cp->vaddr, cp->vport));
>  
>  		return;
>  	}
>  
> -	IP_VS_DBG_BUF(7, "DELeting control for: "
> -		      "cp.dst=%s:%d ctl_cp.dst=%s:%d\n",
> -		      IP_VS_DBG_ADDR(cp->af, &cp->caddr),
> -		      ntohs(cp->cport),
> -		      IP_VS_DBG_ADDR(cp->af, &ctl_cp->caddr),
> -		      ntohs(ctl_cp->cport));
> +	IP_VS_DBG(7, "DELeting control for: cp.dst=%pISpc ctl_cp.dst=%pISpc\n",
> +		  IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, cp->cport),
> +		  IP_VS_DBG_SOCKADDR(cp->af, &ctl_cp->caddr, ctl_cp->cport));
>  
>  	cp->control = NULL;
>  	if (atomic_read(&ctl_cp->n_control) == 0) {
> -		IP_VS_ERR_BUF("BUG control DEL with n=0 : "
> -			      "%s:%d to %s:%d\n",
> -			      IP_VS_DBG_ADDR(cp->af, &cp->caddr),
> -			      ntohs(cp->cport),
> -			      IP_VS_DBG_ADDR(cp->af, &cp->vaddr),
> -			      ntohs(cp->vport));
> +		pr_err("BUG control DEL with n=0 : %pISpc to %pISpc\n",

	We can remove the space before colon

> +		       IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, cp->cport),
> +		       IP_VS_DBG_SOCKADDR(cp->af, &cp->vaddr, cp->vport));
>  
>  		return;
>  	}
> @@ -1279,22 +1282,18 @@ static inline void
>  ip_vs_control_add(struct ip_vs_conn *cp, struct ip_vs_conn *ctl_cp)
>  {
>  	if (cp->control) {
> -		IP_VS_ERR_BUF("request control ADD for already controlled: "
> -			      "%s:%d to %s:%d\n",
> -			      IP_VS_DBG_ADDR(cp->af, &cp->caddr),
> -			      ntohs(cp->cport),
> -			      IP_VS_DBG_ADDR(cp->af, &cp->vaddr),
> -			      ntohs(cp->vport));
> +		pr_err("request control ADD for already controlled: "
> +		       "%pISpc to %pISpc\n",
> +		       IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, cp->cport),
> +		       IP_VS_DBG_SOCKADDR(cp->af, &cp->vaddr, cp->vport));
>  
>  		ip_vs_control_del(cp);
>  	}
>  
> -	IP_VS_DBG_BUF(7, "ADDing control for: "
> -		      "cp.dst=%s:%d ctl_cp.dst=%s:%d\n",
> -		      IP_VS_DBG_ADDR(cp->af, &cp->caddr),
> -		      ntohs(cp->cport),
> -		      IP_VS_DBG_ADDR(cp->af, &ctl_cp->caddr),
> -		      ntohs(ctl_cp->cport));
> +	IP_VS_DBG(7, "ADDing control for: "
> +		  "cp.dst=%pISpc ctl_cp.dst=%pISpc\n",

	May be we can join above lines

> +		  IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, cp->cport),
> +		  IP_VS_DBG_SOCKADDR(cp->af, &ctl_cp->caddr, ctl_cp->cport));
>  
>  	cp->control = ctl_cp;
>  	atomic_inc(&ctl_cp->n_control);
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 46f06f92ab8f..06ab0bf9ae08 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -52,7 +52,6 @@
>  #include <net/ip_vs.h>
>  #include <linux/indirect_call_wrapper.h>
>  
> -
>  EXPORT_SYMBOL(register_ip_vs_scheduler);
>  EXPORT_SYMBOL(unregister_ip_vs_scheduler);
>  EXPORT_SYMBOL(ip_vs_proto_name);
> @@ -295,11 +294,11 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
>  #endif
>  		snet.ip = src_addr->ip & svc->netmask;
>  
> -	IP_VS_DBG_BUF(6, "p-schedule: src %s:%u dest %s:%u "
> -		      "mnet %s\n",
> -		      IP_VS_DBG_ADDR(svc->af, src_addr), ntohs(src_port),
> -		      IP_VS_DBG_ADDR(svc->af, dst_addr), ntohs(dst_port),
> -		      IP_VS_DBG_ADDR(svc->af, &snet));
> +	IP_VS_DBG(6, "p-schedule: src %pISpc dest %pISpc "
> +		      "mnet %pIS\n",

	Join? mnet %pISc ?

> +		      IP_VS_DBG_SOCKADDR(svc->af, src_addr, src_port),
> +		      IP_VS_DBG_SOCKADDR(svc->af, dst_addr, dst_port),
> +		      IP_VS_DBG_SOCKADDR(svc->af, &snet, 0));
>  
>  	/*
>  	 * As far as we know, FTP is a very complicated network protocol, and
> @@ -567,12 +566,12 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
>  		}
>  	}
>  
> -	IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u "
> -		      "d:%s:%u conn->flags:%X conn->refcnt:%d\n",
> +	IP_VS_DBG(6, "Schedule fwd:%c c:%pISpc v:%pISpc "
> +		      "d:%pISp conn->flags:%X conn->refcnt:%d\n",

	d:%pISpc

>  		      ip_vs_fwd_tag(cp),
> -		      IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport),
> -		      IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport),
> -		      IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport),
> +		      IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, cp->cport),
> +		      IP_VS_DBG_SOCKADDR(cp->af, &cp->vaddr, cp->vport),
> +		      IP_VS_DBG_SOCKADDR(cp->daf, &cp->daddr, cp->dport),
>  		      cp->flags, refcount_read(&cp->refcnt));
>  
>  	ip_vs_conn_stats(cp, svc);
> @@ -886,8 +885,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
>  	/* Ensure the checksum is correct */
>  	if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) {
>  		/* Failed checksum! */
> -		IP_VS_DBG_BUF(1, "Forward ICMP: failed checksum from %s!\n",
> -			      IP_VS_DBG_ADDR(af, snet));
> +		IP_VS_DBG(1, "Forward ICMP: failed checksum from %pISc!\n",
> +			      IP_VS_DBG_SOCKADDR(af, snet, 0));
>  		goto out;
>  	}
>  
> @@ -1220,13 +1219,13 @@ struct ip_vs_conn *ip_vs_new_conn_out(struct ip_vs_service *svc,
>  	ip_vs_conn_stats(cp, svc);
>  
>  	/* return connection (will be used to handle outgoing packet) */
> -	IP_VS_DBG_BUF(6, "New connection RS-initiated:%c c:%s:%u v:%s:%u "
> -		      "d:%s:%u conn->flags:%X conn->refcnt:%d\n",
> -		      ip_vs_fwd_tag(cp),
> -		      IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport),
> -		      IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport),
> -		      IP_VS_DBG_ADDR(cp->af, &cp->daddr), ntohs(cp->dport),
> -		      cp->flags, refcount_read(&cp->refcnt));
> +	IP_VS_DBG(6, "New connection RS-initiated:%c c:%pISpc v:%pISpc "
> +		  "d:%pISp conn->flags:%X conn->refcnt:%d\n",

	d:%pISpc

> +		  ip_vs_fwd_tag(cp),
> +		  IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, cp->cport),
> +		  IP_VS_DBG_SOCKADDR(cp->af, &cp->vaddr, cp->vport),
> +		  IP_VS_DBG_SOCKADDR(cp->af, &cp->daddr, cp->dport),
> +		  cp->flags, refcount_read(&cp->refcnt));
>  	LeaveFunction(12);
>  	return cp;
>  }
> @@ -1969,7 +1968,6 @@ static int ip_vs_in_icmp_v6(struct netns_ipvs *ipvs, struct sk_buff *skb,
>  }
>  #endif
>  
> -
>  /*
>   *	Check if it's for virtual services, look it up,
>   *	and send it on its way...
> @@ -1998,10 +1996,10 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
>  		      hooknum != NF_INET_LOCAL_OUT) ||
>  		     !skb_dst(skb))) {
>  		ip_vs_fill_iph_skb(af, skb, false, &iph);
> -		IP_VS_DBG_BUF(12, "packet type=%d proto=%d daddr=%s"
> +		IP_VS_DBG(12, "packet type=%d proto=%d daddr=%pISc"
>  			      " ignored in hook %u\n",
>  			      skb->pkt_type, iph.protocol,
> -			      IP_VS_DBG_ADDR(af, &iph.daddr), hooknum);
> +			      IP_VS_DBG_SOCKADDR(af, &iph.daddr, 0), hooknum);
>  		return NF_ACCEPT;
>  	}
>  	/* ipvs enabled in this netns ? */
> diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
> index cf925906f59b..22099c42e184 100644
> --- a/net/netfilter/ipvs/ip_vs_ftp.c
> +++ b/net/netfilter/ipvs/ip_vs_ftp.c
> @@ -306,9 +306,9 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
>  					   &start, &end) != 1)
>  			return 1;
>  
> -		IP_VS_DBG_BUF(7, "EPSV response (%s:%u) -> %s:%u detected\n",
> -			      IP_VS_DBG_ADDR(cp->af, &from), ntohs(port),
> -			      IP_VS_DBG_ADDR(cp->af, &cp->caddr), 0);
> +		IP_VS_DBG(7, "EPSV response (%pISpc) -> %pISc detected\n",
> +			  IP_VS_DBG_SOCKADDR(cp->af, &from, port),
> +			  IP_VS_DBG_SOCKADDR(cp->af, &cp->caddr, 0));
>  	} else {
>  		return 1;
>  	}
> @@ -510,15 +510,15 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
>  					  &to, &port, cp->af,
>  					  &start, &end) == 1) {
>  
> -		IP_VS_DBG_BUF(7, "EPRT %s:%u detected\n",
> -			      IP_VS_DBG_ADDR(cp->af, &to), ntohs(port));
> +		IP_VS_DBG(7, "EPRT %pISpc detected\n",
> +			  IP_VS_DBG_SOCKADDR(cp->af, &to, port));
>  
>  		/* Now update or create a connection entry for it */
> -		IP_VS_DBG_BUF(7, "protocol %s %s:%u %s:%u\n",
> -			      ip_vs_proto_name(ipvsh->protocol),
> -			      IP_VS_DBG_ADDR(cp->af, &to), ntohs(port),
> -			      IP_VS_DBG_ADDR(cp->af, &cp->vaddr),
> -			      ntohs(cp->vport)-1);
> +		IP_VS_DBG(7, "protocol %s %pISpc %pISpc\n",
> +			  ip_vs_proto_name(ipvsh->protocol),
> +			  IP_VS_DBG_SOCKADDR(cp->af, &to, port),
> +			  IP_VS_DBG_SOCKADDR(cp->af, &cp->vaddr,
> +					     htons(ntohs(cp->vport)-1)));
>  	} else {
>  		return 1;
>  	}
> -- 
> 2.20.0

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: Reminder: 99 open syzbot bugs in net subsystem
From: Eric Biggers @ 2019-07-25  4:40 UTC (permalink / raw)
  To: Theodore Y. Ts'o, David Miller, eric.dumazet, dvyukov, netdev,
	fw, i.maximets, edumazet, dsahern, linux-kernel, syzkaller-bugs
In-Reply-To: <20190725033913.GB13651@mit.edu>

On Wed, Jul 24, 2019 at 11:39:13PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Jul 24, 2019 at 01:09:28PM -0700, David Miller wrote:
> > From: Eric Biggers <ebiggers@kernel.org>
> > Date: Wed, 24 Jul 2019 11:37:12 -0700
> > 
> > > We can argue about what words to use to describe this situation, but
> > > it doesn't change the situation itself.
> > 
> > And we should argue about those words because it matters to humans and
> > effects how they feel, and humans ultimately fix these bugs.
> > 
> > So please stop with the hyperbole.
> 
> Perhaps it would be better to call them, "syzbot reports".  Not all
> syzbot reports are bugs.  In fact, Dmitry has steadfastly refused to
> add features which any basic bug-tracking system would have, claiming
> that syzbot should not be a bug-tracking system, and something like
> bugzilla should be forcibly imposed on all kernel developers.  So I
> don't consider syzkaller reports as bugs --- they are just reports.
> 
> In order for developers to want to engage with "syzbot reports", we
> need to reduce developer toil which syzbot imposes on developers, such
> that it is a net benefit, instead of it being just a source of
> annoying e-mails, some of which are actionable, and some of which are
> noise.
> 
> In particular, asking developers to figure out which syzbot reports
> should be closed, because developers found the problem independently,
> and fixed it without hearing about from syzbot first, really isn't a
> fair thing to ask.  Especially if we can automate away the problem.
> 
> If there is a reproducer, it should be possible to automatically
> categorize the reproducer as a reliable reproducer or a flakey one.
> If it is a reliable reproducer on version X, and it fails to be
> reliably reproduce on version X+N, then it should be able to figure
> out that it has been fixed, instead of requesting that a human confirm
> it.  If you really want a human to look at it, now that syzkaller has
> a bisection feature, it should be possible to use the reliable
> reproducer to do a negative bisection search to report a candidate
> fix.  This would significantly reproduce the developer toil imposed as
> a tax on developers.  And if Dmitry doesn't want to auto-close those
> reports that appear to be fixed already, at the very least they should
> be down-prioritized on Eric's reports, so people who don't want to
> waste their time on "bureaucracy" can do so.
> 
> Cheers,
> 
> 						- Ted
> 
> P.S.  Another criteria I'd suggest down-prioritizing on is, "does it
> require root privileges?"  After all, since root has so many different
> ways of crashing a system already, and if we're all super-busy, we
> need to prioritize which reports should be addressed first.
> 

I agree with all this.  Fix bisection would be really useful.  I think what we'd
actually need to do to get decent results, though, is consider many different
signals (days since last occurred, repro type, fix bisected, bug bisected,
occurred in mainline or not, does the repro work as root, is it clearly a "bad"
bug like use-after-free, etc.) and compute an appropriate timeout based on that.

However, I'd like to emphasize that in my reminder emails, I've *already*
considered many of these factors when sorting the bug reports, and in particular
the bugs/reports that have been seen recently are strongly weighted towards
being listed first, especially if they were seen in mainline.  In this
particular reminder email, for example, the first 18 bugs/reports have *all*
been seen in the last 4 days.

These first 18 bugs/reports are ready to be worked on and fixed now.  It's
unclear to me what is most impeding this.  Is it part of the syzbot process?
Bad reproducers?  Too much noise?  Or is it no funding?  Not enough qualified
people?  No maintainers?  Not enough reminders?  Lack of CVEs and demonstrable
exploits?  What is most impeding these 18 bugs from being fixed?

- Eric

^ permalink raw reply

* Re: [PATCH 4.4 stable net] net: tcp: Fix use-after-free in tcp_write_xmit
From: maowenan @ 2019-07-25  4:29 UTC (permalink / raw)
  To: Eric Dumazet, davem, gregkh, netdev, linux-kernel
In-Reply-To: <44f0ba0d-fd19-d44b-9c5c-686e2f8ef988@gmail.com>



On 2019/7/24 22:07, Eric Dumazet wrote:
> 
> 
> On 7/24/19 12:46 PM, maowenan wrote:
>>
>>
>> On 2019/7/24 17:45, Eric Dumazet wrote:
>>>
>>>
>>> On 7/24/19 11:17 AM, Mao Wenan wrote:
>>>> There is one report about tcp_write_xmit use-after-free with version 4.4.136:
>>>>
>>>> BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline]
>>>> BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
>>>> BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
>>>> Read of size 2 at addr ffff8801d6fc87b0 by task syz-executor408/4195
>>>>
>>>> CPU: 0 PID: 4195 Comm: syz-executor408 Not tainted 4.4.136-gfb7e319 #59
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>>>  0000000000000000 7d8f38ecc03be946 ffff8801d73b7710 ffffffff81e0edad
>>>>  ffffea00075bf200 ffff8801d6fc87b0 0000000000000000 ffff8801d6fc87b0
>>>>  dffffc0000000000 ffff8801d73b7748 ffffffff815159b6 ffff8801d6fc87b0
>>>> Call Trace:
>>>>  [<ffffffff81e0edad>] __dump_stack lib/dump_stack.c:15 [inline]
>>>>  [<ffffffff81e0edad>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
>>>>  [<ffffffff815159b6>] print_address_description+0x6c/0x216 mm/kasan/report.c:252
>>>>  [<ffffffff81515cd5>] kasan_report_error mm/kasan/report.c:351 [inline]
>>>>  [<ffffffff81515cd5>] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408
>>>>  [<ffffffff814f9784>] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427
>>>>  [<ffffffff83286582>] tcp_skb_pcount include/net/tcp.h:796 [inline]
>>>>  [<ffffffff83286582>] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
>>>>  [<ffffffff83286582>] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
>>>>  [<ffffffff83287a40>] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307
>>>>  [<ffffffff8328e966>] tcp_send_fin+0x176/0xab0 net/ipv4/tcp_output.c:2883
>>>>  [<ffffffff8324c0d0>] tcp_close+0xca0/0xf70 net/ipv4/tcp.c:2112
>>>>  [<ffffffff832f8d0f>] inet_release+0xff/0x1d0 net/ipv4/af_inet.c:435
>>>>  [<ffffffff82f1a156>] sock_release+0x96/0x1c0 net/socket.c:586
>>>>  [<ffffffff82f1a296>] sock_close+0x16/0x20 net/socket.c:1037
>>>>  [<ffffffff81522da5>] __fput+0x235/0x6f0 fs/file_table.c:208
>>>>  [<ffffffff815232e5>] ____fput+0x15/0x20 fs/file_table.c:244
>>>>  [<ffffffff8118bd7f>] task_work_run+0x10f/0x190 kernel/task_work.c:115
>>>>  [<ffffffff81135285>] exit_task_work include/linux/task_work.h:21 [inline]
>>>>  [<ffffffff81135285>] do_exit+0x9e5/0x26b0 kernel/exit.c:759
>>>>  [<ffffffff8113b1d1>] do_group_exit+0x111/0x330 kernel/exit.c:889
>>>>  [<ffffffff8115e5cc>] get_signal+0x4ec/0x14b0 kernel/signal.c:2321
>>>>  [<ffffffff8100e02b>] do_signal+0x8b/0x1d30 arch/x86/kernel/signal.c:712
>>>>  [<ffffffff8100360a>] exit_to_usermode_loop+0x11a/0x160 arch/x86/entry/common.c:248
>>>>  [<ffffffff81006535>] prepare_exit_to_usermode arch/x86/entry/common.c:283 [inline]
>>>>  [<ffffffff81006535>] syscall_return_slowpath+0x1b5/0x1f0 arch/x86/entry/common.c:348
>>>>  [<ffffffff838c29b5>] int_ret_from_sys_call+0x25/0xa3
>>>>
>>>> Allocated by task 4194:
>>>>  [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
>>>>  [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
>>>>  [<ffffffff814f8b57>] set_track mm/kasan/kasan.c:524 [inline]
>>>>  [<ffffffff814f8b57>] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:616
>>>>  [<ffffffff814f9122>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:554
>>>>  [<ffffffff814f4c1e>] slab_post_alloc_hook mm/slub.c:1349 [inline]
>>>>  [<ffffffff814f4c1e>] slab_alloc_node mm/slub.c:2615 [inline]
>>>>  [<ffffffff814f4c1e>] slab_alloc mm/slub.c:2623 [inline]
>>>>  [<ffffffff814f4c1e>] kmem_cache_alloc+0xbe/0x2a0 mm/slub.c:2628
>>>>  [<ffffffff82f380a6>] kmem_cache_alloc_node include/linux/slab.h:350 [inline]
>>>>  [<ffffffff82f380a6>] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218
>>>>  [<ffffffff832466c3>] alloc_skb_fclone include/linux/skbuff.h:856 [inline]
>>>>  [<ffffffff832466c3>] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833
>>>>  [<ffffffff83249164>] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178
>>>>  [<ffffffff83300ef3>] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755
>>>>  [<ffffffff82f1e1fc>] sock_sendmsg_nosec net/socket.c:625 [inline]
>>>>  [<ffffffff82f1e1fc>] sock_sendmsg+0xcc/0x110 net/socket.c:635
>>>>  [<ffffffff82f1eedc>] SYSC_sendto+0x21c/0x370 net/socket.c:1665
>>>>  [<ffffffff82f21560>] SyS_sendto+0x40/0x50 net/socket.c:1633
>>>>  [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
>>>>
>>>> Freed by task 4194:
>>>>  [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
>>>>  [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
>>>>  [<ffffffff814f91a2>] set_track mm/kasan/kasan.c:524 [inline]
>>>>  [<ffffffff814f91a2>] kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:589
>>>>  [<ffffffff814f632e>] slab_free_hook mm/slub.c:1383 [inline]
>>>>  [<ffffffff814f632e>] slab_free_freelist_hook mm/slub.c:1405 [inline]
>>>>  [<ffffffff814f632e>] slab_free mm/slub.c:2859 [inline]
>>>>  [<ffffffff814f632e>] kmem_cache_free+0xbe/0x340 mm/slub.c:2881
>>>>  [<ffffffff82f3527f>] kfree_skbmem+0xcf/0x100 net/core/skbuff.c:635
>>>>  [<ffffffff82f372fd>] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676
>>>>  [<ffffffff83288834>] sk_wmem_free_skb include/net/sock.h:1447 [inline]
>>>>  [<ffffffff83288834>] tcp_write_queue_purge include/net/tcp.h:1460 [inline]
>>>>  [<ffffffff83288834>] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline]
>>>>  [<ffffffff83288834>] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261
>>>>  [<ffffffff8329b991>] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246
>>>>  [<ffffffff832f9ca9>] __inet_stream_connect+0x2a9/0xc30 net/ipv4/af_inet.c:615
>>>>  [<ffffffff832fa685>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:676
>>>>  [<ffffffff82f1eb78>] SYSC_connect+0x1b8/0x300 net/socket.c:1557
>>>>  [<ffffffff82f214b4>] SyS_connect+0x24/0x30 net/socket.c:1538
>>>>  [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
>>>>
>>>> Syzkaller reproducer():
>>>> r0 = socket$packet(0x11, 0x3, 0x300)
>>>> r1 = socket$inet_tcp(0x2, 0x1, 0x0)
>>>> bind$inet(r1, &(0x7f0000000300)={0x2, 0x4e21, @multicast1}, 0x10)
>>>> connect$inet(r1, &(0x7f0000000140)={0x2, 0x1000004e21, @loopback}, 0x10)
>>>> recvmmsg(r1, &(0x7f0000001e40)=[{{0x0, 0x0, &(0x7f0000000100)=[{&(0x7f00000005c0)=""/88, 0x58}], 0x1}}], 0x1, 0x40000000, 0x0)
>>>> sendto$inet(r1, &(0x7f0000000000)="e2f7ad5b661c761edf", 0x9, 0x8080, 0x0, 0x0)
>>>> r2 = fcntl$dupfd(r1, 0x0, r0)
>>>> connect$unix(r2, &(0x7f00000001c0)=@file={0x0, './file0\x00'}, 0x6e)
>>>>
>>>> C repro link: https://syzkaller.appspot.com/text?tag=ReproC&x=14db474f800000
>>>>
>>>> This is because when tcp_connect_init call tcp_write_queue_purge, it will
>>>> kfree all the skb in the write_queue, but the sk->sk_send_head forget to set NULL,
>>>> then tcp_write_xmit try to send skb, which has freed in tcp_write_queue_purge, UAF happens.
>>>>
>>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>>> ---
>>>>  include/net/tcp.h | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>>>> index bf8a0dae977a..8f8aace28cf8 100644
>>>> --- a/include/net/tcp.h
>>>> +++ b/include/net/tcp.h
>>>> @@ -1457,6 +1457,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
>>>>  
>>>>  	while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
>>>>  		sk_wmem_free_skb(sk, skb);
>>>> +	sk->sk_send_head = NULL;
>>>>  	sk_mem_reclaim(sk);
>>>>  	tcp_clear_all_retrans_hints(tcp_sk(sk));
>>>>  	inet_csk(sk)->icsk_backoff = 0;
>>>>
>>>
>>> This is strange, because tcp_init_send_head() is called from tcp_disconnect()
>>> which is the syzkaller way to trigger this kind of bugs.
>>>
>>
>> syzkaller reproduce program duplicate one socket that have multiple skb in write queue,
>> and new socket have purged skb but original socket still try to send skb. In this program,
>> it does not call tcp_disconnect?
> 
> 
> It does call tcp_disconnect(), by one of the connect() call.

yes, __inet_stream_connect will call tcp_disconnect when sa_family == AF_UNSPEC, in c repro if it
passes sa_family with AF_INET it won't call disconnect, and then sk_send_head won't be NULL when tcp_connect.

#define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */

...
  case 2:
    *(uint16_t*)0x20e68000 = 2;
    *(uint16_t*)0x20e68002 = htobe16(0x4e23);
    *(uint8_t*)0x20e68004 = 0xac;
    *(uint8_t*)0x20e68005 = 0x14;
    *(uint8_t*)0x20e68006 = 0x14;
    *(uint8_t*)0x20e68007 = 0xaa;
    *(uint8_t*)0x20e68008 = 0;
    *(uint8_t*)0x20e68009 = 0;
    *(uint8_t*)0x20e6800a = 0;
    *(uint8_t*)0x20e6800b = 0;
    *(uint8_t*)0x20e6800c = 0;
    *(uint8_t*)0x20e6800d = 0;
    *(uint8_t*)0x20e6800e = 0;
    *(uint8_t*)0x20e6800f = 0;
    syscall(__NR_sendto, r[0], 0x20a88f88, 0xfffffffffffffe6e, 0x20000000,       //with fastopen, it will call __inet_stream_connect
            0x20e68000, 0x10);
    break;
  case 3:
    memcpy((void*)0x20000140, "\x7f", 1);
    syscall(__NR_write, r[0], 0x20000140, 1);
    break;
  case 4:
    *(uint16_t*)0x200012c0 = 0;   //sa_family
    *(uint16_t*)0x200012c2 = 0;
    *(uint32_t*)0x200012c4 = 0;
    *(uint32_t*)0x200012c8 = 0;
    syscall(__NR_connect, r[0], 0x200012c0, 0x80);
    break;
  case 5:
    *(uint16_t*)0x20000040 = 2;  //sa_family
    *(uint16_t*)0x20000042 = htobe16(0x4e23);
    *(uint8_t*)0x20000044 = 0xac;
    *(uint8_t*)0x20000045 = 0x14;
    *(uint8_t*)0x20000046 = 0x14;
    *(uint8_t*)0x20000047 = 0xaa;
    *(uint8_t*)0x20000048 = 0;
    *(uint8_t*)0x20000049 = 0;
    *(uint8_t*)0x2000004a = 0;
    *(uint8_t*)0x2000004b = 0;
    *(uint8_t*)0x2000004c = 0;
    *(uint8_t*)0x2000004d = 0;
    *(uint8_t*)0x2000004e = 0;
    *(uint8_t*)0x2000004f = 0;
    syscall(__NR_connect, r[0], 0x20000040, 0x10);
    break;
...

> 
> 
> .
> 


^ permalink raw reply

* Re: [PATCH net] net: hns: fix LED configuration for marvell phy
From: Andrew Lunn @ 2019-07-25  4:28 UTC (permalink / raw)
  To: liuyonglong
  Cc: David Miller, netdev, linux-kernel, linuxarm, salil.mehta,
	yisen.zhuang, shiju.jose
In-Reply-To: <72061222-411f-a58c-5873-ad873394cdb5@huawei.com>

On Thu, Jul 25, 2019 at 11:00:08AM +0800, liuyonglong wrote:
> > Revert "net: hns: fix LED configuration for marvell phy"
> > This reverts commit f4e5f775db5a4631300dccd0de5eafb50a77c131.
> >
> > Andrew Lunn says this should be handled another way.
> >
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 
> Hi Andrew:
> 
> I see this patch have been reverted, can you tell me the better way to do this?
> Thanks very much!

Please take a look at the work Matthias Kaehlcke is doing. It has not
got too far yet, but when it is complete, it should define a generic
way to configure PHY LEDs.

    Andrew

^ permalink raw reply

* Re: [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
From: Marcelo Ricardo Leitner @ 2019-07-25  3:45 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, davem, netfilter-devel, netdev
In-Reply-To: <9775e2da-78ce-95f8-c215-b35b464ea5a9@ucloud.cn>

On Thu, Jul 25, 2019 at 11:03:52AM +0800, wenxu wrote:
> 
> On 7/25/2019 7:51 AM, Marcelo Ricardo Leitner wrote:
> > On Thu, Jul 11, 2019 at 04:03:30PM +0800, wenxu@ucloud.cn wrote:
> >> From: wenxu <wenxu@ucloud.cn>
> >>
> >> The flow_cls_common_offload prio should be not zero
> >>
> >> It leads the invalid table prio in hw.
> >>
> >> # nft add table netdev firewall
> >> # nft add chain netdev firewall acl { type filter hook ingress device mlx_pf0vf0 priority - 300 \; }
> >> # nft add rule netdev firewall acl ip daddr 1.1.1.7 drop
> >> Error: Could not process rule: Invalid argument
> >>
> >> kernel log
> >> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22 (table prio: 65535, level: 0, size: 4194304)
> >>
> >> Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
> >> Signed-off-by: wenxu <wenxu@ucloud.cn>
> >> ---
> >>  net/netfilter/nf_tables_offload.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
> >> index 2c33028..01d8133 100644
> >> --- a/net/netfilter/nf_tables_offload.c
> >> +++ b/net/netfilter/nf_tables_offload.c
> >> @@ -7,6 +7,8 @@
> >>  #include <net/netfilter/nf_tables_offload.h>
> >>  #include <net/pkt_cls.h>
> >>  
> >> +#define FLOW_OFFLOAD_DEFAUT_PRIO 1U
> >> +
> >>  static struct nft_flow_rule *nft_flow_rule_alloc(int num_actions)
> >>  {
> >>  	struct nft_flow_rule *flow;
> >> @@ -107,6 +109,7 @@ static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
> >>  					struct netlink_ext_ack *extack)
> >>  {
> >>  	common->protocol = proto;
> >> +	common->prio = TC_H_MAKE(FLOW_OFFLOAD_DEFAUT_PRIO << 16, 0);
> > Note that tc semantics for this is to auto-generate a priority in such
> > cases, instead of using a default.
> >
> > @tc_new_tfilter():
> >         if (prio == 0) {
> >                 /* If no priority is provided by the user,
> >                  * we allocate one.
> >                  */
> >                 if (n->nlmsg_flags & NLM_F_CREATE) {
> >                         prio = TC_H_MAKE(0x80000000U, 0U);
> >                         prio_allocate = true;
> > ...
> >                 if (prio_allocate)
> >                         prio = tcf_auto_prio(tcf_chain_tp_prev(chain,
> >                                                                &chain_info));
> 
> Yes,The tc auto-generate a priority.  But if there is no pre
> tcf_proto, the priority is also set as a default.

After the first filter, there will be a tcf_proto. Please see the test below.

> 
> In nftables each rule no priortiy for each other. So It is enough to
> set a default value which is similar as the tc.

Yep, maybe it works for nftables. I'm just highlighting this because
it is reusing tc infrastructure and will expose a different behavior
to the user.  But if nftables already has this defined, that probably
takes precedence by now and all that is left to do is to make sure any
documentation on it is updated.  Pablo?

> 
> static inline u32 tcf_auto_prio(struct tcf_proto *tp)
> {
>     u32 first = TC_H_MAKE(0xC0000000U, 0U);
                              ^^^^  base default prio, 0xC0000 = 49152

> 
>     if (tp)
>         first = tp->prio - 1;
> 
>     return TC_H_MAJ(first);
> }

# tc qdisc add dev veth1 ingress
# tc filter add dev veth1 ingress proto ip flower src_mac ec:13:db:00:00:00 action drop
                                                           1st filter  --^^
# tc filter add dev veth1 ingress proto ip flower src_mac ec:13:db:00:00:01 action drop
                                                           2nd filter  --^^
# tc filter add dev veth1 ingress proto ip flower src_mac ec:13:db:00:00:02 action drop

With no 'prio X' parameter, it uses 0 as default, and when dumped:

# tc filter show dev veth1 ingress
filter protocol ip pref 49150 flower
filter protocol ip pref 49150 flower handle 0x1
  src_mac ec:13:db:00:00:02
  eth_type ipv4
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 40003 ref 1 bind 1

filter protocol ip pref 49151 flower
filter protocol ip pref 49151 flower handle 0x1
                        ^vv^^---- 2nd filter
  src_mac ec:13:db:00:00:01
  eth_type ipv4
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 40002 ref 1 bind 1

filter protocol ip pref 49152 flower
filter protocol ip pref 49152 flower handle 0x1
                        ^vv^^---- 1st filter
  src_mac ec:13:db:00:00:00
  eth_type ipv4
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 40001 ref 1 bind 1



^ permalink raw reply

* Re: Reminder: 99 open syzbot bugs in net subsystem
From: Theodore Y. Ts'o @ 2019-07-25  3:39 UTC (permalink / raw)
  To: David Miller
  Cc: ebiggers, eric.dumazet, dvyukov, netdev, fw, i.maximets, edumazet,
	dsahern, linux-kernel, syzkaller-bugs
In-Reply-To: <20190724.130928.1854327585456756387.davem@davemloft.net>

On Wed, Jul 24, 2019 at 01:09:28PM -0700, David Miller wrote:
> From: Eric Biggers <ebiggers@kernel.org>
> Date: Wed, 24 Jul 2019 11:37:12 -0700
> 
> > We can argue about what words to use to describe this situation, but
> > it doesn't change the situation itself.
> 
> And we should argue about those words because it matters to humans and
> effects how they feel, and humans ultimately fix these bugs.
> 
> So please stop with the hyperbole.

Perhaps it would be better to call them, "syzbot reports".  Not all
syzbot reports are bugs.  In fact, Dmitry has steadfastly refused to
add features which any basic bug-tracking system would have, claiming
that syzbot should not be a bug-tracking system, and something like
bugzilla should be forcibly imposed on all kernel developers.  So I
don't consider syzkaller reports as bugs --- they are just reports.

In order for developers to want to engage with "syzbot reports", we
need to reduce developer toil which syzbot imposes on developers, such
that it is a net benefit, instead of it being just a source of
annoying e-mails, some of which are actionable, and some of which are
noise.

In particular, asking developers to figure out which syzbot reports
should be closed, because developers found the problem independently,
and fixed it without hearing about from syzbot first, really isn't a
fair thing to ask.  Especially if we can automate away the problem.

If there is a reproducer, it should be possible to automatically
categorize the reproducer as a reliable reproducer or a flakey one.
If it is a reliable reproducer on version X, and it fails to be
reliably reproduce on version X+N, then it should be able to figure
out that it has been fixed, instead of requesting that a human confirm
it.  If you really want a human to look at it, now that syzkaller has
a bisection feature, it should be possible to use the reliable
reproducer to do a negative bisection search to report a candidate
fix.  This would significantly reproduce the developer toil imposed as
a tax on developers.  And if Dmitry doesn't want to auto-close those
reports that appear to be fixed already, at the very least they should
be down-prioritized on Eric's reports, so people who don't want to
waste their time on "bureaucracy" can do so.

Cheers,

						- Ted

P.S.  Another criteria I'd suggest down-prioritizing on is, "does it
require root privileges?"  After all, since root has so many different
ways of crashing a system already, and if we're all super-busy, we
need to prioritize which reports should be addressed first.

^ permalink raw reply

* [PATCH] ipip: validate header length in ipip_tunnel_xmit
From: Haishuang Yan @ 2019-07-25  3:07 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov; +Cc: netdev, linux-kernel, Haishuang Yan

We need the same checks introduced by commit cb9f1b783850
("ip: validate header length on virtual device xmit") for
ipip tunnel.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv4/ipip.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 43adfc1..2f01cf6 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -275,6 +275,9 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb,
 	const struct iphdr  *tiph = &tunnel->parms.iph;
 	u8 ipproto;
 
+	if (!pskb_inet_may_pull(skb))
+		goto tx_error;
+
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
 		ipproto = IPPROTO_IPIP;
-- 
1.8.3.1




^ permalink raw reply related

* [PATCH] ipip: validate header length in ipip_tunnel_xmit
From: Haishuang Yan @ 2019-07-25  3:07 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov; +Cc: netdev, linux-kernel, Haishuang Yan
In-Reply-To: <1564024076-13764-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

We need the same checks introduced by commit cb9f1b783850
("ip: validate header length on virtual device xmit") for
ipip tunnel.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv4/ipip.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 43adfc1..2f01cf6 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -275,6 +275,9 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb,
 	const struct iphdr  *tiph = &tunnel->parms.iph;
 	u8 ipproto;
 
+	if (!pskb_inet_may_pull(skb))
+		goto tx_error;
+
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
 		ipproto = IPPROTO_IPIP;
-- 
1.8.3.1




^ permalink raw reply related

* Re: [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
From: wenxu @ 2019-07-25  3:03 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: pablo, davem, netfilter-devel, netdev
In-Reply-To: <20190724235151.GB4063@localhost.localdomain>


On 7/25/2019 7:51 AM, Marcelo Ricardo Leitner wrote:
> On Thu, Jul 11, 2019 at 04:03:30PM +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> The flow_cls_common_offload prio should be not zero
>>
>> It leads the invalid table prio in hw.
>>
>> # nft add table netdev firewall
>> # nft add chain netdev firewall acl { type filter hook ingress device mlx_pf0vf0 priority - 300 \; }
>> # nft add rule netdev firewall acl ip daddr 1.1.1.7 drop
>> Error: Could not process rule: Invalid argument
>>
>> kernel log
>> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22 (table prio: 65535, level: 0, size: 4194304)
>>
>> Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>>  net/netfilter/nf_tables_offload.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
>> index 2c33028..01d8133 100644
>> --- a/net/netfilter/nf_tables_offload.c
>> +++ b/net/netfilter/nf_tables_offload.c
>> @@ -7,6 +7,8 @@
>>  #include <net/netfilter/nf_tables_offload.h>
>>  #include <net/pkt_cls.h>
>>  
>> +#define FLOW_OFFLOAD_DEFAUT_PRIO 1U
>> +
>>  static struct nft_flow_rule *nft_flow_rule_alloc(int num_actions)
>>  {
>>  	struct nft_flow_rule *flow;
>> @@ -107,6 +109,7 @@ static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
>>  					struct netlink_ext_ack *extack)
>>  {
>>  	common->protocol = proto;
>> +	common->prio = TC_H_MAKE(FLOW_OFFLOAD_DEFAUT_PRIO << 16, 0);
> Note that tc semantics for this is to auto-generate a priority in such
> cases, instead of using a default.
>
> @tc_new_tfilter():
>         if (prio == 0) {
>                 /* If no priority is provided by the user,
>                  * we allocate one.
>                  */
>                 if (n->nlmsg_flags & NLM_F_CREATE) {
>                         prio = TC_H_MAKE(0x80000000U, 0U);
>                         prio_allocate = true;
> ...
>                 if (prio_allocate)
>                         prio = tcf_auto_prio(tcf_chain_tp_prev(chain,
>                                                                &chain_info));

Yes,The tc auto-generate a priority.  But if there is no pre tcf_proto, the priority is also set as a default.

In nftables each rule no priortiy for each other. So It is enough to set a default value which is similar as the

tc.

static inline u32 tcf_auto_prio(struct tcf_proto *tp)
{
    u32 first = TC_H_MAKE(0xC0000000U, 0U);

    if (tp)
        first = tp->prio - 1;

    return TC_H_MAJ(first);
}


^ permalink raw reply

* Re: [PATCH mlx5-next] net/mlx5: Fix modify_cq_in alignment
From: Leon Romanovsky @ 2019-07-25  3:02 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: davem@davemloft.net, Jason Gunthorpe, Yishai Hadas,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	dledford@redhat.com, Edward Srouji
In-Reply-To: <5447fded90dfd133ef002177b77bfd3685bf8b42.camel@mellanox.com>

On Wed, Jul 24, 2019 at 08:56:08PM +0000, Saeed Mahameed wrote:
> On Tue, 2019-07-23 at 22:04 +0300, Leon Romanovsky wrote:
> > On Tue, Jul 23, 2019 at 11:28:50AM -0700, David Miller wrote:
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Date: Tue, 23 Jul 2019 10:12:55 +0300
> > >
> > > > From: Edward Srouji <edwards@mellanox.com>
> > > >
> > > > Fix modify_cq_in alignment to match the device specification.
> > > > After this fix the 'cq_umem_valid' field will be in the right
> > > > offset.
> > > >
> > > > Cc: <stable@vger.kernel.org> # 4.19
> > > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID
> > > > bits")
>
> Leon, I applied this patch to my tree, it got marked for -stable 4.20
> and not 4.19, i checked manually and indeed the offending patch came to
> light only on 4.20

Thanks

>

^ permalink raw reply

* Re: [PATCH net] net: hns: fix LED configuration for marvell phy
From: liuyonglong @ 2019-07-25  3:00 UTC (permalink / raw)
  To: David Miller, Andrew Lunn
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang,
	shiju.jose
In-Reply-To: <20190722.181906.2225538844348045066.davem@davemloft.net>

> Revert "net: hns: fix LED configuration for marvell phy"
> This reverts commit f4e5f775db5a4631300dccd0de5eafb50a77c131.
>
> Andrew Lunn says this should be handled another way.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>


Hi Andrew:

I see this patch have been reverted, can you tell me the better way to do this?
Thanks very much!

On 2019/7/23 9:19, David Miller wrote:
> From: Yonglong Liu <liuyonglong@huawei.com>
> Date: Mon, 22 Jul 2019 13:59:12 +0800
> 
>> Since commit(net: phy: marvell: change default m88e1510 LED configuration),
>> the active LED of Hip07 devices is always off, because Hip07 just
>> use 2 LEDs.
>> This patch adds a phy_register_fixup_for_uid() for m88e1510 to
>> correct the LED configuration.
>>
>> Fixes: 077772468ec1 ("net: phy: marvell: change default m88e1510 LED configuration")
>> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
>> Reviewed-by: linyunsheng <linyunsheng@huawei.com>
> 
> Applied and queued up for -stable.
> 
> .
> 


^ permalink raw reply

* Re: [PATCH] rtlwifi: remove unneeded function _rtl_dump_channel_map()
From: Pkshih @ 2019-07-25  2:50 UTC (permalink / raw)
  To: yuehaibing@huawei.com, kvalo@codeaurora.org
  Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190724141020.58800-1-yuehaibing@huawei.com>

On Wed, 2019-07-24 at 22:10 +0800, YueHaibing wrote:
> Now _rtl_dump_channel_map() does not do any actual
> thing using the channel. So remove it.
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  drivers/net/wireless/realtek/rtlwifi/regd.c | 18 ------------------
>  1 file changed, 18 deletions(-)
> 
> diff --git a/drivers/net/wireless/realtek/rtlwifi/regd.c
> b/drivers/net/wireless/realtek/rtlwifi/regd.c
> index 6ccb5b9..c10432c 100644
> --- a/drivers/net/wireless/realtek/rtlwifi/regd.c
> +++ b/drivers/net/wireless/realtek/rtlwifi/regd.c
> @@ -276,22 +276,6 @@ static void _rtl_reg_apply_world_flags(struct wiphy
> *wiphy,
>  	return;
>  }
>  
> -static void _rtl_dump_channel_map(struct wiphy *wiphy)
> -{
> -	enum nl80211_band band;
> -	struct ieee80211_supported_band *sband;
> -	struct ieee80211_channel *ch;
> -	unsigned int i;
> -
> -	for (band = 0; band < NUM_NL80211_BANDS; band++) {
> -		if (!wiphy->bands[band])
> -			continue;
> -		sband = wiphy->bands[band];
> -		for (i = 0; i < sband->n_channels; i++)
> -			ch = &sband->channels[i];
> -	}
> -}
> -
>  static int _rtl_reg_notifier_apply(struct wiphy *wiphy,
>  				   struct regulatory_request *request,
>  				   struct rtl_regulatory *reg)
> @@ -309,8 +293,6 @@ static int _rtl_reg_notifier_apply(struct wiphy *wiphy,
>  		break;
>  	}
>  
> -	_rtl_dump_channel_map(wiphy);
> -
>  	return 0;
>  }
>  

Acked-by: Ping-Ke Shih <pkshih@realtek.com>



^ permalink raw reply

* Re: [PATCH net-next v2 3/3] netlink: add validation of NLA_F_NESTED flag
From: David Ahern @ 2019-07-25  2:46 UTC (permalink / raw)
  To: Thomas Haller, Michal Kubecek, David S. Miller
  Cc: netdev, Johannes Berg, linux-kernel
In-Reply-To: <0fc58a4883f6656208b9250876e53d723919e342.camel@redhat.com>

On 7/23/19 1:57 AM, Thomas Haller wrote:
> Does this flag and strict validation really provide any value? Commonly a netlink message
> is a plain TLV blob, and the meaning depends entirely on the policy.

Strict checking enables kernel side filtering and other features that
require passing attributes as part of the dump request - like address
dumps in a specific namespace.

^ permalink raw reply

* Re: [PATCH net-next 06/11] net: hns3: modify firmware version display format
From: tanhuazhong @ 2019-07-25  2:34 UTC (permalink / raw)
  To: Saeed Mahameed, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	moyufeng@huawei.com
In-Reply-To: <4c4ce27c9a9372340c0e2b0f654b3fb9cd85b3e4.camel@mellanox.com>



On 2019/7/25 2:34, Saeed Mahameed wrote:
> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>> From: Yufeng Mo <moyufeng@huawei.com>
>>
>> This patch modifies firmware version display format in
>> hclge(vf)_cmd_init() and hns3_get_drvinfo(). Also, adds
>> some optimizations for firmware version display format.
>>
>> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
>> Signed-off-by: Peng Li <lipeng321@huawei.com>
>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>> ---
>>   drivers/net/ethernet/hisilicon/hns3/hnae3.h              |  9
>> +++++++++
>>   drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c       | 15
>> +++++++++++++--
>>   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c   | 10
>> +++++++++-
>>   drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 11
>> +++++++++--
>>   4 files changed, 40 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
>> b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
>> index 48c7b70..a4624db 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
>> @@ -179,6 +179,15 @@ struct hnae3_vector_info {
>>   #define HNAE3_RING_GL_RX 0
>>   #define HNAE3_RING_GL_TX 1
>>   
>> +#define HNAE3_FW_VERSION_BYTE3_SHIFT	24
>> +#define HNAE3_FW_VERSION_BYTE3_MASK	GENMASK(31, 24)
>> +#define HNAE3_FW_VERSION_BYTE2_SHIFT	16
>> +#define HNAE3_FW_VERSION_BYTE2_MASK	GENMASK(23, 16)
>> +#define HNAE3_FW_VERSION_BYTE1_SHIFT	8
>> +#define HNAE3_FW_VERSION_BYTE1_MASK	GENMASK(15, 8)
>> +#define HNAE3_FW_VERSION_BYTE0_SHIFT	0
>> +#define HNAE3_FW_VERSION_BYTE0_MASK	GENMASK(7, 0)
>> +
>>   struct hnae3_ring_chain_node {
>>   	struct hnae3_ring_chain_node *next;
>>   	u32 tqp_index;
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> index 5bff98a..e71c92b 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> @@ -527,6 +527,7 @@ static void hns3_get_drvinfo(struct net_device
>> *netdev,
>>   {
>>   	struct hns3_nic_priv *priv = netdev_priv(netdev);
>>   	struct hnae3_handle *h = priv->ae_handle;
>> +	u32 fw_version;
>>   
>>   	if (!h->ae_algo->ops->get_fw_version) {
>>   		netdev_err(netdev, "could not get fw version!\n");
>> @@ -545,8 +546,18 @@ static void hns3_get_drvinfo(struct net_device
>> *netdev,
>>   		sizeof(drvinfo->bus_info));
>>   	drvinfo->bus_info[ETHTOOL_BUSINFO_LEN - 1] = '\0';
>>   
>> -	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
>> "0x%08x",
>> -		 priv->ae_handle->ae_algo->ops->get_fw_version(h));
>> +	fw_version = priv->ae_handle->ae_algo->ops->get_fw_version(h);
>> +
>> +	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
>> +		 "%lu.%lu.%lu.%lu",
>> +		 hnae3_get_field(fw_version,
>> HNAE3_FW_VERSION_BYTE3_MASK,
>> +				 HNAE3_FW_VERSION_BYTE3_SHIFT),
>> +		 hnae3_get_field(fw_version,
>> HNAE3_FW_VERSION_BYTE2_MASK,
>> +				 HNAE3_FW_VERSION_BYTE2_SHIFT),
>> +		 hnae3_get_field(fw_version,
>> HNAE3_FW_VERSION_BYTE1_MASK,
>> +				 HNAE3_FW_VERSION_BYTE1_SHIFT),
>> +		 hnae3_get_field(fw_version,
>> HNAE3_FW_VERSION_BYTE0_MASK,
>> +				 HNAE3_FW_VERSION_BYTE0_SHIFT));
>>   }
>>   
>>   static u32 hns3_get_link(struct net_device *netdev)
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>> index 22f6acd..c2320bf 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>> @@ -419,7 +419,15 @@ int hclge_cmd_init(struct hclge_dev *hdev)
>>   	}
>>   	hdev->fw_version = version;
>>   
>> -	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n",
>> version);
>> +	pr_info_once("The firmware version is %lu.%lu.%lu.%lu\n",
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE3_MASK,
>> +				     HNAE3_FW_VERSION_BYTE3_SHIFT),
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE2_MASK,
>> +				     HNAE3_FW_VERSION_BYTE2_SHIFT),
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE1_MASK,
>> +				     HNAE3_FW_VERSION_BYTE1_SHIFT),
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE0_MASK,
>> +				     HNAE3_FW_VERSION_BYTE0_SHIFT));
>>   
> 
> Device name/string will not be printed now, what happens if i have
> multiple devices ? at least print the device name as it was before
>
Since on each board we only have one firmware, the firmware
version is same per device, and will not change when running.
So pr_info_once() looks good for this case.

BTW, maybe we should change below print in the end of
hclge_init_ae_dev(), use dev_info() instead of pr_info(),
then we can know that which device has already initialized.
I will send other patch to do that, is it acceptable for you?

"pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME);"

Thanks.

>>   	return 0;
>>   
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
>> index 652b796..004125b 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
>> @@ -405,8 +405,15 @@ int hclgevf_cmd_init(struct hclgevf_dev *hdev)
>>   	}
>>   	hdev->fw_version = version;
>>   
>> -	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n",
>> version);
>> -
>> +	pr_info_once("The firmware version is %lu.%lu.%lu.%lu\n",
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE3_MASK,
>> +				     HNAE3_FW_VERSION_BYTE3_SHIFT),
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE2_MASK,
>> +				     HNAE3_FW_VERSION_BYTE2_SHIFT),
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE1_MASK,
>> +				     HNAE3_FW_VERSION_BYTE1_SHIFT),
>> +		     hnae3_get_field(version,
>> HNAE3_FW_VERSION_BYTE0_MASK,
>> +				     HNAE3_FW_VERSION_BYTE0_SHIFT));
>>   	return 0;
>>   
> 
> Same.
> 

Same
:)


^ permalink raw reply

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
From: Jakub Kicinski @ 2019-07-25  2:22 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jonathan.lemon,
	saeedm, maximmi, stephen, bruce.richardson, ciara.loftus, bpf,
	intel-wired-lan
In-Reply-To: <20190724051043.14348-4-kevin.laatz@intel.com>

On Wed, 24 Jul 2019 05:10:35 +0000, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
> 
> v3:
>   - Add helper function to do address/offset masking/addition
> ---
>  include/net/xdp_sock.h      | 17 ++++++++
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 18 +++++---
>  net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>  6 files changed, 170 insertions(+), 30 deletions(-)
> 
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 69796d264f06..738996c0f995 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -19,6 +19,7 @@ struct xsk_queue;
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> +	bool next_pg_contig;

IIRC accesses to xdp_umem_page case a lot of cache misses, so having
this structure grow from 16 to 24B is a little unfortunate :(
Can we try to steal lower bits of addr or dma? Or perhaps not pre
compute this info at all?

>  };
>  
>  struct xdp_umem_fq_reuse {
> @@ -48,6 +49,7 @@ struct xdp_umem {
>  	bool zc;
>  	spinlock_t xsk_list_lock;
>  	struct list_head xsk_list;
> +	u32 flags;
>  };
>  
>  struct xdp_sock {
> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>  
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return handle += offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>  {
> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>  {
>  }
>  
> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
> +					 u64 offset)
> +{
> +	return NULL;

	return 0?

> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>  
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..f8dc68fcdf78 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,9 @@
>  #define XDP_COPY	(1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>  
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>  
>  struct xdp_statistics {
> @@ -74,6 +78,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>  
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..952ca22103e9 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>  
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>  
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))

parens unnecessary, consider adding a define for known flags.

> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>  
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>  
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>  
>  	headroom = ALIGN(headroom, 64);
>  
> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>  
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 59b57d708697..b3ab653091c4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>  
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>  
> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_discard_addr);
>  
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}

Why handle this case gracefully? Real XSK use is the zero copy mode,
having extra code to make copy mode more permissive seems a little
counter productive IMHO.

>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	void *from_buf;
>  	u32 metalen;
>  	u64 addr;
>  	int err;
>  
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>  
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>  
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	else
> +		addr += offset;
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>  	u32 len = xdp->data_end - xdp->data;
>  	void *buffer;
>  	u64 addr;
> +	u64 offset = xs->umem->headroom;

reverse xmas tree, please

>  	int err;
>  
>  	spin_lock_bh(&xs->rx_lock);


^ permalink raw reply

* Re: [PATCH] net: sctp: fix memory leak in sctp_send_reset_streams
From: Marcelo Ricardo Leitner @ 2019-07-25  2:19 UTC (permalink / raw)
  To: Xin Long
  Cc: Neil Horman, Hillf Danton, linux-sctp, network dev, syzkaller,
	David S . Miller, LKML, syzkaller-bugs, syzbot, Vlad Yasevich,
	Eric Dumazet
In-Reply-To: <CADvbK_ddFyO2iz-QS3bHevHN7Q29VUS4joK3Kxam3Y4tEqHFKA@mail.gmail.com>

On Wed, Jul 24, 2019 at 03:56:40PM +0800, Xin Long wrote:
> On Sun, Jun 2, 2019 at 9:36 PM Xin Long <lucien.xin@gmail.com> wrote:
> >
> > On Sun, Jun 2, 2019 at 6:52 PM Neil Horman <nhorman@tuxdriver.com> wrote:
> > >
> > > On Sun, Jun 02, 2019 at 11:44:29AM +0800, Hillf Danton wrote:
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:    036e3431 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=153cff12a00000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=8f0f63a62bb5b13c
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=6ad9c3bd0a218a2ab41d
> > > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12561c86a00000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15b76fd8a00000
> > > >
> > > > executing program
> > > > executing program
> > > > executing program
> > > > executing program
> > > > executing program
> > > > BUG: memory leak
> > > > unreferenced object 0xffff888123894820 (size 32):
> > > >   comm "syz-executor045", pid 7267, jiffies 4294943559 (age 13.660s)
> > > >   hex dump (first 32 bytes):
> > > >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > > >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > > >   backtrace:
> > > >     [<00000000c7e71c69>] kmemleak_alloc_recursive
> > > > include/linux/kmemleak.h:55 [inline]
> > > >     [<00000000c7e71c69>] slab_post_alloc_hook mm/slab.h:439 [inline]
> > > >     [<00000000c7e71c69>] slab_alloc mm/slab.c:3326 [inline]
> > > >     [<00000000c7e71c69>] __do_kmalloc mm/slab.c:3658 [inline]
> > > >     [<00000000c7e71c69>] __kmalloc+0x161/0x2c0 mm/slab.c:3669
> > > >     [<000000003250ed8e>] kmalloc_array include/linux/slab.h:670 [inline]
> > > >     [<000000003250ed8e>] kcalloc include/linux/slab.h:681 [inline]
> > > >     [<000000003250ed8e>] sctp_send_reset_streams+0x1ab/0x5a0 net/sctp/stream.c:302
> > > >     [<00000000cd899c6e>] sctp_setsockopt_reset_streams net/sctp/socket.c:4314 [inline]
> > > >     [<00000000cd899c6e>] sctp_setsockopt net/sctp/socket.c:4765 [inline]
> > > >     [<00000000cd899c6e>] sctp_setsockopt+0xc23/0x2bf0 net/sctp/socket.c:4608
> > > >     [<00000000ff3a21a2>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130
> > > >     [<000000009eb87ae7>] __sys_setsockopt+0x98/0x120 net/socket.c:2078
> > > >     [<00000000e0ede6ca>] __do_sys_setsockopt net/socket.c:2089 [inline]
> > > >     [<00000000e0ede6ca>] __se_sys_setsockopt net/socket.c:2086 [inline]
> > > >     [<00000000e0ede6ca>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
> > > >     [<00000000c61155f5>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
> > > >     [<00000000e540958c>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >
> > > >
> > > > It was introduced in commit d570a59c5b5f ("sctp: only allow the out stream
> > > > reset when the stream outq is empty"), in orde to check stream outqs before
> > > > sending SCTP_STRRESET_IN_PROGRESS back to the peer of the stream. EAGAIN is
> > > > returned, however, without the nstr_list slab released, if any outq is found
> > > > to be non empty.
> > > >
> > > > Freeing the slab in question before bailing out fixes it.
> > > >
> > > > Fixes: d570a59c5b5f ("sctp: only allow the out stream reset when the stream outq is empty")
> > > > Reported-by: syzbot <syzbot+6ad9c3bd0a218a2ab41d@syzkaller.appspotmail.com>
> > > > Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > Tested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > Cc: Xin Long <lucien.xin@gmail.com>
> > > > Cc: Neil Horman <nhorman@tuxdriver.com>
> > > > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Signed-off-by: Hillf Danton <hdanton@sina.com>
> > > > ---
> > > > net/sctp/stream.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/net/sctp/stream.c b/net/sctp/stream.c
> > > > index 93ed078..d3e2f03 100644
> > > > --- a/net/sctp/stream.c
> > > > +++ b/net/sctp/stream.c
> > > > @@ -310,6 +310,7 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
> > > >
> > > >       if (out && !sctp_stream_outq_is_empty(stream, str_nums, nstr_list)) {
> > > >               retval = -EAGAIN;
> > > > +             kfree(nstr_list);
> > > >               goto out;
> > > >       }
> > > >
> > > > --
> > > >
> > > >
> > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Reviewed-by: Xin Long <lucien.xin@gmail.com>
> This fix is not applied, pls resend it with:
> to = network dev <netdev@vger.kernel.org>
> cc = davem@davemloft.net
> to = linux-sctp@vger.kernel.org
> cc = Neil Horman <nhorman@tuxdriver.com>
> cc = Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Good catch, thanks Xin. I don't know what happened but I never got
this patch via netdev@, just the direct delivery. If it didn't reach
netdev@, that explains it.

  Marcelo

^ permalink raw reply

* Re: [PATCH 4.4 stable net] net: tcp: Fix use-after-free in tcp_write_xmit
From: maowenan @ 2019-07-25  2:06 UTC (permalink / raw)
  To: Eric Dumazet, davem, gregkh, netdev, linux-kernel
In-Reply-To: <be1aebb5-fee7-e079-d864-a2e4aa13007f@gmail.com>



On 2019/7/24 18:13, Eric Dumazet wrote:
> 
> 
> On 7/24/19 12:01 PM, Eric Dumazet wrote:
>>
>>
>> On 7/24/19 11:17 AM, Mao Wenan wrote:
>>> There is one report about tcp_write_xmit use-after-free with version 4.4.136:
>>
>> Current stable 4.4 is 4.4.186
>>
>> Can you check the bug is still there ?
>>
> 
> BTW, I tried the C repro and another bug showed up.
> 
> It looks like 4.4.186 misses other fixes :/

Do you have logs about this?
Is it the bugs followed up this UAF?

> 
> [  180.811610] skbuff: skb_under_panic: text:ffffffff825ec6ea len:156 put:84 head:ffff8837dd1f0990 data:ffff8837dd1f098c tail:0x98 end:0xc0 dev:ip6gre0
> [  180.825037] ------------[ cut here ]------------
> [  180.829688] kernel BUG at net/core/skbuff.c:104!
> [  180.834316] invalid opcode: 0000 [#1] SMP KASAN
> [  180.839305] gsmi: Log Shutdown Reason 0x03
> [  180.843426] Modules linked in: ipip bonding bridge stp llc tun veth w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd ip_gre mlx4_en ib_uverbs mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_core
> [  180.862052] CPU: 22 PID: 1619 Comm: kworker/22:1 Not tainted 4.4.186-smp-DEV #41
> [  180.869475] Hardware name: Intel BIOS 2.56.0 10/19/2018
> [  180.876463] Workqueue: ipv6_addrconf addrconf_dad_work
> [  180.881658] task: ffff8837f1f59d80 ti: ffff8837eeeb8000 task.ti: ffff8837eeeb8000
> [  180.889171] RIP: 0010:[<ffffffff821ef26f>]  [<ffffffff821ef26f>] skb_panic+0x14f/0x210
> [  180.897162] RSP: 0018:ffff8837eeebf4b8  EFLAGS: 00010282
> [  180.902504] RAX: 0000000000000088 RBX: ffff8837eeeeb600 RCX: 0000000000000000
> [  180.909645] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff83508c00
> [  180.916854] RBP: ffff8837eeebf520 R08: 0000000000000016 R09: 0000000000000000
> [  180.924029] R10: ffff881fc8abf038 R11: 0000000000000007 R12: ffff881fc8abe720
> [  180.931213] R13: ffffffff82aa9e80 R14: 00000000000000c0 R15: 0000000000000098
> [  180.938390] FS:  0000000000000000(0000) GS:ffff8837ff280000(0000) knlGS:0000000000000000
> [  180.946519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  180.952290] CR2: 00007f519426f530 CR3: 00000037d37f2000 CR4: 0000000000160670
> [  180.959447] Stack:
> [  180.961458]  ffff8837dd1f098c 0000000000000098 00000000000000c0 ffff881fc8abe720
> [  180.968909]  ffffea00df747c00 ffff881fff404b40 ffff8837ff2a1a20 ffff8837eeebf5b8
> [  180.976371]  ffff8837eeeeb600 ffffffff825ec6ea 1ffff106fddd7eb6 ffff8837eeeeb600
> [  180.983848] Call Trace:
> [  180.986297]  [<ffffffff825ec6ea>] ? ip6gre_header+0xba/0xd50
> [  180.991962]  [<ffffffff821f0e01>] skb_push+0xc1/0x100
> [  180.997023]  [<ffffffff825ec6ea>] ip6gre_header+0xba/0xd50
> [  181.002519]  [<ffffffff8158dc16>] ? memcpy+0x36/0x40
> [  181.007509]  [<ffffffff825ec630>] ? ip6gre_changelink+0x6d0/0x6d0
> [  181.013629]  [<ffffffff82550741>] ? ndisc_constructor+0x5b1/0x770
> [  181.019728]  [<ffffffff82666861>] ? _raw_write_unlock_bh+0x41/0x50
> [  181.025924]  [<ffffffff8226540b>] ? __neigh_create+0xe6b/0x1670
> [  181.031851]  [<ffffffff8225817f>] neigh_connected_output+0x23f/0x480
> [  181.038219]  [<ffffffff824f61ec>] ip6_finish_output2+0x74c/0x1a90
> [  181.044324]  [<ffffffff810f1d33>] ? print_context_stack+0x73/0xf0
> [  181.050429]  [<ffffffff824f5aa0>] ? ip6_xmit+0x1700/0x1700
> [  181.055933]  [<ffffffff82304a28>] ? nf_hook_slow+0x118/0x1b0
> [  181.061617]  [<ffffffff82502d7a>] ip6_finish_output+0x2ba/0x580
> [  181.067546]  [<ffffffff82503179>] ip6_output+0x139/0x380
> [  181.072884]  [<ffffffff82503040>] ? ip6_finish_output+0x580/0x580
> [  181.079004]  [<ffffffff82502ac0>] ? ip6_fragment+0x31b0/0x31b0
> [  181.084852]  [<ffffffff82251b51>] ? dst_init+0x4b1/0x820
> [  181.090172]  [<ffffffff8158da45>] ? kasan_unpoison_shadow+0x35/0x50
> [  181.096437]  [<ffffffff8158da45>] ? kasan_unpoison_shadow+0x35/0x50
> [  181.102712]  [<ffffffff8254f3ca>] NF_HOOK_THRESH.constprop.22+0xca/0x180
> [  181.109421]  [<ffffffff8254f300>] ? ndisc_alloc_skb+0x340/0x340
> [  181.115338]  [<ffffffff8254d820>] ? compat_ipv6_setsockopt+0x180/0x180
> [  181.121874]  [<ffffffff8254fbc2>] ndisc_send_skb+0x742/0xd10
> [  181.127550]  [<ffffffff8254f480>] ? NF_HOOK_THRESH.constprop.22+0x180/0x180
> [  181.134516]  [<ffffffff821f2440>] ? skb_complete_tx_timestamp+0x280/0x280
> [  181.141311]  [<ffffffff8254e2b3>] ? ndisc_fill_addr_option+0x193/0x260
> [  181.147844]  [<ffffffff82553bd9>] ndisc_send_rs+0x179/0x2d0
> [  181.153426]  [<ffffffff8251e7df>] addrconf_dad_completed+0x41f/0x7c0
> [  181.159795]  [<ffffffff81297f78>] ? pick_next_entity+0x198/0x470
> [  181.165807]  [<ffffffff8251e3c0>] ? addrconf_rs_timer+0x4a0/0x4a0
> [  181.171918]  [<ffffffff81aab928>] ? find_next_bit+0x18/0x20
> [  181.177504]  [<ffffffff81a99ec9>] ? prandom_seed+0xd9/0x160
> [  181.183095]  [<ffffffff8251eef5>] addrconf_dad_work+0x375/0x9e0
> [  181.189024]  [<ffffffff8251eb80>] ? addrconf_dad_completed+0x7c0/0x7c0
> [  181.195576]  [<ffffffff81249d8f>] process_one_work+0x52f/0xf60
> [  181.201468]  [<ffffffff8124a89d>] worker_thread+0xdd/0xe80
> [  181.206977]  [<ffffffff8265cf0a>] ? __schedule+0x73a/0x16d0
> [  181.212550]  [<ffffffff8124a7c0>] ? process_one_work+0xf60/0xf60
> [  181.218572]  [<ffffffff8125a115>] kthread+0x205/0x2b0
> [  181.223633]  [<ffffffff81259f10>] ? kthread_worker_fn+0x4e0/0x4e0
> [  181.229743]  [<ffffffff81259f10>] ? kthread_worker_fn+0x4e0/0x4e0
> [  181.235834]  [<ffffffff8266726f>] ret_from_fork+0x3f/0x70
> [  181.241232]  [<ffffffff81259f10>] ? kthread_worker_fn+0x4e0/0x4e0
> 
> 
> .
> 


^ permalink raw reply

* Re: [PATCH net-next 08/11] net: hns3: add interrupt affinity support for misc interrupt
From: Yunsheng Lin @ 2019-07-25  2:05 UTC (permalink / raw)
  To: Saeed Mahameed, tanhuazhong@huawei.com, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <67b32cdc72c0be03622e78899ac518d807ca7b85.camel@mellanox.com>

On 2019/7/25 3:24, Saeed Mahameed wrote:
> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>
>> The misc interrupt is used to schedule the reset and mailbox
>> subtask, and a 1 sec timer is used to schedule the service
>> subtask, which does periodic work.
>>
>> This patch sets the above three subtask's affinity using the
>> misc interrupt' affinity.
>>
>> Also this patch setups a affinity notify for misc interrupt to
>> allow user to change the above three subtask's affinity.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> Signed-off-by: Peng Li <lipeng321@huawei.com>
>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>> ---
>>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 59
>> ++++++++++++++++++++--
>>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  4 ++
>>  2 files changed, 59 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> index f345095..fe45986 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> @@ -1270,6 +1270,12 @@ static int hclge_configure(struct hclge_dev
>> *hdev)
>>  
>>  	hclge_init_kdump_kernel_config(hdev);
>>  
>> +	/* Set the init affinity based on pci func number */
>> +	i = cpumask_weight(cpumask_of_node(dev_to_node(&hdev->pdev-
>>> dev)));
>> +	i = i ? PCI_FUNC(hdev->pdev->devfn) % i : 0;
>> +	cpumask_set_cpu(cpumask_local_spread(i, dev_to_node(&hdev-
>>> pdev->dev)),
>> +			&hdev->affinity_mask);
>> +
>>  	return ret;
>>  }
>>  
>> @@ -2502,14 +2508,16 @@ static void hclge_mbx_task_schedule(struct
>> hclge_dev *hdev)
>>  {
>>  	if (!test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state) &&
>>  	    !test_and_set_bit(HCLGE_STATE_MBX_SERVICE_SCHED, &hdev-
>>> state))
>> -		schedule_work(&hdev->mbx_service_task);
>> +		queue_work_on(cpumask_first(&hdev->affinity_mask),
>> system_wq,
>> +			      &hdev->mbx_service_task);
>>  }
>>  
>>  static void hclge_reset_task_schedule(struct hclge_dev *hdev)
>>  {
>>  	if (!test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
>>  	    !test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev-
>>> state))
>> -		schedule_work(&hdev->rst_service_task);
>> +		queue_work_on(cpumask_first(&hdev->affinity_mask),
>> system_wq,
>> +			      &hdev->rst_service_task);
>>  }
>>  
>>  static void hclge_task_schedule(struct hclge_dev *hdev)
>> @@ -2517,7 +2525,8 @@ static void hclge_task_schedule(struct
>> hclge_dev *hdev)
>>  	if (!test_bit(HCLGE_STATE_DOWN, &hdev->state) &&
>>  	    !test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
>>  	    !test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state))
>> -		(void)schedule_work(&hdev->service_task);
>> +		queue_work_on(cpumask_first(&hdev->affinity_mask),
>> system_wq,
>> +			      &hdev->service_task);
>>  }
>>  
>>  static int hclge_get_mac_link_status(struct hclge_dev *hdev)
>> @@ -2921,6 +2930,39 @@ static void hclge_get_misc_vector(struct
>> hclge_dev *hdev)
>>  	hdev->num_msi_used += 1;
>>  }
>>  
>> +static void hclge_irq_affinity_notify(struct irq_affinity_notify
>> *notify,
>> +				      const cpumask_t *mask)
>> +{
>> +	struct hclge_dev *hdev = container_of(notify, struct hclge_dev,
>> +					      affinity_notify);
>> +
>> +	cpumask_copy(&hdev->affinity_mask, mask);
>> +	del_timer_sync(&hdev->service_timer);
>> +	hdev->service_timer.expires = jiffies + HZ;
>> +	add_timer_on(&hdev->service_timer, cpumask_first(&hdev-
>>> affinity_mask));
>> +}
> 
> I don't see any relation between your misc irq vector and &hdev-
>> service_timer, to me this looks like an abuse of the irq affinity API
> to allow the user to move the service timer affinity.

Hi, thanks for reviewing.

hdev->service_timer is used to schedule the periodic work
queue hdev->service_task, we want all the management work
queue including hdev->service_task to bind to the same cpu
to improve cache and power efficiency, it is better to move
service timer affinity according to that.

The hdev->service_task is changed to delay work queue in
next patch " net: hns3: make hclge_service use delayed workqueue",
So the affinity in the timer of the delay work queue is automatically
set to the affinity of the delay work queue, we will move the
"make hclge_service use delayed workqueue" patch before the
"add interrupt affinity support for misc interrupt" patch, so
we do not have to set service timer affinity explicitly.

Also, There is there work queues(mbx_service_task, service_task,
rst_service_task) in the hns3 driver, we plan to combine them
to one or two workqueue to improve efficiency and readability.

> 
>> +
>> +static void hclge_irq_affinity_release(struct kref *ref)
>> +{
>> +}
>> +
>> +static void hclge_misc_affinity_setup(struct hclge_dev *hdev)
>> +{
>> +	irq_set_affinity_hint(hdev->misc_vector.vector_irq,
>> +			      &hdev->affinity_mask);
>> +
>> +	hdev->affinity_notify.notify = hclge_irq_affinity_notify;
>> +	hdev->affinity_notify.release = hclge_irq_affinity_release;
>> +	irq_set_affinity_notifier(hdev->misc_vector.vector_irq,
>> +				  &hdev->affinity_notify);
>> +}
>> +
>> +static void hclge_misc_affinity_teardown(struct hclge_dev *hdev)
>> +{
>> +	irq_set_affinity_notifier(hdev->misc_vector.vector_irq, NULL);
>> +	irq_set_affinity_hint(hdev->misc_vector.vector_irq, NULL);
>> +}
>> +
>>  static int hclge_misc_irq_init(struct hclge_dev *hdev)
>>  {
>>  	int ret;
>> @@ -6151,7 +6193,10 @@ static void hclge_set_timer_task(struct
>> hnae3_handle *handle, bool enable)
>>  	struct hclge_dev *hdev = vport->back;
>>  
>>  	if (enable) {
>> -		mod_timer(&hdev->service_timer, jiffies + HZ);
>> +		del_timer_sync(&hdev->service_timer);
>> +		hdev->service_timer.expires = jiffies + HZ;
>> +		add_timer_on(&hdev->service_timer,
>> +			     cpumask_first(&hdev->affinity_mask));
>>  	} else {
>>  		del_timer_sync(&hdev->service_timer);
>>  		cancel_work_sync(&hdev->service_task);
>> @@ -8809,6 +8854,11 @@ static int hclge_init_ae_dev(struct
>> hnae3_ae_dev *ae_dev)
>>  	INIT_WORK(&hdev->rst_service_task, hclge_reset_service_task);
>>  	INIT_WORK(&hdev->mbx_service_task, hclge_mailbox_service_task);
>>  
>> +	/* Setup affinity after service timer setup because
>> add_timer_on
>> +	 * is called in affinity notify.
>> +	 */
>> +	hclge_misc_affinity_setup(hdev);
>> +
>>  	hclge_clear_all_event_cause(hdev);
>>  	hclge_clear_resetting_state(hdev);
>>  
>> @@ -8970,6 +9020,7 @@ static void hclge_uninit_ae_dev(struct
>> hnae3_ae_dev *ae_dev)
>>  	struct hclge_dev *hdev = ae_dev->priv;
>>  	struct hclge_mac *mac = &hdev->hw.mac;
>>  
>> +	hclge_misc_affinity_teardown(hdev);
>>  	hclge_state_uninit(hdev);
>>  
>>  	if (mac->phydev)
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
>> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
>> index 6a12285..14df23c 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
>> @@ -864,6 +864,10 @@ struct hclge_dev {
>>  
>>  	DECLARE_KFIFO(mac_tnl_log, struct hclge_mac_tnl_stats,
>>  		      HCLGE_MAC_TNL_LOG_SIZE);
>> +
>> +	/* affinity mask and notify for misc interrupt */
>> +	cpumask_t affinity_mask;
>> +	struct irq_affinity_notify affinity_notify;
>>  };
>>  
>>  /* VPort level vlan tag configuration for TX direction */


^ permalink raw reply

* Re: [PATCH net-next 04/11] net: hns3: fix mis-counting IRQ vector numbers issue
From: tanhuazhong @ 2019-07-25  2:04 UTC (permalink / raw)
  To: Saeed Mahameed, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	liuyonglong@huawei.com
In-Reply-To: <ad63b46dfb7e36d63d95866a023ef181af40aa76.camel@mellanox.com>



On 2019/7/25 2:28, Saeed Mahameed wrote:
> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>> From: Yonglong Liu <liuyonglong@huawei.com>
>>
>> The num_msi_left means the vector numbers of NIC, but if the
>> PF supported RoCE, it contains the vector numbers of NIC and
>> RoCE(Not expected).
>>
>> This may cause interrupts lost in some case, because of the
>> NIC module used the vector resources which belongs to RoCE.
>>
>> This patch corrects the value of num_msi_left to be equals to
>> the vector numbers of NIC, and adjust the default tqp numbers
>> according to the value of num_msi_left.
>>
>> Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine &
>> Compatibility Layer Support")
>> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
>> Signed-off-by: Peng Li <lipeng321@huawei.com>
>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>> ---
>>   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   |  5 ++++-
>>   drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 12
>> ++++++++++--
>>   2 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> index 3c64d70..a59d13f 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> @@ -1470,13 +1470,16 @@ static int hclge_vport_setup(struct
>> hclge_vport *vport, u16 num_tqps)
>>   {
>>   	struct hnae3_handle *nic = &vport->nic;
>>   	struct hclge_dev *hdev = vport->back;
>> +	u16 alloc_tqps;
>>   	int ret;
>>   
>>   	nic->pdev = hdev->pdev;
>>   	nic->ae_algo = &ae_algo;
>>   	nic->numa_node_mask = hdev->numa_node_mask;
>>   
>> -	ret = hclge_knic_setup(vport, num_tqps,
>> +	alloc_tqps = min_t(u16, hdev->roce_base_msix_offset - 1,
> 
> 
> Why do you need the extra alloc_tqps ? just overwrite num_tqps, the
> original value is not needed afterwards.
> 

Yes, using num_tqps is better.
I will remove the extra alloc_tqps in V2.
Thanks.

>> num_tqps);
>> +
>> +	ret = hclge_knic_setup(vport, alloc_tqps,
>>   			       hdev->num_tx_desc, hdev->num_rx_desc);
>>   	if (ret)
>>   		dev_err(&hdev->pdev->dev, "knic setup failed %d\n",
>> ret);
>>


^ permalink raw reply

* [PATCH] net: mscc: ocelot: null check devm_kcalloc
From: Navid Emamdoost @ 2019-07-25  1:56 UTC (permalink / raw)
  Cc: emamd001, kjlu, smccaman, secalert, Navid Emamdoost,
	Alexandre Belloni, Microchip Linux Driver Support,
	David S. Miller, netdev, linux-kernel

devm_kcalloc may fail and return NULL. Added the null check.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
---
 drivers/net/ethernet/mscc/ocelot_board.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mscc/ocelot_board.c b/drivers/net/ethernet/mscc/ocelot_board.c
index 58bde1a9eacb..52377cfdc31a 100644
--- a/drivers/net/ethernet/mscc/ocelot_board.c
+++ b/drivers/net/ethernet/mscc/ocelot_board.c
@@ -257,6 +257,8 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
 
 	ocelot->ports = devm_kcalloc(&pdev->dev, ocelot->num_phys_ports,
 				     sizeof(struct ocelot_port *), GFP_KERNEL);
+	if (!ocelot->ports)
+		return -ENOMEM;
 
 	INIT_LIST_HEAD(&ocelot->multicast);
 	ocelot_init(ocelot);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 2/2] arm: Add support for function error injection
From: Leo Yan @ 2019-07-25  1:48 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	netdev, bpf, Masami Hiramatsu, Justin He
In-Reply-To: <20190716111301.1855-3-leo.yan@linaro.org>

Hi Russell,

On Tue, Jul 16, 2019 at 07:13:01PM +0800, Leo Yan wrote:
> This patch implement regs_set_return_value() and
> override_function_with_return() to support function error injection
> for arm.
> 
> In the exception flow, we can update pt_regs::ARM_pc with
> pt_regs::ARM_lr so that can override the probed function return.

Gentle ping.

> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  arch/arm/Kconfig                       |  1 +
>  arch/arm/include/asm/error-injection.h | 13 +++++++++++++
>  arch/arm/include/asm/ptrace.h          |  5 +++++
>  arch/arm/lib/Makefile                  |  2 ++
>  arch/arm/lib/error-inject.c            | 19 +++++++++++++++++++
>  5 files changed, 40 insertions(+)
>  create mode 100644 arch/arm/include/asm/error-injection.h
>  create mode 100644 arch/arm/lib/error-inject.c
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 8869742a85df..f7932a5e29ea 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -74,6 +74,7 @@ config ARM
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
>  	select HAVE_EXIT_THREAD
>  	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
> +	select HAVE_FUNCTION_ERROR_INJECTION if !THUMB2_KERNEL
>  	select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
>  	select HAVE_FUNCTION_TRACER if !XIP_KERNEL
>  	select HAVE_GCC_PLUGINS
> diff --git a/arch/arm/include/asm/error-injection.h b/arch/arm/include/asm/error-injection.h
> new file mode 100644
> index 000000000000..da057e8ed224
> --- /dev/null
> +++ b/arch/arm/include/asm/error-injection.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#ifndef __ASM_ERROR_INJECTION_H_
> +#define __ASM_ERROR_INJECTION_H_
> +
> +#include <linux/compiler.h>
> +#include <linux/linkage.h>
> +#include <asm/ptrace.h>
> +#include <asm-generic/error-injection.h>
> +
> +void override_function_with_return(struct pt_regs *regs);
> +
> +#endif /* __ASM_ERROR_INJECTION_H_ */
> diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
> index 91d6b7856be4..3b41f37b361a 100644
> --- a/arch/arm/include/asm/ptrace.h
> +++ b/arch/arm/include/asm/ptrace.h
> @@ -89,6 +89,11 @@ static inline long regs_return_value(struct pt_regs *regs)
>  	return regs->ARM_r0;
>  }
>  
> +static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
> +{
> +	regs->ARM_r0 = rc;
> +}
> +
>  #define instruction_pointer(regs)	(regs)->ARM_pc
>  
>  #ifdef CONFIG_THUMB2_KERNEL
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index 0bff0176db2c..d3d7430ecd76 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -43,3 +43,5 @@ ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
>    CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
>    obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
>  endif
> +
> +obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> diff --git a/arch/arm/lib/error-inject.c b/arch/arm/lib/error-inject.c
> new file mode 100644
> index 000000000000..96319d017114
> --- /dev/null
> +++ b/arch/arm/lib/error-inject.c
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/error-injection.h>
> +#include <linux/kprobes.h>
> +
> +void override_function_with_return(struct pt_regs *regs)
> +{
> +	/*
> +	 * 'regs' represents the state on entry of a predefined function in
> +	 * the kernel/module and which is captured on a kprobe.
> +	 *
> +	 * 'regs->ARM_lr' contains the the link register for the probed
> +	 * function and assign it to 'regs->ARM_pc', so when kprobe returns
> +	 * back from exception it will override the end of probed function
> +	 * and drirectly return to the predefined function's caller.
> +	 */
> +	regs->ARM_pc = regs->ARM_lr;
> +}
> +NOKPROBE_SYMBOL(override_function_with_return);
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH 1/2] arm64: Add support for function error injection
From: Leo Yan @ 2019-07-25  1:42 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	netdev, bpf, Masami Hiramatsu, Justin He
In-Reply-To: <20190716111301.1855-2-leo.yan@linaro.org>

On Tue, Jul 16, 2019 at 07:13:00PM +0800, Leo Yan wrote:
> This patch implement regs_set_return_value() and
> override_function_with_return() to support function error injection
> for arm64.
> 
> In the exception flow, arm64's general register x30 contains the value
> for the link register; so we can just update pt_regs::pc with it rather
> than redirecting execution to a dummy function that returns.
> 
> This patch is heavily inspired by the commit 7cd01b08d35f ("powerpc:
> Add support for function error injection").
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>

Catalin, Will:  Gentle ping ...

> ---
>  arch/arm64/Kconfig                       |  1 +
>  arch/arm64/include/asm/error-injection.h | 13 +++++++++++++
>  arch/arm64/include/asm/ptrace.h          |  5 +++++
>  arch/arm64/lib/Makefile                  |  2 ++
>  arch/arm64/lib/error-inject.c            | 19 +++++++++++++++++++
>  5 files changed, 40 insertions(+)
>  create mode 100644 arch/arm64/include/asm/error-injection.h
>  create mode 100644 arch/arm64/lib/error-inject.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 697ea0510729..a6d9e622977d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -142,6 +142,7 @@ config ARM64
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
>  	select HAVE_FTRACE_MCOUNT_RECORD
>  	select HAVE_FUNCTION_TRACER
> +	select HAVE_FUNCTION_ERROR_INJECTION
>  	select HAVE_FUNCTION_GRAPH_TRACER
>  	select HAVE_GCC_PLUGINS
>  	select HAVE_HW_BREAKPOINT if PERF_EVENTS
> diff --git a/arch/arm64/include/asm/error-injection.h b/arch/arm64/include/asm/error-injection.h
> new file mode 100644
> index 000000000000..da057e8ed224
> --- /dev/null
> +++ b/arch/arm64/include/asm/error-injection.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#ifndef __ASM_ERROR_INJECTION_H_
> +#define __ASM_ERROR_INJECTION_H_
> +
> +#include <linux/compiler.h>
> +#include <linux/linkage.h>
> +#include <asm/ptrace.h>
> +#include <asm-generic/error-injection.h>
> +
> +void override_function_with_return(struct pt_regs *regs);
> +
> +#endif /* __ASM_ERROR_INJECTION_H_ */
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index dad858b6adc6..3aafbbe218a2 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -294,6 +294,11 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
>  	return regs->regs[0];
>  }
>  
> +static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
> +{
> +	regs->regs[0] = rc;
> +}
> +
>  /**
>   * regs_get_kernel_argument() - get Nth function argument in kernel
>   * @regs:	pt_regs of that context
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index 33c2a4abda04..f182ccb0438e 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -33,3 +33,5 @@ UBSAN_SANITIZE_atomic_ll_sc.o	:= n
>  lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
>  
>  obj-$(CONFIG_CRC32) += crc32.o
> +
> +obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> diff --git a/arch/arm64/lib/error-inject.c b/arch/arm64/lib/error-inject.c
> new file mode 100644
> index 000000000000..35661c2de4b0
> --- /dev/null
> +++ b/arch/arm64/lib/error-inject.c
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/error-injection.h>
> +#include <linux/kprobes.h>
> +
> +void override_function_with_return(struct pt_regs *regs)
> +{
> +	/*
> +	 * 'regs' represents the state on entry of a predefined function in
> +	 * the kernel/module and which is captured on a kprobe.
> +	 *
> +	 * 'regs->regs[30]' contains the the link register for the probed
> +	 * function and assign it to 'regs->pc', so when kprobe returns
> +	 * back from exception it will override the end of probed function
> +	 * and drirectly return to the predefined function's caller.
> +	 */
> +	regs->pc = regs->regs[30];
> +}
> +NOKPROBE_SYMBOL(override_function_with_return);
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH] carl9170: remove set but not used variable 'udev'
From: Yuehaibing @ 2019-07-25  1:40 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Kalle Valo, linux-wireless, Netdev, kernel-janitors, linux-kernel,
	Hulk Robot
In-Reply-To: <CAAd0S9BvTfRyUVkQzcczyNkU_oeU5hNdK3KVQzLsU21b4JGNTQ@mail.gmail.com>

On 2019/7/25 3:42, Christian Lamparter wrote:
> On Wed, Jul 24, 2019 at 3:48 AM YueHaibing <yuehaibing@huawei.com> wrote:
>>
>> Fixes gcc '-Wunused-but-set-variable' warning:
>>
>> drivers/net/wireless/ath/carl9170/usb.c: In function 'carl9170_usb_disconnect':
>> drivers/net/wireless/ath/carl9170/usb.c:1110:21: warning:
>>  variable 'udev' set but not used [-Wunused-but-set-variable]
>>
>> It is not used, so can be removed.
>>
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>> ---
> Isn't this the same patch you sent earlier:
> 
> https://patchwork.kernel.org/patch/11027909/
> 
>>From what I can tell, it's the same but with an extra [-next], I
> remember that I've acked that one
> but your patch now does not have it? Is this an oversight, because I'm
> the maintainer for this
> driver. So, in my opinion at least the "ack" should have some value
> and shouldn't be "ignored".
> 
> Look, from what I know, Kalle is not ignoring you, It's just that
> carl9170 is no longer top priority.
> So please be patient. As long as its queued in the patchwork it will
> get considered.

Thank you for reminder. I forget the previous patch,and our CI robot
report it again, So I do it again, sorry for confusion.

Just pls drop this and use previous one.

> 
> Cheers,
> Christian
> 
>>  drivers/net/wireless/ath/carl9170/usb.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/carl9170/usb.c b/drivers/net/wireless/ath/carl9170/usb.c
>> index 99f1897a775d..486957a04bd1 100644
>> --- a/drivers/net/wireless/ath/carl9170/usb.c
>> +++ b/drivers/net/wireless/ath/carl9170/usb.c
>> @@ -1107,12 +1107,10 @@ static int carl9170_usb_probe(struct usb_interface *intf,
>>  static void carl9170_usb_disconnect(struct usb_interface *intf)
>>  {
>>         struct ar9170 *ar = usb_get_intfdata(intf);
>> -       struct usb_device *udev;
>>
>>         if (WARN_ON(!ar))
>>                 return;
>>
>> -       udev = ar->udev;
>>         wait_for_completion(&ar->fw_load_wait);
>>
>>         if (IS_INITIALIZED(ar)) {
>>
>>
>>
> 
> .
> 


^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2019-07-25  0:58 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Wen Yang,
	Sean Nyekjaer

[-- Attachment #1: Type: text/plain, Size: 1294 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/can/flexcan.c

between commit:

  e9f2a856e102 ("can: flexcan: fix an use-after-free in flexcan_setup_stop_mode()")

from the net tree and commit:

  915f9666421c ("can: flexcan: add support for DT property 'wakeup-source'")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/can/flexcan.c
index fcec8bcb53d6,09d8e623dcf6..000000000000
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@@ -1473,9 -1473,10 +1491,12 @@@ static int flexcan_setup_stop_mode(stru
  
  	device_set_wakeup_capable(&pdev->dev, true);
  
+ 	if (of_property_read_bool(np, "wakeup-source"))
+ 		device_set_wakeup_enable(&pdev->dev, true);
+ 
 -	return 0;
 +out_put_node:
 +	of_node_put(gpr_np);
 +	return ret;
  }
  
  static const struct of_device_id flexcan_of_match[] = {

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox