Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: BUG: spinlock bad magic in tun_do_read
From: Cong Wang @ 2018-05-08  5:54 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, Eric Dumazet, Jason Wang, LKML, Michael S. Tsirkin,
	Linux Kernel Network Developers, peterpenkov96, Sabrina Dubroca,
	syzkaller-bugs
In-Reply-To: <0000000000003f06aa056bab0943@google.com>

On Mon, May 7, 2018 at 10:27 PM, syzbot
<syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    75bc37fefc44 Linux 4.17-rc4
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1162c697800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=31f4b3733894ef79
> dashboard link: https://syzkaller.appspot.com/bug?extid=e8b902c3c3fadf0a9dba
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> userspace arch: i386
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=172e4c97800000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com
>
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> IPVS: ftp: loaded support on port[0] = 21
> BUG: spinlock bad magic on CPU#0, syz-executor0/4586
>  lock: 0xffff8801ae8928c8, .magic: 00000000, .owner: <none>/-1, .owner_cpu:
> 0
> CPU: 0 PID: 4586 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #62
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  spin_dump+0x160/0x169 kernel/locking/spinlock_debug.c:67
>  spin_bug kernel/locking/spinlock_debug.c:75 [inline]
>  debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
>  do_raw_spin_lock.cold.3+0x37/0x3c kernel/locking/spinlock_debug.c:112
>  __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
>  _raw_spin_lock+0x32/0x40 kernel/locking/spinlock.c:144
>  spin_lock include/linux/spinlock.h:310 [inline]
>  ptr_ring_consume include/linux/ptr_ring.h:335 [inline]
>  tun_ring_recv drivers/net/tun.c:2143 [inline]

Yeah, we should return early before hitting this uninitialized ptr ring...
Something like:

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ef33950a45d9..638c87a95247 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2128,6 +2128,9 @@ static void *tun_ring_recv(struct tun_file
*tfile, int noblock, int *err)
        void *ptr = NULL;
        int error = 0;

+       if (!tfile->tx_ring.queue)
+               goto out;
+

Or, checking if tun is detached...


>  tun_do_read+0x18b1/0x29f0 drivers/net/tun.c:2182
>  tun_chr_read_iter+0xe5/0x1e0 drivers/net/tun.c:2214
>  call_read_iter include/linux/fs.h:1778 [inline]
>  new_sync_read fs/read_write.c:406 [inline]
>  __vfs_read+0x696/0xa50 fs/read_write.c:418
>  vfs_read+0x17f/0x3d0 fs/read_write.c:452
>  ksys_pread64+0x174/0x1a0 fs/read_write.c:626
>  __do_compat_sys_x86_pread arch/x86/ia32/sys_ia32.c:177 [inline]
>  __se_compat_sys_x86_pread arch/x86/ia32/sys_ia32.c:174 [inline]
>  __ia32_compat_sys_x86_pread+0xc4/0x130 arch/x86/ia32/sys_ia32.c:174
>  do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
>  do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
>  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
> RIP: 0023:0xf7fc0cb9
> RSP: 002b:00000000f7fbc0ac EFLAGS: 00000282 ORIG_RAX: 00000000000000b4
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000080
> RDX: 000000000000006e RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply related

* [PATCH net] stmmac: fix reception of 802.1ad Ethernet tagged frames
From: Elad Nachman @ 2018-05-08  6:01 UTC (permalink / raw)
  To: davem, netdev; +Cc: eladv6

stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before calling napi_gro_receive().

The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to __vlan_hwaccel_put_tag() .

This causes packets not to be passed to the VLAN slave if it was created with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).

This fix passes the protocol from the VLAN header into __vlan_hwaccel_put_tag()
instead of using the hard-coded value of ETH_P_8021Q.

Signed-off-by: Elad Nachman <eladn@gilat.com>

---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index b65e2d1..ced2d34 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3293,17 +3293,19 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
 {
-	struct ethhdr *ehdr;
+	struct vlan_ethhdr *veth;
 	u16 vlanid;
+	__be16 vlan_proto;
 
 	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
 	    NETIF_F_HW_VLAN_CTAG_RX &&
 	    !__vlan_get_tag(skb, &vlanid)) {
 		/* pop the vlan tag */
-		ehdr = (struct ethhdr *)skb->data;
-		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
+		veth = (struct vlan_ethhdr *)skb->data;
+		vlan_proto = veth->h_vlan_proto;
+		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
 		skb_pull(skb, VLAN_HLEN);
-		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
+		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC/PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4 version
From: Andre Naujoks @ 2018-05-08  6:03 UTC (permalink / raw)
  To: David S. Miller, netdev
In-Reply-To: <b49554c1-f988-b80f-30eb-9472f1f595b4@gmail.com>

On 11.04.2018 13:02, Andre Naujoks wrote:
> Hi.

Hi again.

Since it has been a month now, I'd like to send a little "ping" on this subject.

Is anything wrong with this? Or was it just bad timing?

Regards
  Andre

> 
> I was running into a problem, when trying to join multiple multicast groups
> on a single socket and thus binding to the any-address on said socket. I
> received traffic from multicast groups, I did not join on that socket and
> was at first surprised by that. After reading some old e-mails/threads,
> which came to the conclusion "It is, as it is."
> (e.g https://marc.info/?l=linux-kernel&m=115815686626791&w=2), I discovered
> the IPv4 socketoption IP_MULTICAST_ALL, which, when disabled, does exactly
> what I would expect from a socket by default.
> 
> I propose a socket option for IPv6, which does the same and has the same
> default as the IPv4 version. My first thought was, to just apply
> IP_MULTICAST_ALL to a ipv6 socket, but that would change the behavior of
> current applications and would probably be a big no-no.
> 
> Regards
>   Andre
> 
> 
> From 473653086c05a3de839c3504885053f6254c7bc5 Mon Sep 17 00:00:00 2001
> From: Andre Naujoks <nautsch2@gmail.com>
> Date: Wed, 11 Apr 2018 12:38:28 +0200
> Subject: [PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4
>  version
> 
> The socket option will be enabled by default to ensure current behaviour
> is not changed. This is the same for the IPv4 version.
> 
> A socket bound to in6addr_any and a specific port will receive all traffic
> on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
> one or more multicast groups were joined (using said socket) and only
> pass on multicast traffic from groups, which were explicitly joined via
> this socket.
> 
> Without this option disabled a socket (system even) joined to multiple
> multicast groups is very hard to get right. Filtering by destination
> address has to take place in user space to avoid receiving multicast
> traffic from other multicast groups, which might have traffic on the same
> port.
> 
> The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
> too, is not done to avoid changing the behaviour of current applications.
> 
> Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
> ---
>  include/linux/ipv6.h     |  3 ++-
>  include/uapi/linux/in6.h |  1 +
>  net/ipv6/af_inet6.c      |  1 +
>  net/ipv6/ipv6_sockglue.c | 11 +++++++++++
>  net/ipv6/mcast.c         |  2 +-
>  5 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 8415bf1a9776..495e834c1367 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -274,7 +274,8 @@ struct ipv6_pinfo {
>  						 */
>  				dontfrag:1,
>  				autoflowlabel:1,
> -				autoflowlabel_set:1;
> +				autoflowlabel_set:1,
> +				mc_all:1;
>  	__u8			min_hopcount;
>  	__u8			tclass;
>  	__be32			rcv_flowinfo;
> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> index ed291e55f024..71d82fe15b03 100644
> --- a/include/uapi/linux/in6.h
> +++ b/include/uapi/linux/in6.h
> @@ -177,6 +177,7 @@ struct in6_flowlabel_req {
>  #define IPV6_V6ONLY		26
>  #define IPV6_JOIN_ANYCAST	27
>  #define IPV6_LEAVE_ANYCAST	28
> +#define IPV6_MULTICAST_ALL	29
>  
>  /* IPV6_MTU_DISCOVER values */
>  #define IPV6_PMTUDISC_DONT		0
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 8da0b513f188..7844cd9d2f10 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -209,6 +209,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
>  	np->hop_limit	= -1;
>  	np->mcast_hops	= IPV6_DEFAULT_MCASTHOPS;
>  	np->mc_loop	= 1;
> +	np->mc_all	= 1;
>  	np->pmtudisc	= IPV6_PMTUDISC_WANT;
>  	np->repflow	= net->ipv6.sysctl.flowlabel_reflect;
>  	sk->sk_ipv6only	= net->ipv6.sysctl.bindv6only;
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index 4d780c7f0130..b2bc1942a2ee 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -664,6 +664,13 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
>  			retv = ipv6_sock_ac_drop(sk, mreq.ipv6mr_ifindex, &mreq.ipv6mr_acaddr);
>  		break;
>  	}
> +	case IPV6_MULTICAST_ALL:
> +		if (optlen < sizeof(int))
> +			goto e_inval;
> +		np->mc_all = valbool;
> +		retv = 0;
> +		break;
> +
>  	case MCAST_JOIN_GROUP:
>  	case MCAST_LEAVE_GROUP:
>  	{
> @@ -1255,6 +1262,10 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
>  		val = np->mcast_oif;
>  		break;
>  
> +	case IPV6_MULTICAST_ALL:
> +		val = np->mc_all;
> +		break;
> +
>  	case IPV6_UNICAST_IF:
>  		val = (__force int)htonl((__u32) np->ucast_oif);
>  		break;
> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> index 793159d77d8a..623ad00eb3c2 100644
> --- a/net/ipv6/mcast.c
> +++ b/net/ipv6/mcast.c
> @@ -622,7 +622,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr,
>  	}
>  	if (!mc) {
>  		rcu_read_unlock();
> -		return true;
> +		return np->mc_all;
>  	}
>  	read_lock(&mc->sflock);
>  	psl = mc->sflist;
> 

^ permalink raw reply

* Re: BUG: spinlock bad magic in tun_do_read
From: Eric Dumazet @ 2018-05-08  6:04 UTC (permalink / raw)
  To: Cong Wang, syzbot
  Cc: David Miller, Eric Dumazet, Jason Wang, LKML, Michael S. Tsirkin,
	Linux Kernel Network Developers, peterpenkov96, Sabrina Dubroca,
	syzkaller-bugs
In-Reply-To: <CAM_iQpXVsMzKochN7SY-CdHeACtJi3cC4oCfgzYdpqDMbo8Lcw@mail.gmail.com>



On 05/07/2018 10:54 PM, Cong Wang wrote:
> On Mon, May 7, 2018 at 10:27 PM, syzbot
> <syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    75bc37fefc44 Linux 4.17-rc4
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1162c697800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=31f4b3733894ef79
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e8b902c3c3fadf0a9dba
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> userspace arch: i386
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=172e4c97800000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com
>>
>> random: sshd: uninitialized urandom read (32 bytes read)
>> random: sshd: uninitialized urandom read (32 bytes read)
>> random: sshd: uninitialized urandom read (32 bytes read)
>> IPVS: ftp: loaded support on port[0] = 21
>> BUG: spinlock bad magic on CPU#0, syz-executor0/4586
>>  lock: 0xffff8801ae8928c8, .magic: 00000000, .owner: <none>/-1, .owner_cpu:
>> 0
>> CPU: 0 PID: 4586 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #62
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>  spin_dump+0x160/0x169 kernel/locking/spinlock_debug.c:67
>>  spin_bug kernel/locking/spinlock_debug.c:75 [inline]
>>  debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
>>  do_raw_spin_lock.cold.3+0x37/0x3c kernel/locking/spinlock_debug.c:112
>>  __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
>>  _raw_spin_lock+0x32/0x40 kernel/locking/spinlock.c:144
>>  spin_lock include/linux/spinlock.h:310 [inline]
>>  ptr_ring_consume include/linux/ptr_ring.h:335 [inline]
>>  tun_ring_recv drivers/net/tun.c:2143 [inline]
> 
> Yeah, we should return early before hitting this uninitialized ptr ring...
> Something like:
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ef33950a45d9..638c87a95247 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2128,6 +2128,9 @@ static void *tun_ring_recv(struct tun_file
> *tfile, int noblock, int *err)
>         void *ptr = NULL;
>         int error = 0;
> 
> +       if (!tfile->tx_ring.queue)
> +               goto out;
> +
> 
> Or, checking if tun is detached...
> 
>

tx_ring was properly initialized when first ptr_ring_consume() at line 2131 was attempted.

The bug happens later at line 2143 , after a schedule() call, line 2155

So a single check at function prologue wont solve the case the thread had to sleep,
then some uninit happened.

^ permalink raw reply

* Re: [RFC/PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4 version
From: 吉藤英明 @ 2018-05-08  6:31 UTC (permalink / raw)
  To: Andre Naujoks; +Cc: David S. Miller, netdev, yoshfuji
In-Reply-To: <45accb95-f712-1758-774d-c729727b89db@gmail.com>

Hi,

2018-05-08 15:03 GMT+09:00 Andre Naujoks <nautsch2@gmail.com>:
> On 11.04.2018 13:02, Andre Naujoks wrote:
>> Hi.
>
> Hi again.
>
> Since it has been a month now, I'd like to send a little "ping" on this subject.
>
> Is anything wrong with this? Or was it just bad timing?

I'm just curious... What kind of behaviour do you expect?

Unless you explicitly join the group, you cannot get traffic for the group
because of multicast filtering at device level (multicast fitlering) or at the
switch level (MLD).

If an application is interested in (several) multicast groups, it should
explicitly join the group.  So I cannot find valid (or meaningful) use-case.

--yoshfuji

>
> Regards
>   Andre
>
>>
>> I was running into a problem, when trying to join multiple multicast groups
>> on a single socket and thus binding to the any-address on said socket. I
>> received traffic from multicast groups, I did not join on that socket and
>> was at first surprised by that. After reading some old e-mails/threads,
>> which came to the conclusion "It is, as it is."
>> (e.g https://marc.info/?l=linux-kernel&m=115815686626791&w=2), I discovered
>> the IPv4 socketoption IP_MULTICAST_ALL, which, when disabled, does exactly
>> what I would expect from a socket by default.
>>
>> I propose a socket option for IPv6, which does the same and has the same
>> default as the IPv4 version. My first thought was, to just apply
>> IP_MULTICAST_ALL to a ipv6 socket, but that would change the behavior of
>> current applications and would probably be a big no-no.
>>
>> Regards
>>   Andre
>>
>>
>> From 473653086c05a3de839c3504885053f6254c7bc5 Mon Sep 17 00:00:00 2001
>> From: Andre Naujoks <nautsch2@gmail.com>
>> Date: Wed, 11 Apr 2018 12:38:28 +0200
>> Subject: [PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4
>>  version
>>
>> The socket option will be enabled by default to ensure current behaviour
>> is not changed. This is the same for the IPv4 version.
>>
>> A socket bound to in6addr_any and a specific port will receive all traffic
>> on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
>> one or more multicast groups were joined (using said socket) and only
>> pass on multicast traffic from groups, which were explicitly joined via
>> this socket.
>>
>> Without this option disabled a socket (system even) joined to multiple
>> multicast groups is very hard to get right. Filtering by destination
>> address has to take place in user space to avoid receiving multicast
>> traffic from other multicast groups, which might have traffic on the same
>> port.
>>
>> The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
>> too, is not done to avoid changing the behaviour of current applications.
>>
>> Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
>> ---
>>  include/linux/ipv6.h     |  3 ++-
>>  include/uapi/linux/in6.h |  1 +
>>  net/ipv6/af_inet6.c      |  1 +
>>  net/ipv6/ipv6_sockglue.c | 11 +++++++++++
>>  net/ipv6/mcast.c         |  2 +-
>>  5 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> index 8415bf1a9776..495e834c1367 100644
>> --- a/include/linux/ipv6.h
>> +++ b/include/linux/ipv6.h
>> @@ -274,7 +274,8 @@ struct ipv6_pinfo {
>>                                                */
>>                               dontfrag:1,
>>                               autoflowlabel:1,
>> -                             autoflowlabel_set:1;
>> +                             autoflowlabel_set:1,
>> +                             mc_all:1;
>>       __u8                    min_hopcount;
>>       __u8                    tclass;
>>       __be32                  rcv_flowinfo;
>> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
>> index ed291e55f024..71d82fe15b03 100644
>> --- a/include/uapi/linux/in6.h
>> +++ b/include/uapi/linux/in6.h
>> @@ -177,6 +177,7 @@ struct in6_flowlabel_req {
>>  #define IPV6_V6ONLY          26
>>  #define IPV6_JOIN_ANYCAST    27
>>  #define IPV6_LEAVE_ANYCAST   28
>> +#define IPV6_MULTICAST_ALL   29
>>
>>  /* IPV6_MTU_DISCOVER values */
>>  #define IPV6_PMTUDISC_DONT           0
>> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
>> index 8da0b513f188..7844cd9d2f10 100644
>> --- a/net/ipv6/af_inet6.c
>> +++ b/net/ipv6/af_inet6.c
>> @@ -209,6 +209,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
>>       np->hop_limit   = -1;
>>       np->mcast_hops  = IPV6_DEFAULT_MCASTHOPS;
>>       np->mc_loop     = 1;
>> +     np->mc_all      = 1;
>>       np->pmtudisc    = IPV6_PMTUDISC_WANT;
>>       np->repflow     = net->ipv6.sysctl.flowlabel_reflect;
>>       sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;
>> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
>> index 4d780c7f0130..b2bc1942a2ee 100644
>> --- a/net/ipv6/ipv6_sockglue.c
>> +++ b/net/ipv6/ipv6_sockglue.c
>> @@ -664,6 +664,13 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
>>                       retv = ipv6_sock_ac_drop(sk, mreq.ipv6mr_ifindex, &mreq.ipv6mr_acaddr);
>>               break;
>>       }
>> +     case IPV6_MULTICAST_ALL:
>> +             if (optlen < sizeof(int))
>> +                     goto e_inval;
>> +             np->mc_all = valbool;
>> +             retv = 0;
>> +             break;
>> +
>>       case MCAST_JOIN_GROUP:
>>       case MCAST_LEAVE_GROUP:
>>       {
>> @@ -1255,6 +1262,10 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
>>               val = np->mcast_oif;
>>               break;
>>
>> +     case IPV6_MULTICAST_ALL:
>> +             val = np->mc_all;
>> +             break;
>> +
>>       case IPV6_UNICAST_IF:
>>               val = (__force int)htonl((__u32) np->ucast_oif);
>>               break;
>> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
>> index 793159d77d8a..623ad00eb3c2 100644
>> --- a/net/ipv6/mcast.c
>> +++ b/net/ipv6/mcast.c
>> @@ -622,7 +622,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr,
>>       }
>>       if (!mc) {
>>               rcu_read_unlock();
>> -             return true;
>> +             return np->mc_all;
>>       }
>>       read_lock(&mc->sflock);
>>       psl = mc->sflist;
>>
>

^ permalink raw reply

* [PATCH] net: aquantia: Fix an error handling path in 'aq_pci_probe()'
From: Christophe JAILLET @ 2018-05-08  6:39 UTC (permalink / raw)
  To: davem, igor.russkikh, pavel.belous, weiyongjun1, dan.carpenter
  Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET

The position of 2 labels should be swapped in order to release resources
in the correct order and avoid leaks.

Fixes: 23ee07ad3c2f ("net: aquantia: Cleanup pci functions module")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
The order of 'pci_release_regions()' and 'free_netdev()' is in reverse
order in the 'aq_pci_remove()' function.
I don't know if done on purpose and/or needed, so I've left it as-is.
---
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index ecc6306f940f..b7f6b5a68b33 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -298,9 +298,9 @@ static int aq_pci_probe(struct pci_dev *pdev,
 	kfree(self->aq_hw);
 err_ioremap:
 	free_netdev(ndev);
-err_pci_func:
-	pci_release_regions(pdev);
 err_ndev:
+	pci_release_regions(pdev);
+err_pci_func:
 	pci_disable_device(pdev);
 	return err;
 }
-- 
2.17.0

^ permalink raw reply related

* Re: [RFC/PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4 version
From: Andre Naujoks @ 2018-05-08  6:41 UTC (permalink / raw)
  To: 吉藤英明; +Cc: David S. Miller, netdev, yoshfuji
In-Reply-To: <CAPA1RqCJtJS7pk+4MrJXan-QbbQRON=LRio870Xp2yfV2hRN9g@mail.gmail.com>

On 08.05.2018 08:31, 吉藤英明 wrote:
> Hi,
> 
> 2018-05-08 15:03 GMT+09:00 Andre Naujoks <nautsch2@gmail.com>:
>> On 11.04.2018 13:02, Andre Naujoks wrote:
>>> Hi.
>>
>> Hi again.
>>
>> Since it has been a month now, I'd like to send a little "ping" on this subject.
>>
>> Is anything wrong with this? Or was it just bad timing?
> 
> I'm just curious... What kind of behaviour do you expect?
> 
> Unless you explicitly join the group, you cannot get traffic for the group
> because of multicast filtering at device level (multicast fitlering) or at the
> switch level (MLD).
> 
> If an application is interested in (several) multicast groups, it should
> explicitly join the group.  So I cannot find valid (or meaningful) use-case.

I expect only to receive the multicast traffic of groups I explicitly joined on that
socket. This is was the IPv4 version of this socket option already does. The problem
only exists if multiple groups are joined and the socket therefore has to be bound
to the "any"-address. Then we get traffic from all multicast groups joined by any(!)
process on the system (plus anything else on that IP-port).

Regards
  Andre

> 
> --yoshfuji
> 
>>
>> Regards
>>   Andre
>>
>>>
>>> I was running into a problem, when trying to join multiple multicast groups
>>> on a single socket and thus binding to the any-address on said socket. I
>>> received traffic from multicast groups, I did not join on that socket and
>>> was at first surprised by that. After reading some old e-mails/threads,
>>> which came to the conclusion "It is, as it is."
>>> (e.g https://marc.info/?l=linux-kernel&m=115815686626791&w=2), I discovered
>>> the IPv4 socketoption IP_MULTICAST_ALL, which, when disabled, does exactly
>>> what I would expect from a socket by default.
>>>
>>> I propose a socket option for IPv6, which does the same and has the same
>>> default as the IPv4 version. My first thought was, to just apply
>>> IP_MULTICAST_ALL to a ipv6 socket, but that would change the behavior of
>>> current applications and would probably be a big no-no.
>>>
>>> Regards
>>>   Andre
>>>
>>>
>>> From 473653086c05a3de839c3504885053f6254c7bc5 Mon Sep 17 00:00:00 2001
>>> From: Andre Naujoks <nautsch2@gmail.com>
>>> Date: Wed, 11 Apr 2018 12:38:28 +0200
>>> Subject: [PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4
>>>  version
>>>
>>> The socket option will be enabled by default to ensure current behaviour
>>> is not changed. This is the same for the IPv4 version.
>>>
>>> A socket bound to in6addr_any and a specific port will receive all traffic
>>> on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
>>> one or more multicast groups were joined (using said socket) and only
>>> pass on multicast traffic from groups, which were explicitly joined via
>>> this socket.
>>>
>>> Without this option disabled a socket (system even) joined to multiple
>>> multicast groups is very hard to get right. Filtering by destination
>>> address has to take place in user space to avoid receiving multicast
>>> traffic from other multicast groups, which might have traffic on the same
>>> port.
>>>
>>> The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
>>> too, is not done to avoid changing the behaviour of current applications.
>>>
>>> Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
>>> ---
>>>  include/linux/ipv6.h     |  3 ++-
>>>  include/uapi/linux/in6.h |  1 +
>>>  net/ipv6/af_inet6.c      |  1 +
>>>  net/ipv6/ipv6_sockglue.c | 11 +++++++++++
>>>  net/ipv6/mcast.c         |  2 +-
>>>  5 files changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>>> index 8415bf1a9776..495e834c1367 100644
>>> --- a/include/linux/ipv6.h
>>> +++ b/include/linux/ipv6.h
>>> @@ -274,7 +274,8 @@ struct ipv6_pinfo {
>>>                                                */
>>>                               dontfrag:1,
>>>                               autoflowlabel:1,
>>> -                             autoflowlabel_set:1;
>>> +                             autoflowlabel_set:1,
>>> +                             mc_all:1;
>>>       __u8                    min_hopcount;
>>>       __u8                    tclass;
>>>       __be32                  rcv_flowinfo;
>>> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
>>> index ed291e55f024..71d82fe15b03 100644
>>> --- a/include/uapi/linux/in6.h
>>> +++ b/include/uapi/linux/in6.h
>>> @@ -177,6 +177,7 @@ struct in6_flowlabel_req {
>>>  #define IPV6_V6ONLY          26
>>>  #define IPV6_JOIN_ANYCAST    27
>>>  #define IPV6_LEAVE_ANYCAST   28
>>> +#define IPV6_MULTICAST_ALL   29
>>>
>>>  /* IPV6_MTU_DISCOVER values */
>>>  #define IPV6_PMTUDISC_DONT           0
>>> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
>>> index 8da0b513f188..7844cd9d2f10 100644
>>> --- a/net/ipv6/af_inet6.c
>>> +++ b/net/ipv6/af_inet6.c
>>> @@ -209,6 +209,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
>>>       np->hop_limit   = -1;
>>>       np->mcast_hops  = IPV6_DEFAULT_MCASTHOPS;
>>>       np->mc_loop     = 1;
>>> +     np->mc_all      = 1;
>>>       np->pmtudisc    = IPV6_PMTUDISC_WANT;
>>>       np->repflow     = net->ipv6.sysctl.flowlabel_reflect;
>>>       sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;
>>> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
>>> index 4d780c7f0130..b2bc1942a2ee 100644
>>> --- a/net/ipv6/ipv6_sockglue.c
>>> +++ b/net/ipv6/ipv6_sockglue.c
>>> @@ -664,6 +664,13 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
>>>                       retv = ipv6_sock_ac_drop(sk, mreq.ipv6mr_ifindex, &mreq.ipv6mr_acaddr);
>>>               break;
>>>       }
>>> +     case IPV6_MULTICAST_ALL:
>>> +             if (optlen < sizeof(int))
>>> +                     goto e_inval;
>>> +             np->mc_all = valbool;
>>> +             retv = 0;
>>> +             break;
>>> +
>>>       case MCAST_JOIN_GROUP:
>>>       case MCAST_LEAVE_GROUP:
>>>       {
>>> @@ -1255,6 +1262,10 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
>>>               val = np->mcast_oif;
>>>               break;
>>>
>>> +     case IPV6_MULTICAST_ALL:
>>> +             val = np->mc_all;
>>> +             break;
>>> +
>>>       case IPV6_UNICAST_IF:
>>>               val = (__force int)htonl((__u32) np->ucast_oif);
>>>               break;
>>> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
>>> index 793159d77d8a..623ad00eb3c2 100644
>>> --- a/net/ipv6/mcast.c
>>> +++ b/net/ipv6/mcast.c
>>> @@ -622,7 +622,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr,
>>>       }
>>>       if (!mc) {
>>>               rcu_read_unlock();
>>> -             return true;
>>> +             return np->mc_all;
>>>       }
>>>       read_lock(&mc->sflock);
>>>       psl = mc->sflist;
>>>
>>

^ permalink raw reply

* Re: [PATCH net] stmmac: fix reception of 802.1ad Ethernet tagged frames
From: Toshiaki Makita @ 2018-05-08  6:43 UTC (permalink / raw)
  To: Elad Nachman; +Cc: davem, netdev
In-Reply-To: <b3f73283-cc98-fbb3-7b65-cd0dd4d41d41@gmail.com>

On 2018/05/08 15:01, Elad Nachman wrote:
> stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before calling napi_gro_receive().
> 
> The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
> and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to __vlan_hwaccel_put_tag() .
> 
> This causes packets not to be passed to the VLAN slave if it was created with 802.1AD protocol
> (ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).
> 
> This fix passes the protocol from the VLAN header into __vlan_hwaccel_put_tag()
> instead of using the hard-coded value of ETH_P_8021Q.
> 
> Signed-off-by: Elad Nachman <eladn@gilat.com>
> 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index b65e2d1..ced2d34 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3293,17 +3293,19 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
>  {
> -	struct ethhdr *ehdr;
> +	struct vlan_ethhdr *veth;
>  	u16 vlanid;
> +	__be16 vlan_proto;
>  
>  	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
>  	    NETIF_F_HW_VLAN_CTAG_RX &&
>  	    !__vlan_get_tag(skb, &vlanid)) {
>  		/* pop the vlan tag */
> -		ehdr = (struct ethhdr *)skb->data;
> -		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
> +		veth = (struct vlan_ethhdr *)skb->data;
> +		vlan_proto = veth->h_vlan_proto;
> +		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
>  		skb_pull(skb, VLAN_HLEN);
> -		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
> +		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);

This is what devices with NETIF_F_HW_VLAN_STAG_RX are supposed to do,
not NETIF_F_HW_VLAN_CTAG_RX.

By the way this looks like doing the same thing as skb_vlan_untag in
__netif_receive_skb_core, so seems unnecessary to add HW_VLAN_STAG_RX.
Alternatively you can check if vlan_proto is 8021Q here.

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-08  6:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, linux-kernel, netdev, wexu,
	jfreimann
In-Reply-To: <12ede490-f674-2b89-d639-266b5fe15466@redhat.com>

On Tue, May 08, 2018 at 01:40:40PM +0800, Jason Wang wrote:
> On 2018年05月08日 11:05, Jason Wang wrote:
> > > 
> > > Because in virtqueue_enable_cb_delayed(), we may set an
> > > event_off which is bigger than new and both of them have
> > > wrapped. And in this case, although new is smaller than
> > > event_off (i.e. the third param -- old), new shouldn't
> > > add vq->num, and actually we are expecting a very big
> > > idx diff.
> > 
> > Yes, so to calculate distance correctly between event and new, we just
> > need to compare the warp counter and return false if it doesn't match
> > without the need to try to add vq.num here.
> > 
> > Thanks
> 
> Sorry, looks like the following should work, we need add vq.num if
> used_wrap_counter does not match:
> 
> static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
>                       __u16 off_wrap, __u16 new,
>                       __u16 old)
> {
>     bool wrap = off_wrap >> 15;
>     int off = off_wrap & ~(1 << 15);
>     __u16 d1, d2;
> 
>     if (wrap != vq->used_wrap_counter)
>         d1 = new + vq->num - off - 1;

Just to draw your attention (maybe you have already
noticed this).

In this case (i.e. wrap != vq->used_wrap_counter),
it's also possible that (off < new) is true. Because,

when virtqueue_enable_cb_delayed_packed() is used,
`off` is calculated in driver in a way like this:

	off = vq->last_used_idx + bufs;
	if (off >= vq->vring_packed.num) {
		off -= vq->vring_packed.num;
		wrap_counter ^= 1;
	}

And when `new` (in vhost) is close to vq->num. The
vq->last_used_idx + bufs (in driver) can be bigger
than vq->vring_packed.num, and:

1. `off` will wrap;
2. wrap counters won't match;
3. off < new;

And d1 (i.e. new + vq->num - off - 1) will be a value
bigger than vq->num. I'm okay with this, although it's
a bit weird.

Best regards,
Tiwei Bie

>     else
>         d1 = new - off - 1;
> 
>     if (new > old)
>         d2 = new - old;
>     else
>         d2 = new + vq->num - old;
> 
>     return d1 < d2;
> }
> 
> Thanks
> 

^ permalink raw reply

* Re: [PATCH] net: 8390: Fix possible data races in __ei_get_stats
From: Jia-Ju Bai @ 2018-05-08  6:47 UTC (permalink / raw)
  To: Eric Dumazet, davem, fthain, joe; +Cc: netdev, linux-kernel
In-Reply-To: <dddbec3f-4b63-3e70-d857-4980135d6f31@gmail.com>



On 2018/5/8 13:04, Eric Dumazet wrote:
>
> On 05/07/2018 07:16 PM, Jia-Ju Bai wrote:
>
>> Yes, "&dev->stats" will not change, because it is a fixed address.
>> But the field data in "dev->stats" is changed (rx_frame_errors, rx_crc_errors and rx_missed_errors).
>> So if the driver returns "&dev->stats" without lock protection (like on line 858), the field data value of this return value can be the changed field data value or unchanged field data value.
>
> We do not care.
>
> This function can be called by multiple cpus at the same time.
>
> As soon as one cpu returns from it, another cpu can happily modify dev->stats.ANYFIELD.
>
> Your patch fixes nothing at all.
>

Okay, thanks.
I also find that my patch does not work...


Best wishes,
Jia-Ju Bai

^ permalink raw reply

* Re: [PATCH net] stmmac: fix reception of 802.1ad Ethernet tagged frames
From: Elad Nachman @ 2018-05-08  7:11 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: davem, netdev, eladv6
In-Reply-To: <dcd6ce9c-e098-d901-ecb9-0c2b6d4219cf@lab.ntt.co.jp>

Currently running:
ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100

On eth0=stmmac succeeds, but the end result is that the vlan device gets proto 802.1q instead of proto 802.1ad and drops the received packet. Without the patch packets gets dropped for a seemingly "correct" 802.1ad ip link configuration.

If NETIF_F_HW_VLAN_STAG_RX is a requirement for the driver for supporting 802.1ad protocols then the Linux kernel should return error when user-space requests to create a vlan device with proto 802.1ad for physical devices which lacks NETIF_F_HW_VLAN_STAG_RX, which is not currently the case.

skb_vlan_untag() does nothing if __vlan_hwaccel_put_tag() was already called before (in the driver). The only possible alternative is to completely remove stmmac_rx_vlan() from the stmmac code and let skb_vlan_untag() handles things in a generic way.


On 08/05/18 09:43, Toshiaki Makita wrote:
> On 2018/05/08 15:01, Elad Nachman wrote:
>> stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before calling napi_gro_receive().
>>
>> The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
>> and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to __vlan_hwaccel_put_tag() .
>>
>> This causes packets not to be passed to the VLAN slave if it was created with 802.1AD protocol
>> (ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).
>>
>> This fix passes the protocol from the VLAN header into __vlan_hwaccel_put_tag()
>> instead of using the hard-coded value of ETH_P_8021Q.
>>
>> Signed-off-by: Elad Nachman <eladn@gilat.com>
>>
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index b65e2d1..ced2d34 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -3293,17 +3293,19 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>>  
>>  static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
>>  {
>> -	struct ethhdr *ehdr;
>> +	struct vlan_ethhdr *veth;
>>  	u16 vlanid;
>> +	__be16 vlan_proto;
>>  
>>  	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
>>  	    NETIF_F_HW_VLAN_CTAG_RX &&
>>  	    !__vlan_get_tag(skb, &vlanid)) {
>>  		/* pop the vlan tag */
>> -		ehdr = (struct ethhdr *)skb->data;
>> -		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
>> +		veth = (struct vlan_ethhdr *)skb->data;
>> +		vlan_proto = veth->h_vlan_proto;
>> +		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
>>  		skb_pull(skb, VLAN_HLEN);
>> -		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
>> +		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
> 
> This is what devices with NETIF_F_HW_VLAN_STAG_RX are supposed to do,
> not NETIF_F_HW_VLAN_CTAG_RX.
> 
> By the way this looks like doing the same thing as skb_vlan_untag in
> __netif_receive_skb_core, so seems unnecessary to add HW_VLAN_STAG_RX.
> Alternatively you can check if vlan_proto is 8021Q here.
> 

^ permalink raw reply

* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Jason Wang @ 2018-05-08  7:16 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: Michael S. Tsirkin, virtualization, linux-kernel, netdev, wexu,
	jfreimann
In-Reply-To: <20180508064409.kcn6amhsxu7nkuuc@debian>



On 2018年05月08日 14:44, Tiwei Bie wrote:
> On Tue, May 08, 2018 at 01:40:40PM +0800, Jason Wang wrote:
>> On 2018年05月08日 11:05, Jason Wang wrote:
>>>> Because in virtqueue_enable_cb_delayed(), we may set an
>>>> event_off which is bigger than new and both of them have
>>>> wrapped. And in this case, although new is smaller than
>>>> event_off (i.e. the third param -- old), new shouldn't
>>>> add vq->num, and actually we are expecting a very big
>>>> idx diff.
>>> Yes, so to calculate distance correctly between event and new, we just
>>> need to compare the warp counter and return false if it doesn't match
>>> without the need to try to add vq.num here.
>>>
>>> Thanks
>> Sorry, looks like the following should work, we need add vq.num if
>> used_wrap_counter does not match:
>>
>> static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
>>                        __u16 off_wrap, __u16 new,
>>                        __u16 old)
>> {
>>      bool wrap = off_wrap >> 15;
>>      int off = off_wrap & ~(1 << 15);
>>      __u16 d1, d2;
>>
>>      if (wrap != vq->used_wrap_counter)
>>          d1 = new + vq->num - off - 1;
> Just to draw your attention (maybe you have already
> noticed this).

I miss this, thanks!

>
> In this case (i.e. wrap != vq->used_wrap_counter),
> it's also possible that (off < new) is true. Because,
>
> when virtqueue_enable_cb_delayed_packed() is used,
> `off` is calculated in driver in a way like this:
>
> 	off = vq->last_used_idx + bufs;
> 	if (off >= vq->vring_packed.num) {
> 		off -= vq->vring_packed.num;
> 		wrap_counter ^= 1;
> 	}
>
> And when `new` (in vhost) is close to vq->num. The
> vq->last_used_idx + bufs (in driver) can be bigger
> than vq->vring_packed.num, and:
>
> 1. `off` will wrap;
> 2. wrap counters won't match;
> 3. off < new;
>
> And d1 (i.e. new + vq->num - off - 1) will be a value
> bigger than vq->num. I'm okay with this, although it's
> a bit weird.


So I'm considering something more compact by reusing vring_need_event() 
by pretending a larger queue size and adding vq->num back when necessary:

static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
                       __u16 off_wrap, __u16 new,
                       __u16 old)
{
     bool wrap = vq->used_wrap_counter;
     int off = off_wrap & ~(1 << 15);
     __u16 d1, d2;

     if (new < old) {
         new += vq->num;
         wrap ^= 1;
     }

     if (wrap != off_wrap >> 15)
         off += vq->num;

     return vring_need_event(off, new, old);
}


>
> Best regards,
> Tiwei Bie
>
>>      else
>>          d1 = new - off - 1;
>>
>>      if (new > old)
>>          d2 = new - old;
>>      else
>>          d2 = new + vq->num - old;
>>
>>      return d1 < d2;
>> }
>>
>> Thanks
>>

^ permalink raw reply

* Re: Build regressions/improvements in v4.17-rc4
From: Geert Uytterhoeven @ 2018-05-08  7:17 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Israel Rukshin, netdev
In-Reply-To: <1525762955-26767-1-git-send-email-geert@linux-m68k.org>

On Tue, May 8, 2018 at 9:02 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> JFYI, when comparing v4.17-rc4[1] to v4.17-rc3[3], the summaries are:
>   - build errors: +1/-4

  + /kisskb/src/include/linux/mlx5/driver.h: error: 'struct irq_desc'
has no member named 'affinity_hint':  => 1299:13

xtensa-allmodconfig

> [1] http://kisskb.ellerman.id.au/kisskb/head/75bc37fefc4471e718ba8e651aa74673d4e0a9eb/ (244 out of 246 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/head/6da6c0db5316275015e8cc2959f12a17584aeb64/ (244 out of 246 configs)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH net-next 1/4] bnxt_en: Fix firmware message delay loop regression.
From: Michael Chan @ 2018-05-08  7:18 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1525763921-20698-1-git-send-email-michael.chan@broadcom.com>

A recent change to reduce delay granularity waiting for firmware
reponse has caused a regression.  With a tighter delay loop,
the driver may see the beginning part of the response faster.
The original 5 usec delay to wait for the rest of the message
is not long enough and some messages are detected as invalid.

Increase the maximum wait time from 5 usec to 20 usec.  Also, fix
the debug message that shows the total delay time for the response
when the message times out.  With the new logic, the delay time
is not fixed per iteration of the loop, so we define a macro to
show the total delay time.

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 12 ++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  7 +++++++
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index efe5c72..168342a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3530,6 +3530,8 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		      HWRM_RESP_LEN_SFT;
 		valid = bp->hwrm_cmd_resp_addr + len - 1;
 	} else {
+		int j;
+
 		/* Check if response len is updated */
 		for (i = 0; i < tmo_count; i++) {
 			len = (le32_to_cpu(*resp_len) & HWRM_RESP_LEN_MASK) >>
@@ -3547,14 +3549,15 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 
 		if (i >= tmo_count) {
 			netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d\n",
-				   timeout, le16_to_cpu(req->req_type),
+				   HWRM_TOTAL_TIMEOUT(i),
+				   le16_to_cpu(req->req_type),
 				   le16_to_cpu(req->seq_id), len);
 			return -1;
 		}
 
 		/* Last byte of resp contains valid bit */
 		valid = bp->hwrm_cmd_resp_addr + len - 1;
-		for (i = 0; i < 5; i++) {
+		for (j = 0; j < HWRM_VALID_BIT_DELAY_USEC; j++) {
 			/* make sure we read from updated DMA memory */
 			dma_rmb();
 			if (*valid)
@@ -3562,9 +3565,10 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 			udelay(1);
 		}
 
-		if (i >= 5) {
+		if (j >= HWRM_VALID_BIT_DELAY_USEC) {
 			netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d v:%d\n",
-				   timeout, le16_to_cpu(req->req_type),
+				   HWRM_TOTAL_TIMEOUT(i),
+				   le16_to_cpu(req->req_type),
 				   le16_to_cpu(req->seq_id), len, *valid);
 			return -1;
 		}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 8df1d8b..a9c210e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -539,6 +539,13 @@ struct rx_tpa_end_cmp_ext {
 #define HWRM_MIN_TIMEOUT		25
 #define HWRM_MAX_TIMEOUT		40
 
+#define HWRM_TOTAL_TIMEOUT(n)	(((n) <= HWRM_SHORT_TIMEOUT_COUNTER) ?	\
+	((n) * HWRM_SHORT_MIN_TIMEOUT) :				\
+	(HWRM_SHORT_TIMEOUT_COUNTER * HWRM_SHORT_MIN_TIMEOUT +		\
+	 ((n) - HWRM_SHORT_TIMEOUT_COUNTER) * HWRM_MIN_TIMEOUT))
+
+#define HWRM_VALID_BIT_DELAY_USEC	20
+
 #define BNXT_RX_EVENT	1
 #define BNXT_AGG_EVENT	2
 #define BNXT_TX_EVENT	4
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 0/4] bnxt_en: Fixes for net-next.
From: Michael Chan @ 2018-05-08  7:18 UTC (permalink / raw)
  To: davem; +Cc: netdev

This series includes a bug fix for a regression in firmware message polling
introduced recently on net-next.  There are 3 additional minor fixes for
unsupported link speed checking, VF MAC address handling, and setting
PHY eeprom length.

Michael Chan (3):
  bnxt_en: Fix firmware message delay loop regression.
  bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only.
  bnxt_en: Always forward VF MAC address to the PF.

Vasundhara Volam (1):
  bnxt_en: Read phy eeprom A2h address only when optical diagnostics is
    supported.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 17 ++++++++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         | 10 ++++++++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 20 ++++++++------------
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c   |  3 ++-
 4 files changed, 30 insertions(+), 20 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next 2/4] bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only.
From: Michael Chan @ 2018-05-08  7:18 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1525763921-20698-1-git-send-email-michael.chan@broadcom.com>

Only non-NPAR PFs need to actively check and manage unsupported link
speeds.  NPAR functions and VFs do not control the link speed and
should skip the unsupported speed detection logic, to avoid warning
messages from firmware rejecting the unsupported firmware calls.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 168342a..cd3ab78 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6462,6 +6462,9 @@ static int bnxt_update_link(struct bnxt *bp, bool chng_link_state)
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);

+	if (!BNXT_SINGLE_PF(bp))
+		return 0;
+
 	diff = link_info->support_auto_speeds ^ link_info->advertising;
 	if ((link_info->support_auto_speeds | diff) !=
 	    link_info->support_auto_speeds) {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/4] bnxt_en: Read phy eeprom A2h address only when optical diagnostics is supported.
From: Michael Chan @ 2018-05-08  7:18 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1525763921-20698-1-git-send-email-michael.chan@broadcom.com>

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

For SFP+ modules, 0xA2 page is available only when Diagnostic Monitoring
Type [Address A0h, Byte 92] is implemented. Extend bnxt_get_module_info(),
to read optical diagnostics support at offset 92(0x5c) and set eeprom_len
length to ETH_MODULE_SFF_8436_LEN (to exclude A2 page), if dianostics is
not supported.

Also in bnxt_get_module_info(), module id is read from offset 0x5e which
is not correct. It was working by accident, as offset was not effective
without setting enables flag in the firmware request. SFP module id is
present at location 0. Fix this by removing the offset and read it
from location 0.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  3 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 20 ++++++++------------
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index a9c210e..9b14eb6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1414,8 +1414,7 @@ struct bnxt {
 
 #define I2C_DEV_ADDR_A0				0xa0
 #define I2C_DEV_ADDR_A2				0xa2
-#define SFP_EEPROM_SFF_8472_COMP_ADDR		0x5e
-#define SFP_EEPROM_SFF_8472_COMP_SIZE		1
+#define SFF_DIAG_SUPPORT_OFFSET			0x5c
 #define SFF_MODULE_ID_SFP			0x3
 #define SFF_MODULE_ID_QSFP			0xc
 #define SFF_MODULE_ID_QSFP_PLUS			0xd
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index ad98b78..7270c8b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2184,9 +2184,8 @@ static int bnxt_read_sfp_module_eeprom_info(struct bnxt *bp, u16 i2c_addr,
 static int bnxt_get_module_info(struct net_device *dev,
 				struct ethtool_modinfo *modinfo)
 {
+	u8 data[SFF_DIAG_SUPPORT_OFFSET + 1];
 	struct bnxt *bp = netdev_priv(dev);
-	struct hwrm_port_phy_i2c_read_input req = {0};
-	struct hwrm_port_phy_i2c_read_output *output = bp->hwrm_cmd_resp_addr;
 	int rc;
 
 	/* No point in going further if phy status indicates
@@ -2201,21 +2200,19 @@ static int bnxt_get_module_info(struct net_device *dev,
 	if (bp->hwrm_spec_code < 0x10202)
 		return -EOPNOTSUPP;
 
-	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_PHY_I2C_READ, -1, -1);
-	req.i2c_slave_addr = I2C_DEV_ADDR_A0;
-	req.page_number = 0;
-	req.page_offset = cpu_to_le16(SFP_EEPROM_SFF_8472_COMP_ADDR);
-	req.data_length = SFP_EEPROM_SFF_8472_COMP_SIZE;
-	req.port_id = cpu_to_le16(bp->pf.port_id);
-	mutex_lock(&bp->hwrm_cmd_lock);
-	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	rc = bnxt_read_sfp_module_eeprom_info(bp, I2C_DEV_ADDR_A0, 0, 0,
+					      SFF_DIAG_SUPPORT_OFFSET + 1,
+					      data);
 	if (!rc) {
-		u32 module_id = le32_to_cpu(output->data[0]);
+		u8 module_id = data[0];
+		u8 diag_supported = data[SFF_DIAG_SUPPORT_OFFSET];
 
 		switch (module_id) {
 		case SFF_MODULE_ID_SFP:
 			modinfo->type = ETH_MODULE_SFF_8472;
 			modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+			if (!diag_supported)
+				modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
 			break;
 		case SFF_MODULE_ID_QSFP:
 		case SFF_MODULE_ID_QSFP_PLUS:
@@ -2231,7 +2228,6 @@ static int bnxt_get_module_info(struct net_device *dev,
 			break;
 		}
 	}
-	mutex_unlock(&bp->hwrm_cmd_lock);
 	return rc;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 4/4] bnxt_en: Always forward VF MAC address to the PF.
From: Michael Chan @ 2018-05-08  7:18 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1525763921-20698-1-git-send-email-michael.chan@broadcom.com>

The current code already forwards the VF MAC address to the PF, except
in one case.  If the VF driver gets a valid MAC address from the firmware
during probe time, it will not forward the MAC address to the PF,
incorrectly assuming that the PF already knows the MAC address.  This
causes "ip link show" to show zero VF MAC addresses for this case.

This assumption is not correct.  Newer firmware remembers the VF MAC
address last used by the VF and provides it to the VF driver during
probe.  So we need to always forward the VF MAC address to the PF.

The forwarded MAC address may now be the PF assigned MAC address and so we
need to make sure we approve it for this case.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index cd3ab78..dfa0839 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8678,8 +8678,8 @@ static int bnxt_init_mac_addr(struct bnxt *bp)
 			memcpy(bp->dev->dev_addr, vf->mac_addr, ETH_ALEN);
 		} else {
 			eth_hw_addr_random(bp->dev);
-			rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
 		}
+		rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
 #endif
 	}
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index cc21d87..a649108 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -923,7 +923,8 @@ static int bnxt_vf_configure_mac(struct bnxt *bp, struct bnxt_vf_info *vf)
 	if (req->enables & cpu_to_le32(FUNC_VF_CFG_REQ_ENABLES_DFLT_MAC_ADDR)) {
 		if (is_valid_ether_addr(req->dflt_mac_addr) &&
 		    ((vf->flags & BNXT_VF_TRUST) ||
-		     (!is_valid_ether_addr(vf->mac_addr)))) {
+		     !is_valid_ether_addr(vf->mac_addr) ||
+		     ether_addr_equal(req->dflt_mac_addr, vf->mac_addr))) {
 			ether_addr_copy(vf->vf_mac_addr, req->dflt_mac_addr);
 			return bnxt_hwrm_exec_fwd_resp(bp, vf, msg_size);
 		}
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] net: aquantia: Fix an error handling path in 'aq_pci_probe()'
From: Igor Russkikh @ 2018-05-08  7:19 UTC (permalink / raw)
  To: Christophe JAILLET, davem, pavel.belous, weiyongjun1,
	dan.carpenter
  Cc: netdev, linux-kernel, kernel-janitors
In-Reply-To: <20180508063947.11317-1-christophe.jaillet@wanadoo.fr>

Hi Christophe,

On 08.05.2018 09:39, Christophe JAILLET wrote:
> The position of 2 labels should be swapped in order to release resources
> in the correct order and avoid leaks.
> 

>  	kfree(self->aq_hw);
>  err_ioremap:
>  	free_netdev(ndev);
> -err_pci_func:
> -	pci_release_regions(pdev);
>  err_ndev:
> +	pci_release_regions(pdev);
> +err_pci_func:
>  	pci_disable_device(pdev);
>  	return err;
>  }
> 

This was just submitted yesterday and is already accepted in netdev by David:

http://patchwork.ozlabs.org/patch/909746/

Thanks!

BR, Igor

^ permalink raw reply

* INFO: rcu detected stall in sctp_generate_heartbeat_event
From: syzbot @ 2018-05-08  7:35 UTC (permalink / raw)
  To: davem, linux-kernel, linux-sctp, marcelo.leitner, netdev, nhorman,
	syzkaller-bugs, vyasevich

Hello,

syzbot found the following crash on:

HEAD commit:    90278871d4b0 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=119a7237800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=aea320d3af5ef99d
dashboard link: https://syzkaller.appspot.com/bug?extid=e4a5bbd54260c93014f9
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e4a5bbd54260c93014f9@syzkaller.appspotmail.com

device bridge0 left promiscuous mode
IPVS: set_ctl: invalid protocol: 56 0.0.0.0:20003 fo
IPVS: set_ctl: invalid protocol: 175 224.0.0.2:20003 dh
INFO: rcu_sched self-detected stall on CPU
	0-...!: (119824 ticks this GP) idle=4b6/1/4611686018427387908  
softirq=23864/23864 fqs=5
	 (t=125000 jiffies g=13072 c=13071 q=480954)
NMI backtrace for cpu 0
CPU: 0 PID: 4547 Comm: udevd Not tainted 4.17.0-rc3+ #34
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
  __rcu_pending kernel/rcu/tree.c:3356 [inline]
  rcu_pending kernel/rcu/tree.c:3401 [inline]
  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
RIP: 0010:rep_nop arch/x86/include/asm/processor.h:667 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:672 [inline]
RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:69 [inline]
RIP: 0010:native_queued_spin_lock_slowpath+0x204/0xde0  
kernel/locking/qspinlock.c:305
RSP: 0018:ffff8801dae07390 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: ffffed003b5c0e8b RCX: 0000000000000004
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801a9e9d088
RBP: ffff8801dae07700 R08: ffffed00353d3a12 R09: ffffed00353d3a11
R10: ffffed00353d3a11 R11: ffff8801a9e9d08b R12: ffff8801a9e9d088
R13: ffff8801dae076d8 R14: 0000000000000001 R15: dffffc0000000000
  pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:674 [inline]
  queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:30 [inline]
  queued_spin_lock include/asm-generic/qspinlock.h:90 [inline]
  do_raw_spin_lock+0x1a7/0x200 kernel/locking/spinlock_debug.c:113
  __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
  _raw_spin_lock+0x32/0x40 kernel/locking/spinlock.c:144
  spin_lock include/linux/spinlock.h:310 [inline]
  sctp_generate_heartbeat_event+0xa4/0x450 net/sctp/sm_sideeffect.c:386
  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d1/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
  </IRQ>
RIP: 0010:rcu_is_watching+0x41/0x140 kernel/rcu/tree.c:1071
RSP: 0018:ffff8801ad457848 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff13
RAX: ffffed0035a8af0a RBX: 1ffff10035a8af0a RCX: ffff8801ad4578f0
RDX: 0000000000000000 RSI: ffff8801ad457f58 RDI: ffffffff897bf004
RBP: ffff8801ad4578d8 R08: ffff8801ad457978 R09: ffff8801ad53e040
R10: ffffed0035a8af32 R11: ffff8801ad457997 R12: ffff8801ad457988
R13: 0000000000000000 R14: ffff8801ad4578b0 R15: dffffc0000000000
syz-executor3 (7657) used greatest stack depth: 15968 bytes left
  kernel_text_address+0x61/0xf0 kernel/extable.c:140
  __kernel_text_address+0xd/0x40 kernel/extable.c:107
  unwind_get_return_address+0x61/0xa0 arch/x86/kernel/unwind_frame.c:18
  __save_stack_trace+0x7e/0xd0 arch/x86/kernel/stacktrace.c:45
  save_stack_trace+0x1a/0x20 arch/x86/kernel/stacktrace.c:60
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
  getname_flags+0xd0/0x5a0 fs/namei.c:140
  getname+0x19/0x20 fs/namei.c:211
  do_sys_open+0x39a/0x740 fs/open.c:1087
  __do_sys_open fs/open.c:1111 [inline]
  __se_sys_open fs/open.c:1106 [inline]
  __x64_sys_open+0x7e/0xc0 fs/open.c:1106
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f7621c19120
RSP: 002b:00007fff6a9646f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7621c19120
RDX: 0000000000000124 RSI: 0000000000080000 RDI: 00007fff6a9647a0
RBP: 0000000000ddd744 R08: 0000000000ddd744 R09: 00007f7621c6ec20
R10: 7269762f73656369 R11: 0000000000000246 R12: 0000000000dc3810
R13: 0000000000dc3900 R14: 0000000000dc3250 R15: 0000000000dc8e10
device bridge_slave_1 left promiscuous mode
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_0 left promiscuous mode
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_1 left promiscuous mode
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_0 left promiscuous mode
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_1 left promiscuous mode
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_0 left promiscuous mode
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_1 left promiscuous mode
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_0 left promiscuous mode
bridge0: port 1(bridge_slave_0) entered disabled state
IPVS: ftp: loaded support on port[0] = 21
team0 (unregistering): Port device team_slave_1 removed
team0 (unregistering): Port device team_slave_0 removed
bond0 (unregistering): Releasing backup interface bond_slave_1
bond0 (unregistering): Releasing backup interface bond_slave_0
bond0 (unregistering): Released all slaves
team0 (unregistering): Port device team_slave_1 removed
team0 (unregistering): Port device team_slave_0 removed
bond0 (unregistering): Releasing backup interface bond_slave_1
bond0 (unregistering): Releasing backup interface bond_slave_0
bond0 (unregistering): Released all slaves
team0 (unregistering): Port device team_slave_1 removed
team0 (unregistering): Port device team_slave_0 removed
bond0 (unregistering): Releasing backup interface bond_slave_1
bond0 (unregistering): Releasing backup interface bond_slave_0
bond0 (unregistering): Released all slaves
team0 (unregistering): Port device team_slave_1 removed
team0 (unregistering): Port device team_slave_0 removed
bond0 (unregistering): Releasing backup interface bond_slave_1
bond0 (unregistering): Releasing backup interface bond_slave_0
bond0 (unregistering): Released all slaves
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_0 entered promiscuous mode
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_1 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
bond0: Enslaving bond_slave_0 as an active interface with an up link
bond0: Enslaving bond_slave_1 as an active interface with an up link
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bond: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bond: link becomes ready
IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
team0: Port device team_slave_0 added
IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
team0: Port device team_slave_1 added
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
IPv6: ADDRCONF(NETDEV_UP): bridge_slave_0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
IPv6: ADDRCONF(NETDEV_UP): bridge_slave_1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered forwarding state
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered forwarding state
IPv6: ADDRCONF(NETDEV_UP): bridge0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): bridge0: link becomes ready


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [PATCH net] stmmac: fix reception of 802.1ad Ethernet tagged frames
From: Toshiaki Makita @ 2018-05-08  7:34 UTC (permalink / raw)
  To: Elad Nachman; +Cc: davem, netdev
In-Reply-To: <f82648ab-0cfb-0bc2-e0f4-759e32ee445a@gmail.com>

On 2018/05/08 16:11, Elad Nachman wrote:
> Currently running:
> ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100
> 
> On eth0=stmmac succeeds, but the end result is that the vlan device gets proto 802.1q instead of proto 802.1ad and drops the received packet. Without the patch packets gets dropped for a seemingly "correct" 802.1ad ip link configuration.
> 
> If NETIF_F_HW_VLAN_STAG_RX is a requirement for the driver for supporting 802.1ad protocols then the Linux kernel should return error when user-space requests to create a vlan device with proto 802.1ad for physical devices which lacks NETIF_F_HW_VLAN_STAG_RX, which is not currently the case.

No. You can create 802.1ad devices without HW_VLAN_STAG_RX, but you
should not strip 802.1ad tag in driver without HW_VLAN_STAG_RX.
__netif_receive_skb_core should handle them if the device does not have
HW_VLAN_STAG_RX.

> skb_vlan_untag() does nothing if __vlan_hwaccel_put_tag() was already called before (in the driver). The only possible alternative is to completely remove stmmac_rx_vlan() from the stmmac code and let skb_vlan_untag() handles things in a generic way.

You cannot remove an already added feature in the driver.
Alternatively you can skip stripping vlan if vlan_proto is not 8021Q.
Something like this.

	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
	    NETIF_F_HW_VLAN_CTAG_RX &&
	    !__vlan_get_tag(skb, &vlanid)) {
		veth = (struct vlan_ethhdr *)skb->data;
		vlan_proto = veth->h_vlan_proto;
		if (vlan_proto == htons(ETH_P_8021Q)) {
			/* pop the vlan tag */
			memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
			skb_pull(skb, VLAN_HLEN);
			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
		}
 	}
 }

-- 
Toshiaki Makita

^ permalink raw reply

* [PATCH] net: wireless: ath: ath9k: Fix a possible data race in ath_chanctx_set_next
From: Jia-Ju Bai @ 2018-05-08  8:06 UTC (permalink / raw)
  To: ath9k-devel, kvalo; +Cc: linux-wireless, netdev, linux-kernel, Jia-Ju Bai

The write operation to "sc->next_chan" is protected by
the lock on line 1287, but the read operation to
this data on line 1262 is not protected by the lock.
Thus, there may exist a data race for "sc->next_chan".

To fix this data race, the read operation to "sc->next_chan" 
should be also protected by the lock.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/wireless/ath/ath9k/channel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/channel.c b/drivers/net/wireless/ath/ath9k/channel.c
index 1b05b5d7a038..ed3cd5523481 100644
--- a/drivers/net/wireless/ath/ath9k/channel.c
+++ b/drivers/net/wireless/ath/ath9k/channel.c
@@ -1257,12 +1257,12 @@ void ath_chanctx_set_next(struct ath_softc *sc, bool force)
 			"Stopping current chanctx: %d\n",
 			sc->cur_chan->chandef.center_freq1);
 		sc->cur_chan->stopped = true;
-		spin_unlock_bh(&sc->chan_lock);
 
 		if (sc->next_chan == &sc->offchannel.chan) {
 			getrawmonotonic(&ts);
 			measure_time = true;
 		}
+		spin_unlock_bh(&sc->chan_lock);
 
 		ath9k_chanctx_stop_queues(sc, sc->cur_chan);
 		queues_stopped = true;
-- 
2.17.0

^ permalink raw reply related

* Re: linux-next: manual merge of the bpf-next tree with the s390 tree
From: Daniel Borkmann @ 2018-05-08  8:19 UTC (permalink / raw)
  To: Stephen Rothwell, Alexei Starovoitov, Networking,
	Martin Schwidefsky, Heiko Carstens
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20180508102638.1e19b7f2@canb.auug.org.au>

On 05/08/2018 02:26 AM, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the bpf-next tree got a conflict in:
> 
>   arch/s390/net/bpf_jit.S
> 
> between commit:
> 
>   de5cb6eb514e ("s390: use expoline thunks in the BPF JIT")
> 
> from the s390 tree and commit:
> 
>   e1cf4befa297 ("bpf, s390x: remove ld_abs/ld_ind")
> 
> from the bpf-next tree.
> 
> I fixed it up (I just removed the file as the latter does) and can
> carry the fix as necessary. This is now fixed as far as linux-next is
> concerned, but any non trivial conflicts should be mentioned to your
> upstream maintainer when your tree is submitted for merging.  You may
> also want to consider cooperating with the maintainer of the conflicting
> tree to minimise any particularly complex conflicts.

Yep, sounds good, thanks!

^ permalink raw reply

* [PATCH net-next] drivers: net: davinci_mdio: prevent sprious timeout
From: Sekhar Nori @ 2018-05-08  8:26 UTC (permalink / raw)
  To: Grygorii Strashko; +Cc: David S . Miller, linux-omap, netdev

A well timed kernel preemption in the time_after() loop
in wait_for_idle() can result in a spurious timeout
error to be returned.

Fix it by checking for status of hardware before returning
timeout error.

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
The issue has not been personally observed by me, but has
been reported by users. Sending for next-next given the
non-critical nature. There is seems to be no easy way to
reproduce this.

 drivers/net/ethernet/ti/davinci_mdio.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c b/drivers/net/ethernet/ti/davinci_mdio.c
index 3c33f4504d8e..4fbd04fd38cf 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -231,10 +231,16 @@ static inline int wait_for_idle(struct davinci_mdio_data *data)
 
 	while (time_after(timeout, jiffies)) {
 		if (__raw_readl(&regs->control) & CONTROL_IDLE)
-			return 0;
+			goto out;
 	}
-	dev_err(data->dev, "timed out waiting for idle\n");
-	return -ETIMEDOUT;
+
+	if (!(__raw_readl(&regs->control) & CONTROL_IDLE)) {
+		dev_err(data->dev, "timed out waiting for idle\n");
+		return -ETIMEDOUT;
+	}
+
+out:
+	return 0;
 }
 
 static int davinci_mdio_read(struct mii_bus *bus, int phy_id, int phy_reg)
-- 
2.16.2

^ permalink raw reply related

* Re: [PATCH net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Kevin Easton @ 2018-05-08  8:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel,
	syzkaller-bugs
In-Reply-To: <20180507155534-mutt-send-email-mst@kernel.org>

On Mon, May 07, 2018 at 04:03:25PM +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 27, 2018 at 11:45:02AM -0400, Kevin Easton wrote:
> > The struct vhost_msg within struct vhost_msg_node is copied to userspace,
> > so it should be allocated with kzalloc() to ensure all structure padding
> > is zeroed.
> > 
> > Signed-off-by: Kevin Easton <kevin@guarana.org>
> > Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
> > ---
> >  drivers/vhost/vhost.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index f3bd8e9..1b84dcff 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
> >  /* Create a new message. */
> >  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
> >  {
> > -	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
> > +	struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
> >  	if (!node)
> >  		return NULL;
> >  	node->vq = vq;
> 
> 
> Let's just init the msg though.
> 
> OK it seems this is the best we can do for now,
> we need a new feature bit to fix it for 32 bit
> userspace on 64 bit kernels.
> 
> Does the following help?

Yes, the reproducer doesn't trigger the error with that patch applied.

    - Kevin

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox