Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] rhashtable: make walk safe from softirq context
From: David Miller @ 2019-02-12 18:43 UTC (permalink / raw)
  To: johannes; +Cc: linux-wireless, netdev, j, tgraf, herbert, johannes.berg
In-Reply-To: <20190206090721.8001-1-johannes@sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Wed,  6 Feb 2019 10:07:21 +0100

> From: Johannes Berg <johannes.berg@intel.com>
> 
> When an rhashtable walk is done from softirq context, we rightfully
> get a lockdep complaint saying that we could get a softirq in the
> middle of a rehash, and thus deadlock on &ht->lock. This happened
> e.g. in mac80211 as it does a walk in softirq context.
> 
> Fix this by using spin_lock_bh() wherever we use the &ht->lock.
> 
> Initially, I thought it would be sufficient to do this only in the
> rehash (rhashtable_rehash_table), but I changed my mind:
>  * the caller doesn't really need to disable softirqs across all
>    of the rhashtable_walk_* functions, only those parts that they
>    actually do within the lock need it
>  * maybe more importantly, it would still lead to massive lockdep
>    complaints - false positives, but hard to fix - because lockdep
>    wouldn't know about different ht->lock instances, and thus one
>    user of the code doing a walk w/o any locking (when it only ever
>    uses process context this is fine) vs. another user like in wifi
>    where we noticed this problem would still cause it to complain.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Jouni Malinen <j@w1.fi>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Herbert and Johannes, I need some guidance.

It seems Herbert wants the softirq usage of rhashtables removed, but
since things have been like this for so long that's not the most
reasonable requirement if we can fix it more simply with Johannes's
patch especially for -stable.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 3/4] mlxsw: spectrum_flower: Fix VLAN modify action support
From: Pablo Neira Ayuso @ 2019-02-12 18:49 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev@vger.kernel.org, davem@davemloft.net, Jiri Pirko,
	Nir Dotan, mlxsw
In-Reply-To: <20190212162924.29777-4-idosch@mellanox.com>

On Tue, Feb 12, 2019 at 04:29:53PM +0000, Ido Schimmel wrote:
> The driver does not support VLAN push and pop, but only VLAN modify.
> 
> Fixes: 738678817573 ("drivers: net: use flow action infrastructure")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply

* KASAN: use-after-free Read in rt_cache_valid
From: syzbot @ 2019-02-12 18:53 UTC (permalink / raw)
  To: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    aa0c38cf39de Merge branch 'fixes' of git://git.kernel.org/..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1460eca7400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ee434566c893c7b1
dashboard link: https://syzkaller.appspot.com/bug?extid=c4c4b2bb358bb936ad7e
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1197cb88c00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com

Enabling of bearer <udp:syz1> rejected, already enabled
Enabling of bearer <udp:syz1> rejected, already enabled
Enabling of bearer <udp:syz1> rejected, already enabled
Enabling of bearer <udp:syz1> rejected, already enabled
==================================================================
BUG: KASAN: use-after-free in rt_cache_valid+0x158/0x190  
net/ipv4/route.c:1510
Read of size 2 at addr ffff88809bfce836 by task udevd/7435

CPU: 1 PID: 7435 Comm: udevd Not tainted 5.0.0-rc6+ #69
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
  kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
  __asan_report_load2_noabort+0x14/0x20 mm/kasan/generic_report.c:133
  rt_cache_valid+0x158/0x190 net/ipv4/route.c:1510
  __mkroute_output net/ipv4/route.c:2260 [inline]
  ip_route_output_key_hash_rcu+0x89d/0x30e0 net/ipv4/route.c:2492
Enabling of bearer <udp:syz1> rejected, already enabled
  ip_route_output_key_hash+0x212/0x380 net/ipv4/route.c:2321
Enabling of bearer <udp:syz1> rejected, already enabled
  __ip_route_output_key include/net/route.h:124 [inline]
  ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2576
  ip_route_output_key include/net/route.h:134 [inline]
  tipc_udp_xmit.isra.0+0x55d/0xcc0 net/tipc/udp_media.c:173
Enabling of bearer <udp:syz1> rejected, already enabled
  tipc_udp_send_msg+0x295/0x4a0 net/tipc/udp_media.c:247
  tipc_bearer_xmit_skb+0x172/0x360 net/tipc/bearer.c:503
Enabling of bearer <udp:syz1> rejected, already enabled
  tipc_disc_timeout+0x933/0xd60 net/tipc/discover.c:332
  call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
Enabling of bearer <udp:syz1> rejected, already enabled
  expire_timers kernel/time/timer.c:1362 [inline]
  __run_timers kernel/time/timer.c:1681 [inline]
  __run_timers kernel/time/timer.c:1649 [inline]
  run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
Enabling of bearer <udp:syz1> rejected, already enabled
  __do_softirq+0x266/0x95a kernel/softirq.c:292
Enabling of bearer <udp:syz1> rejected, already enabled
  invoke_softirq kernel/softirq.c:373 [inline]
  irq_exit+0x180/0x1d0 kernel/softirq.c:413
  exiting_irq arch/x86/include/asm/apic.h:536 [inline]
  smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
  </IRQ>
Enabling of bearer <udp:syz1> rejected, already enabled
RIP: 0010:generic_fillattr+0x54c/0x6e0 fs/stat.c:51
Code: 7c 24 0c 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03  
38 d0 7c 08 84 d2 0f 85 ee 00 00 00 45 8b 64 24 0c 31 ff <41> 81 e4 00 08  
00 00 44 89 e6 e8 15 21 bf ff 45 85 e4 74 2c e8 8b
RSP: 0018:ffff88809894fc58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000007 RBX: ffff88809894fde8 RCX: ffffffff81b0c113
RDX: 0000000000000000 RSI: ffffffff81b0c146 RDI: 0000000000000000
RBP: ffff88809894fc78 R08: ffff8880925de340 R09: ffff88809894fde8
R10: ffffed1013129fcd R11: ffff88809894fe6f R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000800 R15: ffff888098d4d100
  vfs_getattr_nosec+0x140/0x180 fs/stat.c:82
  vfs_getattr+0x4b/0x70 fs/stat.c:116
Enabling of bearer <udp:syz1> rejected, already enabled
  vfs_statx+0x157/0x200 fs/stat.c:189
Enabling of bearer <udp:syz1> rejected, already enabled
  vfs_stat include/linux/fs.h:3171 [inline]
  __do_sys_newstat+0xa4/0x130 fs/stat.c:339
Enabling of bearer <udp:syz1> rejected, already enabled
Enabling of bearer <udp:syz1> rejected, already enabled
kobject: 'loop0' (000000003275aa21): kobject_uevent_env
  __se_sys_newstat fs/stat.c:335 [inline]
  __x64_sys_newstat+0x54/0x80 fs/stat.c:335
kobject: 'loop0' (000000003275aa21): fill_kobj_path: path  
= '/devices/virtual/block/loop0'
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fda7f482c65
Code: 00 00 00 e8 5d 01 00 00 48 83 c4 18 c3 90 90 90 90 90 90 90 90 83 ff  
01 48 89 f0 77 18 48 89 c7 48 89 d6 b8 04 00 00 00 0f 05 <48> 3d 00 f0 ff  
ff 77 17 f3 c3 90 48 8b 05 a1 51 2b 00 64 c7 00 16
RSP: 002b:00007ffefd39a338 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
Enabling of bearer <udp:syz1> rejected, already enabled
RAX: ffffffffffffffda RBX: 00007ffefd39a3d0 RCX: 00007fda7f482c65
RDX: 00007ffefd39a340 RSI: 00007ffefd39a340 RDI: 00007ffefd39a3d0
RBP: 0000000001125980 R08: 00007ffefd39a3e0 R09: 00007fda7f4d9790
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000001114250
R13: 0000000000000000 R14: 00007ffefd39a840 R15: 0000000000000001
Enabling of bearer <udp:syz1> rejected, already enabled

Allocated by task 7724:
  save_stack+0x45/0xd0 mm/kasan/common.c:73
kobject: 'loop0' (000000003275aa21): kobject_uevent_env
  set_track mm/kasan/common.c:85 [inline]
  __kasan_kmalloc mm/kasan/common.c:496 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
kobject: 'loop0' (000000003275aa21): fill_kobj_path: path  
= '/devices/virtual/block/loop0'
  kasan_kmalloc mm/kasan/common.c:504 [inline]
  kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:411
Enabling of bearer <udp:syz1> rejected, already enabled
  kmem_cache_alloc+0x12d/0x710 mm/slab.c:3543
  dst_alloc+0x10e/0x1d0 net/core/dst.c:105
  rt_dst_alloc+0x83/0x3f0 net/ipv4/route.c:1566
  __mkroute_output net/ipv4/route.c:2265 [inline]
  ip_route_output_key_hash_rcu+0x97d/0x30e0 net/ipv4/route.c:2492
Enabling of bearer <udp:syz1> rejected, already enabled
  ip_route_output_key_hash+0x212/0x380 net/ipv4/route.c:2321
  __ip_route_output_key include/net/route.h:124 [inline]
  ip_route_connect include/net/route.h:302 [inline]
  __ip4_datagram_connect+0x6fb/0x1330 net/ipv4/datagram.c:51
  __ip6_datagram_connect+0xa6a/0x1390 net/ipv6/datagram.c:152
kobject: 'loop0' (000000003275aa21): kobject_uevent_env
  ip6_datagram_connect+0x30/0x50 net/ipv6/datagram.c:273
  inet_dgram_connect+0x150/0x2e0 net/ipv4/af_inet.c:571
  __sys_connect+0x266/0x330 net/socket.c:1662
  __do_sys_connect net/socket.c:1673 [inline]
  __se_sys_connect net/socket.c:1670 [inline]
  __x64_sys_connect+0x73/0xb0 net/socket.c:1670
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
kobject: 'loop0' (000000003275aa21): fill_kobj_path: path  
= '/devices/virtual/block/loop0'

Freed by task 0:
  save_stack+0x45/0xd0 mm/kasan/common.c:73
  set_track mm/kasan/common.c:85 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
  __cache_free mm/slab.c:3487 [inline]
  kmem_cache_free+0x86/0x260 mm/slab.c:3749
  dst_destroy+0x2a0/0x3c0 net/core/dst.c:141
  dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
  __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
  rcu_do_batch kernel/rcu/tree.c:2452 [inline]
  invoke_rcu_callbacks kernel/rcu/tree.c:2773 [inline]
  rcu_process_callbacks+0x928/0x1390 kernel/rcu/tree.c:2754
cgroup: fork rejected by pids controller in /syz0
  __do_softirq+0x266/0x95a kernel/softirq.c:292

The buggy address belongs to the object at ffff88809bfce800
  which belongs to the cache ip_dst_cache of size 160
The buggy address is located 54 bytes inside of
  160-byte region [ffff88809bfce800, ffff88809bfce8a0)
The buggy address belongs to the page:
page:ffffea00026ff380 count:1 mapcount:0 mapping:ffff88821af8d1c0 index:0x0
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea000297e848 ffff8880a68abd48 ffff88821af8d1c0
raw: 0000000000000000 ffff88809bfce000 0000000100000010 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88809bfce700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff88809bfce780: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
> ffff88809bfce800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                      ^
  ffff88809bfce880: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88809bfce900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* [PATCH net] net: phy: fix interrupt handling in non-started states
From: Heiner Kallweit @ 2019-02-12 18:56 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: netdev@vger.kernel.org, Russell King - ARM Linux

phylib enables interrupts before phy_start() has been called, and if
we receive an interrupt in a non-started state, the interrupt handler
returns IRQ_NONE. This causes problems with at least one Marvell chip
as reported by Andrew.
Fix this by handling interrupts the same as in phy_mac_interrupt(),
basically always running the phylib state machine. It knows when it
has to do something and when not.
This change allows to handle interrupts gracefully even if they
occur in a non-started state.

Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 189cd2048c3a..ca5e0c0f018c 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -762,9 +762,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 {
 	struct phy_device *phydev = phy_dat;

-	if (!phy_is_started(phydev))
-		return IRQ_NONE;		/* It can't be ours.  */
-
 	if (phydev->drv->did_interrupt && !phydev->drv->did_interrupt(phydev))
 		return IRQ_NONE;

-- 
2.20.1

^ permalink raw reply related

* Re: [PATCH net-next 0/4] net: phy: Add 2.5G/5GBASET PHYs support
From: David Miller @ 2019-02-12 19:03 UTC (permalink / raw)
  To: maxime.chevallier
  Cc: netdev, linux-kernel, andrew, f.fainelli, hkallweit1, linux,
	linux-arm-kernel, antoine.tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190211142529.22885-1-maxime.chevallier@bootlin.com>

From: Maxime Chevallier <maxime.chevallier@bootlin.com>
Date: Mon, 11 Feb 2019 15:25:25 +0100

> The 802.3bz standard defines 2 modes based on the NBASET alliance work
> that allow to use 2.5Gbps and 5Gbps speeds on Cat 5e, 6 and 7 cables.
> 
> This series adds the necessary infrastructure to handle these modes with
> C45 PHYs. This series was originally part of a bigger one, that has
> seen 2 iterations [1] [2] that added support for these modes on Marvell
> Alaska PHYs.
 ...

I'll give Andrew, Florian, and Heiner a chance to review this series.

^ permalink raw reply

* Re: [PATCH v2] rhashtable: make walk safe from softirq context
From: Johannes Berg @ 2019-02-12 19:03 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, j, tgraf, herbert, Bob Copeland
In-Reply-To: <20190212.104339.1794719792249723582.davem@davemloft.net>

On Tue, 2019-02-12 at 10:43 -0800, David Miller wrote:

> Herbert and Johannes, I need some guidance.
> 
> It seems Herbert wants the softirq usage of rhashtables removed, 

Well, specifically of rhashtable walkers. I can only concede that he's
right in that a hashtable walk during softirq (or even with softirqs
disabled) was maybe a bad idea.

At the same time, it's likely going to be pretty deep surgery in this
code, and I'm not sure I can do that right now. Maybe Bob has some
thoughts if it can be achieved more easily, but I think it'd require
adding a new list to each station that tracks which mesh paths it is the
next_hop for, and making sure that's maintained correctly, which feels
tricky but maybe it's not (I could be more familiar with mesh ...)

Evidently this goes back to 

commit 60854fd94573f0d3b80b55b40cf0140a0430f3ab
Author: Bob Copeland <me@bobcopeland.com>
Date:   Wed Mar 2 10:09:20 2016 -0500

    mac80211: mesh: convert path table to rhashtable

which is kinda old. Not sure why this didn't surface before, because the
spinlock was introduced *before*, otherwise certainly the mutex would've
caused us to not be able to do this code to start with (commit
c6ff5268293 - rhashtable: Fix walker list corruption).

That commit also just converted an existing hashtable walk to
rhashtable, so not sure that counts as having introduced the problem :-)

I guess that's not really guidance. If it were my call I'd apply the
patch and issue a stern warning to myself to remove this ASAP ;-) But
sadly, mesh isn't exactly a priority to most, so not sure when that "P"
would be.

But I guess we should also ask Bob first:
1) do you think it'd be easy to maintain a separate list or avoid the
   iteration in some otherway, and make that a small enough patch to be
   applicable for stable?
2) or do you think maybe the mesh_plink_broken() call could just be
   lifted into a workqueue instead?

johannes

^ permalink raw reply

* KASAN: use-after-free Read in sctp_outq_tail
From: syzbot @ 2019-02-12 19:04 UTC (permalink / raw)
  To: davem, linux-kernel, linux-sctp, marcelo.leitner, netdev, nhorman,
	syzkaller-bugs, vyasevich

Hello,

syzbot found the following crash on:

HEAD commit:    d4104460aec1 Add linux-next specific files for 20190211
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14140124c00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c8a112d3b0d6719b
dashboard link: https://syzkaller.appspot.com/bug?extid=7823fa3f3e2d69341ea8
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7823fa3f3e2d69341ea8@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in list_add_tail include/linux/list.h:93 [inline]
BUG: KASAN: use-after-free in sctp_outq_tail_data net/sctp/outqueue.c:105  
[inline]
BUG: KASAN: use-after-free in sctp_outq_tail+0x816/0x930  
net/sctp/outqueue.c:313
Read of size 8 at addr ffff88807b19a7b8 by task syz-executor.0/30745

CPU: 1 PID: 30745 Comm: syz-executor.0 Not tainted 5.0.0-rc5-next-20190211  
#32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
  kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
  list_add_tail include/linux/list.h:93 [inline]
  sctp_outq_tail_data net/sctp/outqueue.c:105 [inline]
  sctp_outq_tail+0x816/0x930 net/sctp/outqueue.c:313
  sctp_cmd_send_msg net/sctp/sm_sideeffect.c:1109 [inline]
  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1784 [inline]
  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
  sctp_do_sm+0x68e/0x5380 net/sctp/sm_sideeffect.c:1191
  sctp_primitive_SEND+0xa0/0xd0 net/sctp/primitive.c:178
  sctp_sendmsg_to_asoc+0xa63/0x17b0 net/sctp/socket.c:1955
  sctp_sendmsg+0x10a9/0x17e0 net/sctp/socket.c:2113
  inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg+0xdd/0x130 net/socket.c:631
  ___sys_sendmsg+0x806/0x930 net/socket.c:2136
  __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
  __do_sys_sendmsg net/socket.c:2183 [inline]
  __se_sys_sendmsg net/socket.c:2181 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fa9b8630c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa9b86316d4
R13: 00000000004c4e2b R14: 00000000004d8ab8 R15: 00000000ffffffff

Allocated by task 30745:
  save_stack+0x45/0xd0 mm/kasan/common.c:75
  set_track mm/kasan/common.c:87 [inline]
  __kasan_kmalloc mm/kasan/common.c:498 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:471
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:506
  kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3615
  kmalloc include/linux/slab.h:548 [inline]
  kzalloc include/linux/slab.h:743 [inline]
  sctp_stream_init_ext+0x51/0x110 net/sctp/stream.c:172
  sctp_sendmsg_to_asoc+0x1273/0x17b0 net/sctp/socket.c:1896
  sctp_sendmsg+0x10a9/0x17e0 net/sctp/socket.c:2113
  inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg+0xdd/0x130 net/socket.c:631
  ___sys_sendmsg+0x806/0x930 net/socket.c:2136
  __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
  __do_sys_sendmsg net/socket.c:2183 [inline]
  __se_sys_sendmsg net/socket.c:2181 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 30745:
  save_stack+0x45/0xd0 mm/kasan/common.c:75
  set_track mm/kasan/common.c:87 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:460
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:468
  __cache_free mm/slab.c:3491 [inline]
  kfree+0xcf/0x230 mm/slab.c:3816
  sctp_stream_outq_migrate+0x3e6/0x540 net/sctp/stream.c:88
  sctp_stream_init+0xbc/0x410 net/sctp/stream.c:139
  sctp_process_init+0x21c3/0x2b20 net/sctp/sm_make_chunk.c:2466
  sctp_cmd_process_init net/sctp/sm_sideeffect.c:682 [inline]
  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1410 [inline]
  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
  sctp_do_sm+0x3145/0x5380 net/sctp/sm_sideeffect.c:1191
  sctp_assoc_bh_rcv+0x343/0x660 net/sctp/associola.c:1074
  sctp_inq_push+0x1ea/0x290 net/sctp/inqueue.c:95
  sctp_backlog_rcv+0x196/0xbe0 net/sctp/input.c:354
  sk_backlog_rcv include/net/sock.h:937 [inline]
  __release_sock+0x12e/0x3a0 net/core/sock.c:2379
  release_sock+0x59/0x1c0 net/core/sock.c:2895
  sctp_wait_for_connect+0x316/0x540 net/sctp/socket.c:8998
  sctp_sendmsg_to_asoc+0x13e3/0x17b0 net/sctp/socket.c:1967
  sctp_sendmsg+0x10a9/0x17e0 net/sctp/socket.c:2113
  inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg+0xdd/0x130 net/socket.c:631
  ___sys_sendmsg+0x806/0x930 net/socket.c:2136
  __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
  __do_sys_sendmsg net/socket.c:2183 [inline]
  __se_sys_sendmsg net/socket.c:2181 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88807b19a780
  which belongs to the cache kmalloc-96 of size 96
The buggy address is located 56 bytes inside of
  96-byte region [ffff88807b19a780, ffff88807b19a7e0)
The buggy address belongs to the page:
page:ffffea0001ec6680 count:1 mapcount:0 mapping:ffff88812c3f04c0  
index:0xffff88807b19a800
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea000262acc8 ffffea0001448348 ffff88812c3f04c0
raw: ffff88807b19a800 ffff88807b19a000 000000010000001d 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88807b19a680: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
  ffff88807b19a700: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
> ffff88807b19a780: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                         ^
  ffff88807b19a800: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
  ffff88807b19a880: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: [PATCH V1 net 0/2] net: ena: race condition bug fix and version update
From: David Miller @ 2019-02-12 19:06 UTC (permalink / raw)
  To: akiyano
  Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, gtzalik,
	netanel, alisaidi
In-Reply-To: <1549905464-13758-1-git-send-email-akiyano@amazon.com>

From: <akiyano@amazon.com>
Date: Mon, 11 Feb 2019 19:17:42 +0200

> From: Arthur Kiyanovski <akiyano@amazon.com>
> 
> This patchset includes a fix to a race condition that can cause
> kernel panic, as well as a driver version update because of this
> fix.

Series applied and patch #1 queued up for -stable.

But I want to reiterate what Andrew said, the version is so increibly
useless and stupid.

I'm going to submit the fix to -stable, and then people will then
doubly and triply have no relationship between driver version number
and what fixes exist.

^ permalink raw reply

* Re: [PATCH net-next v5 09/12] socket: Add SO_TIMESTAMPING_NEW
From: Deepa Dinamani @ 2019-02-12 19:08 UTC (permalink / raw)
  To: Ran Rozenstein
  Cc: davem@davemloft.net, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, arnd@arndb.de, y2038@lists.linaro.org,
	chris@zankel.net, fenghua.yu@intel.com, rth@twiddle.net,
	tglx@linutronix.de, ubraun@linux.ibm.com,
	linux-alpha@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-ia64@vger.kernel.org, linux-mips@linux-mips.org,
	linux-s390@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	sparclinux@vger.kernel.org
In-Reply-To: <CABeXuvqMR7T=3ORvXPihkz-WbTN+oFeHkCu9uvebEq9wTLpJuQ@mail.gmail.com>

On Sun, Feb 10, 2019 at 7:21 PM Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>
> On Feb 10, 2019, at 7:43 AM, Ran Rozenstein <ranro@mellanox.com> wrote:
>
> >> Subject: [PATCH net-next v5 09/12] socket: Add SO_TIMESTAMPING_NEW
> >>
> >> Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
> >> This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all
> >> architectures.
> >>
> >> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> >> Acked-by: Willem de Bruijn <willemb@google.com>
> >
> >
> > Hi,
> >
> > I have app that include:
> >    #include <linux/errqueue.h>
> >
> > It now fail with this error:
> >    In file included from timestamping.c:6:0:
> >    /usr/include/linux/errqueue.h:46:27: error: array type has incomplete element type 'struct __kernel_timespec'
> >      struct __kernel_timespec ts[3];
> >                                       ^~
> > I tried to do the trivial fix, to include time.h:
> > In include/uapi/linux/errqueue.h
> >    #include <linux/time.h>
> >    #include <linux/types.h>
> >
> > But it just add some other noises:
> >                In file included from /usr/include/linux/errqueue.h:5:0,
> >                                 from timestamping.c:6:
> >                /usr/include/linux/time.h:10:8: error: redefinition of ?struct timespec?
> >                 struct timespec {
> >                        ^~~~~~~~
> >                In file included from /usr/include/sys/select.h:39:0,
> >                                 from /usr/include/sys/types.h:197,
> >                                 from /usr/include/stdlib.h:279,
> >                                 from timestamping.c:2:
> >                /usr/include/bits/types/struct_timespec.h:8:8: note: originally defined here
> >                 struct timespec
> >                        ^~~~~~~~
> >                In file included from /usr/include/linux/errqueue.h:5:0,
> >                                 from timestamping.c:6:
> >                /usr/include/linux/time.h:16:8: error: redefinition of ?struct timeval?
> >                 struct timeval {
> >                        ^~~~~~~
> >                In file included from /usr/include/sys/select.h:37:0,
> >                                 from /usr/include/sys/types.h:197,
> >                                 from /usr/include/stdlib.h:279,
> >                                 from timestamping.c:2:
> >                /usr/include/bits/types/struct_timeval.h:8:8: note: originally defined here
> >                 struct timeval
> >                        ^~~~~~~
> >
> >
> > Can you please advise how to solve it?
> >
> > Thanks,
> > Ran
>
> The errqueue.h already had the same issue reported previously:
> https://lore.kernel.org/netdev/CAF=yD-L2ntuH54J_SwN9WcpBMgkV_v0e-Q2Pu2mrQ3+1RozGFQ@mail.gmail.com/
>
> Earlier when I tested this with kernel selftests such as
> tools/testing/selftests/networking/timestamping/rxtimestamp(the test
> was broken to begin with because of  missing include of unistd.h), I
> was using make.cross to build.
> This does not put the headers in the right place
> (obj-$ARCH/usr/include instead of usr/include). Hence, I did not
> realize that this breaks the inclusion of errqueue.h due to the
> missing __kernel_timespec definition.
> I forgot that nobody seems to be using linux/time.h.
>
> But, if I include guards( #ifndef __KERNEL__) for struct timespec,
> struct timeval etc for linux/time.h, then we can include it from
> userspace/ errqueue.h for __kernel_timespec:
>
> --- a/include/uapi/linux/errqueue.h
> +++ b/include/uapi/linux/errqueue.h
> @@ -2,7 +2,7 @@
>  #ifndef _UAPI_LINUX_ERRQUEUE_H
>  #define _UAPI_LINUX_ERRQUEUE_H
>
> -#include <linux/types.h>
> +#include <linux/time.h>
>
>  struct sock_extended_err {
>         __u32   ee_errno;
> diff --git a/include/uapi/linux/time.h b/include/uapi/linux/time.h
> index a6aca9aaab80..40913d9a5bc8 100644
> --- a/include/uapi/linux/time.h
> +++ b/include/uapi/linux/time.h
> @@ -5,6 +5,8 @@
>  #include <linux/types.h>
>
>
> +#ifdef __KERNEL__
> +
>  #ifndef _STRUCT_TIMESPEC
>  #define _STRUCT_TIMESPEC
>  struct timespec {
> @@ -42,6 +44,8 @@ struct itimerval {
>         struct timeval it_value;        /* current value */
>  };
>
> +#endif /* __KERNEL__ */
>
> Arnd,
>
> I forgot how we plan to include the definition for __kernel_timespec
> for libc or userspace. Does this seem right to you?
> Also these changes to errqueue.h needs to be reverted probably as this
> breaks userspace.

Arnd and I talked about this today morning.
We agreed that we could introduce a new time_types.h along the lines
of posix_types.h. We will move all the time definitions that we plan
to keep in the kernel uapi headers to this header. This header will
also not have any overlap with the sys/time.h and can be included
along with it from userspace.
I will post this patch shortly.

This should fix Ran's issue.

-Deepa

^ permalink raw reply

* Re: [Patch net v2 0/3] net_sched: some fixes for cls_tcindex
From: David Miller @ 2019-02-12 19:14 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev
In-Reply-To: <20190211210616.9592-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 11 Feb 2019 13:06:13 -0800

> This patchset contains 3 bug fixes for tcindex filter. Please check
> each patch for details.
> 
> v2: fix a compile error in patch 2
>     drop netns refcnt in patch 1
> 
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Series applied and queued up for -stable, thanks Cong.

^ permalink raw reply

* Re: [PATCH net-next 4/4] net: phy: Add generic support for 2.5GBaseT and 5GBaseT
From: Heiner Kallweit @ 2019-02-12 19:14 UTC (permalink / raw)
  To: Maxime Chevallier, davem
  Cc: netdev, linux-kernel, Andrew Lunn, Florian Fainelli, Russell King,
	linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190211142529.22885-5-maxime.chevallier@bootlin.com>

On 11.02.2019 15:25, Maxime Chevallier wrote:
> The 802.3bz specification, based on previous by the NBASET alliance,
> defines the 2.5GBaseT and 5GBaseT link modes for ethernet traffic on
> cat5e, cat6 and cat7 cables.
> 
> These mode integrate with the already defined C45 MDIO PMA/PMD registers
> set that added 10G support, by defining some previously reserved bits,
> and adding a new register (2.5G/5G Extended abilities).
> 
> This commit adds the required definitions in include/uapi/linux/mdio.h
> to support these modes, and detect when a link-partner advertises them.
> 
> It also adds support for these mode in the generic C45 PHY
> infrastructure.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> ---
>  drivers/net/phy/phy-c45.c | 37 +++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/mdio.h | 16 ++++++++++++++++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
> index 6f028de4dae1..7af5fa81daf6 100644
> --- a/drivers/net/phy/phy-c45.c
> +++ b/drivers/net/phy/phy-c45.c
> @@ -47,6 +47,16 @@ int genphy_c45_pma_setup_forced(struct phy_device *phydev)
>  		/* Assume 1000base-T */
>  		ctrl2 |= MDIO_PMA_CTRL2_1000BT;
>  		break;
> +	case SPEED_2500:
> +		ctrl1 |= MDIO_CTRL1_SPEED2_5G;
> +		/* Assume 2.5Gbase-T */
> +		ctrl2 |= MDIO_PMA_CTRL2_2_5GBT;
> +		break;
> +	case SPEED_5000:
> +		ctrl1 |= MDIO_CTRL1_SPEED5G;
> +		/* Assume 5Gbase-T */
> +		ctrl2 |= MDIO_PMA_CTRL2_5GBT;
> +		break;
>  	case SPEED_10000:
>  		ctrl1 |= MDIO_CTRL1_SPEED10G;
>  		/* Assume 10Gbase-T */
> @@ -194,6 +204,12 @@ int genphy_c45_read_lpa(struct phy_device *phydev)
>  	if (val < 0)
>  		return val;
>  
> +	if (val & MDIO_AN_10GBT_STAT_LP2_5G)
> +		linkmode_set_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
> +				 phydev->lp_advertising);
> +	if (val & MDIO_AN_10GBT_STAT_LP5G)
> +		linkmode_set_bit(ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
> +				 phydev->lp_advertising);
>  	if (val & MDIO_AN_10GBT_STAT_LP10G)
>  		linkmode_set_bit(ETHTOOL_LINK_MODE_10000baseT_Full_BIT,
>  				 phydev->lp_advertising);
> @@ -224,6 +240,12 @@ int genphy_c45_read_pma(struct phy_device *phydev)
>  	case MDIO_PMA_CTRL1_SPEED1000:
>  		phydev->speed = SPEED_1000;
>  		break;
> +	case MDIO_CTRL1_SPEED2_5G:
> +		phydev->speed = SPEED_2500;
> +		break;
> +	case MDIO_CTRL1_SPEED5G:
> +		phydev->speed = SPEED_5000;
> +		break;
>  	case MDIO_CTRL1_SPEED10G:
>  		phydev->speed = SPEED_10000;
>  		break;
> @@ -339,6 +361,21 @@ int genphy_c45_pma_read_abilities(struct phy_device *phydev)
>  		linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
>  				 phydev->supported,
>  				 val & MDIO_PMA_EXTABLE_10BT);
> +
> +		if (val & MDIO_PMA_EXTABLE_NBT) {
> +			val = phy_read_mmd(phydev, MDIO_MMD_PMAPMD,
> +					   MDIO_PMA_NG_EXTABLE);
> +			if (val < 0)
> +				return val;
> +
> +			linkmode_mod_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
> +					 phydev->supported,
> +					 val & MDIO_PMA_NG_EXTABLE_2_5GBT);
> +
> +			linkmode_mod_bit(ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
> +					 phydev->supported,
> +					 val & MDIO_PMA_NG_EXTABLE_5GBT);
> +		}
>  	}
>  
>  	return 0;
> diff --git a/include/uapi/linux/mdio.h b/include/uapi/linux/mdio.h
> index 0e012b168e4d..0a552061ff1c 100644
> --- a/include/uapi/linux/mdio.h
> +++ b/include/uapi/linux/mdio.h
> @@ -45,6 +45,7 @@
>  #define MDIO_AN_ADVERTISE	16	/* AN advertising (base page) */
>  #define MDIO_AN_LPA		19	/* AN LP abilities (base page) */
>  #define MDIO_PCS_EEE_ABLE	20	/* EEE Capability register */
> +#define MDIO_PMA_NG_EXTABLE	21	/* 2.5G/5G PMA/PMD extended ability */
>  #define MDIO_PCS_EEE_WK_ERR	22	/* EEE wake error counter */
>  #define MDIO_PHYXS_LNSTAT	24	/* PHY XGXS lane state */
>  #define MDIO_AN_EEE_ADV		60	/* EEE advertisement */
> @@ -92,6 +93,10 @@
>  #define MDIO_CTRL1_SPEED10G		(MDIO_CTRL1_SPEEDSELEXT | 0x00)
>  /* 10PASS-TS/2BASE-TL */
>  #define MDIO_CTRL1_SPEED10P2B		(MDIO_CTRL1_SPEEDSELEXT | 0x04)
> +/* 2.5 Gb/s */
> +#define MDIO_CTRL1_SPEED2_5G		(MDIO_CTRL1_SPEEDSELEXT | 0x18)
> +/* 5 Gb/s */
> +#define MDIO_CTRL1_SPEED5G		(MDIO_CTRL1_SPEEDSELEXT | 0x1c)
>  
>  /* Status register 1. */
>  #define MDIO_STAT1_LPOWERABLE		0x0002	/* Low-power ability */
> @@ -145,6 +150,8 @@
>  #define MDIO_PMA_CTRL2_1000BKX		0x000d	/* 1000BASE-KX type */
>  #define MDIO_PMA_CTRL2_100BTX		0x000e	/* 100BASE-TX type */
>  #define MDIO_PMA_CTRL2_10BT		0x000f	/* 10BASE-T type */
> +#define MDIO_PMA_CTRL2_2_5GBT		0x0030  /* 2.5GBaseT type */
> +#define MDIO_PMA_CTRL2_5GBT		0x0031  /* 5GBaseT type */
>  #define MDIO_PCS_CTRL2_TYPE		0x0003	/* PCS type selection */
>  #define MDIO_PCS_CTRL2_10GBR		0x0000	/* 10GBASE-R type */
>  #define MDIO_PCS_CTRL2_10GBX		0x0001	/* 10GBASE-X type */
> @@ -198,6 +205,7 @@
>  #define MDIO_PMA_EXTABLE_1000BKX	0x0040	/* 1000BASE-KX ability */
>  #define MDIO_PMA_EXTABLE_100BTX		0x0080	/* 100BASE-TX ability */
>  #define MDIO_PMA_EXTABLE_10BT		0x0100	/* 10BASE-T ability */
> +#define MDIO_PMA_EXTABLE_NBT		0x4000  /* 2.5/5GBASE-T ability */
>  
>  /* PHY XGXS lane state register. */
>  #define MDIO_PHYXS_LNSTAT_SYNC0		0x0001
> @@ -234,9 +242,13 @@
>  #define MDIO_PCS_10GBRT_STAT2_BER	0x3f00
>  
>  /* AN 10GBASE-T control register. */
> +#define MDIO_AN_10GBT_CTRL_ADV2_5G	0x0080	/* Advertise 2.5GBASE-T */
> +#define MDIO_AN_10GBT_CTRL_ADV5G	0x0100	/* Advertise 5GBASE-T */
>  #define MDIO_AN_10GBT_CTRL_ADV10G	0x1000	/* Advertise 10GBASE-T */
>  
>  /* AN 10GBASE-T status register. */
> +#define MDIO_AN_10GBT_STAT_LP2_5G	0x0020  /* LP is 2.5GBT capable */
> +#define MDIO_AN_10GBT_STAT_LP5G		0x0040  /* LP is 5GBT capable */
>  #define MDIO_AN_10GBT_STAT_LPTRR	0x0200	/* LP training reset req. */
>  #define MDIO_AN_10GBT_STAT_LPLTABLE	0x0400	/* LP loop timing ability */
>  #define MDIO_AN_10GBT_STAT_LP10G	0x0800	/* LP is 10GBT capable */
> @@ -265,6 +277,10 @@
>  #define MDIO_EEE_10GKX4		0x0020	/* 10G KX4 EEE cap */
>  #define MDIO_EEE_10GKR		0x0040	/* 10G KR EEE cap */
>  
> +/* 2.5G/5G Extended abilities register. */
> +#define MDIO_PMA_NG_EXTABLE_2_5GBT	0x0001	/* 2.5GBASET ability */
> +#define MDIO_PMA_NG_EXTABLE_5GBT	0x0002	/* 5GBASET ability */
> +
>  /* LASI RX_ALARM control/status registers. */
>  #define MDIO_PMA_LASI_RX_PHYXSLFLT	0x0001	/* PHY XS RX local fault */
>  #define MDIO_PMA_LASI_RX_PCSLFLT	0x0008	/* PCS RX local fault */
> 
Looks good to me.

Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>

Heiner

^ permalink raw reply

* Re: KASAN: use-after-free Read in sctp_outq_tail
From: Marcelo Ricardo Leitner @ 2019-02-12 19:19 UTC (permalink / raw)
  To: syzbot
  Cc: davem, linux-kernel, linux-sctp, netdev, nhorman, syzkaller-bugs,
	vyasevich
In-Reply-To: <0000000000002015db0581b71858@google.com>

On Tue, Feb 12, 2019 at 11:04:27AM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    d4104460aec1 Add linux-next specific files for 20190211
> git tree:       linux-next

I can't find this commit. How can I know for sure which commits are in?
Recent patches are involved with this code, but not sure what was
included on the test.

  Marcelo

^ permalink raw reply

* Re: [Patch net] team: avoid complex list operations in team_nl_cmd_options_set()
From: David Miller @ 2019-02-12 19:21 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: netdev, syzbot+4d4af685432dc0e56c91, syzbot+68ee510075cf64260cc4,
	jiri, pabeni
In-Reply-To: <20190212055951.6712-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 11 Feb 2019 21:59:51 -0800

> The current opt_inst_list operations inside team_nl_cmd_options_set()
> is too complex to track:
> 
>     LIST_HEAD(opt_inst_list);
>     nla_for_each_nested(...) {
>         list_for_each_entry(opt_inst, &team->option_inst_list, list) {
>             if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
>                 continue;
>             list_add(&opt_inst->tmp_list, &opt_inst_list);
>         }
>     }
>     team_nl_send_event_options_get(team, &opt_inst_list);
> 
> as while we retrieve 'opt_inst' from team->option_inst_list, it could
> be added to the local 'opt_inst_list' for multiple times. The
> __team_option_inst_tmp_find() doesn't work, as the setter
> team_mode_option_set() still calls team->ops.exit() which uses
> ->tmp_list too in __team_options_change_check().
> 
> Simplify the list operations by moving the 'opt_inst_list' and
> team_nl_send_event_options_get() into the nla_for_each_nested() loop so
> that it can be guranteed that we won't insert a same list entry for
> multiple times. Therefore, __team_option_inst_tmp_find() can be removed
> too.
> 
> Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list")
> Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
> Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
> Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable, thanks Cong.

^ permalink raw reply

* Re: [PATCH net-next 1/2] devlink: Return right error code in case of errors for region read
From: David Miller @ 2019-02-12 19:24 UTC (permalink / raw)
  To: parav; +Cc: jiri, netdev
In-Reply-To: <1549955359-15786-1-git-send-email-parav@mellanox.com>

From: Parav Pandit <parav@mellanox.com>
Date: Tue, 12 Feb 2019 01:09:19 -0600

> devlink_nl_cmd_region_read_dumpit() misses to return right error code on
> most error conditions.
> Return the right error code on such errors.
> 
> Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

This has conflicts with the locking changes I applied earlier today.

Please respin, thanks.

^ permalink raw reply

* Re: [PATCH net] net: phy: fix interrupt handling in non-started states
From: Andrew Lunn @ 2019-02-12 19:37 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org,
	Russell King - ARM Linux
In-Reply-To: <25e86edc-0b88-8c03-b692-776e971331f2@gmail.com>

On Tue, Feb 12, 2019 at 07:56:15PM +0100, Heiner Kallweit wrote:
> phylib enables interrupts before phy_start() has been called, and if
> we receive an interrupt in a non-started state, the interrupt handler
> returns IRQ_NONE. This causes problems with at least one Marvell chip
> as reported by Andrew.
> Fix this by handling interrupts the same as in phy_mac_interrupt(),
> basically always running the phylib state machine. It knows when it
> has to do something and when not.
> This change allows to handle interrupts gracefully even if they
> occur in a non-started state.
> 
> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
> Reported-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/net/phy/phy.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index 189cd2048c3a..ca5e0c0f018c 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -762,9 +762,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>  {
>  	struct phy_device *phydev = phy_dat;
>  
> -	if (!phy_is_started(phydev))
> -		return IRQ_NONE;		/* It can't be ours.  */
> -
>  	if (phydev->drv->did_interrupt && !phydev->drv->did_interrupt(phydev))
>  		return IRQ_NONE;

Hi Heiner

Did you look at

ommit 3c3070d713d798f7f9e7ee3614e49b47655d14d8
Author: Maciej W. Rozycki <macro@linux-mips.org>
Date:   Tue Oct 3 16:18:35 2006 +0100

    [PATCH] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
    
     This patch fixes a couple of problems discovered with interrupt handling
    in the phylib core, namely:
    
    1. The driver uses timer and workqueue calls, but does not include
       <linux/timer.h> nor <linux/workqueue.h>.
    
    2. The driver uses schedule_work() for handling interrupts, but does not
       make sure any pending work scheduled thus has been completed before
       driver's structures get freed from memory.  This is especially
       important as interrupts may keep arriving if the line is shared with
       another PHY.
    
       The solution is to ignore phy_interrupt() calls if the reported device
       has already been halted and calling flush_scheduled_work() from
       phy_stop_interrupts() (but guarded with current_is_keventd() in case
       the function has been called through keventd from the MAC device's
       close call to avoid a deadlock on the netlink lock).
    
    Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
    
    patch-mips-2.6.18-20060920-phy-irq-16
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

There has been a lot of change since then, so maybe this is no longer
an issue?

   Thanks
	Andrew

^ permalink raw reply

* patch for ip6_input.c
From: Farrell.Woods @ 2019-02-12 19:53 UTC (permalink / raw)
  To: netdev

Folks,

I'm proposing the following patch for ip6_input.c:

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index c7ed2b6..5aba6a6 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -409,12 +409,10 @@ void ip6_protocol_deliver_rcu(struct net *net, 
struct sk_buff *skb, int nexthdr,
                 }
         } else {
                 if (!raw) {
-                       if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-                               __IP6_INC_STATS(net, idev,
- IPSTATS_MIB_INUNKNOWNPROTOS);
-                               icmpv6_send(skb, ICMPV6_PARAMPROB,
-                                           ICMPV6_UNK_NEXTHDR, nhoff);
-                       }
+                       __IP6_INC_STATS(net, idev,
+                               IPSTATS_MIB_INUNKNOWNPROTOS);
+                       icmpv6_send(skb, ICMPV6_PARAMPROB,
+                               ICMPV6_UNK_NEXTHDR, nhoff);
                         kfree_skb(skb);
                 } else {
                         __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDELIVERS);

The patch fixes an IPv6 conformance test failure (v6LC_1_2_03a in the 
UNH INTACT suite) that occurs specifically when IPsec is in use.  The 
test iterates through the set of unassigned protocol numbers (currently, 
143 through 252) and inserts these into the next header field of a 
Destination Options header.  The expected test result is that an ICMPv6 
Parameter Problem is sent back.  But if there's a policy in place that 
requires an active SA between the Test Node and the Device Under Test 
(and none exists), the inbound packet is quietly dropped.

This behavior is inconsistent with, for example, how unknown tlv's are 
handled in extension headers (see the tlv parsing code in 
ipv6/exthdrs.c) or for instance how misaligned fragment headers are 
handled.  These will always cause a Parameter Problem message to get 
sent back to the source.

I have verified that with the policy check removed, that the unit test 
passes.

FYI here's a trace of the test in question:

No.     Time           Source Destination           Protocol Length Info
       1 0.000000000    fe80::200:10ff:fe10:1080 
fe80::260:16ff:fe97:ebf2 IPv6     71 *Unknown IP Protocol: Unassigned (143)*

Frame 1: 71 bytes on wire (568 bits), 71 bytes captured (568 bits) on 
interface 0
     Interface id: 0 (unknown)
         Interface name: unknown
     Encapsulation type: Ethernet (1)
     Arrival Time: Feb  6, 2019 13:27:29.949609000 EST
     [Time shift for this packet: 0.000000000 seconds]
     Epoch Time: 1549477649.949609000 seconds
     [Time delta from previous captured frame: 0.000000000 seconds]
     [Time delta from previous displayed frame: 0.000000000 seconds]
     [Time since reference or first frame: 0.000000000 seconds]
     Frame Number: 1
     Frame Length: 71 bytes (568 bits)
     Capture Length: 71 bytes (568 bits)
     [Frame is marked: False]
     [Frame is ignored: False]
     [Protocols in frame: eth:ethertype:ipv6:ipv6.dstopts:data]
Ethernet II, Src: Sytek_10:10:80 (00:00:10:10:10:80), Dst: 
Clariion_97:eb:f2 (00:60:16:97:eb:f2)
     Destination: Clariion_97:eb:f2 (00:60:16:97:eb:f2)
         Address: Clariion_97:eb:f2 (00:60:16:97:eb:f2)
         .... ..0. .... .... .... .... = LG bit: Globally unique address 
(factory default)
         .... ...0 .... .... .... .... = IG bit: Individual address 
(unicast)
     Source: Sytek_10:10:80 (00:00:10:10:10:80)
         Address: Sytek_10:10:80 (00:00:10:10:10:80)
         .... ..0. .... .... .... .... = LG bit: Globally unique address 
(factory default)
         .... ...0 .... .... .... .... = IG bit: Individual address 
(unicast)
     Type: IPv6 (0x86dd)
Internet Protocol Version 6, Src: fe80::200:10ff:fe10:1080, Dst: 
fe80::260:16ff:fe97:ebf2
     0110 .... = Version: 6
     .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 
(DSCP: CS0, ECN: Not-ECT)
         .... 0000 00.. .... .... .... .... .... = Differentiated 
Services Codepoint: Default (0)
         .... .... ..00 .... .... .... .... .... = Explicit Congestion 
Notification: Not ECN-Capable Transport (0)
     .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
     Payload Length: 17
     Next Header: Destination Options for IPv6 (60)
     Hop Limit: 255
     Source: fe80::200:10ff:fe10:1080
     Destination: fe80::260:16ff:fe97:ebf2
     [Source SA MAC: Sytek_10:10:80 (00:00:10:10:10:80)]
     [Destination SA MAC: Clariion_97:eb:f2 (00:60:16:97:eb:f2)]
     Destination Options for IPv6
*Next Header: Unassigned (143**)*
         Length: 0
         [Length: 8 bytes]
         PadN
             Type: PadN (0x01)
                 00.. .... = Action: Skip and continue (0)
                 ..0. .... = May Change: No
                 ...0 0001 = Low-Order Bits: 0x01
             Length: 4
             PadN: 00000000
Data (9 bytes)

0000  80 00 5c eb 00 00 00 00 00 ..\......
     Data: 80005ceb0000000000
     [Length: 9]

I am working on a product that will ship with IPsec enabled and with a 
set of traffic selectors in place that will exclude most inbound 
traffic.  Since this is how it will ship to the customer, we must leave 
IPsec enabled when this goes to UNH for USGv6 certification.


Thanks for your consideration.

     -- Farrell

^ permalink raw reply related

* linux-next: Fixes tag needs some work in the net tree
From: Stephen Rothwell @ 2019-02-12 20:04 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Li RongQing

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

Hi all,

In commit

  d1f20798a119 ("ipv6: propagate genlmsg_reply return code")

Fixes tag

  Fixes: 915d7e5e593 ("ipv6: sr: add code base for control plane support of SR-IPv6")

has these problem(s):

  - SHA1 should be at least 12 digits long
    Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
    or later) just making sure it is not set (or set to "auto")

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net] net: phy: fix interrupt handling in non-started states
From: Heiner Kallweit @ 2019-02-12 20:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org,
	Russell King - ARM Linux
In-Reply-To: <20190212193716.GD7527@lunn.ch>

On 12.02.2019 20:37, Andrew Lunn wrote:
> On Tue, Feb 12, 2019 at 07:56:15PM +0100, Heiner Kallweit wrote:
>> phylib enables interrupts before phy_start() has been called, and if
>> we receive an interrupt in a non-started state, the interrupt handler
>> returns IRQ_NONE. This causes problems with at least one Marvell chip
>> as reported by Andrew.
>> Fix this by handling interrupts the same as in phy_mac_interrupt(),
>> basically always running the phylib state machine. It knows when it
>> has to do something and when not.
>> This change allows to handle interrupts gracefully even if they
>> occur in a non-started state.
>>
>> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
>> Reported-by: Andrew Lunn <andrew@lunn.ch>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/phy/phy.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index 189cd2048c3a..ca5e0c0f018c 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -762,9 +762,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>>  {
>>  	struct phy_device *phydev = phy_dat;
>>  
>> -	if (!phy_is_started(phydev))
>> -		return IRQ_NONE;		/* It can't be ours.  */
>> -
>>  	if (phydev->drv->did_interrupt && !phydev->drv->did_interrupt(phydev))
>>  		return IRQ_NONE;
> 
> Hi Heiner
> 
> Did you look at
> 
Thanks for this interesting old read.

> ommit 3c3070d713d798f7f9e7ee3614e49b47655d14d8
> Author: Maciej W. Rozycki <macro@linux-mips.org>
> Date:   Tue Oct 3 16:18:35 2006 +0100
> 
>     [PATCH] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
>     
>      This patch fixes a couple of problems discovered with interrupt handling
>     in the phylib core, namely:
>     
>     1. The driver uses timer and workqueue calls, but does not include
>        <linux/timer.h> nor <linux/workqueue.h>.
>     
>     2. The driver uses schedule_work() for handling interrupts, but does not
>        make sure any pending work scheduled thus has been completed before
>        driver's structures get freed from memory.  This is especially
>        important as interrupts may keep arriving if the line is shared with
>        another PHY.
>     
Since that time this have massively changed and I *think* this doesn't apply
any longer. cancel_delayed_work_sync() is called always before driver
structure is freed.

>        The solution is to ignore phy_interrupt() calls if the reported device
>        has already been halted and calling flush_scheduled_work() from
>        phy_stop_interrupts() (but guarded with current_is_keventd() in case
>        the function has been called through keventd from the MAC device's
>        close call to avoid a deadlock on the netlink lock).
>     
>     Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
>     
>     patch-mips-2.6.18-20060920-phy-irq-16
>     Signed-off-by: Jeff Garzik <jeff@garzik.org>
> 
> There has been a lot of change since then, so maybe this is no longer
> an issue?
> 
AFAICS, yes. I'll have a closer look at the scenario described by Russell,
this however should be independent of the patch here.

>    Thanks
> 	Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: John David Anglin @ 2019-02-12 20:09 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit
  Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190212125644.GA7527@lunn.ch>

On 2019-02-12 7:56 a.m., Andrew Lunn wrote:
> It is a little bit old, 5.0-rc1 net-next. I should rebase and
> retest. I'm testing on a ZII board which is not fully in mainline So i
> need some patches.
I attempted to test 5.0.0-rc5 on espressobin but it dies on boot:
Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.0.0-rc5+ (root@espressobin) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PREEMPT Tue Feb 12
10:02:30 EST 2019
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Board
[    0.000000] earlycon: ar3700_uart0 at MMIO 0x00000000d0012000 (options '')
[    0.000000] printk: bootconsole [ar3700_uart0] enabled
[    3.218865] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    3.224521] Modules linked in:
[    3.227661] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5+ #1
[    3.234107] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[    3.240740] pstate: 20000005 (nzCv daif -PAN -UAO)
[    3.245676] pc : dma_direct_map_page+0x48/0x1d8
[    3.250331] lr : mv_xor_channel_add+0x3b0/0xb28
[    3.254984] sp : ffffff8010033a60
[    3.258389] x29: ffffff8010033a60 x28: ffffffc03bf70480
[    3.263854] x27: ffffff8010e97068 x26: 0000000000000000
[    3.269320] x25: 0000000000000029 x24: 0000000000000083
[    3.274785] x23: 0000000000000000 x22: 0000000000000002
[    3.280251] x21: 0000000000000080 x20: ffffff8010ecd000
[    3.285716] x19: 0000000000000000 x18: ffffffffffffffff
[    3.291182] x17: 000000000000000c x16: 000000000000000a
[    3.296648] x15: ffffff8010ecd6c8 x14: ffffffc03bf46783
[    3.302113] x13: ffffffc03bf46782 x12: 0000000000000038
[    3.307579] x11: 0000000000001fff x10: 0000000000000001
[    3.313044] x9 : 0000000000000000 x8 : ffffff8010dbe000
[    3.318510] x7 : ffffff8010fbe000 x6 : ffffffbf00000000
[    3.323976] x5 : 0000000000000000 x4 : 0000000000000002
[    3.329441] x3 : 0000000000000002 x2 : 00000000000006ac
[    3.334907] x1 : ffffffbf00efdc00 x0 : 0000000000000000
[    3.340373] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[    3.347272] Call trace:
[    3.349784]  dma_direct_map_page+0x48/0x1d8
[    3.354084]  mv_xor_channel_add+0x3b0/0xb28
[    3.358384]  mv_xor_probe+0x20c/0x4b8
[    3.362150]  platform_drv_probe+0x50/0xb0
[    3.366269]  really_probe+0x1fc/0x2c0
[    3.370032]  driver_probe_device+0x58/0x100
[    3.374332]  __driver_attach+0xd8/0xe0
[    3.378188]  bus_for_each_dev+0x68/0xc8
[    3.382129]  driver_attach+0x20/0x28
[    3.385803]  bus_add_driver+0x108/0x228
[    3.389744]  driver_register+0x60/0x110
[    3.393687]  __platform_driver_register+0x44/0x50
[    3.398529]  mv_xor_driver_init+0x18/0x20
[    3.402648]  do_one_initcall+0x58/0x170
[    3.406591]  kernel_init_freeable+0x190/0x234
[    3.411072]  kernel_init+0x10/0x108
[    3.414653]  ret_from_fork+0x10/0x1c
[    3.418329] Code: 2a0403f6 934cfc00 aa0503f7 7100047f (f9412663)
[    3.424610] ---[ end trace f7751570455a07a0 ]---
[    3.429423] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    3.437232] SMP: stopping secondary CPUs
[    3.441265] Kernel Offset: disabled
[    3.444849] CPU features: 0x002,2000200c
[    3.448878] Memory Limit: none
[    3.452018] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: Heiner Kallweit @ 2019-02-12 20:11 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Andrew Lunn, John David Anglin, Vivien Didelot, Florian Fainelli,
	netdev
In-Reply-To: <20190212163017.lwstmgtyw76cwrd7@shell.armlinux.org.uk>

On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
>> On 12.02.2019 04:58, Andrew Lunn wrote:
>>> That change means we don't check the PHY device if it caused an
>>> interrupt when its state is less than UP.
>>>
>>> What i'm seeing is that the PHY is interrupting pretty early on after
>>> a reboot when the previous boot had the interface up.
>>>
>> So this means that when going down for reboot the interrupts are not
>> properly masked / disabled? Because (at least for net-next) we enable
>> interrupts in phy_start() only.
> 
> Looking at Linus' tree as opposed to net-next, things do look rather
> broken wrt interrupts:
> 
> +-phy_attach_direct
>   `-phydev->state = PHY_READY
> +-phy_prepare_link
> +-phy_start_machine
>   `-phy_trigger_machine()
> `-phy_start_interrupts
>   +-request_threaded_irq()
>   `-phy_enable_interrupts()
>     +-phy_clear_interrupt()
>     `-phy_config_interrupt(, PHY_INTERRUPT_ENABLED)
> 
> At this point, the PHY is then able to generate interrupts, which,
> because phy_start() has not been called and phy_interrupt() checks
> that phydev->state >= PHY_UP, get ignored by the interrupt handler
> exactly as Andrew is finding.
> 
> So it looks like 5.0-rc is already in need of this being fixed.
> 
> In looking at this, I came across this chunk of code:
> 
> static inline bool __phy_is_started(struct phy_device *phydev)
> {
>         WARN_ON(!mutex_is_locked(&phydev->lock));
> 
>         return phydev->state >= PHY_UP;
> }
> 
> /**
>  * phy_is_started - Convenience function to check whether PHY is started
>  * @phydev: The phy_device struct
>  */
> static inline bool phy_is_started(struct phy_device *phydev)
> {
>         bool started;
> 
>         mutex_lock(&phydev->lock);
>         started = __phy_is_started(phydev);
>         mutex_unlock(&phydev->lock);
> 
>         return started;
> }
> 
> which looks to me like over-complication.  The mutex locking there is
> completely pointless - what are you trying to achieve with it?
> 
Even though this code is new it's kind of heritage in phylib that each
access (read or write) to phydev->state is protected by this lock.
I also once wondered whether it's actually needed but didn't spend
effort so far on challenging this. Seems that now the time has come ..

> Let's go through this.  The above is exactly equivalent to:
> 
> bool phy_is_started(phydev)
> {
> 	int state;
> 
> 	mutex_lock(&phydev->lock);
> 	state = phydev->state;
> 	mutex_unlock(&phydev->lock);
> 
> 	return state >= PHY_UP;
> }
> 
> since when we do the test is irrelevant.  Architectures that Linux
> runs on are single-copy atomic, which means that reading phydev->state
> itself is an atomic operation.  So, the mutex locking around that
> doesn't add to the atomicity of the entire operation.
> 
> How, depending on what you do with the rest of this function depends
> whether the entire operation is safe or not.  For example, let's take
> this code at the end of phy_state_machine():
> 
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> 
> state = PHY_UP
> 		thread 0			thread 1
> 						phy_disconnect()
> 						+-phy_is_started()
> 		phy_is_started()                |
> 						`-phy_stop()
> 						  +-phydev->state = PHY_HALTED
> 						  `-phy_stop_machine()
> 						    `-cancel_delayed_work_sync()
> 		phy_queue_state_machine()
> 		`-mod_delayed_work()
> 
Thanks for describing this scenario, I'll have a closer look at it.

> At this point, the phydev->state_queue() has been added back onto the
> system workqueue despite phy_stop_machine() having been called and
> cancel_delayed_work_sync() called on it.
> 
> The original code in 4.20 did not have this race condition.
> 
> Basically, the lock inside phy_is_started() does nothing useful, and
> I'd say is dangerously misleading.
> 

Heiner

^ permalink raw reply

* RE: [PATCH net-next 1/2] devlink: Return right error code in case of errors for region read
From: Parav Pandit @ 2019-02-12 20:22 UTC (permalink / raw)
  To: David Miller; +Cc: Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <20190212.112447.1565311464519442439.davem@davemloft.net>



> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> Sent: Tuesday, February 12, 2019 1:25 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 1/2] devlink: Return right error code in case of
> errors for region read
> 
> From: Parav Pandit <parav@mellanox.com>
> Date: Tue, 12 Feb 2019 01:09:19 -0600
> 
> > devlink_nl_cmd_region_read_dumpit() misses to return right error code
> > on most error conditions.
> > Return the right error code on such errors.
> >
> > Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read
> > command")
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > Acked-by: Jiri Pirko <jiri@mellanox.com>
> 
> This has conflicts with the locking changes I applied earlier today.
> 
> Please respin, thanks.
Ok. sending rebased v2.

^ permalink raw reply

* [PATCHv2 net-next 0/2] devlink: 2 fixes for devlink region read
From: Parav Pandit @ 2019-02-12 20:22 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

This 2 patches consist of fixes for devlink region read handling.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
v0->v1:
 - Fixed typo from user to use
v1->v2:
 - Rebased

Parav Pandit (2):
  devlink: Return right error code in case of errors for region read
  devlink: Fix list access without lock while reading region

 net/core/devlink.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* [PATCHv2 net-next 1/2] devlink: Return right error code in case of errors for region read
From: Parav Pandit @ 2019-02-12 20:23 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

devlink_nl_cmd_region_read_dumpit() misses to return right error code on
most error conditions.
Return the right error code on such errors.

Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
v1->v2:
 - Rebased
---
 net/core/devlink.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 283c3ed..312084f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3646,26 +3646,34 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto out_free;
 
 	devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs);
-	if (IS_ERR(devlink))
+	if (IS_ERR(devlink)) {
+		err = PTR_ERR(devlink);
 		goto out_free;
+	}
 
 	mutex_lock(&devlink_mutex);
 	mutex_lock(&devlink->lock);
 
 	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
-	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
+		err = -EINVAL;
 		goto out_unlock;
+	}
 
 	region_name = nla_data(attrs[DEVLINK_ATTR_REGION_NAME]);
 	region = devlink_region_get_by_name(devlink, region_name);
-	if (!region)
+	if (!region) {
+		err = -EINVAL;
 		goto out_unlock;
+	}
 
 	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
 			  DEVLINK_CMD_REGION_READ);
-	if (!hdr)
+	if (!hdr) {
+		err = -EMSGSIZE;
 		goto out_unlock;
+	}
 
 	err = devlink_nl_put_handle(skb, devlink);
 	if (err)
@@ -3676,8 +3684,10 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto nla_put_failure;
 
 	chunks_attr = nla_nest_start(skb, DEVLINK_ATTR_REGION_CHUNKS);
-	if (!chunks_attr)
+	if (!chunks_attr) {
+		err = -EMSGSIZE;
 		goto nla_put_failure;
+	}
 
 	if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] &&
 	    attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]) {
@@ -3700,8 +3710,10 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto nla_put_failure;
 
 	/* Check if there was any progress done to prevent infinite loop */
-	if (ret_offset == start_offset)
+	if (ret_offset == start_offset) {
+		err = -EINVAL;
 		goto nla_put_failure;
+	}
 
 	*((u64 *)&cb->args[0]) = ret_offset;
 
@@ -3720,7 +3732,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	mutex_unlock(&devlink_mutex);
 out_free:
 	kfree(attrs);
-	return 0;
+	return err;
 }
 
 struct devlink_info_req {
-- 
1.8.3.1


^ permalink raw reply related

* [PATCHv2 net-next 2/2] devlink: Fix list access without lock while reading region
From: Parav Pandit @ 2019-02-12 20:24 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

While finding the devlink device during region reading,
devlink device list is accessed and devlink device is
returned without holding a lock. This could lead to use-after-free
accesses.

While at it, add lockdep assert to ensure that all future callers hold
the lock when calling devlink_get_from_attrs().

Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
v0->v1:
 - Fixed typo from user to use
---
 net/core/devlink.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 312084f..1d7502a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -116,6 +116,8 @@ static struct devlink *devlink_get_from_attrs(struct net *net,
 	busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]);
 	devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]);
 
+	lockdep_assert_held(&devlink_mutex);
+
 	list_for_each_entry(devlink, &devlink_list, list) {
 		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
 		    strcmp(dev_name(devlink->dev), devname) == 0 &&
@@ -3645,13 +3647,13 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	if (err)
 		goto out_free;
 
+	mutex_lock(&devlink_mutex);
 	devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs);
 	if (IS_ERR(devlink)) {
 		err = PTR_ERR(devlink);
-		goto out_free;
+		goto out_dev;
 	}
 
-	mutex_lock(&devlink_mutex);
 	mutex_lock(&devlink->lock);
 
 	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
@@ -3729,6 +3731,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	genlmsg_cancel(skb, hdr);
 out_unlock:
 	mutex_unlock(&devlink->lock);
+out_dev:
 	mutex_unlock(&devlink_mutex);
 out_free:
 	kfree(attrs);
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net] net: fix possible overflow in __sk_mem_raise_allocated()
From: Eric Dumazet @ 2019-02-12 20:26 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.

They would increase their share of the memory, instead
of decreasing it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h | 2 +-
 net/core/sock.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2b229f7be8ebbc160706012f7ed03db85c5689d0..f43f935cb113b73c6fc0df35b5f43103ba131ab2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1277,7 +1277,7 @@ static inline void sk_sockets_allocated_inc(struct sock *sk)
 	percpu_counter_inc(sk->sk_prot->sockets_allocated);
 }
 
-static inline int
+static inline u64
 sk_sockets_allocated_read_positive(struct sock *sk)
 {
 	return percpu_counter_read_positive(sk->sk_prot->sockets_allocated);
diff --git a/net/core/sock.c b/net/core/sock.c
index 6aa2e7e0b4fbdbc29d43d6b61a53b8de2a7ba269..bc3512f230a304c97c519a82c69d1e86f115b651 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2380,7 +2380,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 	}
 
 	if (sk_has_memory_pressure(sk)) {
-		int alloc;
+		u64 alloc;
 
 		if (!sk_under_memory_pressure(sk))
 			return 1;
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox