Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: dsa: drop some VLAs in switch.c
From: Kees Cook @ 2018-05-05 18:22 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Salvatore Mesoraca, Florian Fainelli, Vivien Didelot, LKML,
	Kernel Hardening, Network Development, David S. Miller
In-Reply-To: <20180505153905.GA30439@lunn.ch>

On Sat, May 5, 2018 at 8:39 AM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Sat, May 05, 2018 at 12:36:36PM +0200, Salvatore Mesoraca wrote:
>> 2018-03-13 21:06 GMT+01:00 Florian Fainelli <f.fainelli@gmail.com>:
>> > On 03/13/2018 12:58 PM, Vivien Didelot wrote:
>> >> Hi Salvatore,
>> >>
>> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> writes:
>> >>
>> >>> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid
>> >>> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports.
>> >>>
>> >>> [1] https://lkml.org/lkml/2018/3/7/621
>> >>>
>> >>> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
>> >>
>> >> NAK.
>> >>
>> >> We are in the process to remove hardcoded limits such as DSA_MAX_PORTS
>> >> and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports.
>> >
>> > Then this means that we need to allocate a bitmap from the heap, which
>> > sounds a bit superfluous and could theoretically fail... not sure which
>> > way is better, but bumping the size to DSA_MAX_PORTS definitively does
>> > help people working on enabling -Wvla.
>>
>> Hi Florian,
>>
>> Should I consider this patch still NAKed or not?
>> Should I resend the patch with some modifications?
>
> Hi Salvatore
>
> We have been removing all uses of DSA_MAX_PORTS. I don't particularly
> like arbitrary limits on how many ports a switch can have, or how many
> switches a board can have.
>
> So i would prefer to not use DSA_MAX_PORTS here.
>
> You could make the bitmap part of the dsa_switch structure. This is
> allocated by dsa_switch_alloc() and is passed the number of ports.
> Doing the allocation there means you don't need to worry about it
> failing in dsa_switch_mdb_add() or dsa_switch_vlan_add().

Are dsa_switch_mdb_add() and dsa_switch_vlan_add() guaranteed to be
single-threaded?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* possible deadlock in sk_diag_fill
From: syzbot @ 2018-05-05 17:59 UTC (permalink / raw)
  To: avagin, davem, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    c1c07416cdd4 Merge tag 'kbuild-fixes-v4.17' of git://git.k..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12164c97800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5a1dc06635c10d27
dashboard link: https://syzkaller.appspot.com/bug?extid=c1872be62e587eae9669
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
userspace arch: i386

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c1872be62e587eae9669@syzkaller.appspotmail.com


======================================================
WARNING: possible circular locking dependency detected
4.17.0-rc3+ #59 Not tainted
------------------------------------------------------
syz-executor1/25282 is trying to acquire lock:
000000004fddf743 (&(&u->lock)->rlock/1){+.+.}, at: sk_diag_dump_icons  
net/unix/diag.c:82 [inline]
000000004fddf743 (&(&u->lock)->rlock/1){+.+.}, at:  
sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144

but task is already holding lock:
00000000b6895645 (rlock-AF_UNIX){+.+.}, at: spin_lock  
include/linux/spinlock.h:310 [inline]
00000000b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_dump_icons  
net/unix/diag.c:64 [inline]
00000000b6895645 (rlock-AF_UNIX){+.+.}, at:  
sk_diag_fill.isra.5+0x94e/0x10d0 net/unix/diag.c:144

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (rlock-AF_UNIX){+.+.}:
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
        skb_queue_tail+0x26/0x150 net/core/skbuff.c:2900
        unix_dgram_sendmsg+0xf77/0x1730 net/unix/af_unix.c:1797
        sock_sendmsg_nosec net/socket.c:629 [inline]
        sock_sendmsg+0xd5/0x120 net/socket.c:639
        ___sys_sendmsg+0x525/0x940 net/socket.c:2117
        __sys_sendmmsg+0x3bb/0x6f0 net/socket.c:2205
        __compat_sys_sendmmsg net/compat.c:770 [inline]
        __do_compat_sys_sendmmsg net/compat.c:777 [inline]
        __se_compat_sys_sendmmsg net/compat.c:774 [inline]
        __ia32_compat_sys_sendmmsg+0x9f/0x100 net/compat.c:774
        do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
        do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
        entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139

-> #0 (&(&u->lock)->rlock/1){+.+.}:
        lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
        _raw_spin_lock_nested+0x28/0x40 kernel/locking/spinlock.c:354
        sk_diag_dump_icons net/unix/diag.c:82 [inline]
        sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144
        sk_diag_dump net/unix/diag.c:178 [inline]
        unix_diag_dump+0x35f/0x550 net/unix/diag.c:206
        netlink_dump+0x507/0xd20 net/netlink/af_netlink.c:2226
        __netlink_dump_start+0x51a/0x780 net/netlink/af_netlink.c:2323
        netlink_dump_start include/linux/netlink.h:214 [inline]
        unix_diag_handler_dump+0x3f4/0x7b0 net/unix/diag.c:307
        __sock_diag_cmd net/core/sock_diag.c:230 [inline]
        sock_diag_rcv_msg+0x2e0/0x3d0 net/core/sock_diag.c:261
        netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
        sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:272
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
        sock_sendmsg_nosec net/socket.c:629 [inline]
        sock_sendmsg+0xd5/0x120 net/socket.c:639
        sock_write_iter+0x35a/0x5a0 net/socket.c:908
        call_write_iter include/linux/fs.h:1784 [inline]
        new_sync_write fs/read_write.c:474 [inline]
        __vfs_write+0x64d/0x960 fs/read_write.c:487
        vfs_write+0x1f8/0x560 fs/read_write.c:549
        ksys_write+0xf9/0x250 fs/read_write.c:598
        __do_sys_write fs/read_write.c:610 [inline]
        __se_sys_write fs/read_write.c:607 [inline]
        __ia32_sys_write+0x71/0xb0 fs/read_write.c:607
        do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
        do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
        entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139

other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(rlock-AF_UNIX);
                                lock(&(&u->lock)->rlock/1);
                                lock(rlock-AF_UNIX);
   lock(&(&u->lock)->rlock/1);

  *** DEADLOCK ***

5 locks held by syz-executor1/25282:
  #0: 000000003919e1bd (sock_diag_mutex){+.+.}, at: sock_diag_rcv+0x1b/0x40  
net/core/sock_diag.c:271
  #1: 000000004f328d3e (sock_diag_table_mutex){+.+.}, at: __sock_diag_cmd  
net/core/sock_diag.c:225 [inline]
  #1: 000000004f328d3e (sock_diag_table_mutex){+.+.}, at:  
sock_diag_rcv_msg+0x169/0x3d0 net/core/sock_diag.c:261
  #2: 000000004cc04dbb (nlk_cb_mutex-SOCK_DIAG){+.+.}, at:  
netlink_dump+0x98/0xd20 net/netlink/af_netlink.c:2182
  #3: 00000000accdef41 (unix_table_lock){+.+.}, at: spin_lock  
include/linux/spinlock.h:310 [inline]
  #3: 00000000accdef41 (unix_table_lock){+.+.}, at:  
unix_diag_dump+0x10a/0x550 net/unix/diag.c:192
  #4: 00000000b6895645 (rlock-AF_UNIX){+.+.}, at: spin_lock  
include/linux/spinlock.h:310 [inline]
  #4: 00000000b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_dump_icons  
net/unix/diag.c:64 [inline]
  #4: 00000000b6895645 (rlock-AF_UNIX){+.+.}, at:  
sk_diag_fill.isra.5+0x94e/0x10d0 net/unix/diag.c:144

stack backtrace:
CPU: 1 PID: 25282 Comm: syz-executor1 Not tainted 4.17.0-rc3+ #59
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_circular_bug.isra.36.cold.54+0x1bd/0x27d  
kernel/locking/lockdep.c:1223
  check_prev_add kernel/locking/lockdep.c:1863 [inline]
  check_prevs_add kernel/locking/lockdep.c:1976 [inline]
  validate_chain kernel/locking/lockdep.c:2417 [inline]
  __lock_acquire+0x343e/0x5140 kernel/locking/lockdep.c:3431
  lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
  _raw_spin_lock_nested+0x28/0x40 kernel/locking/spinlock.c:354
  sk_diag_dump_icons net/unix/diag.c:82 [inline]
  sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144
  sk_diag_dump net/unix/diag.c:178 [inline]
  unix_diag_dump+0x35f/0x550 net/unix/diag.c:206
  netlink_dump+0x507/0xd20 net/netlink/af_netlink.c:2226
  __netlink_dump_start+0x51a/0x780 net/netlink/af_netlink.c:2323
  netlink_dump_start include/linux/netlink.h:214 [inline]
  unix_diag_handler_dump+0x3f4/0x7b0 net/unix/diag.c:307
  __sock_diag_cmd net/core/sock_diag.c:230 [inline]
  sock_diag_rcv_msg+0x2e0/0x3d0 net/core/sock_diag.c:261
  netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
  sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:272
  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
  netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
  netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  sock_write_iter+0x35a/0x5a0 net/socket.c:908
  call_write_iter include/linux/fs.h:1784 [inline]
  new_sync_write fs/read_write.c:474 [inline]
  __vfs_write+0x64d/0x960 fs/read_write.c:487
  vfs_write+0x1f8/0x560 fs/read_write.c:549
  ksys_write+0xf9/0x250 fs/read_write.c:598
  __do_sys_write fs/read_write.c:610 [inline]
  __se_sys_write fs/read_write.c:607 [inline]
  __ia32_sys_write+0x71/0xb0 fs/read_write.c:607
  do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
  do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f8ccb9
RSP: 002b:00000000f5f880ac EFLAGS: 00000282 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 000000002058bfe4
RDX: 0000000000000029 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [PATCH net-next v2 02/13] net: phy: sfp: handle non-wired SFP connectors
From: Andrew Lunn @ 2018-05-05 17:48 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Antoine Tenart, Florian Fainelli, davem, kishon, linux,
	gregory.clement, jason, sebastian.hesselbarth, netdev,
	linux-kernel, maxime.chevallier, miquel.raynal, nadavh, stefanc,
	ymarkman, mw, linux-arm-kernel
In-Reply-To: <20180505173945.0754d0df@windsurf>

On Sat, May 05, 2018 at 05:39:45PM +0200, Thomas Petazzoni wrote:
> Hello,
> 
> On Fri, 4 May 2018 19:23:37 +0200, Antoine Tenart wrote:
> 
> > On Fri, May 04, 2018 at 10:04:48AM -0700, Florian Fainelli wrote:
> > > On 05/04/2018 06:56 AM, Antoine Tenart wrote:  
> > > > SFP connectors can be solder on a board without having any of their pins
> > > > (LOS, i2c...) wired. In such cases the SFP link state cannot be guessed,
> > > > and the overall link status reporting is left to other layers.
> > > > 
> > > > In order to achieve this, a new SFP_DEV status is added, named UNKNOWN.
> > > > This mode is set when it is not possible for the SFP code to get the
> > > > link status and as a result the link status is reported to be always UP
> > > > from the SFP point of view.  
> > > 
> > > Why represent the SFP in Device Tree then? Why not just declare this is
> > > a fixed link which would avoid having to introduce this "unknown" state.  
> > 
> > The other solution would have been to represent this as a fixed-link.
> > But such a representation would report the link as being up all the
> > time, which is something we wanted to avoid as the GoP in PPv2 can
> > report some link status. This is achieved using SFP+phylink+PPv2.
> > 
> > And representing the SFP cage in the device tree, although it's a
> > "dummy" one, helps describing the hardware.
> 
> Just to add to this: the board physically has a SFP cage, and a cable
> can be connected to it, or not. So it is absolutely not a fixed link
> (cable can be connected or not) and it really is a SFP cage.

Hi Thomas

What i have heard on the rumour mill is that the hardware design is
FUBAR. The i2c-mux and i2c-gpio expanders are using the same
addresses. Or something like that. So the data plane from the MAC to
the SFP works. But the control plane is broken.

I don't really have a problem listing an SFP in device tree. As you
say, it physically exists on the boards. But because of the FUBAR
hardware, it cannot be controlled. phylink is all about the control
plane, and there is no control plane for this hardware. So connecting
this sfp to phylink seems very wrong. When we have no control plain,
we use a fixed-link.

Isn't this hardware a reference design? It is not a real product.  If
it is an RDK, i'm sure Marvell are telling people it is FUBAR, don't
copy it. There will be a new RDK sometime which has the problems
fixed. So how much effort should we put into supporting a broken RDK,
which nobody will copy into a real product? To me, KISS is the right
approach and document it in the device tree what the issue is.

If a real product comes to market which is equally FUBAR, we can then
consider how to get the best out of the hardware. We can extend
phylink to support a fixed link PHY, but still ask the MAC about its
link status.

	 Andrew

^ permalink raw reply

* Re: WARNING in kernfs_add_one
From: Eric Dumazet @ 2018-05-05 17:43 UTC (permalink / raw)
  To: Greg KH, netdev, syzbot; +Cc: linux-kernel, syzkaller-bugs, tj
In-Reply-To: <20180505164041.GD7746@kroah.com>



On 05/05/2018 09:40 AM, Greg KH wrote:
> On Sat, May 05, 2018 at 08:47:02AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    8fb11a9a8d51 net/ipv6: rename rt6_next to fib6_next
>> git tree:       net-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=14b27237800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c416c61f3cd96be
>> dashboard link: https://syzkaller.appspot.com/bug?extid=df47f81c226b31d89fb1
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=172fb3e7800000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16552e57800000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+df47f81c226b31d89fb1@syzkaller.appspotmail.com
>>
>> RBP: 00007fff808f3e10 R08: 0000000000000002 R09: 00007fff80003534
>> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
>> R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
>> ------------[ cut here ]------------
>> kernfs: ns required in 'ieee80211' for 'phy3'
> 
> That's interesting, this looks like a netfilter bug (adding netdev to
> the report here.)


I do not see anything netfilter related here.

More likely wireless territory

> 
> Yes, we can "tone down" the kernfs warning to just be an error message
> in the log, but there might be something worse going on here.
> 
> Network developers, any idea?  Rest of the callback chain is here:
> 
> 
>> WARNING: CPU: 0 PID: 4538 at fs/kernfs/dir.c:759 kernfs_add_one+0x406/0x4d0
>> fs/kernfs/dir.c:758
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 0 PID: 4538 Comm: syz-executor486 Not tainted 4.17.0-rc3+ #33
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>  panic+0x22f/0x4de kernel/panic.c:184
>>  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
>>  report_bug+0x252/0x2d0 lib/bug.c:186
>>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>>  do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
>> RIP: 0010:kernfs_add_one+0x406/0x4d0 fs/kernfs/dir.c:758
>> RSP: 0018:ffff8801ca9eece0 EFLAGS: 00010286
>> RAX: 000000000000002d RBX: ffffffff87d5cee0 RCX: ffffffff8160ba7d
>> RDX: 0000000000000000 RSI: ffffffff81610731 RDI: ffff8801ca9ee840
>> RBP: ffff8801ca9eed20 R08: ffff8801d9538500 R09: 0000000000000006
>> R10: ffff8801d9538500 R11: 0000000000000000 R12: ffff8801ad1cb6c0
>> R13: ffffffff885da640 R14: 0000000000000020 R15: 0000000000000000
>>  kernfs_create_link+0x112/0x180 fs/kernfs/symlink.c:41
>>  sysfs_do_create_link_sd.isra.2+0x90/0x130 fs/sysfs/symlink.c:43
>>  sysfs_do_create_link fs/sysfs/symlink.c:79 [inline]
>>  sysfs_create_link+0x65/0xc0 fs/sysfs/symlink.c:91
>>  device_add_class_symlinks drivers/base/core.c:1612 [inline]
>>  device_add+0x7a0/0x16d0 drivers/base/core.c:1810
>>  wiphy_register+0x178a/0x2430 net/wireless/core.c:806
>>  ieee80211_register_hw+0x13cd/0x35d0 net/mac80211/main.c:1047
>>  mac80211_hwsim_new_radio+0x1d9b/0x3410
>> drivers/net/wireless/mac80211_hwsim.c:2772
>>  hwsim_new_radio_nl+0x7a7/0xa60 drivers/net/wireless/mac80211_hwsim.c:3246
>>  genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599
>>  genl_rcv_msg+0xc6/0x170 net/netlink/genetlink.c:624
>>  netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
>>  genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
>>  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
>>  netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
>>  netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
>>  sock_sendmsg_nosec net/socket.c:629 [inline]
>>  sock_sendmsg+0xd5/0x120 net/socket.c:639
>>  ___sys_sendmsg+0x805/0x940 net/socket.c:2117
>>  __sys_sendmsg+0x115/0x270 net/socket.c:2155
>>  __do_sys_sendmsg net/socket.c:2164 [inline]
>>  __se_sys_sendmsg net/socket.c:2162 [inline]
>>  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
>>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x4404c9
>> RSP: 002b:00007fff808f3e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404c9
>> RDX: 0000000000000000 RSI: 0000000020b3dfc8 RDI: 0000000000000005
>> RBP: 00007fff808f3e10 R08: 0000000000000002 R09: 00007fff80003534
>> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
>> R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Kernel Offset: disabled
>> Rebooting in 86400 seconds..
> 
> 
> Any ideas?
> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply

* Re: [net-next PATCH v2 4/8] udp: Do not pass checksum as a parameter to GSO segmentation
From: Alexander Duyck @ 2018-05-05 17:39 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <CAF=yD-LV7JkTkX-c7wyvjYNGW872H5GPbwRfL6M+SfRzL4cb_w@mail.gmail.com>

On Sat, May 5, 2018 at 3:01 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, May 4, 2018 at 8:30 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> This patch is meant to allow us to avoid having to recompute the checksum
>> from scratch and have it passed as a parameter.
>>
>> Instead of taking that approach we can take advantage of the fact that the
>> length that was used to compute the existing checksum is included in the
>> UDP header. If we cancel that out by adding the value XOR with 0xFFFF we
>> can then just add the new length in and fold that into the new result.
>>
>> I think this may be fixing a checksum bug in the original code as well
>> since the checksum that was passed included the UDP header in the checksum
>> computation, but then excluded it for the adjustment on the last frame. I
>> believe this may have an effect on things in the cases where the two differ
>> by bits that would result in things crossing the byte boundaries.
>
> The replacement code, below, subtracts original payload size then adds
> the new payload size. mss here excludes the udp header size.
>
>>                 /* last packet can be partial gso_size */
>> -               if (!seg->next)
>> -                       csum_replace2(&uh->check, htons(mss),
>> -                                     htons(seg->len - hdrlen - sizeof(*uh)));

That is my point. When you calculated your checksum you included the
UDP header in the calculation.

-       return __udp_gso_segment(gso_skb, features,
-                                udp_v4_check(sizeof(struct udphdr) + mss,
-                                             iph->saddr, iph->daddr, 0));

Basically the problem is in one spot you are adding the sizeof(struct
udphdr) + mss and then in another you are cancelling it out as mss and
trying to account for it by also dropping the UDP header from the
payload length of the value you are adding. That works in the cases
where the effect doesn't cause any issues with the byte ordering,
however I think when mss + 8 crosses a byte boundary it can lead to
issues since the calculation is done on a byte swapped value.

^ permalink raw reply

* Re: [net-next PATCH v2 5/8] udp: Partially unroll handling of first segment and last segment
From: Alexander Duyck @ 2018-05-05 17:13 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <CAF=yD-J9iK6oWgV5L9gBty8L3AZTb8Rf0HjPz4_-5uz0k00rrA@mail.gmail.com>

On Sat, May 5, 2018 at 1:37 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, May 4, 2018 at 8:30 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> This patch allows us to take care of unrolling the first segment and the
>> last segment
>
> Only the last segment needs to be unrolled, right?

I need the first segment as I have to get access to the UDP header to
recompute what the checksum should actually be for the first frame and
all the frames in between.

>> of the loop for processing the segmented skb. Part of the
>> motivation for this is that it makes it easier to process the fact that the
>> first fame and all of the frames in between should be mostly identical
>> in terms of header data, and the last frame has differences in the length
>> and partial checksum.
>>
>> In addition I am dropping the header length calculation since we don't
>> really need it for anything but the last frame and it can be easily
>> obtained by just pulling the data_len and offset of tail from the transport
>> header.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>>
>> v2: New break-out patch based on one patch from earlier series
>>
>>  net/ipv4/udp_offload.c |   35 ++++++++++++++++++++---------------
>>  1 file changed, 20 insertions(+), 15 deletions(-)
>>
>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>> index 5c4bb8b9e83e..946d06d2aa0c 100644
>> --- a/net/ipv4/udp_offload.c
>> +++ b/net/ipv4/udp_offload.c
>> @@ -193,7 +193,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>>         struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
>>         struct sock *sk = gso_skb->sk;
>>         unsigned int sum_truesize = 0;
>> -       unsigned int hdrlen;
>>         struct udphdr *uh;
>>         unsigned int mss;
>>         __sum16 check;
>> @@ -206,7 +205,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>>         if (!pskb_may_pull(gso_skb, sizeof(*uh)))
>>                 goto out;
>>
>> -       hdrlen = gso_skb->data - skb_mac_header(gso_skb);
>>         skb_pull(gso_skb, sizeof(*uh));
>>
>>         /* clear destructor to avoid skb_segment assigning it to tail */
>> @@ -219,7 +217,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>>                 return segs;
>>         }
>>
>> -       uh = udp_hdr(segs);
>> +       seg = segs;
>> +       uh = udp_hdr(seg);
>>
>>         /* compute checksum adjustment based on old length versus new */
>>         newlen = htons(sizeof(*uh) + mss);
>> @@ -227,25 +226,31 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>>                                             ((__force u32)uh->len ^ 0xFFFF) +
>>                                             (__force u32)newlen));
>>
>> -       for (seg = segs; seg; seg = seg->next) {
>> -               uh = udp_hdr(seg);
>> +       for (;;) {
>> +               seg->destructor = sock_wfree;
>> +               seg->sk = sk;
>> +               sum_truesize += seg->truesize;
>>
>> -               /* last packet can be partial gso_size */
>> -               if (!seg->next) {
>> -                       newlen = htons(seg->len - hdrlen);
>> -                       check = ~csum_fold((__force __wsum)((__force u32)uh->check +
>> -                                                           ((__force u32)uh->len ^ 0xFFFF) +
>> -                                                           (__force u32)newlen));
>> -               }
>> +               if (!seg->next)
>> +                       break;
>
> Not critical, but I find replacing
>
>   for (seg = segs; seg; seg = seg->next) {
>     uh = udp_hdr(seg);
>     ...
>   }
>
> with
>
>   uh = udp_hdr(seg);
>   for (;;) {
>     ...
>     if (!seg->next)
>       break;
>
>     uh = udp_hdr(seg);
>   }
>
> much less easy to parse and it inflates patch size. How about just
>
>   - for (seg = segs; seg; seg = seg->next)
>   + for( seg = segs; seg->next; seg = seg->next)
>
>

The problem is I need access to the UDP header before I start the
loop. That was why I pulled seg = segs and the "uh = udp_hdr(seg)" out
seperately

>>
>>                 uh->len = newlen;
>>                 uh->check = check;
>>
>> -               seg->destructor = sock_wfree;
>> -               seg->sk = sk;
>> -               sum_truesize += seg->truesize;
>> +               seg = seg->next;
>> +               uh = udp_hdr(seg);
>>         }
>>
>> +       /* last packet can be partial gso_size, account for that in checksum */
>> +       newlen = htons(skb_tail_pointer(seg) - skb_transport_header(seg) +
>> +                      seg->data_len);
>> +       check = ~csum_fold((__force __wsum)((__force u32)uh->check +
>> +                                           ((__force u32)uh->len ^ 0xFFFF)  +
>> +                                           (__force u32)newlen));
>> +       uh->len = newlen;
>> +       uh->check = check;
>> +
>> +       /* update refcount for the packet */
>>         refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
>>  out:
>>         return segs;
>>

^ permalink raw reply

* Re: [net-next PATCH v2 2/8] udp: Verify that pulling UDP header in GSO segmentation doesn't fail
From: Alexander Duyck @ 2018-05-05 17:10 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <CAF=yD-KcG5ic9PRPSqdVRZon-W=9_rgSQ0JuSC3ApMqS4vDJgw@mail.gmail.com>

On Sat, May 5, 2018 at 1:12 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, May 4, 2018 at 8:29 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> We should verify that we can pull the UDP header before we attempt to do
>> so. Otherwise if this fails we have no way of knowing and GSO will not work
>> correctly.
>
> This already happened in the callers udp[46]_ufo_fragment

Ah. Okay I didn't see that. I may add a comment somewhere indicating
that is the case as it is a bit buried with all the calls.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH rdma-next] MAINTAINERS: Remove bouncing @mellanox.com addresses
From: Matan Barak @ 2018-05-05 16:56 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Or Gerlitz, Linux Netdev List, RDMA mailing list, Jason Gunthorpe
In-Reply-To: <1e1754c4-520f-57c4-f6a8-c6cb2ad0f710@redhat.com>

On Fri, May 4, 2018 at 12:37 AM, Doug Ledford <dledford@redhat.com> wrote:
> On 5/3/2018 5:11 PM, Or Gerlitz wrote:
>> On Thu, May 3, 2018 at 9:37 PM, LR wrote:
>>
>>>  MELLANOX MLX5 core VPI driver
>>>  M:     Saeed Mahameed <saeedm@mellanox.com>
>>> -M:     Matan Barak <matanb@mellanox.com>
>>
>> Goodbye Matan!
>>
>> You were a long time developer, maintainer, hacker and a very deeply thinking,
>> pleasant, nice and open person in our team, enjoy your new adventures and thanks
>> a lot for your long time contributions to the upstream kernel
>
> Indeed, Matan was always a pleasure to work with.  Best of luck on
> whatever you are doing next!
>
>

Thanks Or and Doug!
It was a real pleasure being part of the RDMA and netdev communities,
while seeing the growth of the RDMA community in the last few years
and the importance it gained.
I've enjoyed collaborating with you guys, adding features and
capabilites to our subsystem and drivers.
Hope our paths will cross again sometime in the future:)

^ permalink raw reply

* Re: WARNING in kernfs_add_one
From: Greg KH @ 2018-05-05 16:40 UTC (permalink / raw)
  To: netdev, syzbot; +Cc: linux-kernel, syzkaller-bugs, tj
In-Reply-To: <0000000000000390eb056b77596d@google.com>

On Sat, May 05, 2018 at 08:47:02AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    8fb11a9a8d51 net/ipv6: rename rt6_next to fib6_next
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=14b27237800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c416c61f3cd96be
> dashboard link: https://syzkaller.appspot.com/bug?extid=df47f81c226b31d89fb1
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=172fb3e7800000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16552e57800000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+df47f81c226b31d89fb1@syzkaller.appspotmail.com
> 
> RBP: 00007fff808f3e10 R08: 0000000000000002 R09: 00007fff80003534
> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
> R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
> ------------[ cut here ]------------
> kernfs: ns required in 'ieee80211' for 'phy3'

That's interesting, this looks like a netfilter bug (adding netdev to
the report here.)

Yes, we can "tone down" the kernfs warning to just be an error message
in the log, but there might be something worse going on here.

Network developers, any idea?  Rest of the callback chain is here:


> WARNING: CPU: 0 PID: 4538 at fs/kernfs/dir.c:759 kernfs_add_one+0x406/0x4d0
> fs/kernfs/dir.c:758
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 4538 Comm: syz-executor486 Not tainted 4.17.0-rc3+ #33
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  panic+0x22f/0x4de kernel/panic.c:184
>  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
>  report_bug+0x252/0x2d0 lib/bug.c:186
>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>  do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
> RIP: 0010:kernfs_add_one+0x406/0x4d0 fs/kernfs/dir.c:758
> RSP: 0018:ffff8801ca9eece0 EFLAGS: 00010286
> RAX: 000000000000002d RBX: ffffffff87d5cee0 RCX: ffffffff8160ba7d
> RDX: 0000000000000000 RSI: ffffffff81610731 RDI: ffff8801ca9ee840
> RBP: ffff8801ca9eed20 R08: ffff8801d9538500 R09: 0000000000000006
> R10: ffff8801d9538500 R11: 0000000000000000 R12: ffff8801ad1cb6c0
> R13: ffffffff885da640 R14: 0000000000000020 R15: 0000000000000000
>  kernfs_create_link+0x112/0x180 fs/kernfs/symlink.c:41
>  sysfs_do_create_link_sd.isra.2+0x90/0x130 fs/sysfs/symlink.c:43
>  sysfs_do_create_link fs/sysfs/symlink.c:79 [inline]
>  sysfs_create_link+0x65/0xc0 fs/sysfs/symlink.c:91
>  device_add_class_symlinks drivers/base/core.c:1612 [inline]
>  device_add+0x7a0/0x16d0 drivers/base/core.c:1810
>  wiphy_register+0x178a/0x2430 net/wireless/core.c:806
>  ieee80211_register_hw+0x13cd/0x35d0 net/mac80211/main.c:1047
>  mac80211_hwsim_new_radio+0x1d9b/0x3410
> drivers/net/wireless/mac80211_hwsim.c:2772
>  hwsim_new_radio_nl+0x7a7/0xa60 drivers/net/wireless/mac80211_hwsim.c:3246
>  genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599
>  genl_rcv_msg+0xc6/0x170 net/netlink/genetlink.c:624
>  netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
>  genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
>  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
>  netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
>  netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
>  sock_sendmsg_nosec net/socket.c:629 [inline]
>  sock_sendmsg+0xd5/0x120 net/socket.c:639
>  ___sys_sendmsg+0x805/0x940 net/socket.c:2117
>  __sys_sendmsg+0x115/0x270 net/socket.c:2155
>  __do_sys_sendmsg net/socket.c:2164 [inline]
>  __se_sys_sendmsg net/socket.c:2162 [inline]
>  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4404c9
> RSP: 002b:00007fff808f3e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404c9
> RDX: 0000000000000000 RSI: 0000000020b3dfc8 RDI: 0000000000000005
> RBP: 00007fff808f3e10 R08: 0000000000000002 R09: 00007fff80003534
> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
> R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..


Any ideas?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Alexei Starovoitov @ 2018-05-05 16:24 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Greg Kroah-Hartman, Andy Lutomirski,
	Network Development, kernel list, kernel-team
In-Reply-To: <CAG48ez3gB=zFebJAvFCj+6iuicgTiw_0UDEdeWW1Ri82pz8zkw@mail.gmail.com>

On Sat, May 05, 2018 at 12:48:24AM -0400, Jann Horn wrote:
> On Thu, May 3, 2018 at 12:36 AM, Alexei Starovoitov <ast@kernel.org> wrote:
> > Introduce helper:
> > int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> > struct umh_info {
> >        struct file *pipe_to_umh;
> >        struct file *pipe_from_umh;
> >        pid_t pid;
> > };
> >
> > that GPLed kernel modules (signed or unsigned) can use it to execute part
> > of its own data as swappable user mode process.
> >
> > The kernel will do:
> > - mount "tmpfs"
> > - allocate a unique file in tmpfs
> > - populate that file with [data, data + len] bytes
> > - user-mode-helper code will do_execve that file and, before the process
> >   starts, the kernel will create two unix pipes for bidirectional
> >   communication between kernel module and umh
> > - close tmpfs file, effectively deleting it
> > - the fork_usermode_blob will return zero on success and populate
> >   'struct umh_info' with two unix pipes and the pid of the user process
> >
> > As the first step in the development of the bpfilter project
> > the fork_usermode_blob() helper is introduced to allow user mode code
> > to be invoked from a kernel module. The idea is that user mode code plus
> > normal kernel module code are built as part of the kernel build
> > and installed as traditional kernel module into distro specified location,
> > such that from a distribution point of view, there is
> > no difference between regular kernel modules and kernel modules + umh code.
> > Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> > by a kernel module doesn't make it any special from kernel and user space
> > tooling point of view.
> [...]
> > +static struct vfsmount *umh_fs;
> > +
> > +static int init_tmpfs(void)
> > +{
> > +       struct file_system_type *type;
> > +
> > +       if (umh_fs)
> > +               return 0;
> > +       type = get_fs_type("tmpfs");
> > +       if (!type)
> > +               return -ENODEV;
> > +       umh_fs = kern_mount(type);
> > +       if (IS_ERR(umh_fs)) {
> > +               int err = PTR_ERR(umh_fs);
> > +
> > +               put_filesystem(type);
> > +               umh_fs = NULL;
> > +               return err;
> > +       }
> > +       return 0;
> > +}
> 
> Should init_tmpfs() be holding some sort of mutex if it's fiddling
> with `umh_fs`? The current code only calls it in initcall context, but
> if that ever changes and two processes try to initialize the tmpfs at
> the same time, a few things could go wrong.

I thought that module loading is serialized, so calls to
fork_usermode_blob() will be serialized as well, but looking at the code
again that doesn't seem to be the case, so need to revisit not only
this function, but the rest of it too.

> I guess Luis' suggestion (putting a call to init_tmpfs() in
> do_basic_setup()) might be the easiest way to get rid of that problem.

I still think that two mounts where umh mount is dynamic is cleaner.
Why waste the mount if no module uses this helper?
I'm thinking to wrap init_tmpfs into DO_ONCE instead or use a mutex.
Looks like shmem_file_setup_with_mnt() can be called in parallel
on the same mount, so that should be fine.

^ permalink raw reply

* [PATCH v2 1/1] drivers core: multi-threading device shutdown
From: Pavel Tatashin @ 2018-05-05 15:40 UTC (permalink / raw)
  To: pasha.tatashin, steven.sistare, daniel.m.jordan, linux-kernel,
	jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh,
	alexander.duyck, tobin
In-Reply-To: <20180505154040.28614-1-pasha.tatashin@oracle.com>

When system is rebooted, halted or kexeced device_shutdown() is
called.

This function shuts down every single device by calling either:

	dev->bus->shutdown(dev)
	dev->driver->shutdown(dev)

Even on a machine with just a moderate amount of devices, device_shutdown()
may take multiple seconds to complete. This is because many devices require
a specific delays to perform this operation.

Here is a sample analysis of time it takes to call device_shutdown() on a
two socket Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz machine.

device_shutdown		2.95s
-----------------------------
 mlx4_shutdown		1.14s
 megasas_shutdown	0.24s
 ixgbe_shutdown		0.37s x 4 (four ixgbe devices on this machine).
 the rest		0.09s

In mlx4 we spent the most time, but that is because there is a 1 second
sleep, which is defined by hardware specifications:
mlx4_shutdown
 mlx4_unload_one
  mlx4_free_ownership
   msleep(1000)

With megasas we spend a quarter of a second, but sometimes longer (up-to
0.5s) in this path:

    megasas_shutdown
      megasas_flush_cache
        megasas_issue_blocked_cmd
          wait_event_timeout

Finally, with ixgbe_shutdown() it takes 0.37 for each device, but that time
is spread all over the place, with bigger offenders:

    ixgbe_shutdown
      __ixgbe_shutdown
        ixgbe_close_suspend
          ixgbe_down
            ixgbe_init_hw_generic
              ixgbe_reset_hw_X540
                msleep(100);                        0.104483472
                ixgbe_get_san_mac_addr_generic      0.048414851
                ixgbe_get_wwn_prefix_generic        0.048409893
              ixgbe_start_hw_X540
                ixgbe_start_hw_generic
                  ixgbe_clear_hw_cntrs_generic      0.048581502
                  ixgbe_setup_fc_generic            0.024225800

    All the ixgbe_*generic functions end-up calling:
    ixgbe_read_eerd_X540()
      ixgbe_acquire_swfw_sync_X540
        usleep_range(5000, 6000);
      ixgbe_release_swfw_sync_X540
        usleep_range(5000, 6000);

While these are short sleeps, they end-up calling them over 24 times!
24 * 0.0055s = 0.132s. Adding-up to 0.528s for four devices. Also we have
four msleep(100). Totaling to:  0.928s

While we should keep optimizing the individual device drivers, in some
cases this is simply a hardware property that forces a specific delay, and
we must wait.

So, the solution for this problem is to shutdown devices in parallel.
However, we must shutdown children before shutting down parents, so parent
device must wait for its children to finish.

With this patch, on the same machine devices_shutdown() takes 1.142s, and
without mlx4 one second delay only 0.38s

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 drivers/base/core.c | 275 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 225 insertions(+), 50 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b610816eb887..110d7970deef 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -25,6 +25,7 @@
 #include <linux/netdevice.h>
 #include <linux/sched/signal.h>
 #include <linux/sysfs.h>
+#include <linux/kthread.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -2102,6 +2103,59 @@ const char *device_get_devnode(struct device *dev,
 	return *tmp = s;
 }
 
+/**
+ * device_children_count - device children count
+ * @parent: parent struct device.
+ *
+ * Returns number of children for this device or 0 if none.
+ */
+static int device_children_count(struct device *parent)
+{
+	struct klist_iter i;
+	int children = 0;
+
+	if (!parent->p)
+		return 0;
+
+	klist_iter_init(&parent->p->klist_children, &i);
+	while (next_device(&i))
+		children++;
+	klist_iter_exit(&i);
+
+	return children;
+}
+
+/**
+ * device_get_child_by_index - Return child using the provided index.
+ * @parent: parent struct device.
+ * @index:  Index of the child, where 0 is the first child in the children list,
+ * and so on.
+ *
+ * Returns child or NULL if child with this index is not present.
+ */
+static struct device *
+device_get_child_by_index(struct device *parent, int index)
+{
+	struct klist_iter i;
+	struct device *dev = NULL, *d;
+	int child_index = 0;
+
+	if (!parent->p || index < 0)
+		return NULL;
+
+	klist_iter_init(&parent->p->klist_children, &i);
+	while ((d = next_device(&i))) {
+		if (child_index == index) {
+			dev = d;
+			break;
+		}
+		child_index++;
+	}
+	klist_iter_exit(&i);
+
+	return dev;
+}
+
 /**
  * device_for_each_child - device child iterator.
  * @parent: parent struct device.
@@ -2765,71 +2819,192 @@ int device_move(struct device *dev, struct device *new_parent,
 }
 EXPORT_SYMBOL_GPL(device_move);
 
+/*
+ * device_shutdown_one - call ->shutdown() for the device passed as
+ * argument.
+ */
+static void device_shutdown_one(struct device *dev)
+{
+	/* Don't allow any more runtime suspends */
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
+
+	if (dev->class && dev->class->shutdown_pre) {
+		if (initcall_debug)
+			dev_info(dev, "shutdown_pre\n");
+		dev->class->shutdown_pre(dev);
+	}
+	if (dev->bus && dev->bus->shutdown) {
+		if (initcall_debug)
+			dev_info(dev, "shutdown\n");
+		dev->bus->shutdown(dev);
+	} else if (dev->driver && dev->driver->shutdown) {
+		if (initcall_debug)
+			dev_info(dev, "shutdown\n");
+		dev->driver->shutdown(dev);
+	}
+
+	/* decrement the reference counter */
+	put_device(dev);
+}
+
+/**
+ * Passed as an argument to device_shutdown_child_task().
+ * child_next_index	the next available child index.
+ * tasks_running	number of tasks still running. Each tasks decrements it
+ *			when job is finished and the last task signals that the
+ *			job is complete.
+ * complete		Used to signal job competition.
+ * parent		Parent device.
+ */
+struct device_shutdown_task_data {
+	atomic_t		child_next_index;
+	atomic_t		tasks_running;
+	struct completion	complete;
+	struct device		*parent;
+};
+
+static int device_shutdown_child_task(void *data);
+
+/**
+ * These globals are used by tasks that are started for root devices.
+ * device_root_tasks_finished	Number of root devices finished shutting down.
+ * device_root_tasks_started	Total number of root devices tasks started.
+ * device_root_tasks_done	The completion signal to the main thread.
+ */
+static atomic_t		device_root_tasks_finished;
+static atomic_t		device_root_tasks_started;
+struct completion	device_root_tasks_done;
+
+/**
+ * Shutdown device tree with root started in dev. If dev has no children
+ * simply shutdown only this device. If dev has children recursively shutdown
+ * children first, and only then the parent. For performance reasons children
+ * are shutdown in parallel using kernel threads. dev is locked, so its children
+ * cannot be removed while this functions is running.
+ */
+static void device_shutdown_tree(struct device *dev)
+{
+	int children_count = device_children_count(dev);
+
+	if (children_count) {
+		struct device_shutdown_task_data tdata;
+		int i;
+
+		init_completion(&tdata.complete);
+		atomic_set(&tdata.child_next_index, 0);
+		atomic_set(&tdata.tasks_running, children_count);
+		tdata.parent = dev;
+
+		for (i = 0; i < children_count; i++) {
+			kthread_run(device_shutdown_child_task,
+				    &tdata, "device_shutdown.%s",
+				    dev_name(dev));
+		}
+		wait_for_completion(&tdata.complete);
+	}
+	device_shutdown_one(dev);
+}
+
+/**
+ * Only devices with parent are going through this function. The parent is
+ * locked and waits for all of its children to finish shutting down before
+ * calling shutdown function for itself.
+ */
+static int device_shutdown_child_task(void *data)
+{
+	struct device_shutdown_task_data *tdata = data;
+	int cidx = atomic_inc_return(&tdata->child_next_index) - 1;
+	struct device *dev = device_get_child_by_index(tdata->parent, cidx);
+
+	/* ref. counter is going to be decremented in device_shutdown_one() */
+	get_device(dev);
+	device_lock(dev);
+	device_shutdown_tree(dev);
+	device_unlock(dev);
+
+	/* If we are the last to exit, signal the completion */
+	if (atomic_dec_return(&tdata->tasks_running) == 0)
+		complete(&tdata->complete);
+	return 0;
+}
+
+/**
+ * On shutdown each root device (the one that does not have a parent) goes
+ * through this function.
+ */
+static int device_shutdown_root_task(void *data)
+{
+	struct device *dev = (struct device *)data;
+	int root_devices;
+
+	device_lock(dev);
+	device_shutdown_tree(dev);
+	device_unlock(dev);
+
+	/* If we are the last to exit, signal the completion */
+	root_devices = atomic_inc_return(&device_root_tasks_finished);
+	if (root_devices == atomic_read(&device_root_tasks_started))
+		complete(&device_root_tasks_done);
+	return 0;
+}
+
 /**
  * device_shutdown - call ->shutdown() on each device to shutdown.
  */
 void device_shutdown(void)
 {
-	struct device *dev, *parent;
+	int root_devices = 0;
+	struct device *dev;
 
-	spin_lock(&devices_kset->list_lock);
-	/*
-	 * Walk the devices list backward, shutting down each in turn.
-	 * Beware that device unplug events may also start pulling
-	 * devices offline, even as the system is shutting down.
+	atomic_set(&device_root_tasks_finished, 0);
+	atomic_set(&device_root_tasks_started, 0);
+	init_completion(&device_root_tasks_done);
+
+	/* Shutdown the root devices in parallel. The children are going to be
+	 * shutdown first in device_shutdown_tree().
 	 */
+	spin_lock(&devices_kset->list_lock);
 	while (!list_empty(&devices_kset->list)) {
-		dev = list_entry(devices_kset->list.prev, struct device,
-				kobj.entry);
+		dev = list_entry(devices_kset->list.next, struct device,
+				 kobj.entry);
 
-		/*
-		 * hold reference count of device's parent to
-		 * prevent it from being freed because parent's
-		 * lock is to be held
-		 */
-		parent = get_device(dev->parent);
-		get_device(dev);
-		/*
-		 * Make sure the device is off the kset list, in the
-		 * event that dev->*->shutdown() doesn't remove it.
+		/* Make sure the device is off the kset list, in the event that
+		 * dev->*->shutdown() doesn't remove it.
 		 */
 		list_del_init(&dev->kobj.entry);
-		spin_unlock(&devices_kset->list_lock);
-
-		/* hold lock to avoid race with probe/release */
-		if (parent)
-			device_lock(parent);
-		device_lock(dev);
-
-		/* Don't allow any more runtime suspends */
-		pm_runtime_get_noresume(dev);
-		pm_runtime_barrier(dev);
 
-		if (dev->class && dev->class->shutdown_pre) {
-			if (initcall_debug)
-				dev_info(dev, "shutdown_pre\n");
-			dev->class->shutdown_pre(dev);
+		/* Here we start tasks for root devices only */
+		if (!dev->parent) {
+			/* Prevents devices from being freed. The counter is
+			 * going to be decremented in device_shutdown_one() once
+			 * this root device is shutdown.
+			 */
+			get_device(dev);
+
+			/* We unlock list for performance reasons,
+			 * dev->*->shutdown(), may try to take this lock to
+			 * remove us from kset list. To avoid unlocking this
+			 * list we could replace spin lock in:
+			 * dev->kobj.kset->list_lock with a dummy one once
+			 * device is locked in device_shutdown_root_task() and
+			 * in device_shutdown_child_task().
+			 */
+			spin_unlock(&devices_kset->list_lock);
+
+			root_devices++;
+			kthread_run(device_shutdown_root_task,
+				    dev, "device_root_shutdown.%s",
+				    dev_name(dev));
+			spin_lock(&devices_kset->list_lock);
 		}
-		if (dev->bus && dev->bus->shutdown) {
-			if (initcall_debug)
-				dev_info(dev, "shutdown\n");
-			dev->bus->shutdown(dev);
-		} else if (dev->driver && dev->driver->shutdown) {
-			if (initcall_debug)
-				dev_info(dev, "shutdown\n");
-			dev->driver->shutdown(dev);
-		}
-
-		device_unlock(dev);
-		if (parent)
-			device_unlock(parent);
-
-		put_device(dev);
-		put_device(parent);
-
-		spin_lock(&devices_kset->list_lock);
 	}
 	spin_unlock(&devices_kset->list_lock);
+
+	/* Set number of root tasks started, and waits for completion */
+	atomic_set(&device_root_tasks_started, root_devices);
+	if (root_devices != atomic_read(&device_root_tasks_finished))
+		wait_for_completion(&device_root_tasks_done);
 }
 
 /*
-- 
2.17.0

^ permalink raw reply related

* [PATCH v2 0/1] multi-threading device shutdown
From: Pavel Tatashin @ 2018-05-05 15:40 UTC (permalink / raw)
  To: pasha.tatashin, steven.sistare, daniel.m.jordan, linux-kernel,
	jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh,
	alexander.duyck, tobin

Changelog

v1 - v2
	- It turns out we cannot lock more than MAX_LOCK_DEPTH by a single
	  thread. (By default this value is 48), and is used to detect
	  deadlocks. So, I re-wrote the code to only lock one devices per
	  thread instead of pre-locking all devices by the main thread.
	- Addressed comments from Tobin C. Harding.
	- As suggested by Alexander Duyck removed ixgbe changes. It can be
	  done as a separate work scaling RTNL mutex.

Do a faster shutdown by calling dev->*->shutdown(dev) in parallel.
device_shutdown() calls these functions for every single device but
only using one thread.

Since, nothing else is running on the machine by the device_shutdown()
s called, there is no reason not to utilize all the available CPU
resources.

Pavel Tatashin (1):
  drivers core: multi-threading device shutdown

 drivers/base/core.c | 275 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 225 insertions(+), 50 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: [PATCH 7/8] rhashtable: add rhashtable_walk_prev()
From: Tom Herbert @ 2018-05-05 15:40 UTC (permalink / raw)
  To: Herbert Xu; +Cc: NeilBrown, Thomas Graf, Linux Kernel Network Developers, LKML
In-Reply-To: <20180505094324.wwjtl76oofgrtpg4@gondor.apana.org.au>

On Sat, May 5, 2018 at 2:43 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
>> rhashtable_walk_prev() returns the object returned by
>> the previous rhashtable_walk_next(), providing it is still in the
>> table (or was during this grace period).
>> This works even if rhashtable_walk_stop() and rhashtable_talk_start()
>> have been called since the last rhashtable_walk_next().
>>
>> If there have been no calls to rhashtable_walk_next(), or if the
>> object is gone from the table, then NULL is returned.
>>
>> This can usefully be used in a seq_file ->start() function.
>> If the pos is the same as was returned by the last ->next() call,
>> then rhashtable_walk_prev() can be used to re-establish the
>> current location in the table.  If it returns NULL, then
>> rhashtable_walk_next() should be used.
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>
> I will ack this if Tom is OK with replacing peek with it.
>
I'm not following why this is any better than peek. The point of
having rhashtable_walk_peek is to to allow the caller to get then
current element not the next one. This is needed when table is read in
multiple parts and we need to pick up with what was returned from the
last call to rhashtable_walk_next (which apparently is what this patch
is also trying to do).

There is one significant difference in that peek will return the
element in the table regardless of where the iterator is at (this is
why peek can move the iterator) and only returns NULL at end of table.
As mentioned above rhashtable_walk_prev can return NULL so then caller
needs and so rhashtable_walk_next "should be used" to get the previous
element. Doing a peek is a lot cleaner and more straightforward API in
this regard.

Tom

> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next v2 02/13] net: phy: sfp: handle non-wired SFP connectors
From: Thomas Petazzoni @ 2018-05-05 15:39 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: Florian Fainelli, davem, kishon, linux, gregory.clement, andrew,
	jason, sebastian.hesselbarth, netdev, linux-kernel,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux-arm-kernel
In-Reply-To: <20180504172337.GC13899@kwain>

Hello,

On Fri, 4 May 2018 19:23:37 +0200, Antoine Tenart wrote:

> On Fri, May 04, 2018 at 10:04:48AM -0700, Florian Fainelli wrote:
> > On 05/04/2018 06:56 AM, Antoine Tenart wrote:  
> > > SFP connectors can be solder on a board without having any of their pins
> > > (LOS, i2c...) wired. In such cases the SFP link state cannot be guessed,
> > > and the overall link status reporting is left to other layers.
> > > 
> > > In order to achieve this, a new SFP_DEV status is added, named UNKNOWN.
> > > This mode is set when it is not possible for the SFP code to get the
> > > link status and as a result the link status is reported to be always UP
> > > from the SFP point of view.  
> > 
> > Why represent the SFP in Device Tree then? Why not just declare this is
> > a fixed link which would avoid having to introduce this "unknown" state.  
> 
> The other solution would have been to represent this as a fixed-link.
> But such a representation would report the link as being up all the
> time, which is something we wanted to avoid as the GoP in PPv2 can
> report some link status. This is achieved using SFP+phylink+PPv2.
> 
> And representing the SFP cage in the device tree, although it's a
> "dummy" one, helps describing the hardware.

Just to add to this: the board physically has a SFP cage, and a cable
can be connected to it, or not. So it is absolutely not a fixed link
(cable can be connected or not) and it really is a SFP cage.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH] net: dsa: drop some VLAs in switch.c
From: Andrew Lunn @ 2018-05-05 15:39 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: Florian Fainelli, Vivien Didelot, linux-kernel, Kernel Hardening,
	netdev, David S. Miller, Kees Cook
In-Reply-To: <CAJHCu1+V+khxvzhQdG=mo960b5ayiQcSaoMnm9A__51JdWkdig@mail.gmail.com>

On Sat, May 05, 2018 at 12:36:36PM +0200, Salvatore Mesoraca wrote:
> 2018-03-13 21:06 GMT+01:00 Florian Fainelli <f.fainelli@gmail.com>:
> > On 03/13/2018 12:58 PM, Vivien Didelot wrote:
> >> Hi Salvatore,
> >>
> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> writes:
> >>
> >>> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid
> >>> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports.
> >>>
> >>> [1] https://lkml.org/lkml/2018/3/7/621
> >>>
> >>> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
> >>
> >> NAK.
> >>
> >> We are in the process to remove hardcoded limits such as DSA_MAX_PORTS
> >> and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports.
> >
> > Then this means that we need to allocate a bitmap from the heap, which
> > sounds a bit superfluous and could theoretically fail... not sure which
> > way is better, but bumping the size to DSA_MAX_PORTS definitively does
> > help people working on enabling -Wvla.
> 
> Hi Florian,
> 
> Should I consider this patch still NAKed or not?
> Should I resend the patch with some modifications?

Hi Salvatore

We have been removing all uses of DSA_MAX_PORTS. I don't particularly
like arbitrary limits on how many ports a switch can have, or how many
switches a board can have.

So i would prefer to not use DSA_MAX_PORTS here.

You could make the bitmap part of the dsa_switch structure. This is
allocated by dsa_switch_alloc() and is passed the number of ports.
Doing the allocation there means you don't need to worry about it
failing in dsa_switch_mdb_add() or dsa_switch_vlan_add().

	Andrew

^ permalink raw reply

* [PATCH net] tls: fix use after free in tls_sk_proto_close
From: Eric Dumazet @ 2018-05-05 15:35 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Atul Gupta, Steve Wise,
	Ilya Lesokhin, Aviad Yehezkel, Dave Watson

syzbot reported a use-after-free in tls_sk_proto_close

Add a boolean value to cleanup a bit this function.

BUG: KASAN: use-after-free in tls_sk_proto_close+0x8ab/0x9c0 net/tls/tls_main.c:297
Read of size 1 at addr ffff8801ae40a858 by task syz-executor363/4503

CPU: 0 PID: 4503 Comm: syz-executor363 Not tainted 4.17.0-rc3+ #34
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
 tls_sk_proto_close+0x8ab/0x9c0 net/tls/tls_main.c:297
 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460
 sock_release+0x96/0x1b0 net/socket.c:594
 sock_close+0x16/0x20 net/socket.c:1149
 __fput+0x34d/0x890 fs/file_table.c:209
 ____fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x1aee/0x2730 kernel/exit.c:865
 do_group_exit+0x16f/0x430 kernel/exit.c:968
 get_signal+0x886/0x1960 kernel/signal.c:2469
 do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
 exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4457b9
RSP: 002b:00007fdf4d766da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000006dac3c RCX: 00000000004457b9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000006dac3c
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dac38
R13: 3692738801137283 R14: 6bf92c39443c4c1d R15: 0000000000000006

Allocated by task 4498:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 create_ctx net/tls/tls_main.c:521 [inline]
 tls_init+0x1f9/0xb00 net/tls/tls_main.c:633
 tcp_set_ulp+0x1bc/0x520 net/ipv4/tcp_ulp.c:153
 do_tcp_setsockopt.isra.39+0x44a/0x2600 net/ipv4/tcp.c:2588
 tcp_setsockopt+0xc1/0xe0 net/ipv4/tcp.c:2893
 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4503:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.c:3813
 tls_sw_free_resources+0x2a3/0x360 net/tls/tls_sw.c:1037
 tls_sk_proto_close+0x67c/0x9c0 net/tls/tls_main.c:288
 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460
 sock_release+0x96/0x1b0 net/socket.c:594
 sock_close+0x16/0x20 net/socket.c:1149
 __fput+0x34d/0x890 fs/file_table.c:209
 ____fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x1aee/0x2730 kernel/exit.c:865
 do_group_exit+0x16f/0x430 kernel/exit.c:968
 get_signal+0x886/0x1960 kernel/signal.c:2469
 do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
 exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ae40a800
 which belongs to the cache kmalloc-256 of size 256
The buggy address is located 88 bytes inside of
 256-byte region [ffff8801ae40a800, ffff8801ae40a900)
The buggy address belongs to the page:
page:ffffea0006b90280 count:1 mapcount:0 mapping:ffff8801ae40a080 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801ae40a080 0000000000000000 000000010000000c
raw: ffffea0006bea9e0 ffffea0006bc94a0 ffff8801da8007c0 0000000000000000
page dumped because: kasan: bad access detected

Fixes: dd0bed1665d6 ("tls: support for Inline tls record")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Atul Gupta <atul.gupta@chelsio.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Ilya Lesokhin <ilyal@mellanox.com>
Cc: Aviad Yehezkel <aviadye@mellanox.com>
Cc: Dave Watson <davejwatson@fb.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/tls/tls_main.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index cc03e00785c7ffefc8c37cac39aecc7e28cc86f9..74ed1e7af3d9da0f4074cb4ee5d7ed705f89c78b 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -248,16 +248,13 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	struct tls_context *ctx = tls_get_ctx(sk);
 	long timeo = sock_sndtimeo(sk, 0);
 	void (*sk_proto_close)(struct sock *sk, long timeout);
+	bool free_ctx = false;
 
 	lock_sock(sk);
 	sk_proto_close = ctx->sk_proto_close;
 
-	if (ctx->conf == TLS_HW_RECORD)
-		goto skip_tx_cleanup;
-
-	if (ctx->conf == TLS_BASE) {
-		kfree(ctx);
-		ctx = NULL;
+	if (ctx->conf == TLS_BASE || ctx->conf == TLS_HW_RECORD) {
+		free_ctx = true;
 		goto skip_tx_cleanup;
 	}
 
@@ -294,7 +291,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	/* free ctx for TLS_HW_RECORD, used by tcp_set_state
 	 * for sk->sk_prot->unhash [tls_hw_unhash]
 	 */
-	if (ctx && ctx->conf == TLS_HW_RECORD)
+	if (free_ctx)
 		kfree(ctx);
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: [PATCH iproute2] rdma: fix header files
From: David Ahern @ 2018-05-05 15:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: swise, netdev
In-Reply-To: <20180504215800.41b8467a@xeon-e3>

On 5/4/18 10:58 PM, Stephen Hemminger wrote:
> On Fri, 4 May 2018 16:13:07 -0600
> David Ahern <dsahern@gmail.com> wrote:
> 
>> On 5/4/18 3:56 PM, Stephen Hemminger wrote:
>>> All user api headers in iproute2 should be in include/uapi
>>> so that script can be used to put correct sanitized kernel headers
>>> there. And the header files for rdma must be a complete set; if one
>>> header file includes another, all must be present.
>>>
>>> This fixes build on older distributions, and Windows Services
>>> for Linux.
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>> ---
>>>  include/uapi/rdma/ib_user_sa.h                |   77 ++
>>>  include/uapi/rdma/ib_user_verbs.h             | 1210 +++++++++++++++++
>>>  .../uapi/rdma/rdma_netlink.h                  |   13 +
>>>  .../uapi/rdma/rdma_user_cm.h                  |    6 +-
>>>  4 files changed, 1303 insertions(+), 3 deletions(-)
>>>  create mode 100644 include/uapi/rdma/ib_user_sa.h
>>>  create mode 100644 include/uapi/rdma/ib_user_verbs.h
>>>  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
>>>  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
>>>   
>>
>> Stephen:
>>
>> Per a recent discussion the RDMA folks need to take ownership of the
>> uapi files. RDMA features do not hit Dave's net-next tree so the rdma
>> code can never hit iproute2-next during a dev cycle.
> 
> I want all uapi headers in include/uapi because it avoids possible overlap problems,
> During the linux-net/linus release cycle they should match what is Linus's tree.
> 
> During the net-next they can come from two sources.
> 

That creates extra work for me for no reason.

You state above "user api headers in iproute2 should be in include/uapi
so that script can be used to put correct sanitized kernel headers there."

With RDMA's development cycle that will *never* happen. With the
exception of RDMA, all iproute2 features go through net-next and it is
the right tree to pull updates from. Every time I sync from net-next the
header files for rdma will have to be ignored, so they will never be
updated through this mechanism which means the stated goal is not
achievable.

As for linux-next, I will not sync header files to a tree that
disappears; it breaks all traceability. Further, it seems to me that it
does not really solve the problem. I forget the steps now but RDMA
features have to hit some development tree before going to Linus, so
there will be a delay with the headers.

Back in March we discussed options. iproute2 is nothing more than a
delivery vehicle for rdmatool. Since it breaks everything else about the
iproute2 and net-next association, the simplest option for everyone is
for the rdma group to control syncing their own headers and putting them
under rdma directory.

^ permalink raw reply

* Re: [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function
From: Guenter Roeck @ 2018-05-05 14:38 UTC (permalink / raw)
  To: Israel Rukshin
  Cc: Max Gurtovoy, Matan Barak, Doug Ledford, linux-rdma, linux-kernel,
	netdev, Thomas Gleixner

On Thu, Apr 12, 2018 at 09:49:11AM +0000, Israel Rukshin wrote:
> Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
> This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
> and the fact that the internal completion vectors that mlx5 allocates
> don't get an EQ index.
> 
> The second problem here is that using effective_affinity_mask gives the same
> CPU for different vectors.
> This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
> This doesn't happen when using affinity_hint mask.
> 
Except that affinity_hint is only defined if SMP is enabled. Without:

include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
include/linux/mlx5/driver.h:1299:13: error:
        ‘struct irq_desc’ has no member named ‘affinity_hint’

Note that this is the only use of affinity_hint outside kernel/irq.
Don't other drivers have similar problems ?

Guenter

> Fixes: 2572cf57d75a ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
> Fixes: 05e0cc84e00c ("net/mlx5: Fix get vector affinity helper function")
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/infiniband/hw/mlx5/main.c |  2 +-
>  include/linux/mlx5/driver.h       | 12 +++---------
>  2 files changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index daa919e5a442..241cf4ff9901 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -4757,7 +4757,7 @@ mlx5_ib_get_vector_affinity(struct ib_device *ibdev, int comp_vector)
>  {
>  	struct mlx5_ib_dev *dev = to_mdev(ibdev);
>  
> -	return mlx5_get_vector_affinity(dev->mdev, comp_vector);
> +	return mlx5_get_vector_affinity_hint(dev->mdev, comp_vector);
>  }
>  
>  /* The mlx5_ib_multiport_mutex should be held when calling this function */
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index 767d193c269a..2a156c5dfadd 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -1284,25 +1284,19 @@ enum {
>  };
>  
>  static inline const struct cpumask *
> -mlx5_get_vector_affinity(struct mlx5_core_dev *dev, int vector)
> +mlx5_get_vector_affinity_hint(struct mlx5_core_dev *dev, int vector)
>  {
> -	const struct cpumask *mask;
>  	struct irq_desc *desc;
>  	unsigned int irq;
>  	int eqn;
>  	int err;
>  
> -	err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, &irq);
> +	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
>  	if (err)
>  		return NULL;
>  
>  	desc = irq_to_desc(irq);
> -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> -	mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
> -#else
> -	mask = desc->irq_common_data.affinity;
> -#endif
> -	return mask;
> +	return desc->affinity_hint;
>  }
>  
>  #endif /* MLX5_DRIVER_H */
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH] dt-bindings: can: rcar_can: Fix R8A7796 SoC name
From: Marc Kleine-Budde @ 2018-05-05 13:56 UTC (permalink / raw)
  To: Geert Uytterhoeven, Wolfgang Grandegger, Rob Herring,
	Mark Rutland
  Cc: Chris Paterson, linux-can, netdev, devicetree, linux-renesas-soc
In-Reply-To: <1525352553-17045-1-git-send-email-geert+renesas@glider.be>


[-- Attachment #1.1: Type: text/plain, Size: 461 bytes --]

On 05/03/2018 03:02 PM, Geert Uytterhoeven wrote:
> R8A7796 is R-Car M3-W.
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Applied to linux-can.

thnx,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 38/40] ide: remove ide_driver_proc_write
From: Eric W. Biederman @ 2018-05-05 13:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel,
	linux-kernel, linux-scsi, linux-ide, Greg Kroah-Hartman,
	jfs-discussion, linux-afs, linux-acpi, netdev, netfilter-devel,
	Alexander Viro, Jiri Slaby, Andrew Morton, linux-ext4,
	Alexey Dobriyan, megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180425154827.32251-39-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> The driver proc file hasn't been writeable for a long time, so this is
> just dead code.

It is possible to chmod this file to get at the write method.  Not that
I think anyone does.

It looks like this code was merged in 2.3.99-pre1 with permissions
S_IFREG|S_IRUGO so I don't think the write support was ever finished.

That cap_capable in the write method looks down right scary/buggy.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Eric



>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/ide/ide-proc.c | 46 ------------------------------------------
>  1 file changed, 46 deletions(-)
>
> diff --git a/drivers/ide/ide-proc.c b/drivers/ide/ide-proc.c
> index 863db44c7916..b3b8b8822d6a 100644
> --- a/drivers/ide/ide-proc.c
> +++ b/drivers/ide/ide-proc.c
> @@ -528,58 +528,12 @@ static int ide_driver_proc_open(struct inode *inode, struct file *file)
>  	return single_open(file, ide_driver_proc_show, PDE_DATA(inode));
>  }
>  
> -static int ide_replace_subdriver(ide_drive_t *drive, const char *driver)
> -{
> -	struct device *dev = &drive->gendev;
> -	int ret = 1;
> -	int err;
> -
> -	device_release_driver(dev);
> -	/* FIXME: device can still be in use by previous driver */
> -	strlcpy(drive->driver_req, driver, sizeof(drive->driver_req));
> -	err = device_attach(dev);
> -	if (err < 0)
> -		printk(KERN_WARNING "IDE: %s: device_attach error: %d\n",
> -			__func__, err);
> -	drive->driver_req[0] = 0;
> -	if (dev->driver == NULL) {
> -		err = device_attach(dev);
> -		if (err < 0)
> -			printk(KERN_WARNING
> -				"IDE: %s: device_attach(2) error: %d\n",
> -				__func__, err);
> -	}
> -	if (dev->driver && !strcmp(dev->driver->name, driver))
> -		ret = 0;
> -
> -	return ret;
> -}
> -
> -static ssize_t ide_driver_proc_write(struct file *file, const char __user *buffer,
> -				     size_t count, loff_t *pos)
> -{
> -	ide_drive_t	*drive = PDE_DATA(file_inode(file));
> -	char name[32];
> -
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EACCES;
> -	if (count > 31)
> -		count = 31;
> -	if (copy_from_user(name, buffer, count))
> -		return -EFAULT;
> -	name[count] = '\0';
> -	if (ide_replace_subdriver(drive, name))
> -		return -EINVAL;
> -	return count;
> -}
> -
>  static const struct file_operations ide_driver_proc_fops = {
>  	.owner		= THIS_MODULE,
>  	.open		= ide_driver_proc_open,
>  	.read		= seq_read,
>  	.llseek		= seq_lseek,
>  	.release	= single_release,
> -	.write		= ide_driver_proc_write,
>  };
>  
>  static int ide_media_proc_show(struct seq_file *m, void *v)

^ permalink raw reply

* Re: [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP
From: Julian Anastasov @ 2018-05-05 12:58 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, David Ahern, Tom Herbert, Eric Dumazet, Nikita Shirokov,
	kernel-team, lvs-devel
In-Reply-To: <20180503070114.bcuusvzga45klccs@kafai-mbp>


	Hello,

On Thu, 3 May 2018, Martin KaFai Lau wrote:

> > - when exactly we start to use the new PMTU, eg. what happens
> > in case socket caches the route, whether route is killed via
> > dst->obsolete. Or may be while the PMTU expiration is handled
> > per-packet, the PMTU change is noticed only on ICMP...
> Before sk can reuse its dst cache, the sk will notice
> its dst cache is no longer valid by calling dst_check().
> dst_check() should return NULL which is one of the side
> effect of the earlier update_pmtu().  This dst_check()
> is usually only called when the sk needs to do output,
> so the new PMTU route (i.e. the RTF_CACHE IPv6 route)
> only have effect to the later packets.

	I checked again the code and it looks like sockets
are forced to use new exceptional route (RTF_CACHE/fnhe) via
dst_check only when the PMTU update should move them away
from old non-exceptional routes. Later, if PMTU is
reduced/updated this is noticed for every packet via dst_mtu,
as in the case with TCP.

	So, except the RTF_LOCAL check in __ip6_rt_update_pmtu
we should have no other issues. Only one minor bit is strange to me,
why rt6_insert_exception warns for RTF_PCPU if rt6_cache_allowed_for_pmtu
allows it when returning true... 

	Also, commit 0d3f6d297bfb allows rt6_do_update_pmtu() for
routes without RTF_CACHE, RTF_PCPU and rt6i_node. Should we
restrict rt6_do_update_pmtu only to RTF_CACHE routes?

 	if (!rt6_cache_allowed_for_pmtu(rt6)) {
-		rt6_do_update_pmtu(rt6, mtu);
-		/* update rt6_ex->stamp for cache */
-		if (rt6->rt6i_flags & RTF_CACHE)
+		if (rt6->rt6i_flags & RTF_CACHE) {
+			rt6_do_update_pmtu(rt6, mtu);
+			/* update rt6_ex->stamp for cache */
 			rt6_update_exception_stamp_rt(rt6);
+		}
 	} else if (daddr) {

Regards

^ permalink raw reply

* Re: [PATCH 34/40] atm: simplify procfs code
From: Eric W. Biederman @ 2018-05-05 12:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel,
	linux-kernel, linux-scsi, linux-ide, Greg Kroah-Hartman,
	jfs-discussion, linux-afs, linux-acpi, netdev, netfilter-devel,
	Alexander Viro, Jiri Slaby, Andrew Morton, linux-ext4,
	Alexey Dobriyan, megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180425154827.32251-35-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Use remove_proc_subtree to remove the whole subtree on cleanup, and
> unwind the registration loop into individual calls.  Switch to use
> proc_create_seq where applicable.

Can you please explain why you are removing the error handling when
you are unwinding the registration loop?

>  int __init atm_proc_init(void)
>  {
> -	static struct atm_proc_entry *e;
> -	int ret;
> -
>  	atm_proc_root = proc_net_mkdir(&init_net, "atm", init_net.proc_net);
>  	if (!atm_proc_root)
> -		goto err_out;
> -	for (e = atm_proc_ents; e->name; e++) {
> -		struct proc_dir_entry *dirent;
> -
> -		dirent = proc_create(e->name, 0444,
> -				     atm_proc_root, e->proc_fops);
> -		if (!dirent)
> -			goto err_out_remove;
> -		e->dirent = dirent;
> -	}
> -	ret = 0;
> -out:
> -	return ret;
> -
> -err_out_remove:
> -	atm_proc_dirs_remove();
> -err_out:
> -	ret = -ENOMEM;
> -	goto out;
> +		return -ENOMEM;
> +	proc_create_seq("devices", 0444, atm_proc_root, &atm_dev_seq_ops);
> +	proc_create("pvc", 0444, atm_proc_root, &pvc_seq_fops);
> +	proc_create("svc", 0444, atm_proc_root, &svc_seq_fops);
> +	proc_create("vc", 0444, atm_proc_root, &vcc_seq_fops);
> +	return 0;
>  }

These proc entries could fail to register with -ENOMEM if for no other
reason.  Can you justify the removal of the error handling?

Can you please at least mention that removal in your change description
and explain why it is reasonable.

As it sits this is not a semantics preserving transformation, and the
difference is not documented.  Which raises red flags for me.

Eric

^ permalink raw reply

* Re: [PATCH v3 12/20] media: Remove depends on HAS_DMA in case of platform dependency
From: Mauro Carvalho Chehab @ 2018-05-05 12:47 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Ulf Hansson, Wolfram Sang, linux-iio, linux-fpga,
	linux-remoteproc, alsa-devel, Bjorn Andersson, Eric Anholt,
	netdev, linux-mtd, linux-i2c, linux1394-devel, Christoph Hellwig,
	Marek Szyprowski, Stefan Wahren, Boris Brezillon, Herbert Xu,
	Richard Weinberger, Joerg Roedel, Jassi Brar, Marek Vasut,
	linux-serial, Matias Bjorling, David Woodhouse, linux-media, Ohad
In-Reply-To: <1523987360-18760-13-git-send-email-geert@linux-m68k.org>

Hi Geert,

Em Tue, 17 Apr 2018 19:49:12 +0200
Geert Uytterhoeven <geert@linux-m68k.org> escreveu:

> Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
> symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".

Applying a patch like that is hard, as there are lots of churn at
the code. That's against latest media upstream:

checking file drivers/media/pci/intel/ipu3/Kconfig
Hunk #1 FAILED at 4.
1 out of 1 hunk FAILED
checking file drivers/media/pci/solo6x10/Kconfig
checking file drivers/media/pci/sta2x11/Kconfig
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED
checking file drivers/media/pci/tw5864/Kconfig
checking file drivers/media/pci/tw686x/Kconfig
checking file drivers/media/platform/Kconfig
Hunk #2 FAILED at 63.
Hunk #3 succeeded at 81 (offset 1 line).
Hunk #4 succeeded at 91 (offset 1 line).
Hunk #5 succeeded at 101 (offset 1 line).
Hunk #6 succeeded at 111 (offset 1 line).
Hunk #7 succeeded at 124 (offset 1 line).
Hunk #8 succeeded at 142 (offset 1 line).
Hunk #9 succeeded at 169 (offset 1 line).
Hunk #10 succeeded at 186 (offset 1 line).
Hunk #11 succeeded at 197 (offset 1 line).
Hunk #12 succeeded at 213 (offset 1 line).
Hunk #13 succeeded at 227 (offset 1 line).
Hunk #14 succeeded at 254 (offset 1 line).
Hunk #15 succeeded at 265 (offset 1 line).
Hunk #16 succeeded at 275 (offset 1 line).
Hunk #17 succeeded at 284 (offset 1 line).
Hunk #18 succeeded at 295 (offset 1 line).
Hunk #19 succeeded at 303 (offset 1 line).
Hunk #20 succeeded at 312 (offset 1 line).
Hunk #21 succeeded at 338 (offset 1 line).
Hunk #22 succeeded at 383 (offset 1 line).
Hunk #23 succeeded at 397 (offset 1 line).
Hunk #24 succeeded at 422 (offset 1 line).
Hunk #25 succeeded at 435 (offset 1 line).
Hunk #26 succeeded at 452 (offset 1 line).
Hunk #27 succeeded at 470 (offset 1 line).
Hunk #28 succeeded at 612 (offset 1 line).
1 out of 28 hunks FAILED

> In most cases this other symbol is an architecture or platform specific
> symbol, or PCI.
> 
> Generic symbols and drivers without platform dependencies keep their
> dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
> cannot work anyway.

Actually, depends on HAS_DMA was introduced on media because builds
were failing otherwise. We started adding it before the addition 
of COMPILE_TEST.

Can we just remove all HAS_DMA Kconfig dependencies as a hole from the
entire media subsystem, with something like:

	$ for i in $(find drivers/media -name Kconfig) $(find drivers/staging/media -name Kconfig); do sed '/depends on HAS_DMA/d;s/ && HAS_DMA//g' -i $i; done

Or would it cause build issues?

Regards,
Mauro




Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH 11/40] ipv6/flowlabel: simplify pid namespace lookup
From: Eric W. Biederman @ 2018-05-05 12:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Alexander Viro, Alexey Dobriyan,
	Greg Kroah-Hartman, Jiri Slaby, Alessandro Zummo,
	Alexandre Belloni, linux-acpi, drbd-dev, linux-ide, netdev,
	linux-rtc, megaraidlinux.pdl, linux-scsi, devel, linux-afs,
	linux-ext4, jfs-discussion, netfilter-devel, linux-kernel
In-Reply-To: <20180425154827.32251-12-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> The shole seq_file sequence already operates under a single RCU lock pair,
> so move the pid namespace lookup into it, and stop grabbing a reference
> and remove all kinds of boilerplate code.

This is wrong.

Move task_active_pid_ns(current) from open to seq_start actually means
that the results if you pass this proc file between callers the results
will change.  So this breaks file descriptor passing.

Open is a bad place to access current.  In the middle of read/write is
broken.


In this particular instance looking up the pid namespace with
task_active_pid_ns was a personal brain fart.  What the code should be
doing (with an appropriate helper) is:

struct pid_namespace *pid_ns = inode->i_sb->s_fs_info;

Because each mount of proc is bound to a pid namespace.  Looking up the
pid namespace from the super_block is a much better way to go.

Eric



> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  net/ipv6/ip6_flowlabel.c | 28 +++++-----------------------
>  1 file changed, 5 insertions(+), 23 deletions(-)
>
> diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> index c05c4e82a7ca..a9f221d45ef9 100644
> --- a/net/ipv6/ip6_flowlabel.c
> +++ b/net/ipv6/ip6_flowlabel.c
> @@ -754,7 +754,10 @@ static struct ip6_flowlabel *ip6fl_get_idx(struct seq_file *seq, loff_t pos)
>  static void *ip6fl_seq_start(struct seq_file *seq, loff_t *pos)
>  	__acquires(RCU)
>  {
> +	struct ip6fl_iter_state *state = ip6fl_seq_private(seq);
> +
>  	rcu_read_lock_bh();
> +	state->pid_ns = task_active_pid_ns(current);
>  	return *pos ? ip6fl_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
>  }
>  
> @@ -810,36 +813,15 @@ static const struct seq_operations ip6fl_seq_ops = {
>  
>  static int ip6fl_seq_open(struct inode *inode, struct file *file)
>  {
> -	struct seq_file *seq;
> -	struct ip6fl_iter_state *state;
> -	int err;
> -
> -	err = seq_open_net(inode, file, &ip6fl_seq_ops,
> +	return seq_open_net(inode, file, &ip6fl_seq_ops,
>  			   sizeof(struct ip6fl_iter_state));
> -
> -	if (!err) {
> -		seq = file->private_data;
> -		state = ip6fl_seq_private(seq);
> -		rcu_read_lock();
> -		state->pid_ns = get_pid_ns(task_active_pid_ns(current));
> -		rcu_read_unlock();
> -	}
> -	return err;
> -}
> -
> -static int ip6fl_seq_release(struct inode *inode, struct file *file)
> -{
> -	struct seq_file *seq = file->private_data;
> -	struct ip6fl_iter_state *state = ip6fl_seq_private(seq);
> -	put_pid_ns(state->pid_ns);
> -	return seq_release_net(inode, file);
>  }
>  
>  static const struct file_operations ip6fl_seq_fops = {
>  	.open		=	ip6fl_seq_open,
>  	.read		=	seq_read,
>  	.llseek		=	seq_lseek,
> -	.release	=	ip6fl_seq_release,
> +	.release	=	seq_release_net,
>  };
>  
>  static int __net_init ip6_flowlabel_proc_init(struct net *net)

^ permalink raw reply

* Re: [PATCH] net: dsa: drop some VLAs in switch.c
From: Salvatore Mesoraca @ 2018-05-05 10:36 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, linux-kernel, Kernel Hardening, netdev,
	David S. Miller, Andrew Lunn, Kees Cook
In-Reply-To: <b948ef55-8b16-fee4-b22f-4c6d09fea0cf@gmail.com>

2018-03-13 21:06 GMT+01:00 Florian Fainelli <f.fainelli@gmail.com>:
> On 03/13/2018 12:58 PM, Vivien Didelot wrote:
>> Hi Salvatore,
>>
>> Salvatore Mesoraca <s.mesoraca16@gmail.com> writes:
>>
>>> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid
>>> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports.
>>>
>>> [1] https://lkml.org/lkml/2018/3/7/621
>>>
>>> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
>>
>> NAK.
>>
>> We are in the process to remove hardcoded limits such as DSA_MAX_PORTS
>> and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports.
>
> Then this means that we need to allocate a bitmap from the heap, which
> sounds a bit superfluous and could theoretically fail... not sure which
> way is better, but bumping the size to DSA_MAX_PORTS definitively does
> help people working on enabling -Wvla.

Hi Florian,

Should I consider this patch still NAKed or not?
Should I resend the patch with some modifications?

Thank you,

Salvatore

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox