* [PATCH net] r8169: don't use MSI before RTL8168d
From: Heiner Kallweit @ 2019-07-27 10:45 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Dušan Dragić
It was reported that after resuming from suspend network fails with
error "do_IRQ: 3.38 No irq handler for vector", see [0]. Enabling WoL
can work around the issue, but the only actual fix is to disable MSI.
So let's mimic the behavior of the vendor driver and disable MSI on
all chip versions before RTL8168d.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204079
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Reported-by: Dušan Dragić <dragic.dusan@gmail.com>
Tested-by: Dušan Dragić <dragic.dusan@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
This version of the fix applies on kernel versions up to 5.2.
---
drivers/net/ethernet/realtek/r8169.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 6d176be51..038a034ee 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7105,13 +7105,18 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
{
unsigned int flags;
- if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
+ switch (tp->mac_version) {
+ case RTL_GIGA_MAC_VER_02 ... RTL_GIGA_MAC_VER_06:
rtl_unlock_config_regs(tp);
RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
rtl_lock_config_regs(tp);
+ /* fall through */
+ case RTL_GIGA_MAC_VER_07 ... RTL_GIGA_MAC_VER_24:
flags = PCI_IRQ_LEGACY;
- } else {
+ break;
+ default:
flags = PCI_IRQ_ALL_TYPES;
+ break;
}
return pci_alloc_irq_vectors(tp->pci_dev, 1, 1, flags);
--
2.22.0
^ permalink raw reply related
* Re: [PATCH 2/2] staging/octeon: Allow test build on !MIPS
From: Greg KH @ 2019-07-27 10:57 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: davem, netdev, aaro.koskinen, arnd
In-Reply-To: <20190726174425.6845-3-willy@infradead.org>
On Fri, Jul 26, 2019 at 10:44:25AM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> Add compile test support by moving all includes of files under
> asm/octeon into octeon-ethernet.h, and if we're not on MIPS,
> stub out all the calls into the octeon support code in octeon-stubs.h
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> drivers/staging/octeon/Kconfig | 2 +-
> drivers/staging/octeon/ethernet-defines.h | 2 -
> drivers/staging/octeon/ethernet-mdio.c | 6 +-
> drivers/staging/octeon/ethernet-mem.c | 5 +-
> drivers/staging/octeon/ethernet-rgmii.c | 10 +-
> drivers/staging/octeon/ethernet-rx.c | 13 +-
> drivers/staging/octeon/ethernet-rx.h | 2 -
> drivers/staging/octeon/ethernet-sgmii.c | 8 +-
> drivers/staging/octeon/ethernet-spi.c | 10 +-
> drivers/staging/octeon/ethernet-tx.c | 12 +-
> drivers/staging/octeon/ethernet-util.h | 4 -
> drivers/staging/octeon/ethernet.c | 12 +-
> drivers/staging/octeon/octeon-ethernet.h | 29 +-
> drivers/staging/octeon/octeon-stubs.h | 1429 +++++++++++++++++++++
> 14 files changed, 1466 insertions(+), 78 deletions(-)
> create mode 100644 drivers/staging/octeon/octeon-stubs.h
No real objection from me, having this driver able to be built on
non-mips systems would be great.
But wow, that stubs.h file is huge, you really need all of that?
There's no way to include the files from the mips "core" directly
instead for some of it?
If not, that's fine, and all of this can go through net-next:
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply
* Re: [PATCH 4.4 stable net] net: tcp: Fix use-after-free in tcp_write_xmit
From: maowenan @ 2019-07-27 11:22 UTC (permalink / raw)
To: Greg KH; +Cc: stable, davem, netdev, linux-kernel
In-Reply-To: <495c2d12-2c18-3498-52a0-71e9e8a05576@huawei.com>
On 2019/7/27 18:44, maowenan wrote:
>
>
> On 2019/7/24 20:13, maowenan wrote:
>>
>>
>> On 2019/7/24 19:05, Greg KH wrote:
>>> On Wed, Jul 24, 2019 at 05:17:15PM +0800, Mao Wenan wrote:
>>>> There is one report about tcp_write_xmit use-after-free with version 4.4.136:
>>>>
>>>> BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline]
>>>> BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
>>>> BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
>>>> Read of size 2 at addr ffff8801d6fc87b0 by task syz-executor408/4195
>>>>
>>>> CPU: 0 PID: 4195 Comm: syz-executor408 Not tainted 4.4.136-gfb7e319 #59
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>>> 0000000000000000 7d8f38ecc03be946 ffff8801d73b7710 ffffffff81e0edad
>>>> ffffea00075bf200 ffff8801d6fc87b0 0000000000000000 ffff8801d6fc87b0
>>>> dffffc0000000000 ffff8801d73b7748 ffffffff815159b6 ffff8801d6fc87b0
>>>> Call Trace:
>>>> [<ffffffff81e0edad>] __dump_stack lib/dump_stack.c:15 [inline]
>>>> [<ffffffff81e0edad>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
>>>> [<ffffffff815159b6>] print_address_description+0x6c/0x216 mm/kasan/report.c:252
>>>> [<ffffffff81515cd5>] kasan_report_error mm/kasan/report.c:351 [inline]
>>>> [<ffffffff81515cd5>] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408
>>>> [<ffffffff814f9784>] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427
>>>> [<ffffffff83286582>] tcp_skb_pcount include/net/tcp.h:796 [inline]
>>>> [<ffffffff83286582>] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
>>>> [<ffffffff83286582>] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
>>>> [<ffffffff83287a40>] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307
>>>> [<ffffffff8328e966>] tcp_send_fin+0x176/0xab0 net/ipv4/tcp_output.c:2883
>>>> [<ffffffff8324c0d0>] tcp_close+0xca0/0xf70 net/ipv4/tcp.c:2112
>>>> [<ffffffff832f8d0f>] inet_release+0xff/0x1d0 net/ipv4/af_inet.c:435
>>>> [<ffffffff82f1a156>] sock_release+0x96/0x1c0 net/socket.c:586
>>>> [<ffffffff82f1a296>] sock_close+0x16/0x20 net/socket.c:1037
>>>> [<ffffffff81522da5>] __fput+0x235/0x6f0 fs/file_table.c:208
>>>> [<ffffffff815232e5>] ____fput+0x15/0x20 fs/file_table.c:244
>>>> [<ffffffff8118bd7f>] task_work_run+0x10f/0x190 kernel/task_work.c:115
>>>> [<ffffffff81135285>] exit_task_work include/linux/task_work.h:21 [inline]
>>>> [<ffffffff81135285>] do_exit+0x9e5/0x26b0 kernel/exit.c:759
>>>> [<ffffffff8113b1d1>] do_group_exit+0x111/0x330 kernel/exit.c:889
>>>> [<ffffffff8115e5cc>] get_signal+0x4ec/0x14b0 kernel/signal.c:2321
>>>> [<ffffffff8100e02b>] do_signal+0x8b/0x1d30 arch/x86/kernel/signal.c:712
>>>> [<ffffffff8100360a>] exit_to_usermode_loop+0x11a/0x160 arch/x86/entry/common.c:248
>>>> [<ffffffff81006535>] prepare_exit_to_usermode arch/x86/entry/common.c:283 [inline]
>>>> [<ffffffff81006535>] syscall_return_slowpath+0x1b5/0x1f0 arch/x86/entry/common.c:348
>>>> [<ffffffff838c29b5>] int_ret_from_sys_call+0x25/0xa3
>>>>
>>>> Allocated by task 4194:
>>>> [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
>>>> [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
>>>> [<ffffffff814f8b57>] set_track mm/kasan/kasan.c:524 [inline]
>>>> [<ffffffff814f8b57>] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:616
>>>> [<ffffffff814f9122>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:554
>>>> [<ffffffff814f4c1e>] slab_post_alloc_hook mm/slub.c:1349 [inline]
>>>> [<ffffffff814f4c1e>] slab_alloc_node mm/slub.c:2615 [inline]
>>>> [<ffffffff814f4c1e>] slab_alloc mm/slub.c:2623 [inline]
>>>> [<ffffffff814f4c1e>] kmem_cache_alloc+0xbe/0x2a0 mm/slub.c:2628
>>>> [<ffffffff82f380a6>] kmem_cache_alloc_node include/linux/slab.h:350 [inline]
>>>> [<ffffffff82f380a6>] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218
>>>> [<ffffffff832466c3>] alloc_skb_fclone include/linux/skbuff.h:856 [inline]
>>>> [<ffffffff832466c3>] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833
>>>> [<ffffffff83249164>] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178
>>>> [<ffffffff83300ef3>] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755
>>>> [<ffffffff82f1e1fc>] sock_sendmsg_nosec net/socket.c:625 [inline]
>>>> [<ffffffff82f1e1fc>] sock_sendmsg+0xcc/0x110 net/socket.c:635
>>>> [<ffffffff82f1eedc>] SYSC_sendto+0x21c/0x370 net/socket.c:1665
>>>> [<ffffffff82f21560>] SyS_sendto+0x40/0x50 net/socket.c:1633
>>>> [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
>>>>
>>>> Freed by task 4194:
>>>> [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
>>>> [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
>>>> [<ffffffff814f91a2>] set_track mm/kasan/kasan.c:524 [inline]
>>>> [<ffffffff814f91a2>] kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:589
>>>> [<ffffffff814f632e>] slab_free_hook mm/slub.c:1383 [inline]
>>>> [<ffffffff814f632e>] slab_free_freelist_hook mm/slub.c:1405 [inline]
>>>> [<ffffffff814f632e>] slab_free mm/slub.c:2859 [inline]
>>>> [<ffffffff814f632e>] kmem_cache_free+0xbe/0x340 mm/slub.c:2881
>>>> [<ffffffff82f3527f>] kfree_skbmem+0xcf/0x100 net/core/skbuff.c:635
>>>> [<ffffffff82f372fd>] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676
>>>> [<ffffffff83288834>] sk_wmem_free_skb include/net/sock.h:1447 [inline]
>>>> [<ffffffff83288834>] tcp_write_queue_purge include/net/tcp.h:1460 [inline]
>>>> [<ffffffff83288834>] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline]
>>>> [<ffffffff83288834>] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261
>>>> [<ffffffff8329b991>] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246
>>>> [<ffffffff832f9ca9>] __inet_stream_connect+0x2a9/0xc30 net/ipv4/af_inet.c:615
>>>> [<ffffffff832fa685>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:676
>>>> [<ffffffff82f1eb78>] SYSC_connect+0x1b8/0x300 net/socket.c:1557
>>>> [<ffffffff82f214b4>] SyS_connect+0x24/0x30 net/socket.c:1538
>>>> [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
>>>>
>>>> Syzkaller reproducer():
>>>> r0 = socket$packet(0x11, 0x3, 0x300)
>>>> r1 = socket$inet_tcp(0x2, 0x1, 0x0)
>>>> bind$inet(r1, &(0x7f0000000300)={0x2, 0x4e21, @multicast1}, 0x10)
>>>> connect$inet(r1, &(0x7f0000000140)={0x2, 0x1000004e21, @loopback}, 0x10)
>>>> recvmmsg(r1, &(0x7f0000001e40)=[{{0x0, 0x0, &(0x7f0000000100)=[{&(0x7f00000005c0)=""/88, 0x58}], 0x1}}], 0x1, 0x40000000, 0x0)
>>>> sendto$inet(r1, &(0x7f0000000000)="e2f7ad5b661c761edf", 0x9, 0x8080, 0x0, 0x0)
>>>> r2 = fcntl$dupfd(r1, 0x0, r0)
>>>> connect$unix(r2, &(0x7f00000001c0)=@file={0x0, './file0\x00'}, 0x6e)
>>>>
>>>> C repro link: https://syzkaller.appspot.com/text?tag=ReproC&x=14db474f800000
>>>>
>>>> This is because when tcp_connect_init call tcp_write_queue_purge, it will
>>>> kfree all the skb in the write_queue, but the sk->sk_send_head forget to set NULL,
>>>> then tcp_write_xmit try to send skb, which has freed in tcp_write_queue_purge, UAF happens.
>>>>
>>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>>> ---
>>>> include/net/tcp.h | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>>>> index bf8a0dae977a..8f8aace28cf8 100644
>>>> --- a/include/net/tcp.h
>>>> +++ b/include/net/tcp.h
>>>> @@ -1457,6 +1457,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
>>>>
>>>> while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
>>>> sk_wmem_free_skb(sk, skb);
>>>> + sk->sk_send_head = NULL;
>>>> sk_mem_reclaim(sk);
>>>> tcp_clear_all_retrans_hints(tcp_sk(sk));
>>>> inet_csk(sk)->icsk_backoff = 0;
>>>
>>> Does this corrispond with a specific commit that is already in Linus's
>>> tree? If not, why, did we change/mess something up when doing
>>> backports, or is the code just that different?
>>>
>>> Also, is this needed in 4.9.y, 4.14.y, 4.19.y, and/or 5.2.y? Why just
>>> 4.4.y?
>
> Greg,
>
> I have tested latest stable tree
> 4.4.186 oops
> 4.9.151 oops
> 4.14.106 NO oops
>
> This patch can simple fix them.
I have checked 4.14.y it has already existed the same fix as mine, this is the reason why 4.14.106 is NO oops.
commit dbbf2d1e4077bab0c65ece2765d3fc69cf7d610f
Author: Soheil Hassas Yeganeh <soheil@google.com>
Date: Thu Mar 15 12:09:13 2018 -0400
tcp: reset sk_send_head in tcp_write_queue_purge
>
>>
>> Is it the commit 75c119afe14f? It does not use sk_send_head to indicate whether it has skb to be sent.
>>
>> commit 75c119afe14f74b4dd967d75ed9f57ab6c0ef045
>> Author: Eric Dumazet <edumazet@google.com>
>> Date: Thu Oct 5 22:21:27 2017 -0700
>>
>> tcp: implement rb-tree based retransmit queue
>>
>>
>> static inline struct sk_buff *tcp_send_head(const struct sock *sk)
>> {
>> - return sk->sk_send_head;
>> + return skb_peek(&sk->sk_write_queue);
>> }
>>
>>
>>
>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>>> .
>>>
>>
>>
>> .
>>
>
>
> .
>
^ permalink raw reply
* Re: general protection fault in tls_trim_both_msgs
From: syzbot @ 2019-07-27 11:40 UTC (permalink / raw)
To: ast, aviadye, borisp, bpf, daniel, davejwatson, davem,
jakub.kicinski, john.fastabend, kafai, linux-kernel, netdev,
songliubraving, syzkaller-bugs, yhs
In-Reply-To: <0000000000002b4896058e7abf78@google.com>
syzbot has found a reproducer for the following crash on:
HEAD commit: fde50b96 Add linux-next specific files for 20190726
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=142826cc600000
kernel config: https://syzkaller.appspot.com/x/.config?x=4b58274564b354c1
dashboard link: https://syzkaller.appspot.com/bug?extid=0e0fedcad708d12d3032
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14779d64600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1587c842600000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0e0fedcad708d12d3032@syzkaller.appspotmail.com
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 10205 Comm: syz-executor265 Not tainted 5.3.0-rc1-next-20190726
#53
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:tls_trim_both_msgs+0x54/0x130 net/tls/tls_sw.c:268
Code: 48 c1 ea 03 80 3c 02 00 0f 85 e3 00 00 00 4d 8b b5 b0 06 00 00 48 b8
00 00 00 00 00 fc ff df 49 8d 7e 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
85 b3 00 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b
RSP: 0018:ffff88809037fac0 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8880a8c0eec0 RCX: ffffffff862f4eef
RDX: 0000000000000005 RSI: ffffffff862e9016 RDI: 0000000000000028
RBP: ffff88809037fae0 R08: ffff8880944a8040 R09: ffffed10125e7d51
R10: ffffed10125e7d50 R11: ffff888092f3ea83 R12: 0000000000000000
R13: ffff8880a9560c80 R14: 0000000000000000 R15: 00000000ffffffe0
FS: 000055555717a880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc5f44109c0 CR3: 000000008b1cc000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tls_sw_sendmsg+0xe38/0x17b0 net/tls/tls_sw.c:1057
inet6_sendmsg+0x9e/0xe0 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
__sys_sendto+0x262/0x380 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto net/socket.c:1960 [inline]
__x64_sys_sendto+0xe1/0x1a0 net/socket.c:1960
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441339
Code: e8 fc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 9b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffef90e4908 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441339
RDX: ffffffffffffffc1 RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00000000006cb018 R08: 0000000000000000 R09: 1201000000003618
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402160
R13: 00000000004021f0 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace 94e33101f438b014 ]---
RIP: 0010:tls_trim_both_msgs+0x54/0x130 net/tls/tls_sw.c:268
Code: 48 c1 ea 03 80 3c 02 00 0f 85 e3 00 00 00 4d 8b b5 b0 06 00 00 48 b8
00 00 00 00 00 fc ff df 49 8d 7e 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
85 b3 00 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b
RSP: 0018:ffff88809037fac0 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8880a8c0eec0 RCX: ffffffff862f4eef
RDX: 0000000000000005 RSI: ffffffff862e9016 RDI: 0000000000000028
RBP: ffff88809037fae0 R08: ffff8880944a8040 R09: ffffed10125e7d51
R10: ffffed10125e7d50 R11: ffff888092f3ea83 R12: 0000000000000000
R13: ffff8880a9560c80 R14: 0000000000000000 R15: 00000000ffffffe0
FS: 000055555717a880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc5f44109c0 CR3: 000000008b1cc000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
^ permalink raw reply
* Re: [PATCH 4.4 stable net] net: tcp: Fix use-after-free in tcp_write_xmit
From: Greg KH @ 2019-07-27 11:40 UTC (permalink / raw)
To: maowenan; +Cc: stable, davem, netdev, linux-kernel
In-Reply-To: <02a3860d-4ad8-0ba9-e488-9a149de55b3b@huawei.com>
On Sat, Jul 27, 2019 at 07:22:30PM +0800, maowenan wrote:
>
>
> On 2019/7/27 18:44, maowenan wrote:
> >
> >
> > On 2019/7/24 20:13, maowenan wrote:
> >>
> >>
> >> On 2019/7/24 19:05, Greg KH wrote:
> >>> On Wed, Jul 24, 2019 at 05:17:15PM +0800, Mao Wenan wrote:
> >>>> There is one report about tcp_write_xmit use-after-free with version 4.4.136:
> >>>>
> >>>> BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline]
> >>>> BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
> >>>> BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
> >>>> Read of size 2 at addr ffff8801d6fc87b0 by task syz-executor408/4195
> >>>>
> >>>> CPU: 0 PID: 4195 Comm: syz-executor408 Not tainted 4.4.136-gfb7e319 #59
> >>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> >>>> 0000000000000000 7d8f38ecc03be946 ffff8801d73b7710 ffffffff81e0edad
> >>>> ffffea00075bf200 ffff8801d6fc87b0 0000000000000000 ffff8801d6fc87b0
> >>>> dffffc0000000000 ffff8801d73b7748 ffffffff815159b6 ffff8801d6fc87b0
> >>>> Call Trace:
> >>>> [<ffffffff81e0edad>] __dump_stack lib/dump_stack.c:15 [inline]
> >>>> [<ffffffff81e0edad>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
> >>>> [<ffffffff815159b6>] print_address_description+0x6c/0x216 mm/kasan/report.c:252
> >>>> [<ffffffff81515cd5>] kasan_report_error mm/kasan/report.c:351 [inline]
> >>>> [<ffffffff81515cd5>] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408
> >>>> [<ffffffff814f9784>] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427
> >>>> [<ffffffff83286582>] tcp_skb_pcount include/net/tcp.h:796 [inline]
> >>>> [<ffffffff83286582>] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
> >>>> [<ffffffff83286582>] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
> >>>> [<ffffffff83287a40>] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307
> >>>> [<ffffffff8328e966>] tcp_send_fin+0x176/0xab0 net/ipv4/tcp_output.c:2883
> >>>> [<ffffffff8324c0d0>] tcp_close+0xca0/0xf70 net/ipv4/tcp.c:2112
> >>>> [<ffffffff832f8d0f>] inet_release+0xff/0x1d0 net/ipv4/af_inet.c:435
> >>>> [<ffffffff82f1a156>] sock_release+0x96/0x1c0 net/socket.c:586
> >>>> [<ffffffff82f1a296>] sock_close+0x16/0x20 net/socket.c:1037
> >>>> [<ffffffff81522da5>] __fput+0x235/0x6f0 fs/file_table.c:208
> >>>> [<ffffffff815232e5>] ____fput+0x15/0x20 fs/file_table.c:244
> >>>> [<ffffffff8118bd7f>] task_work_run+0x10f/0x190 kernel/task_work.c:115
> >>>> [<ffffffff81135285>] exit_task_work include/linux/task_work.h:21 [inline]
> >>>> [<ffffffff81135285>] do_exit+0x9e5/0x26b0 kernel/exit.c:759
> >>>> [<ffffffff8113b1d1>] do_group_exit+0x111/0x330 kernel/exit.c:889
> >>>> [<ffffffff8115e5cc>] get_signal+0x4ec/0x14b0 kernel/signal.c:2321
> >>>> [<ffffffff8100e02b>] do_signal+0x8b/0x1d30 arch/x86/kernel/signal.c:712
> >>>> [<ffffffff8100360a>] exit_to_usermode_loop+0x11a/0x160 arch/x86/entry/common.c:248
> >>>> [<ffffffff81006535>] prepare_exit_to_usermode arch/x86/entry/common.c:283 [inline]
> >>>> [<ffffffff81006535>] syscall_return_slowpath+0x1b5/0x1f0 arch/x86/entry/common.c:348
> >>>> [<ffffffff838c29b5>] int_ret_from_sys_call+0x25/0xa3
> >>>>
> >>>> Allocated by task 4194:
> >>>> [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
> >>>> [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
> >>>> [<ffffffff814f8b57>] set_track mm/kasan/kasan.c:524 [inline]
> >>>> [<ffffffff814f8b57>] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:616
> >>>> [<ffffffff814f9122>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:554
> >>>> [<ffffffff814f4c1e>] slab_post_alloc_hook mm/slub.c:1349 [inline]
> >>>> [<ffffffff814f4c1e>] slab_alloc_node mm/slub.c:2615 [inline]
> >>>> [<ffffffff814f4c1e>] slab_alloc mm/slub.c:2623 [inline]
> >>>> [<ffffffff814f4c1e>] kmem_cache_alloc+0xbe/0x2a0 mm/slub.c:2628
> >>>> [<ffffffff82f380a6>] kmem_cache_alloc_node include/linux/slab.h:350 [inline]
> >>>> [<ffffffff82f380a6>] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218
> >>>> [<ffffffff832466c3>] alloc_skb_fclone include/linux/skbuff.h:856 [inline]
> >>>> [<ffffffff832466c3>] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833
> >>>> [<ffffffff83249164>] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178
> >>>> [<ffffffff83300ef3>] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755
> >>>> [<ffffffff82f1e1fc>] sock_sendmsg_nosec net/socket.c:625 [inline]
> >>>> [<ffffffff82f1e1fc>] sock_sendmsg+0xcc/0x110 net/socket.c:635
> >>>> [<ffffffff82f1eedc>] SYSC_sendto+0x21c/0x370 net/socket.c:1665
> >>>> [<ffffffff82f21560>] SyS_sendto+0x40/0x50 net/socket.c:1633
> >>>> [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
> >>>>
> >>>> Freed by task 4194:
> >>>> [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
> >>>> [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
> >>>> [<ffffffff814f91a2>] set_track mm/kasan/kasan.c:524 [inline]
> >>>> [<ffffffff814f91a2>] kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:589
> >>>> [<ffffffff814f632e>] slab_free_hook mm/slub.c:1383 [inline]
> >>>> [<ffffffff814f632e>] slab_free_freelist_hook mm/slub.c:1405 [inline]
> >>>> [<ffffffff814f632e>] slab_free mm/slub.c:2859 [inline]
> >>>> [<ffffffff814f632e>] kmem_cache_free+0xbe/0x340 mm/slub.c:2881
> >>>> [<ffffffff82f3527f>] kfree_skbmem+0xcf/0x100 net/core/skbuff.c:635
> >>>> [<ffffffff82f372fd>] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676
> >>>> [<ffffffff83288834>] sk_wmem_free_skb include/net/sock.h:1447 [inline]
> >>>> [<ffffffff83288834>] tcp_write_queue_purge include/net/tcp.h:1460 [inline]
> >>>> [<ffffffff83288834>] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline]
> >>>> [<ffffffff83288834>] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261
> >>>> [<ffffffff8329b991>] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246
> >>>> [<ffffffff832f9ca9>] __inet_stream_connect+0x2a9/0xc30 net/ipv4/af_inet.c:615
> >>>> [<ffffffff832fa685>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:676
> >>>> [<ffffffff82f1eb78>] SYSC_connect+0x1b8/0x300 net/socket.c:1557
> >>>> [<ffffffff82f214b4>] SyS_connect+0x24/0x30 net/socket.c:1538
> >>>> [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
> >>>>
> >>>> Syzkaller reproducer():
> >>>> r0 = socket$packet(0x11, 0x3, 0x300)
> >>>> r1 = socket$inet_tcp(0x2, 0x1, 0x0)
> >>>> bind$inet(r1, &(0x7f0000000300)={0x2, 0x4e21, @multicast1}, 0x10)
> >>>> connect$inet(r1, &(0x7f0000000140)={0x2, 0x1000004e21, @loopback}, 0x10)
> >>>> recvmmsg(r1, &(0x7f0000001e40)=[{{0x0, 0x0, &(0x7f0000000100)=[{&(0x7f00000005c0)=""/88, 0x58}], 0x1}}], 0x1, 0x40000000, 0x0)
> >>>> sendto$inet(r1, &(0x7f0000000000)="e2f7ad5b661c761edf", 0x9, 0x8080, 0x0, 0x0)
> >>>> r2 = fcntl$dupfd(r1, 0x0, r0)
> >>>> connect$unix(r2, &(0x7f00000001c0)=@file={0x0, './file0\x00'}, 0x6e)
> >>>>
> >>>> C repro link: https://syzkaller.appspot.com/text?tag=ReproC&x=14db474f800000
> >>>>
> >>>> This is because when tcp_connect_init call tcp_write_queue_purge, it will
> >>>> kfree all the skb in the write_queue, but the sk->sk_send_head forget to set NULL,
> >>>> then tcp_write_xmit try to send skb, which has freed in tcp_write_queue_purge, UAF happens.
> >>>>
> >>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> >>>> ---
> >>>> include/net/tcp.h | 1 +
> >>>> 1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
> >>>> index bf8a0dae977a..8f8aace28cf8 100644
> >>>> --- a/include/net/tcp.h
> >>>> +++ b/include/net/tcp.h
> >>>> @@ -1457,6 +1457,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
> >>>>
> >>>> while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
> >>>> sk_wmem_free_skb(sk, skb);
> >>>> + sk->sk_send_head = NULL;
> >>>> sk_mem_reclaim(sk);
> >>>> tcp_clear_all_retrans_hints(tcp_sk(sk));
> >>>> inet_csk(sk)->icsk_backoff = 0;
> >>>
> >>> Does this corrispond with a specific commit that is already in Linus's
> >>> tree? If not, why, did we change/mess something up when doing
> >>> backports, or is the code just that different?
> >>>
> >>> Also, is this needed in 4.9.y, 4.14.y, 4.19.y, and/or 5.2.y? Why just
> >>> 4.4.y?
> >
> > Greg,
> >
> > I have tested latest stable tree
> > 4.4.186 oops
> > 4.9.151 oops
> > 4.14.106 NO oops
> >
> > This patch can simple fix them.
>
> I have checked 4.14.y it has already existed the same fix as mine, this is the reason why 4.14.106 is NO oops.
> commit dbbf2d1e4077bab0c65ece2765d3fc69cf7d610f
> Author: Soheil Hassas Yeganeh <soheil@google.com>
> Date: Thu Mar 15 12:09:13 2018 -0400
>
> tcp: reset sk_send_head in tcp_write_queue_purge
>
So if this patch is backported to 4.4.y and 4.9.y all will be fine?
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: Himadri Pandya @ 2019-07-27 11:50 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <201907271302.tDRkl9uU%lkp@intel.com>
On 7/27/2019 10:50 AM, kbuild test robot wrote:
> Hi Himadri,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc1 next-20190726]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
This patch should be applied to linux-next git tree.
Thank you.
- Himadri
>
> url: https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
> config: x86_64-allyesconfig (attached as .config)
> compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All error/warnings (new ones prefixed by >>):
>
>>> net/vmw_vsock/hyperv_transport.c:58:28: error: 'HV_HYP_PAGE_SIZE' undeclared here (not in a function); did you mean 'HV_MESSAGE_SIZE'?
> #define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
> ^
>>> net/vmw_vsock/hyperv_transport.c:65:10: note: in expansion of macro 'HVS_SEND_BUF_SIZE'
> u8 data[HVS_SEND_BUF_SIZE];
> ^~~~~~~~~~~~~~~~~
> In file included from include/linux/list.h:9:0,
> from include/linux/module.h:9,
> from net/vmw_vsock/hyperv_transport.c:11:
> net/vmw_vsock/hyperv_transport.c: In function 'hvs_open_connection':
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
> #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
> ^~~~~~~~~~~~~
>>> net/vmw_vsock/hyperv_transport.c:390:12: note: in expansion of macro 'max_t'
> sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
> ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
> #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
> ^~~~~~~~~~~~~
>>> net/vmw_vsock/hyperv_transport.c:391:12: note: in expansion of macro 'min_t'
> sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
> ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
> #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
> ^~~~~~~~~~~~~
> net/vmw_vsock/hyperv_transport.c:393:12: note: in expansion of macro 'max_t'
> rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
> ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
> #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
> ^~~~~~~~~~~~~
> net/vmw_vsock/hyperv_transport.c:394:12: note: in expansion of macro 'min_t'
> rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
> ^~~~~
> net/vmw_vsock/hyperv_transport.c: In function 'hvs_stream_enqueue':
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
> #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
> ^~~~~~~~~~~~~
> net/vmw_vsock/hyperv_transport.c:681:14: note: in expansion of macro 'min_t'
> to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
> ^~~~~
>
> vim +58 net/vmw_vsock/hyperv_transport.c
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* [PATCH] Fix typo reigster to register
From: Pei Hsuan Hung @ 2019-07-27 14:21 UTC (permalink / raw)
Cc: afcidk, trivial, Russell Currey, Sam Bobroff,
Oliver O'Halloran, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Jeremy Kerr, Arnd Bergmann, MyungJoo Ham,
Chanwoo Choi, Liviu Dudau, Brian Starkey, David Airlie,
Daniel Vetter, Ping-Ke Shih, Kalle Valo, David S. Miller,
James Smart, Dick Kennedy, James E.J. Bottomley,
Martin K. Petersen, Alexander Viro, Larry Finger, linuxppc-dev,
linux-kernel, dri-devel, linux-wireless, netdev, linux-scsi,
linux-fsdevel
Signed-off-by: Pei Hsuan Hung <afcidk@gmail.com>
Cc: trivial@kernel.org
---
arch/powerpc/kernel/eeh.c | 2 +-
arch/powerpc/platforms/cell/spufs/switch.c | 4 ++--
drivers/extcon/extcon-rt8973a.c | 2 +-
drivers/gpu/drm/arm/malidp_regs.h | 2 +-
drivers/net/wireless/realtek/rtlwifi/rtl8192se/fw.h | 2 +-
drivers/scsi/lpfc/lpfc_hbadisc.c | 4 ++--
fs/userfaultfd.c | 2 +-
7 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c0e4b73191f3..d75c9c24ec4d 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1030,7 +1030,7 @@ int __init eeh_ops_register(struct eeh_ops *ops)
}
/**
- * eeh_ops_unregister - Unreigster platform dependent EEH operations
+ * eeh_ops_unregister - Unregister platform dependent EEH operations
* @name: name of EEH platform operations
*
* Unregister the platform dependent EEH operation callback
diff --git a/arch/powerpc/platforms/cell/spufs/switch.c b/arch/powerpc/platforms/cell/spufs/switch.c
index 5c3f5d088c3b..9548a086937b 100644
--- a/arch/powerpc/platforms/cell/spufs/switch.c
+++ b/arch/powerpc/platforms/cell/spufs/switch.c
@@ -574,7 +574,7 @@ static inline void save_mfc_rag(struct spu_state *csa, struct spu *spu)
{
/* Save, Step 38:
* Save RA_GROUP_ID register and the
- * RA_ENABLE reigster in the CSA.
+ * RA_ENABLE register in the CSA.
*/
csa->priv1.resource_allocation_groupID_RW =
spu_resource_allocation_groupID_get(spu);
@@ -1227,7 +1227,7 @@ static inline void restore_mfc_rag(struct spu_state *csa, struct spu *spu)
{
/* Restore, Step 29:
* Restore RA_GROUP_ID register and the
- * RA_ENABLE reigster from the CSA.
+ * RA_ENABLE register from the CSA.
*/
spu_resource_allocation_groupID_set(spu,
csa->priv1.resource_allocation_groupID_RW);
diff --git a/drivers/extcon/extcon-rt8973a.c b/drivers/extcon/extcon-rt8973a.c
index 40c07f4d656e..e75c03792398 100644
--- a/drivers/extcon/extcon-rt8973a.c
+++ b/drivers/extcon/extcon-rt8973a.c
@@ -270,7 +270,7 @@ static int rt8973a_muic_get_cable_type(struct rt8973a_muic_info *info)
}
cable_type = adc & RT8973A_REG_ADC_MASK;
- /* Read Device 1 reigster to identify correct cable type */
+ /* Read Device 1 register to identify correct cable type */
ret = regmap_read(info->regmap, RT8973A_REG_DEV1, &dev1);
if (ret) {
dev_err(info->dev, "failed to read DEV1 register\n");
diff --git a/drivers/gpu/drm/arm/malidp_regs.h b/drivers/gpu/drm/arm/malidp_regs.h
index 993031542fa1..0d81b34a4212 100644
--- a/drivers/gpu/drm/arm/malidp_regs.h
+++ b/drivers/gpu/drm/arm/malidp_regs.h
@@ -145,7 +145,7 @@
#define MALIDP_SE_COEFFTAB_DATA_MASK 0x3fff
#define MALIDP_SE_SET_COEFFTAB_DATA(x) \
((x) & MALIDP_SE_COEFFTAB_DATA_MASK)
-/* Enhance coeffents reigster offset */
+/* Enhance coeffents register offset */
#define MALIDP_SE_IMAGE_ENH 0x3C
/* ENH_LIMITS offset 0x0 */
#define MALIDP_SE_ENH_LOW_LEVEL 24
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192se/fw.h b/drivers/net/wireless/realtek/rtlwifi/rtl8192se/fw.h
index 99c6f7eefd85..d03c8f12a15c 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8192se/fw.h
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192se/fw.h
@@ -58,7 +58,7 @@ struct fw_priv {
/* 0x81: PCI-AP, 01:PCIe, 02: 92S-U,
* 0x82: USB-AP, 0x12: 72S-U, 03:SDIO */
u8 hci_sel;
- /* the same value as reigster value */
+ /* the same value as register value */
u8 chip_version;
/* customer ID low byte */
u8 customer_id_0;
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 28ecaa7fc715..9e116bd79836 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -6551,7 +6551,7 @@ lpfc_sli4_unregister_fcf(struct lpfc_hba *phba)
* lpfc_unregister_fcf_rescan - Unregister currently registered fcf and rescan
* @phba: Pointer to hba context object.
*
- * This function unregisters the currently reigstered FCF. This function
+ * This function unregisters the currently registered FCF. This function
* also tries to find another FCF for discovery by rescan the HBA FCF table.
*/
void
@@ -6609,7 +6609,7 @@ lpfc_unregister_fcf_rescan(struct lpfc_hba *phba)
* lpfc_unregister_fcf - Unregister the currently registered fcf record
* @phba: Pointer to hba context object.
*
- * This function just unregisters the currently reigstered FCF. It does not
+ * This function just unregisters the currently registered FCF. It does not
* try to find another FCF for discovery.
*/
void
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index ccbdbd62f0d8..612dc1240f90 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -267,7 +267,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
#endif /* CONFIG_HUGETLB_PAGE */
/*
- * Verify the pagetables are still not ok after having reigstered into
+ * Verify the pagetables are still not ok after having registered into
* the fault_pending_wqh to avoid userland having to UFFDIO_WAKE any
* userfault that has already been resolved, if userfaultfd_read and
* UFFDIO_COPY|ZEROPAGE are being run simultaneously on two different
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 2/2] staging/octeon: Allow test build on !MIPS
From: Matthew Wilcox @ 2019-07-27 14:28 UTC (permalink / raw)
To: Greg KH; +Cc: davem, netdev, aaro.koskinen, arnd
In-Reply-To: <20190727105706.GB458@kroah.com>
On Sat, Jul 27, 2019 at 12:57:06PM +0200, Greg KH wrote:
> No real objection from me, having this driver able to be built on
> non-mips systems would be great.
>
> But wow, that stubs.h file is huge, you really need all of that?
> There's no way to include the files from the mips "core" directly
> instead for some of it?
I don't know. I went the route of copying each structure/enum wholesale
as I came across it in the build log rather than taking only the pieces
of it that I needed. My time versus a few hundred lines of source?
I think that a more wholesale restructuring of this driver would be
helpful; there are a number of structures that are only used in the
driver and not in the arch code, and those could then be removed from
the stubs file. But I have no long term investment in this driver;
it just annoyed me to not be able to build it.
> If not, that's fine, and all of this can go through net-next:
>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply
* Re: [PATCH 2/2] staging/octeon: Allow test build on !MIPS
From: Greg KH @ 2019-07-27 14:51 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: davem, netdev, aaro.koskinen, arnd
In-Reply-To: <20190727142826.GA12889@bombadil.infradead.org>
On Sat, Jul 27, 2019 at 07:28:26AM -0700, Matthew Wilcox wrote:
> On Sat, Jul 27, 2019 at 12:57:06PM +0200, Greg KH wrote:
> > No real objection from me, having this driver able to be built on
> > non-mips systems would be great.
> >
> > But wow, that stubs.h file is huge, you really need all of that?
> > There's no way to include the files from the mips "core" directly
> > instead for some of it?
>
> I don't know. I went the route of copying each structure/enum wholesale
> as I came across it in the build log rather than taking only the pieces
> of it that I needed. My time versus a few hundred lines of source?
>
> I think that a more wholesale restructuring of this driver would be
> helpful; there are a number of structures that are only used in the
> driver and not in the arch code, and those could then be removed from
> the stubs file. But I have no long term investment in this driver;
> it just annoyed me to not be able to build it.
Ok, again, no objection from me, thanks for doing this work.
greg k-h
^ permalink raw reply
* [PATCH net] net/mlx5e: Fix unnecessary flow_block_cb_is_busy call
From: wenxu @ 2019-07-27 14:59 UTC (permalink / raw)
To: netdev, saeedm
From: wenxu <wenxu@ucloud.cn>
When call flow_block_cb_is_busy. The indr_priv is guaranteed to
NULL ptr. So there is no need to call flow_bock_cb_is_busy.
Fixes: 0d4fd02e7199 ("net: flow_offload: add flow_block_cb_is_busy() and use it")
Signed-off-by: wenxu <wenxu@ucloud.cn>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7f747cb..496d303 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -722,10 +722,6 @@ static void mlx5e_rep_indr_tc_block_unbind(void *cb_priv)
if (indr_priv)
return -EEXIST;
- if (flow_block_cb_is_busy(mlx5e_rep_indr_setup_block_cb,
- indr_priv, &mlx5e_block_cb_list))
- return -EBUSY;
-
indr_priv = kmalloc(sizeof(*indr_priv), GFP_KERNEL);
if (!indr_priv)
return -ENOMEM;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net] net: phylink: Fix flow control for fixed-link
From: Russell King - ARM Linux admin @ 2019-07-27 15:51 UTC (permalink / raw)
To: René van Dorst
Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit, David S . Miller,
netdev
In-Reply-To: <20190727094011.14024-1-opensource@vdorst.com>
On Sat, Jul 27, 2019 at 11:40:11AM +0200, René van Dorst wrote:
> In phylink_parse_fixedlink() the pl->link_config.advertising bits are AND
> with pl->supported, pl->supported is zeroed and only the speed/duplex
> modes and MII bits are set.
> So pl->link_config.advertising always loses the flow control/pause bits.
>
> By setting Pause and Asym_Pause bits in pl->supported, the flow control
> work again when devicetree "pause" is set in fixes-link node and the MAC
> advertise that is supports pause.
>
> Results with this patch.
>
> Legend:
> - DT = 'Pause' is set in the fixed-link in devicetree.
> - validate() = ‘Yes’ means phylink_set(mask, Pause) is set in the
> validate().
> - flow = results reported my link is Up line.
>
> +-----+------------+-------+
> | DT | validate() | flow |
> +-----+------------+-------+
> | Yes | Yes | rx/tx |
> | No | Yes | off |
> | Yes | No | off |
> +-----+------------+-------+
>
> Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> Signed-off-by: René van Dorst <opensource@vdorst.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Thanks.
> ---
> drivers/net/phy/phylink.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 5d0af041b8f9..a6aebaa14338 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -216,6 +216,8 @@ static int phylink_parse_fixedlink(struct phylink *pl,
> pl->supported, true);
> linkmode_zero(pl->supported);
> phylink_set(pl->supported, MII);
> + phylink_set(pl->supported, Pause);
> + phylink_set(pl->supported, Asym_Pause);
> if (s) {
> __set_bit(s->bit, pl->supported);
> } else {
> --
> 2.20.1
>
>
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply
* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-27 15:56 UTC (permalink / raw)
To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
linux-arm-kernel@lists.infradead.org
Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard, Chen-Yu Tsai,
Maxime Coquelin, linux-tegra, Giuseppe Cavallaro, Robin Murphy,
David S . Miller
In-Reply-To: <1e2ea942-28fe-15b9-f675-8d6585f9a33f@nvidia.com>
From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/26/2019, 15:11:00 (UTC+00:00)
>
> On 25/07/2019 16:12, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/25/2019, 15:25:59 (UTC+00:00)
> >
> >>
> >> On 25/07/2019 14:26, Jose Abreu wrote:
> >>
> >> ...
> >>
> >>> Well, I wasn't expecting that :/
> >>>
> >>> Per documentation of barriers I think we should set descriptor fields
> >>> and then barrier and finally ownership to HW so that remaining fields
> >>> are coherent before owner is set.
> >>>
> >>> Anyway, can you also add a dma_rmb() after the call to
> >>> stmmac_rx_status() ?
> >>
> >> Yes. I removed the debug print added the barrier, but that did not help.
> >
> > So, I was finally able to setup NFS using your replicated setup and I
> > can't see the issue :(
> >
> > The only difference I have from yours is that I'm using TCP in NFS
> > whilst you (I believe from the logs), use UDP.
>
> So I tried TCP by setting the kernel boot params to 'nfsvers=3' and
> 'proto=tcp' and this does appear to be more stable, but not 100% stable.
> It still appears to fail in the same place about 50% of the time.
>
> > You do have flow control active right ? And your HW FIFO size is >= 4k ?
>
> How can I verify if flow control is active?
You can check it by dumping register MTL_RxQ_Operation_Mode (0xd30).
Can you also add IOMMU debug in file "drivers/iommu/iommu.c" ?
---
Thanks,
Jose Miguel Abreu
^ permalink raw reply
* bpf boot error: WARNING: workqueue cpumask: online intersect > possible intersect (2)
From: syzbot @ 2019-07-27 15:58 UTC (permalink / raw)
To: linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: 5d01ab7b libbpf: fix erroneous multi-closing of BTF FD
git tree: bpf
console output: https://syzkaller.appspot.com/x/log.txt?x=110318b4600000
kernel config: https://syzkaller.appspot.com/x/.config?x=6efd5962fd8c1d39
dashboard link: https://syzkaller.appspot.com/bug?extid=40f581848b1c5452b5ed
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+40f581848b1c5452b5ed@syzkaller.appspotmail.com
smpboot: CPU0: Intel(R) Xeon(R) CPU @ 2.30GHz (family: 0x6, model: 0x3f,
stepping: 0x0)
Performance Events: unsupported p6 CPU model 63 no PMU driver, software
events only.
rcu: Hierarchical SRCU implementation.
NMI watchdog: Perf NMI watchdog permanently disabled
smp: Bringing up secondary CPUs ...
x86: Booting SMP configuration:
.... node #0, CPUs: #1
MDS CPU bug present and SMT on, data leak possible. See
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for
more details.
smp: Brought up 2 nodes, 2 CPUs
smpboot: Max logical packages: 1
smpboot: Total of 2 processors activated (9200.00 BogoMIPS)
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns:
19112604462750000 ns
futex hash table entries: 512 (order: 4, 65536 bytes, vmalloc)
xor: automatically using best checksumming function avx
PM: RTC time: 01:04:45, date: 2019-07-27
NET: Registered protocol family 16
audit: initializing netlink subsys (disabled)
cpuidle: using governor menu
ACPI: bus type PCI registered
dca service started, version 1.12.1
PCI: Using configuration type 1 for base access
WARNING: workqueue cpumask: online intersect > possible intersect
HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
cryptd: max_cpu_qlen set to 1000
raid6: avx2x4 gen() 11248 MB/s
raid6: avx2x4 xor() 6695 MB/s
raid6: avx2x2 gen() 6687 MB/s
raid6: avx2x2 xor() 3654 MB/s
raid6: avx2x1 gen() 3443 MB/s
raid6: avx2x1 xor() 2027 MB/s
raid6: sse2x4 gen() 5758 MB/s
raid6: sse2x4 xor() 3294 MB/s
raid6: sse2x2 gen() 3924 MB/s
raid6: sse2x2 xor() 1858 MB/s
raid6: sse2x1 gen() 1745 MB/s
raid6: sse2x1 xor() 1019 MB/s
raid6: using algorithm avx2x4 gen() 11248 MB/s
raid6: .... xor() 6695 MB/s, rmw enabled
raid6: using avx2x2 recovery algorithm
ACPI: Added _OSI(Module Device)
ACPI: Added _OSI(Processor Device)
ACPI: Added _OSI(3.0 _SCP Extensions)
ACPI: Added _OSI(Processor Aggregator Device)
ACPI: Added _OSI(Linux-Dell-Video)
ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
ACPI: 2 ACPI AML tables successfully acquired and loaded
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and
report a bug
ACPI: Enabled 16 GPEs in block 00 to 0F
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended
PCI configuration space under this bridge.
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
pci 0000:00:01.0: [8086:7110] type 00 class 0x060100
pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
pci 0000:00:03.0: [1af4:1004] type 00 class 0x000000
pci 0000:00:03.0: reg 0x10: [io 0xc000-0xc03f]
pci 0000:00:03.0: reg 0x14: [mem 0xfebfe000-0xfebfe07f]
pci 0000:00:04.0: [1af4:1000] type 00 class 0x020000
pci 0000:00:04.0: reg 0x10: [io 0xc040-0xc07f]
pci 0000:00:04.0: reg 0x14: [mem 0xfebff000-0xfebff07f]
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
vgaarb: loaded
SCSI subsystem initialized
ACPI: bus type USB registered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
mc: Linux media interface: v0.10
videodev: Linux video capture interface: v2.00
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti
<giometti@linux.it>
PTP clock support registered
EDAC MC: Ver: 3.0.0
Advanced Linux Sound Architecture Driver Initialized.
PCI: Using ACPI for IRQ routing
Bluetooth: Core ver 2.22
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
Bluetooth: L2CAP socket layer initialized
Bluetooth: SCO socket layer initialized
NET: Registered protocol family 8
NET: Registered protocol family 20
NetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
NetLabel: unlabeled traffic allowed by default
nfc: nfc_init: NFC Core ver 0.1
NET: Registered protocol family 39
clocksource: Switched to clocksource kvm-clock
VFS: Disk quotas dquot_6.6.0
VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
FS-Cache: Loaded
*** VALIDATE hugetlbfs ***
CacheFiles: Loaded
TOMOYO: 2.6.0
Mandatory Access Control activated.
AppArmor: AppArmor Filesystem Enabled
pnp: PnP ACPI init
pnp: PnP ACPI: found 7 devices
thermal_sys: Registered thermal governor 'step_wise'
thermal_sys: Registered thermal governor 'user_space'
clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns:
2085701024 ns
pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: resource 7 [mem 0xc0000000-0xfebfffff window]
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 4096 (order: 6, 294912 bytes,
vmalloc)
TCP established hash table entries: 65536 (order: 7, 524288 bytes, vmalloc)
TCP bind hash table entries: 65536 (order: 10, 4194304 bytes, vmalloc)
TCP: Hash tables configured (established 65536 bind 65536)
UDP hash table entries: 4096 (order: 7, 655360 bytes, vmalloc)
UDP-Lite hash table entries: 4096 (order: 7, 655360 bytes, vmalloc)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
NET: Registered protocol family 44
pci 0000:00:00.0: Limiting direct PCI/PCI transfers
PCI: CLS 0 bytes, default 64
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
software IO TLB: mapped [mem 0xaa800000-0xae800000] (64MB)
RAPL PMU: API unit is 2^-32 Joules, 0 fixed counters, 10737418240 ms ovfl
timer
kvm: already loaded the other module
clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x212735223b2,
max_idle_ns: 440795277976 ns
clocksource: Switched to clocksource tsc
mce: Machine check injector initialized
check: Scanning for low memory corruption every 60 seconds
Initialise system trusted keyrings
workingset: timestamp_bits=40 max_order=21 bucket_order=0
zbud: loaded
DLM installed
squashfs: version 4.0 (2009/01/31) Phillip Lougher
FS-Cache: Netfs 'nfs' registered for caching
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
nfs4filelayout_init: NFSv4 File Layout Driver Registering...
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
ntfs: driver 2.1.32 [Flags: R/W].
fuse: init (API version 7.31)
JFS: nTxBlock = 8192, nTxLock = 65536
SGI XFS with ACLs, security attributes, realtime, no debug enabled
9p: Installing v9fs 9p2000 file system support
FS-Cache: Netfs '9p' registered for caching
gfs2: GFS2 installed
FS-Cache: Netfs 'ceph' registered for caching
ceph: loaded (mds proto 32)
NET: Registered protocol family 38
async_tx: api initialized (async)
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Asymmetric key parser 'pkcs8' registered
Key type pkcs7_test registered
Asymmetric key parser 'tpm_parser' registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 246)
io scheduler mq-deadline registered
io scheduler kyber registered
io scheduler bfq registered
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: Power Button [PWRF]
input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
ACPI: Sleep Button [SLPF]
ioatdma: Intel(R) QuickData Technology Driver 5.00
PCI Interrupt Link [LNKC] enabled at IRQ 11
virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
PCI Interrupt Link [LNKD] enabled at IRQ 10
virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
HDLC line discipline maxframe=4096
N_HDLC line discipline registered.
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
[drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
[drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
usbcore: registered new interface driver udl
brd: module loaded
loop: module loaded
zram: Added device: zram0
null: module loaded
nfcsim 0.2 initialized
Loading iSCSI transport class v2.0-870.
scsi host0: Virtio SCSI HBA
st: Version 20160209, fixed bufsize 32768, s/g segs 256
kobject: 'sd' (0000000040ee4d1b): kobject_add_internal: parent: 'drivers',
set: 'drivers'
kobject: 'sd' (0000000040ee4d1b): kobject_uevent_env
kobject: 'sd' (0000000040ee4d1b): fill_kobj_path: path
= '/bus/scsi/drivers/sd'
kobject: 'sr' (00000000e7d12427): kobject_add_internal: parent: 'drivers',
set: 'drivers'
kobject: 'sr' (00000000e7d12427): kobject_uevent_env
kobject: 'sr' (00000000e7d12427): fill_kobj_path: path
= '/bus/scsi/drivers/sr'
kobject: 'scsi_generic' (000000000443366b): kobject_add_internal:
parent: 'class', set: 'class'
kobject: 'scsi_generic' (000000000443366b): kobject_uevent_env
kobject: 'scsi_generic' (000000000443366b): fill_kobj_path: path
= '/class/scsi_generic'
kobject: 'nvme-wq' (0000000010fea8b9): kobject_add_internal:
parent: 'workqueue', set: 'devices'
kobject: 'nvme-wq' (0000000010fea8b9): kobject_uevent_env
kobject: 'nvme-wq' (0000000010fea8b9): kobject_uevent_env: uevent_suppress
caused the event to drop!
kobject: 'nvme-wq' (0000000010fea8b9): kobject_uevent_env
kobject: 'nvme-wq' (0000000010fea8b9): fill_kobj_path: path
= '/devices/virtual/workqueue/nvme-wq'
kobject: 'nvme-reset-wq' (00000000a7f5b0c8): kobject_add_internal:
parent: 'workqueue', set: 'devices'
kobject: 'nvme-reset-wq' (00000000a7f5b0c8): kobject_uevent_env
kobject: 'nvme-reset-wq' (00000000a7f5b0c8): kobject_uevent_env:
uevent_suppress caused the event to drop!
kobject: 'nvme-reset-wq' (00000000a7f5b0c8): kobject_uevent_env
kobject: 'nvme-reset-wq' (00000000a7f5b0c8): fill_kobj_path: path
= '/devices/virtual/workqueue/nvme-reset-wq'
kobject: 'nvme-delete-wq' (00000000dc28d66f): kobject_add_internal:
parent: 'workqueue', set: 'devices'
kobject: 'nvme-delete-wq' (00000000dc28d66f): kobject_uevent_env
kobject: 'nvme-delete-wq' (00000000dc28d66f): kobject_uevent_env:
uevent_suppress caused the event to drop!
kobject: 'nvme-delete-wq' (00000000dc28d66f): kobject_uevent_env
kobject: 'nvme-delete-wq' (00000000dc28d66f): fill_kobj_path: path
= '/devices/virtual/workqueue/nvme-delete-wq'
kobject: 'nvme' (00000000d7d8a11c): kobject_add_internal: parent: 'class',
set: 'class'
kobject: 'nvme' (00000000d7d8a11c): kobject_uevent_env
kobject: 'nvme' (00000000d7d8a11c): fill_kobj_path: path = '/class/nvme'
kobject: 'nvme-subsystem' (00000000f9b59088): kobject_add_internal:
parent: 'class', set: 'class'
kobject: 'nvme-subsystem' (00000000f9b59088): kobject_uevent_env
kobject: 'nvme-subsystem' (00000000f9b59088): fill_kobj_path: path
= '/class/nvme-subsystem'
kobject: 'nvme' (000000005b8830cb): kobject_add_internal:
parent: 'drivers', set: 'drivers'
kobject: 'drivers' (00000000f79b6140): kobject_add_internal:
parent: 'nvme', set: '<NULL>'
kobject: 'nvme' (000000005b8830cb): kobject_uevent_env
kobject: 'nvme' (000000005b8830cb): fill_kobj_path: path
= '/bus/pci/drivers/nvme'
kobject: 'ahci' (000000005bbce619): kobject_add_internal:
parent: 'drivers', set: 'drivers'
kobject: 'drivers' (000000000440468e): kobject_add_internal:
parent: 'ahci', set: '<NULL>'
kobject: 'ahci' (000000005bbce619): kobject_uevent_env
kobject: 'ahci' (000000005bbce619): fill_kobj_path: path
= '/bus/pci/drivers/ahci'
kobject: 'ata_piix' (00000000b703d945): kobject_add_internal:
parent: 'drivers', set: 'drivers'
kobject: 'drivers' (000000008d5db480): kobject_add_internal:
parent: 'ata_piix', set: '<NULL>'
kobject: 'ata_piix' (00000000b703d945): kobject_uevent_env
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply
* Re: [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Alexei Starovoitov @ 2019-07-27 17:00 UTC (permalink / raw)
To: Andrii Nakryiko, Alexei Starovoitov
Cc: Andrii Nakryiko, bpf, Networking, Daniel Borkmann, Yonghong Song,
Kernel Team
In-Reply-To: <CAEf4BzZxPgAh4PGSWyD0tPOd1wh=DGZuSe1fzxc-Sgyk4D5vDg@mail.gmail.com>
On 7/26/19 11:25 PM, Andrii Nakryiko wrote:
>>> + } else if (class == BPF_ST && BPF_MODE(insn->code) == BPF_MEM) {
>>> + if (insn->imm != orig_off)
>>> + return -EINVAL;
>>> + insn->imm = new_off;
>>> + pr_debug("prog '%s': patched insn #%d (ST | MEM) imm %d -> %d\n",
>>> + bpf_program__title(prog, false),
>>> + insn_idx, orig_off, new_off);
>> I'm pretty sure llvm was not capable of emitting BPF_ST insn.
>> When did that change?
> I just looked at possible instructions that could have 32-bit
> immediate value. This is `*(rX) = offsetof(struct s, field)`, which I
> though is conceivable. Do you think I should drop it?
Just trying to point out that since it's not emitted by llvm
this code is likely untested ?
Or you've created a bpf asm test for this?
^ permalink raw reply
* Re: [PATCH 1/2] ipmr: Make cache queue length configurable
From: Stephen Suryaputra @ 2019-07-27 17:03 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Brodie Greenfield, David Miller, Stephen Hemminger, kuznet,
yoshfuji, netdev, linux-kernel, chris.packham, luuk.paulussen
In-Reply-To: <6e8c51a0-cd34-e14a-7661-6fa5945f278b@cumulusnetworks.com>
On Fri, Jul 26, 2019 at 7:18 AM Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> > You've said it yourself - it has linear traversal time, but doesn't this patch allow any netns on the
> > system to increase its limit to any value, thus possibly affecting others ?
> > Though the socket limit will kick in at some point. I think that's where David
> > was going with his suggestion back in 2018:
> > https://www.spinics.net/lists/netdev/msg514543.html
> >
> > If we add this sysctl now, we'll be stuck with it. I'd prefer David's suggestion
> > so we can rely only on the receive queue queue limit which is already configurable.
> > We still need to be careful with the defaults though, the NOCACHE entry is 128 bytes
> > and with the skb overhead currently on my setup we end up at about 277 entries default limit.
>
> I mean that people might be surprised if they increased that limit by default, that's the
> only problem I'm not sure how to handle. Maybe we need some hard limit anyway.
> Have you done any tests what value works for your setup ?
FYI: for ours, it is 2048.
^ permalink raw reply
* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Yonghong Song @ 2019-07-27 17:08 UTC (permalink / raw)
To: sedat.dilek@gmail.com, Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, Martin Lau, Song Liu,
netdev@vger.kernel.org, bpf@vger.kernel.org, Clang-Built-Linux ML,
Kees Cook, Nick Desaulniers, Nathan Chancellor
In-Reply-To: <CA+icZUXGPCgdJzxTO+8W0EzNLZEQ88J_wusp7fPfEkNE2RoXJA@mail.gmail.com>
On 7/27/19 12:36 AM, Sedat Dilek wrote:
> On Sat, Jul 27, 2019 at 4:24 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>>>
>>> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/26/19 2:02 PM, Sedat Dilek wrote:
>>>>> On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>>>>>>
>>>>>> Hi Yonghong Song,
>>>>>>
>>>>>> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
>>>>>>>
>>>>>>> Glad to know clang 9 has asm goto support and now It can compile
>>>>>>> kernel again.
>>>>>>>
>>>>>>
>>>>>> Yupp.
>>>>>>
>>>>>>>>
>>>>>>>> I am seeing a problem in the area bpf/seccomp causing
>>>>>>>> systemd/journald/udevd services to fail.
>>>>>>>>
>>>>>>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
>>>>>>>> to connect stdout to the journal socket, ignoring: Connection refused
>>>>>>>>
>>>>>>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
>>>>>>>> BFD linker ld.bfd on Debian/buster AMD64.
>>>>>>>> In both cases I use clang-9 (prerelease).
>>>>>>>
>>>>>>> Looks like it is a lld bug.
>>>>>>>
>>>>>>> I see the stack trace has __bpf_prog_run32() which is used by
>>>>>>> kernel bpf interpreter. Could you try to enable bpf jit
>>>>>>> sysctl net.core.bpf_jit_enable = 1
>>>>>>> If this passed, it will prove it is interpreter related.
>>>>>>>
>>>>>>
>>>>>> After...
>>>>>>
>>>>>> sysctl -w net.core.bpf_jit_enable=1
>>>>>>
>>>>>> I can start all failed systemd services.
>>>>>>
>>>>>> systemd-journald.service
>>>>>> systemd-udevd.service
>>>>>> haveged.service
>>>>>>
>>>>>> This is in maintenance mode.
>>>>>>
>>>>>> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
>>>>>>
>>>>>
>>>>> This is what I did:
>>>>
>>>> I probably won't have cycles to debug this potential lld issue.
>>>> Maybe you already did, I suggest you put enough reproducible
>>>> details in the bug you filed against lld so they can take a look.
>>>>
>>>
>>> I understand and will put the journalctl-log into the CBL issue
>>> tracker and update informations.
>>>
>>> Thanks for your help understanding the BPF correlations.
>>>
>>> Is setting 'net.core.bpf_jit_enable = 2' helpful here?
>>
>> jit_enable=1 is enough.
>> Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
>>
>> It sounds like clang miscompiles interpreter.
>> modprobe test_bpf
>> should be able to point out which part of interpreter is broken.
>
> Maybe we need something like...
>
> "bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()"
>
> ...for clang?
Not sure how do you get conclusion it is gcse causing the problem.
But anyway, adding such flag in the kernel is not a good idea.
clang/llvm should be fixed instead. Esp. there is still time
for 9.0.0 release to fix bugs.
>
> - Sedat -
>
> [1] https://git.kernel.org/linus/3193c0836f203a91bef96d88c64cccf0be090d9c
>
^ permalink raw reply
* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Yonghong Song @ 2019-07-27 17:11 UTC (permalink / raw)
To: sedat.dilek@gmail.com, Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, Martin Lau, Song Liu,
netdev@vger.kernel.org, bpf@vger.kernel.org, Clang-Built-Linux ML,
Kees Cook, Nick Desaulniers, Nathan Chancellor
In-Reply-To: <CA+icZUWVf6AK3bxfWBZ7iM1QTyk_G-4+1_LyK0jkoBDkDzvx4Q@mail.gmail.com>
On 7/27/19 1:16 AM, Sedat Dilek wrote:
> On Sat, Jul 27, 2019 at 9:36 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>>
>> On Sat, Jul 27, 2019 at 4:24 AM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>>
>>> On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>>>>
>>>> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 7/26/19 2:02 PM, Sedat Dilek wrote:
>>>>>> On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Yonghong Song,
>>>>>>>
>>>>>>> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
>>>>>>>>
>>>>>>>> Glad to know clang 9 has asm goto support and now It can compile
>>>>>>>> kernel again.
>>>>>>>>
>>>>>>>
>>>>>>> Yupp.
>>>>>>>
>>>>>>>>>
>>>>>>>>> I am seeing a problem in the area bpf/seccomp causing
>>>>>>>>> systemd/journald/udevd services to fail.
>>>>>>>>>
>>>>>>>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
>>>>>>>>> to connect stdout to the journal socket, ignoring: Connection refused
>>>>>>>>>
>>>>>>>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
>>>>>>>>> BFD linker ld.bfd on Debian/buster AMD64.
>>>>>>>>> In both cases I use clang-9 (prerelease).
>>>>>>>>
>>>>>>>> Looks like it is a lld bug.
>>>>>>>>
>>>>>>>> I see the stack trace has __bpf_prog_run32() which is used by
>>>>>>>> kernel bpf interpreter. Could you try to enable bpf jit
>>>>>>>> sysctl net.core.bpf_jit_enable = 1
>>>>>>>> If this passed, it will prove it is interpreter related.
>>>>>>>>
>>>>>>>
>>>>>>> After...
>>>>>>>
>>>>>>> sysctl -w net.core.bpf_jit_enable=1
>>>>>>>
>>>>>>> I can start all failed systemd services.
>>>>>>>
>>>>>>> systemd-journald.service
>>>>>>> systemd-udevd.service
>>>>>>> haveged.service
>>>>>>>
>>>>>>> This is in maintenance mode.
>>>>>>>
>>>>>>> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
>>>>>>>
>>>>>>
>>>>>> This is what I did:
>>>>>
>>>>> I probably won't have cycles to debug this potential lld issue.
>>>>> Maybe you already did, I suggest you put enough reproducible
>>>>> details in the bug you filed against lld so they can take a look.
>>>>>
>>>>
>>>> I understand and will put the journalctl-log into the CBL issue
>>>> tracker and update informations.
>>>>
>>>> Thanks for your help understanding the BPF correlations.
>>>>
>>>> Is setting 'net.core.bpf_jit_enable = 2' helpful here?
>>>
>>> jit_enable=1 is enough.
>>> Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
>>>
>>> It sounds like clang miscompiles interpreter.
>
> Just to clarify:
> This does not happen with clang-9 + ld.bfd (GNU/ld linker).
>
>>> modprobe test_bpf
>>> should be able to point out which part of interpreter is broken.
>>
>> Maybe we need something like...
>>
>> "bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()"
>>
>> ...for clang?
>>
>
> Not sure if something like GCC's...
>
> -fgcse
>
> Perform a global common subexpression elimination pass. This pass also
> performs global constant and copy propagation.
>
> Note: When compiling a program using computed gotos, a GCC extension,
> you may get better run-time performance if you disable the global
> common subexpression elimination pass by adding -fno-gcse to the
> command line.
>
> Enabled at levels -O2, -O3, -Os.
>
> ...is available for clang.
>
> I tried with hopping to turn off "global common subexpression elimination":
>
> diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
> index 383c87300b0d..92f934a1e9ff 100644
> --- a/arch/x86/net/Makefile
> +++ b/arch/x86/net/Makefile
> @@ -3,6 +3,8 @@
> # Arch-specific network modules
> #
>
> +KBUILD_CFLAGS += -O0
This won't work. First, you added to the wrong file. The interpreter
is at kernel/bpf/core.c.
Second, kernel may have compilation issues with -O0.
> +
> ifeq ($(CONFIG_X86_32),y)
> obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o
> else
>
> Still see...
> BROKEN: test_bpf: #294 BPF_MAXINSNS: Jump, gap, jump, ... jited:0
>
> - Sedat -
>
^ permalink raw reply
* [PATCH net-next 1/3] mlxsw: spectrum_flower: Forbid to offload mirred redirect on egress
From: Ido Schimmel @ 2019-07-27 17:32 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190727173257.6848-1-idosch@idosch.org>
From: Jiri Pirko <jiri@mellanox.com>
Spectrum ASIC does not support redirection on egress, so refuse to
insert such flows:
$ tc qdisc add dev ens16np1 clsact
$ tc filter add dev ens16np1 egress protocol all pref 1 handle 101 flower skip_sw action mirred egress redirect dev ens16np2
Error: mlxsw_spectrum: Redirect action is not supported on egress.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index 202e9a246019..1eeac8a36ead 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -78,6 +78,11 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fid *fid;
u16 fid_index;
+ if (mlxsw_sp_acl_block_is_egress_bound(block)) {
+ NL_SET_ERR_MSG_MOD(extack, "Redirect action is not supported on egress");
+ return -EOPNOTSUPP;
+ }
+
fid = mlxsw_sp_acl_dummy_fid(mlxsw_sp);
fid_index = mlxsw_sp_fid_index(fid);
err = mlxsw_sp_acl_rulei_act_fid_set(mlxsw_sp, rulei,
--
2.21.0
^ permalink raw reply related
* [PATCH net-next 2/3] mlxsw: spectrum_acl: Track rules that forbid egress block bind
From: Ido Schimmel @ 2019-07-27 17:32 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190727173257.6848-1-idosch@idosch.org>
From: Jiri Pirko <jiri@mellanox.com>
Some matches and actions are not supported on egress. Track such rules
and forbid a bind of block which contains them to egress.
With this patch, the kernel tells the user he cannot do that:
$ tc qdisc add dev ens16np1 ingress_block 22 clsact
$ tc filter add block 22 protocol 802.1q pref 2 handle 101 flower vlan_id 100 skip_sw action pass
$ tc qdisc add dev ens16np2 egress_block 22 clsact
Error: mlxsw_spectrum: Block cannot be bound to egress because it contains unsupported rules.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 +-
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 7 +++++--
.../net/ethernet/mellanox/mlxsw/spectrum_acl.c | 17 +++++++++++++----
.../ethernet/mellanox/mlxsw/spectrum_flower.c | 11 +++++++++++
4 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 7e8a54068d92..9277b3f125e8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1625,7 +1625,7 @@ mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
}
flow_block_cb_incref(block_cb);
err = mlxsw_sp_acl_block_bind(mlxsw_sp, acl_block,
- mlxsw_sp_port, ingress);
+ mlxsw_sp_port, ingress, f->extack);
if (err)
goto err_block_bind;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 131f62ce9297..c78d93afbb9d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -623,7 +623,8 @@ struct mlxsw_sp_acl_rule_info {
unsigned int priority;
struct mlxsw_afk_element_values values;
struct mlxsw_afa_block *act_block;
- u8 action_created:1;
+ u8 action_created:1,
+ egress_bind_blocker:1;
unsigned int counter_index;
};
@@ -642,6 +643,7 @@ struct mlxsw_sp_acl_block {
struct mlxsw_sp *mlxsw_sp;
unsigned int rule_count;
unsigned int disable_count;
+ unsigned int egress_blocker_rule_count;
struct net *net;
};
@@ -657,7 +659,8 @@ void mlxsw_sp_acl_block_destroy(struct mlxsw_sp_acl_block *block);
int mlxsw_sp_acl_block_bind(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_acl_block *block,
struct mlxsw_sp_port *mlxsw_sp_port,
- bool ingress);
+ bool ingress,
+ struct netlink_ext_ack *extack);
int mlxsw_sp_acl_block_unbind(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_acl_block *block,
struct mlxsw_sp_port *mlxsw_sp_port,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
index e8ac90564dbe..1aaab8446270 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
@@ -239,7 +239,8 @@ mlxsw_sp_acl_block_lookup(struct mlxsw_sp_acl_block *block,
int mlxsw_sp_acl_block_bind(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_acl_block *block,
struct mlxsw_sp_port *mlxsw_sp_port,
- bool ingress)
+ bool ingress,
+ struct netlink_ext_ack *extack)
{
struct mlxsw_sp_acl_block_binding *binding;
int err;
@@ -247,6 +248,11 @@ int mlxsw_sp_acl_block_bind(struct mlxsw_sp *mlxsw_sp,
if (WARN_ON(mlxsw_sp_acl_block_lookup(block, mlxsw_sp_port, ingress)))
return -EEXIST;
+ if (!ingress && block->egress_blocker_rule_count) {
+ NL_SET_ERR_MSG_MOD(extack, "Block cannot be bound to egress because it contains unsupported rules");
+ return -EOPNOTSUPP;
+ }
+
binding = kzalloc(sizeof(*binding), GFP_KERNEL);
if (!binding)
return -ENOMEM;
@@ -672,6 +678,7 @@ int mlxsw_sp_acl_rule_add(struct mlxsw_sp *mlxsw_sp,
{
struct mlxsw_sp_acl_ruleset *ruleset = rule->ruleset;
const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
+ struct mlxsw_sp_acl_block *block = ruleset->ht_key.block;
int err;
err = ops->rule_add(mlxsw_sp, ruleset->priv, rule->priv, rule->rulei);
@@ -689,14 +696,14 @@ int mlxsw_sp_acl_rule_add(struct mlxsw_sp *mlxsw_sp,
* one, to be directly bound to device. The rest of the
* rulesets are bound by "Goto action set".
*/
- err = mlxsw_sp_acl_ruleset_block_bind(mlxsw_sp, ruleset,
- ruleset->ht_key.block);
+ err = mlxsw_sp_acl_ruleset_block_bind(mlxsw_sp, ruleset, block);
if (err)
goto err_ruleset_block_bind;
}
list_add_tail(&rule->list, &mlxsw_sp->acl->rules);
- ruleset->ht_key.block->rule_count++;
+ block->rule_count++;
+ block->egress_blocker_rule_count += rule->rulei->egress_bind_blocker;
return 0;
err_ruleset_block_bind:
@@ -712,7 +719,9 @@ void mlxsw_sp_acl_rule_del(struct mlxsw_sp *mlxsw_sp,
{
struct mlxsw_sp_acl_ruleset *ruleset = rule->ruleset;
const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
+ struct mlxsw_sp_acl_block *block = ruleset->ht_key.block;
+ block->egress_blocker_rule_count -= rule->rulei->egress_bind_blocker;
ruleset->ht_key.block->rule_count--;
list_del(&rule->list);
if (!ruleset->ht_key.chain_index &&
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index 1eeac8a36ead..c86d582dafbe 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -83,6 +83,11 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp,
return -EOPNOTSUPP;
}
+ /* Forbid block with this rulei to be bound
+ * to egress in future.
+ */
+ rulei->egress_bind_blocker = 1;
+
fid = mlxsw_sp_acl_dummy_fid(mlxsw_sp);
fid_index = mlxsw_sp_fid_index(fid);
err = mlxsw_sp_acl_rulei_act_fid_set(mlxsw_sp, rulei,
@@ -395,6 +400,12 @@ static int mlxsw_sp_flower_parse(struct mlxsw_sp *mlxsw_sp,
NL_SET_ERR_MSG_MOD(f->common.extack, "vlan_id key is not supported on egress");
return -EOPNOTSUPP;
}
+
+ /* Forbid block with this rulei to be bound
+ * to egress in future.
+ */
+ rulei->egress_bind_blocker = 1;
+
if (match.mask->vlan_id != 0)
mlxsw_sp_acl_rulei_keymask_u32(rulei,
MLXSW_AFK_ELEMENT_VID,
--
2.21.0
^ permalink raw reply related
* [PATCH net-next 0/3] mlxsw: spectrum_acl: Forbid unsupported filters
From: Ido Schimmel @ 2019-07-27 17:32 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
From: Ido Schimmel <idosch@mellanox.com>
Patches #1-#2 make mlxsw reject unsupported egress filters. These
include filters that match on VLAN and filters associated with a
redirect action. Patch #1 rejects such filters when they are configured
on egress and patch #2 rejects such filters when they are configured in
a shared block that user tries to bind to egress.
Patch #3 forbids matching on reserved TCP flags as this is not supported
by the current keys that mlxsw uses.
Jiri Pirko (3):
mlxsw: spectrum_flower: Forbid to offload mirred redirect on egress
mlxsw: spectrum_acl: Track rules that forbid egress block bind
mlxsw: spectrum_flower: Forbid to offload match on reserved TCP flags
bits
.../net/ethernet/mellanox/mlxsw/spectrum.c | 2 +-
.../net/ethernet/mellanox/mlxsw/spectrum.h | 7 ++++--
.../ethernet/mellanox/mlxsw/spectrum_acl.c | 17 ++++++++++----
.../ethernet/mellanox/mlxsw/spectrum_flower.c | 22 +++++++++++++++++++
4 files changed, 41 insertions(+), 7 deletions(-)
--
2.21.0
^ permalink raw reply
* [PATCH net-next 3/3] mlxsw: spectrum_flower: Forbid to offload match on reserved TCP flags bits
From: Ido Schimmel @ 2019-07-27 17:32 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190727173257.6848-1-idosch@idosch.org>
From: Jiri Pirko <jiri@mellanox.com>
Matching on reserved TCP flags bits is only supported using custom
parser. Since the usecase for that is not known now, just forbid to
offload rules that match on these bits.
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index c86d582dafbe..0ad1a24abfc6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -267,6 +267,12 @@ static int mlxsw_sp_flower_parse_tcp(struct mlxsw_sp *mlxsw_sp,
flow_rule_match_tcp(rule, &match);
+ if (match.mask->flags & htons(0x0E00)) {
+ NL_SET_ERR_MSG_MOD(f->common.extack, "TCP flags match not supported on reserved bits");
+ dev_err(mlxsw_sp->bus_info->dev, "TCP flags match not supported on reserved bits\n");
+ return -EINVAL;
+ }
+
mlxsw_sp_acl_rulei_keymask_u32(rulei, MLXSW_AFK_ELEMENT_TCP_FLAGS,
ntohs(match.key->flags),
ntohs(match.mask->flags));
--
2.21.0
^ permalink raw reply related
* [PATCH net] mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled
From: Ido Schimmel @ 2019-07-27 17:35 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel
From: Petr Machata <petrm@mellanox.com>
Spectrum systems have a configurable limit on how far into the packet they
parse. By default, the limit is 96 bytes.
An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and
sequence ID of a PTP event is only available 32 bytes into payload, for a
total of 94 bytes. When an additional 802.1q header is present as
well (such as when ptp4l is running on a VLAN port), the parsing limit is
exceeded. Such packets are not recognized as PTP, and are not timestamped.
Therefore generalize the current VXLAN-specific parsing depth setting to
allow reference-counted requests from other modules as well. Keep it in the
VXLAN module, because the MPRS register also configures UDP destination
port number used for VXLAN, and is thus closely tied to the VXLAN code
anyway.
Then invoke the new interfaces from both VXLAN (in obvious places), as well
as from PTP code, when the (global) timestamping configuration changes from
disabled to enabled or vice versa.
Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum.h | 4 +
.../ethernet/mellanox/mlxsw/spectrum_nve.c | 1 +
.../ethernet/mellanox/mlxsw/spectrum_nve.h | 1 +
.../mellanox/mlxsw/spectrum_nve_vxlan.c | 76 ++++++++++++++-----
.../ethernet/mellanox/mlxsw/spectrum_ptp.c | 17 +++++
5 files changed, 82 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 131f62ce9297..6664119fb0c8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -951,4 +951,8 @@ void mlxsw_sp_port_nve_fini(struct mlxsw_sp_port *mlxsw_sp_port);
int mlxsw_sp_nve_init(struct mlxsw_sp *mlxsw_sp);
void mlxsw_sp_nve_fini(struct mlxsw_sp *mlxsw_sp);
+/* spectrum_nve_vxlan.c */
+int mlxsw_sp_nve_inc_parsing_depth_get(struct mlxsw_sp *mlxsw_sp);
+void mlxsw_sp_nve_inc_parsing_depth_put(struct mlxsw_sp *mlxsw_sp);
+
#endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index 1df164a4b06d..17f334b46c40 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -775,6 +775,7 @@ static void mlxsw_sp_nve_tunnel_fini(struct mlxsw_sp *mlxsw_sp)
ops->fini(nve);
mlxsw_sp_kvdl_free(mlxsw_sp, MLXSW_SP_KVDL_ENTRY_TYPE_ADJ, 1,
nve->tunnel_index);
+ memset(&nve->config, 0, sizeof(nve->config));
}
nve->num_nve_tunnels--;
}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
index 0035640156a1..12f664f42f21 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
@@ -29,6 +29,7 @@ struct mlxsw_sp_nve {
unsigned int num_max_mc_entries[MLXSW_SP_L3_PROTO_MAX];
u32 tunnel_index;
u16 ul_rif_index; /* Reserved for Spectrum */
+ unsigned int inc_parsing_depth_refs;
};
struct mlxsw_sp_nve_ops {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index 93ccd9fc2266..05517c7feaa5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -103,9 +103,9 @@ static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve,
config->udp_dport = cfg->dst_port;
}
-static int mlxsw_sp_nve_parsing_set(struct mlxsw_sp *mlxsw_sp,
- unsigned int parsing_depth,
- __be16 udp_dport)
+static int __mlxsw_sp_nve_parsing_set(struct mlxsw_sp *mlxsw_sp,
+ unsigned int parsing_depth,
+ __be16 udp_dport)
{
char mprs_pl[MLXSW_REG_MPRS_LEN];
@@ -113,6 +113,56 @@ static int mlxsw_sp_nve_parsing_set(struct mlxsw_sp *mlxsw_sp,
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mprs), mprs_pl);
}
+static int mlxsw_sp_nve_parsing_set(struct mlxsw_sp *mlxsw_sp,
+ __be16 udp_dport)
+{
+ int parsing_depth = mlxsw_sp->nve->inc_parsing_depth_refs ?
+ MLXSW_SP_NVE_VXLAN_PARSING_DEPTH :
+ MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH;
+
+ return __mlxsw_sp_nve_parsing_set(mlxsw_sp, parsing_depth, udp_dport);
+}
+
+static int
+__mlxsw_sp_nve_inc_parsing_depth_get(struct mlxsw_sp *mlxsw_sp,
+ __be16 udp_dport)
+{
+ int err;
+
+ mlxsw_sp->nve->inc_parsing_depth_refs++;
+
+ err = mlxsw_sp_nve_parsing_set(mlxsw_sp, udp_dport);
+ if (err)
+ goto err_nve_parsing_set;
+ return 0;
+
+err_nve_parsing_set:
+ mlxsw_sp->nve->inc_parsing_depth_refs--;
+ return err;
+}
+
+static void
+__mlxsw_sp_nve_inc_parsing_depth_put(struct mlxsw_sp *mlxsw_sp,
+ __be16 udp_dport)
+{
+ mlxsw_sp->nve->inc_parsing_depth_refs--;
+ mlxsw_sp_nve_parsing_set(mlxsw_sp, udp_dport);
+}
+
+int mlxsw_sp_nve_inc_parsing_depth_get(struct mlxsw_sp *mlxsw_sp)
+{
+ __be16 udp_dport = mlxsw_sp->nve->config.udp_dport;
+
+ return __mlxsw_sp_nve_inc_parsing_depth_get(mlxsw_sp, udp_dport);
+}
+
+void mlxsw_sp_nve_inc_parsing_depth_put(struct mlxsw_sp *mlxsw_sp)
+{
+ __be16 udp_dport = mlxsw_sp->nve->config.udp_dport;
+
+ __mlxsw_sp_nve_inc_parsing_depth_put(mlxsw_sp, udp_dport);
+}
+
static void
mlxsw_sp_nve_vxlan_config_prepare(char *tngcr_pl,
const struct mlxsw_sp_nve_config *config)
@@ -176,9 +226,7 @@ static int mlxsw_sp1_nve_vxlan_init(struct mlxsw_sp_nve *nve,
struct mlxsw_sp *mlxsw_sp = nve->mlxsw_sp;
int err;
- err = mlxsw_sp_nve_parsing_set(mlxsw_sp,
- MLXSW_SP_NVE_VXLAN_PARSING_DEPTH,
- config->udp_dport);
+ err = __mlxsw_sp_nve_inc_parsing_depth_get(mlxsw_sp, config->udp_dport);
if (err)
return err;
@@ -203,8 +251,7 @@ static int mlxsw_sp1_nve_vxlan_init(struct mlxsw_sp_nve *nve,
err_rtdp_set:
mlxsw_sp1_nve_vxlan_config_clear(mlxsw_sp);
err_config_set:
- mlxsw_sp_nve_parsing_set(mlxsw_sp, MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH,
- config->udp_dport);
+ __mlxsw_sp_nve_inc_parsing_depth_put(mlxsw_sp, 0);
return err;
}
@@ -216,8 +263,7 @@ static void mlxsw_sp1_nve_vxlan_fini(struct mlxsw_sp_nve *nve)
mlxsw_sp_router_nve_demote_decap(mlxsw_sp, config->ul_tb_id,
config->ul_proto, &config->ul_sip);
mlxsw_sp1_nve_vxlan_config_clear(mlxsw_sp);
- mlxsw_sp_nve_parsing_set(mlxsw_sp, MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH,
- config->udp_dport);
+ __mlxsw_sp_nve_inc_parsing_depth_put(mlxsw_sp, 0);
}
static int
@@ -320,9 +366,7 @@ static int mlxsw_sp2_nve_vxlan_init(struct mlxsw_sp_nve *nve,
struct mlxsw_sp *mlxsw_sp = nve->mlxsw_sp;
int err;
- err = mlxsw_sp_nve_parsing_set(mlxsw_sp,
- MLXSW_SP_NVE_VXLAN_PARSING_DEPTH,
- config->udp_dport);
+ err = __mlxsw_sp_nve_inc_parsing_depth_get(mlxsw_sp, config->udp_dport);
if (err)
return err;
@@ -348,8 +392,7 @@ static int mlxsw_sp2_nve_vxlan_init(struct mlxsw_sp_nve *nve,
err_rtdp_set:
mlxsw_sp2_nve_vxlan_config_clear(mlxsw_sp);
err_config_set:
- mlxsw_sp_nve_parsing_set(mlxsw_sp, MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH,
- config->udp_dport);
+ __mlxsw_sp_nve_inc_parsing_depth_put(mlxsw_sp, 0);
return err;
}
@@ -361,8 +404,7 @@ static void mlxsw_sp2_nve_vxlan_fini(struct mlxsw_sp_nve *nve)
mlxsw_sp_router_nve_demote_decap(mlxsw_sp, config->ul_tb_id,
config->ul_proto, &config->ul_sip);
mlxsw_sp2_nve_vxlan_config_clear(mlxsw_sp);
- mlxsw_sp_nve_parsing_set(mlxsw_sp, MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH,
- config->udp_dport);
+ __mlxsw_sp_nve_inc_parsing_depth_put(mlxsw_sp, 0);
}
const struct mlxsw_sp_nve_ops mlxsw_sp2_nve_vxlan_ops = {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c
index bd9c2bc2d5d6..4b352a71f76e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c
@@ -979,19 +979,36 @@ static int mlxsw_sp1_ptp_mtpppc_update(struct mlxsw_sp_port *mlxsw_sp_port,
{
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
struct mlxsw_sp_port *tmp;
+ u16 orig_ing_types = 0;
+ u16 orig_egr_types = 0;
int i;
+ int err;
/* MTPPPC configures timestamping globally, not per port. Find the
* configuration that contains all configured timestamping requests.
*/
for (i = 1; i < mlxsw_core_max_ports(mlxsw_sp->core); i++) {
tmp = mlxsw_sp->ports[i];
+ if (tmp) {
+ orig_ing_types |= tmp->ptp.ing_types;
+ orig_egr_types |= tmp->ptp.egr_types;
+ }
if (tmp && tmp != mlxsw_sp_port) {
ing_types |= tmp->ptp.ing_types;
egr_types |= tmp->ptp.egr_types;
}
}
+ if ((ing_types || egr_types) && !(orig_egr_types || orig_egr_types)) {
+ err = mlxsw_sp_nve_inc_parsing_depth_get(mlxsw_sp);
+ if (err) {
+ netdev_err(mlxsw_sp_port->dev, "Failed to increase parsing depth");
+ return err;
+ }
+ }
+ if (!(ing_types || egr_types) && (orig_egr_types || orig_egr_types))
+ mlxsw_sp_nve_inc_parsing_depth_put(mlxsw_sp);
+
return mlxsw_sp1_ptp_mtpppc_set(mlxsw_sp_port->mlxsw_sp,
ing_types, egr_types);
}
--
2.21.0
^ permalink raw reply related
* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Yonghong Song @ 2019-07-27 17:54 UTC (permalink / raw)
To: Brian Vazquez
Cc: Alexei Starovoitov, Song Liu, Brian Vazquez, Daniel Borkmann,
David S . Miller, Stanislav Fomichev, Willem de Bruijn,
Petar Penkov, Networking, bpf
In-Reply-To: <CABCgpaVB+iDGO132d9CTtC_GYiKJuuL6pe5_Krm3-THgvfMO=A@mail.gmail.com>
On 7/26/19 4:36 PM, Brian Vazquez wrote:
> On Thu, Jul 25, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/25/19 6:47 PM, Alexei Starovoitov wrote:
>>> On Thu, Jul 25, 2019 at 6:24 PM Brian Vazquez <brianvv.kernel@gmail.com> wrote:
>>>>
>>>> On Thu, Jul 25, 2019 at 4:54 PM Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>
>>>>> On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
>>>>>>>>> If prev_key is deleted before map_get_next_key(), we get the first key
>>>>>>>>> again. This is pretty weird.
>>>>>>>>
>>>>>>>> Yes, I know. But note that the current scenario happens even for the
>>>>>>>> old interface (imagine you are walking a map from userspace and you
>>>>>>>> tried get_next_key the prev_key was removed, you will start again from
>>>>>>>> the beginning without noticing it).
>>>>>>>> I tried to sent a patch in the past but I was missing some context:
>>>>>>>> before NULL was used to get the very first_key the interface relied in
>>>>>>>> a random (non existent) key to retrieve the first_key in the map, and
>>>>>>>> I was told what we still have to support that scenario.
>>>>>>>
>>>>>>> BPF_MAP_DUMP is slightly different, as you may return the first key
>>>>>>> multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
>>>>>>> don't have to support legacy scenarios.
>>>>>>>
>>>>>>> Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
>>>>>>> to look up previous keys. Would something down this direction work?
>>>>>>
>>>>>> I've been thinking about it and I think first we need a way to detect
>>>>>> that since key was not present we got the first_key instead:
>>>>>>
>>>>>> - One solution I had in mind was to explicitly asked for the first key
>>>>>> with map_get_next_key(map, NULL, first_key) and while walking the map
>>>>>> check that map_get_next_key(map, prev_key, key) doesn't return the
>>>>>> same key. This could be done using memcmp.
>>>>>> - Discussing with Stan, he mentioned that another option is to support
>>>>>> a flag in map_get_next_key to let it know that we want an error
>>>>>> instead of the first_key.
>>>>>>
>>>>>> After detecting the problem we also need to define what we want to do,
>>>>>> here some options:
>>>>>>
>>>>>> a) Return the error to the caller
>>>>>> b) Try with previous keys if any (which be limited to the keys that we
>>>>>> have traversed so far in this dump call)
>>>>>> c) continue with next entries in the map. array is easy just get the
>>>>>> next valid key (starting on i+1), but hmap might be difficult since
>>>>>> starting on the next bucket could potentially skip some keys that were
>>>>>> concurrently added to the same bucket where key used to be, and
>>>>>> starting on the same bucket could lead us to return repeated elements.
>>>>>>
>>>>>> Or maybe we could support those 3 cases via flags and let the caller
>>>>>> decide which one to use?
>>>>>
>>>>> this type of indecision is the reason why I wasn't excited about
>>>>> batch dumping in the first place and gave 'soft yes' when Stan
>>>>> mentioned it during lsf/mm/bpf uconf.
>>>>> We probably shouldn't do it.
>>>>> It feels this map_dump makes api more complex and doesn't really
>>>>> give much benefit to the user other than large map dump becomes faster.
>>>>> I think we gotta solve this problem differently.
>>>>
>>>> Some users are working around the dumping problems with the existing
>>>> api by creating a bpf_map_get_next_key_and_delete userspace function
>>>> (see https://urldefense.proofpoint.com/v2/url?u=https-3A__www.bouncybouncy.net_blog_bpf-5Fmap-5Fget-5Fnext-5Fkey-2Dpitfalls_&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=XvNxqsDhRi62gzZ04HbLRTOFJX8X6mTuK7PZGn80akY&s=7q7beZxOJJ3Q0el8L0r-xDctedSpnEejJ6PVX1XYotQ&e= )
>>>> which in my opinion is actually a good idea. The only problem with
>>>> that is that calling bpf_map_get_next_key(fd, key, next_key) and then
>>>> bpf_map_delete_elem(fd, key) from userspace is racing with kernel code
>>>> and it might lose some information when deleting.
>>>> We could then do map_dump_and_delete using that idea but in the kernel
>>>> where we could better handle the racing condition. In that scenario
>>>> even if we retrieve the same key it will contain different info ( the
>>>> delta between old and new value). Would that work?
>>>
>>> you mean get_next+lookup+delete at once?
>>> Sounds useful.
>>> Yonghong has been thinking about batching api as well.
>>
>> In bcc, we have many instances like this:
>> getting all (key value) pairs, do some analysis and output,
>> delete all keys
>>
>> The implementation typically like
>> /* to get all (key, value) pairs */
>> while(bpf_get_next_key() == 0)
>> bpf_map_lookup()
>> /* do analysis and output */
>> for (all keys)
>> bpf_map_delete()
>
> If you do that in a map that is being modified while you are doing the
> analysis and output, you will lose some new data by deleting the keys,
> right?
Agreed, it is possible that if the same keys are reused to generate data
during analysis and output period, we will miss them by deleting them.
From that perspective, your above approach
while (bpf_get_next_key())
bpf_map_delete(prev_key)
bpf_map_lookup()
reset prev_keey
should provide a better alternative.
>
>> get_next+lookup+delete will be definitely useful.
>> batching will be even better to save the number of syscalls.
>>
>> An alternative is to do batch get_next+lookup and batch delete
>> to achieve similar goal as the above code.
>
> What I mentioned above is what it makes me think that with the
> deletion it'd be better if we perform these 3 operations at once:
> get_next+lookup+delete in a jumbo/atomic command and batch them later?
Agree. This is indeed the one most useful for bcc use case as well.
>
>>
>> There is a minor difference between this approach
>> and the above get_next+lookup+delete.
>> During scanning the hash map, get_next+lookup may get less number
>> of elements compared to get_next+lookup+delete as the latter
>> may have more later-inserted hash elements after the operation
>> start. But both are inaccurate, so probably the difference
>> is minor.
>>
>>>
>>> I think if we cannot figure out how to make a batch of two commands
>>> get_next + lookup to work correctly then we need to identify/invent one
>>> command and make batching more generic.
>>
>> not 100% sure. It will be hard to define what is "correctly".
>
> I agree, it'll be hard to define what is the right behavior.
>
>> For not changing map, looping of (get_next, lookup) and batch
>> get_next+lookup should have the same results.
>
> This is true for the api I'm presenting the only think that I was
> missing was what to do for changing maps to avoid the weird scenario
> (getting the first key due a concurrent deletion). And, in my opinion
> the way to go should be what also Willem supported: return the err to
> the caller and restart the dumping. I could do this with existing code
> just by detecting that we do provide a prev_key and got the first_key
> instead of the next_key or even implement a new function if you want
> to.
Always starting from the first key has its drawback as we keep getting
the new elements if they are constantly populated. This may skew
the results for a large hash table.
Maybe we can just do lookup+delete or batch lookup+delete?
user gives NULL means the first key to lookup/delete.
Every (batch) lookup+delete will deletes one or a set of keys.
The set of keys are retrieved using internal get_next .
The (batch) lookup+delete will return next available key, which
user can be used for next (batch) lookup+delete.
If user provided key does not match, user can provide a flag
to go to the first key, or return an error.
>
>> For constant changing loops, not sure how to define which one
>> is correct. If users have concerns, they may need to just pick one
>> which gives them more comfort.
>>
>>> Like make one jumbo/compound/atomic command to be get_next+lookup+delete.
>>> Define the semantics of this single compound command.
>>> And then let batching to be a multiplier of such command.
>>> In a sense that multiplier 1 or N should be have the same way.
>>> No extra flags to alter the batching.
>>> The high level description of the batch would be:
>>> pls execute get_next,lookup,delete and repeat it N times.
>>> or
>>> pls execute get_next,lookup and repeat N times.
>
> But any attempt to do get_next+lookup will have same problem with
> deletions right?
>
> I don't see how we could do it more consistent than what I'm
> proposing. Let's just support one case: report an error if the
> prev_key was not found instead of retrieving the first_key. Would that
> work?
>
>>> where each command action is defined to be composable.
>>>
>>> Just a rough idea.
>>>
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-07-27 18:20 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Andy Lutomirski, Kees Cook, linux-security@vger.kernel.org,
Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
Lorenz Bauer, Jann Horn, Greg KH, Linux API
In-Reply-To: <7F51F8B8-CF4C-4D82-AAE1-F0F28951DB7F@fb.com>
Hi Andy,
>>>>
>>>
>>> Well, yes. sys_bpf() is pretty powerful.
>>>
>>> The goal of /dev/bpf is to enable special users to call sys_bpf(). In
>>> the meanwhile, such users should not take down the whole system easily
>>> by accident, e.g., with rm -rf /.
>>
>> That’s easy, though — bpftool could learn to read /etc/bpfusers before allowing ruid != 0.
>
> This is a great idea! fscaps + /etc/bpfusers should do the trick.
After some discussions and more thinking on this, I have some concerns
with the user space only approach.
IIUC, your proposal for user space only approach is like:
1. bpftool (and other tools) check /etc/bpfusers and only do
setuid for allowed users:
int main()
{
if (/* uid in /etc/bpfusers */)
setuid(0);
sys_bpf(...);
}
2. bpftool (and other tools) is installed with CAP_SETUID:
setcap cap_setuid=e+p /bin/bpftool
3. sys admin maintains proper /etc/bpfusers.
This approach is not ideal, because we need to trust the tool to give
it CAP_SETUID. A hacked tool could easily bypass /etc/bpfusers check
or use other root only sys calls after setuid(0).
Does this make sense? (Or did I misunderstand anything?)
Thanks,
Song
^ permalink raw reply
* Re: [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Andrii Nakryiko @ 2019-07-27 18:24 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <957fff81-d845-ebc9-0e80-dbb1f1736b40@fb.com>
On Sat, Jul 27, 2019 at 10:00 AM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 7/26/19 11:25 PM, Andrii Nakryiko wrote:
> >>> + } else if (class == BPF_ST && BPF_MODE(insn->code) == BPF_MEM) {
> >>> + if (insn->imm != orig_off)
> >>> + return -EINVAL;
> >>> + insn->imm = new_off;
> >>> + pr_debug("prog '%s': patched insn #%d (ST | MEM) imm %d -> %d\n",
> >>> + bpf_program__title(prog, false),
> >>> + insn_idx, orig_off, new_off);
> >> I'm pretty sure llvm was not capable of emitting BPF_ST insn.
> >> When did that change?
> > I just looked at possible instructions that could have 32-bit
> > immediate value. This is `*(rX) = offsetof(struct s, field)`, which I
> > though is conceivable. Do you think I should drop it?
>
> Just trying to point out that since it's not emitted by llvm
> this code is likely untested ?
> Or you've created a bpf asm test for this?
Yeah, it's untested right now. Let me try to come up with LLVM
assembly + relocation (not yet sure how/whether builtin works with
inline assembly), if that works out, I'll leave this, if not, I'll
drop BPF_ST|BPF_MEM part.
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox