* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-23 14:28 UTC (permalink / raw)
To: Parav Pandit
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866DED407D6F1C653D5D560D1A40@AM0PR05MB4866.eurprd05.prod.outlook.com>
On Fri, 23 Aug 2019 08:14:39 +0000
Parav Pandit <parav@mellanox.com> wrote:
> Hi Alex,
>
>
> > -----Original Message-----
> > From: Jiri Pirko <jiri@resnulli.us>
> > Sent: Friday, August 23, 2019 1:42 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>; Kirti
> > Wankhede <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia <cjia@nvidia.com>;
> > netdev@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> >
> > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Jiri Pirko <jiri@resnulli.us>
> > >> Sent: Thursday, August 22, 2019 5:50 PM
> > >> To: Parav Pandit <parav@mellanox.com>
> > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>; Kirti
> > >> Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > <cohuck@redhat.com>;
> > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >>
> > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
> > >> >
> > >> >
> > >> >> -----Original Message-----
> > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > >> >> To: Parav Pandit <parav@mellanox.com>
> > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > >> >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>; Kirti
> > >> >> Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > >> <cohuck@redhat.com>;
> > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >> >>
> > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com wrote:
> > >> >> >
> > >> >> >
> > >> >> >> -----Original Message-----
> > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > >> >> >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > >> >> >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > >> >> <cohuck@redhat.com>;
> > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >> >> >>
> > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST, parav@mellanox.com wrote:
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >> -----Original Message-----
> > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > >> >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > >> >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > >> >> >> >> core
> > >> >> >> >>
> > >> >> >> >> > > > > Just an example of the alias, not proposing how it's set.
> > >> >> >> >> > > > > In fact, proposing that the user does not set it,
> > >> >> >> >> > > > > mdev-core provides one
> > >> >> >> >> > > automatically.
> > >> >> >> >> > > > >
> > >> >> >> >> > > > > > > Since there seems to be some prefix overhead, as
> > >> >> >> >> > > > > > > I ask about above in how many characters we
> > >> >> >> >> > > > > > > actually have to work with in IFNAMESZ, maybe we
> > >> >> >> >> > > > > > > start with
> > >> >> >> >> > > > > > > 8 characters (matching your "index" namespace)
> > >> >> >> >> > > > > > > and expand as necessary for
> > >> >> >> disambiguation.
> > >> >> >> >> > > > > > > If we can eliminate overhead in IFNAMESZ, let's
> > >> >> >> >> > > > > > > start with
> > >> 12.
> > >> >> >> >> > > > > > > Thanks,
> > >> >> >> >> > > > > > >
> > >> >> >> >> > > > > > If user is going to choose the alias, why does it
> > >> >> >> >> > > > > > have to be limited to
> > >> >> >> >> sha1?
> > >> >> >> >> > > > > > Or you just told it as an example?
> > >> >> >> >> > > > > >
> > >> >> >> >> > > > > > It can be an alpha-numeric string.
> > >> >> >> >> > > > >
> > >> >> >> >> > > > > No, I'm proposing a different solution where
> > >> >> >> >> > > > > mdev-core creates an alias based on an abbreviated
> > >> >> >> >> > > > > sha1. The user does not provide the
> > >> >> >> >> alias.
> > >> >> >> >> > > > >
> > >> >> >> >> > > > > > Instead of mdev imposing number of characters on
> > >> >> >> >> > > > > > the alias, it should be best
> > >> >> >> >> > > > > left to the user.
> > >> >> >> >> > > > > > Because in future if netdev improves on the naming
> > >> >> >> >> > > > > > scheme, mdev will be
> > >> >> >> >> > > > > limiting it, which is not right.
> > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > >> >> >> >> > > > > > User configuring mdev for networking devices in a
> > >> >> >> >> > > > > > given kernel knows what
> > >> >> >> >> > > > > user is doing.
> > >> >> >> >> > > > > > So user can choose alias name size as it finds suitable.
> > >> >> >> >> > > > >
> > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > >> >> >> >> > > > > Thanks,
> > >> >> >> >> > > >
> > >> >> >> >> > > > I understood your point. But mdev doesn't know how
> > >> >> >> >> > > > user is going to use
> > >> >> >> >> > > udev/systemd to name the netdev.
> > >> >> >> >> > > > So even if mdev chose to pick 12 characters, it could
> > >> >> >> >> > > > result in
> > >> >> collision.
> > >> >> >> >> > > > Hence the proposal to provide the alias by the user,
> > >> >> >> >> > > > as user know the best
> > >> >> >> >> > > policy for its use case in the environment its using.
> > >> >> >> >> > > > So 12 character sha1 method will still work by user.
> > >> >> >> >> > >
> > >> >> >> >> > > Haven't you already provided examples where certain
> > >> >> >> >> > > drivers or subsystems have unique netdev prefixes? If
> > >> >> >> >> > > mdev provides a unique alias within the subsystem,
> > >> >> >> >> > > couldn't we simply define a netdev prefix for the mdev
> > >> >> >> >> > > subsystem and avoid all other collisions? I'm not in
> > >> >> >> >> > > favor of the user providing both a uuid and an
> > >> >> >> >> > > alias/instance. Thanks,
> > >> >> >> >> > >
> > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1 first 9
> > >> >> >> >> > characters have
> > >> >> >> >> collision?
> > >> >> >> >>
> > >> >> >> >> I think it would be a mistake to waste so many chars on a
> > >> >> >> >> prefix, but
> > >> >> >> >> 9 characters of sha1 likely wouldn't have a collision before
> > >> >> >> >> we have 10s of thousands of devices. Thanks,
> > >> >> >> >>
> > >> >> >> >> Alex
> > >> >> >> >
> > >> >> >> >Jiri, Dave,
> > >> >> >> >Are you ok with it for devlink/netdev part?
> > >> >> >> >Mdev core will create an alias from a UUID.
> > >> >> >> >
> > >> >> >> >This will be supplied during devlink port attr set such as,
> > >> >> >> >
> > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port, const
> > >> >> >> >char *mdev_alias);
> > >> >> >> >
> > >> >> >> >This alias is used to generate representor netdev's phys_port_name.
> > >> >> >> >This alias from the mdev device's sysfs will be used by the
> > >> >> >> >udev/systemd to
> > >> >> >> generate predicable netdev's name.
> > >> >> >> >Example: enm<mdev_alias_first_12_chars>
> > >> >> >>
> > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > >> >> >>
> > >> >> >Since users sees two devices with same phys_port_name, user
> > >> >> >should destroy
> > >> >> recently created mdev and recreate mdev with different UUID?
> > >> >>
> > >> >> Driver should make sure phys port name wont collide,
> > >> >So when mdev creation is initiated, mdev core calculates the alias
> > >> >and if there
> > >> is any other mdev with same alias exist, it returns -EEXIST error
> > >> before progressing further.
> > >> >This way user will get to know upfront in event of collision before
> > >> >the mdev
> > >> device gets created.
> > >> >How about that?
> > >>
> > >> Sounds fine to me. Now the question is how many chars do we want to have.
> > >>
> > >12 characters from Alex's suggestion similar to git?
> >
> > Ok.
> >
>
> Can you please confirm this scheme looks good now? I like to get patches started.
My only concern is your comment that in the event of an abbreviated
sha1 collision (as exceptionally rare as that might be at 12-chars),
we'd fail the device create, while my original suggestion was that
vfio-core would add an extra character to the alias. For
non-networking devices, the sha1 is unnecessary, so the extension
behavior seems preferred. The user is only responsible to provide a
unique uuid. Perhaps the failure behavior could be applied based on
the mdev device_api. A module option on mdev to specify the default
number of alias chars would also be useful for testing so that we can
set it low enough to validate the collision behavior. Thanks,
Alex
> > >> >> in this case that it does
> > >> >> not provide 2 same attrs for 2 different ports.
> > >> >> Hmm, so the order of creation matters. That is not good.
> > >> >>
> > >> >> >>
> > >> >> >> >I took Ethernet mdev as an example.
> > >> >> >> >New prefix 'm' stands for mediated device.
> > >> >> >> >Remaining 12 characters are first 12 chars of the mdev alias.
> > >> >> >>
> > >> >> >> Does this resolve the identification of devlink port representor?
> > >> >> >Not sure if I understood your question correctly, attemping to
> > >> >> >answer
> > >> below.
> > >> >> >phys_port_name of devlink port is defined by the first 12
> > >> >> >characters of mdev
> > >> >> alias.
> > >> >> >> I assume you want to use the same 12(or so) chars, don't you?
> > >> >> >Mdev's netdev will also use the same mdev alias from the sysfs to
> > >> >> >rename
> > >> >> netdev name from ethX to enm<mdev_alias>, where en=Etherenet,
> > >> m=mdev.
> > >> >> >
> > >> >> >So yes, same 12 characters are use for mdev's netdev and mdev
> > >> >> >devlink port's
> > >> >> phys_port_name.
> > >> >> >
> > >> >> >Is that what are you asking?
> > >> >>
> > >> >> Yes. Then you have 3 chars to handle the rest of the name (pci, pf)...
^ permalink raw reply
* WARNING in sk_msg_check_to_free
From: syzbot @ 2019-08-23 14:33 UTC (permalink / raw)
To: ast, bpf, daniel, davem, john.fastabend, kafai, linux-kernel,
netdev, songliubraving, syzkaller-bugs, yhs
Hello,
syzbot found the following crash on:
HEAD commit: fed07ef3 Merge tag 'mlx5-updates-2019-08-21' of git://git...
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=150102bc600000
kernel config: https://syzkaller.appspot.com/x/.config?x=e34a4fe936eac597
dashboard link: https://syzkaller.appspot.com/bug?extid=ea3c54a7b2364123d818
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ea3c54a7b2364123d818@syzkaller.appspotmail.com
------------[ cut here ]------------
WARNING: CPU: 0 PID: 14478 at include/linux/skmsg.h:129
sk_msg_check_to_free.isra.0.part.0+0x15/0x19 include/linux/skmsg.h:129
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 14478 Comm: syz-executor.0 Not tainted 5.3.0-rc5+ #143
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2dc/0x755 kernel/panic.c:219
__warn.cold+0x20/0x4c kernel/panic.c:576
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:179 [inline]
fixup_bug arch/x86/kernel/traps.c:174 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1028
RIP: 0010:sk_msg_check_to_free.isra.0.part.0+0x15/0x19
include/linux/skmsg.h:129
Code: 77 ff ff ff e8 4e de 03 fc eb 96 4c 89 f7 e8 e4 dd 03 fc eb c3 55 48
89 e5 e8 f9 b4 c9 fb 48 c7 c7 80 3a 48 88 e8 c1 55 b3 fb <0f> 0b 5d c3 e8
e4 b4 c9 fb e8 dd ff ff ff 48 8b 45 d0 0f b6 00 41
RSP: 0018:ffff88806381fb98 EFLAGS: 00010286
RAX: 0000000000000024 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815c2456 RDI: ffffed100c703f65
RBP: ffff88806381fb98 R08: 0000000000000024 R09: ffffed1015d060d1
R10: ffffed1015d060d0 R11: ffff8880ae830687 R12: 000000000000000d
R13: ffff8880986c1550 R14: 0000000000000001 R15: 0000000000000007
sk_msg_check_to_free include/linux/skmsg.h:129 [inline]
__sk_msg_free.cold+0xa/0x2e net/core/skmsg.c:190
sk_msg_free+0x44/0x60 net/core/skmsg.c:207
tls_sw_release_resources_tx+0x268/0x6b0 net/tls/tls_sw.c:2092
tls_sk_proto_cleanup net/tls/tls_main.c:275 [inline]
tls_sk_proto_close+0x6a7/0x990 net/tls/tls_main.c:305
inet_release+0xed/0x200 net/ipv4/af_inet.c:427
inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
__sock_release+0xce/0x280 net/socket.c:590
sock_close+0x1e/0x30 net/socket.c:1268
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413511
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffdca8135c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000413511
RDX: 0000001b31920000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000025038832 R09: 0000000025038836
R10: 00007ffdca8136a0 R11: 0000000000000293 R12: 000000000075bf20
R13: 000000000009f412 R14: 0000000000760b20 R15: ffffffffffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply
* Re: [PATCH net-next v4 0/2] r8152: save EEE
From: Andrew Lunn @ 2019-08-23 14:37 UTC (permalink / raw)
To: Hayes Wang; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-311-Taiwan-albertk@realtek.com>
On Fri, Aug 23, 2019 at 03:33:39PM +0800, Hayes Wang wrote:
> v4:
> For patch #2, remove redundant calling of "ocp_reg_write(tp, OCP_EEE_ADV, 0)".
>
> v3:
> For patch #2, fix the mistake caused by copying and pasting.
>
> v2:
> Adjust patch #1. The EEE has been disabled in the beginning of
> r8153_hw_phy_cfg() and r8153b_hw_phy_cfg(), so only check if
> it is necessary to enable EEE.
Hi Hayes
That was 3 revisions of the patches in less than 30 minutes. Slow
down, take your time, review your work yourself before posting it,
etc.
Aim for no more than one revision, posted to the list, per day.
Andrew
^ permalink raw reply
* KASAN: use-after-free Read in rxrpc_release_call
From: syzbot @ 2019-08-23 14:42 UTC (permalink / raw)
To: davem, dhowells, linux-afs, linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: fed07ef3 Merge tag 'mlx5-updates-2019-08-21' of git://git...
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1256e22e600000
kernel config: https://syzkaller.appspot.com/x/.config?x=e34a4fe936eac597
dashboard link: https://syzkaller.appspot.com/bug?extid=eed305768ece6682bb7f
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
==================================================================
BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
net/rxrpc/call_object.c:481
Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764
CPU: 1 PID: 4764 Comm: syz-executor.5 Not tainted 5.3.0-rc5+ #143
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
__kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
kasan_report+0x12/0x17 mm/kasan/common.c:612
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
rxrpc_release_call+0xb2d/0xb60 net/rxrpc/call_object.c:481
rxrpc_release_calls_on_socket+0x6e7/0x1320 net/rxrpc/call_object.c:517
rxrpc_release_sock net/rxrpc/af_rxrpc.c:898 [inline]
rxrpc_release+0x40c/0x840 net/rxrpc/af_rxrpc.c:930
__sock_release+0xce/0x280 net/socket.c:590
sock_close+0x1e/0x30 net/socket.c:1268
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe5ddebec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000459829
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe5ddebf6d4
R13: 00000000004f8f72 R14: 00000000004d1a70 R15: 00000000ffffffff
Allocated by task 4766:
save_stack+0x23/0x90 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc mm/kasan/common.c:487 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:460
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:501
kmem_cache_alloc_trace+0x158/0x790 mm/slab.c:3550
kmalloc include/linux/slab.h:552 [inline]
kzalloc include/linux/slab.h:748 [inline]
rxrpc_alloc_connection+0x86/0x5f0 net/rxrpc/conn_object.c:41
rxrpc_alloc_client_connection net/rxrpc/conn_client.c:176 [inline]
rxrpc_get_client_conn net/rxrpc/conn_client.c:339 [inline]
rxrpc_connect_call+0x648/0x4c00 net/rxrpc/conn_client.c:697
rxrpc_new_client_call+0x978/0x19d0 net/rxrpc/call_object.c:289
rxrpc_new_client_call_for_sendmsg net/rxrpc/sendmsg.c:594 [inline]
rxrpc_do_sendmsg+0xff5/0x1d53 net/rxrpc/sendmsg.c:651
rxrpc_sendmsg+0x4d6/0x5f0 net/rxrpc/af_rxrpc.c:585
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
__sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 16:
save_stack+0x23/0x90 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:449
kasan_slab_free+0xe/0x10 mm/kasan/common.c:457
__cache_free mm/slab.c:3425 [inline]
kfree+0x10a/0x2c0 mm/slab.c:3756
rxrpc_destroy_connection+0x1f2/0x2d0 net/rxrpc/conn_object.c:372
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch kernel/rcu/tree.c:2114 [inline]
rcu_core+0x67f/0x1580 kernel/rcu/tree.c:2314
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
__do_softirq+0x262/0x98c kernel/softirq.c:292
The buggy address belongs to the object at ffff888062ffe900
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 592 bytes inside of
1024-byte region [ffff888062ffe900, ffff888062ffed00)
The buggy address belongs to the page:
page:ffffea00018bff80 refcount:1 mapcount:0 mapping:ffff8880aa400c40
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000181be88 ffffea0002324108 ffff8880aa400c40
raw: 0000000000000000 ffff888062ffe000 0000000100000007 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888062ffea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888062ffea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888062ffeb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888062ffeb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888062ffec00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply
* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-23 14:53 UTC (permalink / raw)
To: Alex Williamson
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <20190823082820.605deb07@x1.home>
> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, August 23, 2019 7:58 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>
> On Fri, 23 Aug 2019 08:14:39 +0000
> Parav Pandit <parav@mellanox.com> wrote:
>
> > Hi Alex,
> >
> >
> > > -----Original Message-----
> > > From: Jiri Pirko <jiri@resnulli.us>
> > > Sent: Friday, August 23, 2019 1:42 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>; Kirti
> > > Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> <cohuck@redhat.com>;
> > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Jiri Pirko <jiri@resnulli.us>
> > > >> Sent: Thursday, August 22, 2019 5:50 PM
> > > >> To: Parav Pandit <parav@mellanox.com>
> > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > <cohuck@redhat.com>;
> > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >>
> > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
> > > >> >
> > > >> >
> > > >> >> -----Original Message-----
> > > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > > >> >> To: Parav Pandit <parav@mellanox.com>
> > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > >> >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > >> >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > >> <cohuck@redhat.com>;
> > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >> >>
> > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com wrote:
> > > >> >> >
> > > >> >> >
> > > >> >> >> -----Original Message-----
> > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > >> >> <cohuck@redhat.com>;
> > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > >> >> >> core
> > > >> >> >>
> > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST, parav@mellanox.com
> wrote:
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> >> -----Original Message-----
> > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
> > > >> >> >> >> linux-kernel@vger.kernel.org; cjia <cjia@nvidia.com>;
> > > >> >> >> >> netdev@vger.kernel.org
> > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > >> >> >> >> mdev core
> > > >> >> >> >>
> > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's set.
> > > >> >> >> >> > > > > In fact, proposing that the user does not set
> > > >> >> >> >> > > > > it, mdev-core provides one
> > > >> >> >> >> > > automatically.
> > > >> >> >> >> > > > >
> > > >> >> >> >> > > > > > > Since there seems to be some prefix
> > > >> >> >> >> > > > > > > overhead, as I ask about above in how many
> > > >> >> >> >> > > > > > > characters we actually have to work with in
> > > >> >> >> >> > > > > > > IFNAMESZ, maybe we start with
> > > >> >> >> >> > > > > > > 8 characters (matching your "index"
> > > >> >> >> >> > > > > > > namespace) and expand as necessary for
> > > >> >> >> disambiguation.
> > > >> >> >> >> > > > > > > If we can eliminate overhead in IFNAMESZ,
> > > >> >> >> >> > > > > > > let's start with
> > > >> 12.
> > > >> >> >> >> > > > > > > Thanks,
> > > >> >> >> >> > > > > > >
> > > >> >> >> >> > > > > > If user is going to choose the alias, why does
> > > >> >> >> >> > > > > > it have to be limited to
> > > >> >> >> >> sha1?
> > > >> >> >> >> > > > > > Or you just told it as an example?
> > > >> >> >> >> > > > > >
> > > >> >> >> >> > > > > > It can be an alpha-numeric string.
> > > >> >> >> >> > > > >
> > > >> >> >> >> > > > > No, I'm proposing a different solution where
> > > >> >> >> >> > > > > mdev-core creates an alias based on an
> > > >> >> >> >> > > > > abbreviated sha1. The user does not provide the
> > > >> >> >> >> alias.
> > > >> >> >> >> > > > >
> > > >> >> >> >> > > > > > Instead of mdev imposing number of characters
> > > >> >> >> >> > > > > > on the alias, it should be best
> > > >> >> >> >> > > > > left to the user.
> > > >> >> >> >> > > > > > Because in future if netdev improves on the
> > > >> >> >> >> > > > > > naming scheme, mdev will be
> > > >> >> >> >> > > > > limiting it, which is not right.
> > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > > >> >> >> >> > > > > > User configuring mdev for networking devices
> > > >> >> >> >> > > > > > in a given kernel knows what
> > > >> >> >> >> > > > > user is doing.
> > > >> >> >> >> > > > > > So user can choose alias name size as it finds suitable.
> > > >> >> >> >> > > > >
> > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > > >> >> >> >> > > > > Thanks,
> > > >> >> >> >> > > >
> > > >> >> >> >> > > > I understood your point. But mdev doesn't know how
> > > >> >> >> >> > > > user is going to use
> > > >> >> >> >> > > udev/systemd to name the netdev.
> > > >> >> >> >> > > > So even if mdev chose to pick 12 characters, it
> > > >> >> >> >> > > > could result in
> > > >> >> collision.
> > > >> >> >> >> > > > Hence the proposal to provide the alias by the
> > > >> >> >> >> > > > user, as user know the best
> > > >> >> >> >> > > policy for its use case in the environment its using.
> > > >> >> >> >> > > > So 12 character sha1 method will still work by user.
> > > >> >> >> >> > >
> > > >> >> >> >> > > Haven't you already provided examples where certain
> > > >> >> >> >> > > drivers or subsystems have unique netdev prefixes?
> > > >> >> >> >> > > If mdev provides a unique alias within the
> > > >> >> >> >> > > subsystem, couldn't we simply define a netdev prefix
> > > >> >> >> >> > > for the mdev subsystem and avoid all other
> > > >> >> >> >> > > collisions? I'm not in favor of the user providing
> > > >> >> >> >> > > both a uuid and an alias/instance. Thanks,
> > > >> >> >> >> > >
> > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
> > > >> >> >> >> > first 9 characters have
> > > >> >> >> >> collision?
> > > >> >> >> >>
> > > >> >> >> >> I think it would be a mistake to waste so many chars on
> > > >> >> >> >> a prefix, but
> > > >> >> >> >> 9 characters of sha1 likely wouldn't have a collision
> > > >> >> >> >> before we have 10s of thousands of devices. Thanks,
> > > >> >> >> >>
> > > >> >> >> >> Alex
> > > >> >> >> >
> > > >> >> >> >Jiri, Dave,
> > > >> >> >> >Are you ok with it for devlink/netdev part?
> > > >> >> >> >Mdev core will create an alias from a UUID.
> > > >> >> >> >
> > > >> >> >> >This will be supplied during devlink port attr set such
> > > >> >> >> >as,
> > > >> >> >> >
> > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
> > > >> >> >> >const char *mdev_alias);
> > > >> >> >> >
> > > >> >> >> >This alias is used to generate representor netdev's
> phys_port_name.
> > > >> >> >> >This alias from the mdev device's sysfs will be used by
> > > >> >> >> >the udev/systemd to
> > > >> >> >> generate predicable netdev's name.
> > > >> >> >> >Example: enm<mdev_alias_first_12_chars>
> > > >> >> >>
> > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > > >> >> >>
> > > >> >> >Since users sees two devices with same phys_port_name, user
> > > >> >> >should destroy
> > > >> >> recently created mdev and recreate mdev with different UUID?
> > > >> >>
> > > >> >> Driver should make sure phys port name wont collide,
> > > >> >So when mdev creation is initiated, mdev core calculates the
> > > >> >alias and if there
> > > >> is any other mdev with same alias exist, it returns -EEXIST error
> > > >> before progressing further.
> > > >> >This way user will get to know upfront in event of collision
> > > >> >before the mdev
> > > >> device gets created.
> > > >> >How about that?
> > > >>
> > > >> Sounds fine to me. Now the question is how many chars do we want to
> have.
> > > >>
> > > >12 characters from Alex's suggestion similar to git?
> > >
> > > Ok.
> > >
> >
> > Can you please confirm this scheme looks good now? I like to get patches
> started.
>
> My only concern is your comment that in the event of an abbreviated
> sha1 collision (as exceptionally rare as that might be at 12-chars), we'd fail the
> device create, while my original suggestion was that vfio-core would add an
> extra character to the alias. For non-networking devices, the sha1 is
> unnecessary, so the extension behavior seems preferred. The user is only
> responsible to provide a unique uuid. Perhaps the failure behavior could be
> applied based on the mdev device_api. A module option on mdev to specify the
> default number of alias chars would also be useful for testing so that we can set
> it low enough to validate the collision behavior. Thanks,
>
Idea is to have mdev alias as optional.
Each mdev_parent says whether it wants mdev_core to generate an alias or not.
So only networking device drivers would set it to true.
For rest, alias won't be generated, and won't be compared either during creation time.
User continue to provide only uuid.
I am tempted to have alias collision detection only within children mdevs of the same parent, but doing so will always mandate to prefix in netdev name.
And currently we are left with only 3 characters to prefix it, so that may not be good either.
Hence, I think mdev core wide alias is better with 12 characters.
I do not understand how an extra character reduces collision, if that's what you meant.
Module options are almost not encouraged anymore with other subsystems/drivers.
For testing collision rate, a sample user space script and sample mtty is easy and get us collision count too.
We shouldn't put that using module option in production kernel.
I practically have the code ready to play with; Changing 12 to smaller value is easy with module reload.
#define MDEV_ALIAS_LEN 12
> Alex
>
> > > >> >> in this case that it does
> > > >> >> not provide 2 same attrs for 2 different ports.
> > > >> >> Hmm, so the order of creation matters. That is not good.
> > > >> >>
> > > >> >> >>
> > > >> >> >> >I took Ethernet mdev as an example.
> > > >> >> >> >New prefix 'm' stands for mediated device.
> > > >> >> >> >Remaining 12 characters are first 12 chars of the mdev alias.
> > > >> >> >>
> > > >> >> >> Does this resolve the identification of devlink port representor?
> > > >> >> >Not sure if I understood your question correctly, attemping
> > > >> >> >to answer
> > > >> below.
> > > >> >> >phys_port_name of devlink port is defined by the first 12
> > > >> >> >characters of mdev
> > > >> >> alias.
> > > >> >> >> I assume you want to use the same 12(or so) chars, don't you?
> > > >> >> >Mdev's netdev will also use the same mdev alias from the
> > > >> >> >sysfs to rename
> > > >> >> netdev name from ethX to enm<mdev_alias>, where en=Etherenet,
> > > >> m=mdev.
> > > >> >> >
> > > >> >> >So yes, same 12 characters are use for mdev's netdev and mdev
> > > >> >> >devlink port's
> > > >> >> phys_port_name.
> > > >> >> >
> > > >> >> >Is that what are you asking?
> > > >> >>
> > > >> >> Yes. Then you have 3 chars to handle the rest of the name (pci, pf)...
^ permalink raw reply
* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Jiri Pirko @ 2019-08-23 15:04 UTC (permalink / raw)
To: Parav Pandit
Cc: Alex Williamson, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866867150DAABA422F25FF8D1A40@AM0PR05MB4866.eurprd05.prod.outlook.com>
Fri, Aug 23, 2019 at 04:53:06PM CEST, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Alex Williamson <alex.williamson@redhat.com>
>> Sent: Friday, August 23, 2019 7:58 PM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
>> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
>> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
>> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
>> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>>
>> On Fri, 23 Aug 2019 08:14:39 +0000
>> Parav Pandit <parav@mellanox.com> wrote:
>>
>> > Hi Alex,
>> >
>> >
>> > > -----Original Message-----
>> > > From: Jiri Pirko <jiri@resnulli.us>
>> > > Sent: Friday, August 23, 2019 1:42 PM
>> > > To: Parav Pandit <parav@mellanox.com>
>> > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
>> > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>; Kirti
>> > > Wankhede <kwankhede@nvidia.com>; Cornelia Huck
>> <cohuck@redhat.com>;
>> > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
>> > > <cjia@nvidia.com>; netdev@vger.kernel.org
>> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>> > >
>> > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
>> > > >
>> > > >
>> > > >> -----Original Message-----
>> > > >> From: Jiri Pirko <jiri@resnulli.us>
>> > > >> Sent: Thursday, August 22, 2019 5:50 PM
>> > > >> To: Parav Pandit <parav@mellanox.com>
>> > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
>> > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
>> > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
>> > > <cohuck@redhat.com>;
>> > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
>> > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
>> > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>> > > >>
>> > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
>> > > >> >
>> > > >> >
>> > > >> >> -----Original Message-----
>> > > >> >> From: Jiri Pirko <jiri@resnulli.us>
>> > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
>> > > >> >> To: Parav Pandit <parav@mellanox.com>
>> > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
>> > > >> >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
>> > > >> >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
>> > > >> <cohuck@redhat.com>;
>> > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
>> > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
>> > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>> > > >> >>
>> > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com wrote:
>> > > >> >> >
>> > > >> >> >
>> > > >> >> >> -----Original Message-----
>> > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
>> > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
>> > > >> >> >> To: Parav Pandit <parav@mellanox.com>
>> > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
>> > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
>> > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
>> > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
>> > > >> >> <cohuck@redhat.com>;
>> > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
>> > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
>> > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
>> > > >> >> >> core
>> > > >> >> >>
>> > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST, parav@mellanox.com
>> wrote:
>> > > >> >> >> >
>> > > >> >> >> >
>> > > >> >> >> >> -----Original Message-----
>> > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
>> > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
>> > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
>> > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
>> > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
>> > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
>> > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
>> > > >> >> >> >> linux-kernel@vger.kernel.org; cjia <cjia@nvidia.com>;
>> > > >> >> >> >> netdev@vger.kernel.org
>> > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
>> > > >> >> >> >> mdev core
>> > > >> >> >> >>
>> > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's set.
>> > > >> >> >> >> > > > > In fact, proposing that the user does not set
>> > > >> >> >> >> > > > > it, mdev-core provides one
>> > > >> >> >> >> > > automatically.
>> > > >> >> >> >> > > > >
>> > > >> >> >> >> > > > > > > Since there seems to be some prefix
>> > > >> >> >> >> > > > > > > overhead, as I ask about above in how many
>> > > >> >> >> >> > > > > > > characters we actually have to work with in
>> > > >> >> >> >> > > > > > > IFNAMESZ, maybe we start with
>> > > >> >> >> >> > > > > > > 8 characters (matching your "index"
>> > > >> >> >> >> > > > > > > namespace) and expand as necessary for
>> > > >> >> >> disambiguation.
>> > > >> >> >> >> > > > > > > If we can eliminate overhead in IFNAMESZ,
>> > > >> >> >> >> > > > > > > let's start with
>> > > >> 12.
>> > > >> >> >> >> > > > > > > Thanks,
>> > > >> >> >> >> > > > > > >
>> > > >> >> >> >> > > > > > If user is going to choose the alias, why does
>> > > >> >> >> >> > > > > > it have to be limited to
>> > > >> >> >> >> sha1?
>> > > >> >> >> >> > > > > > Or you just told it as an example?
>> > > >> >> >> >> > > > > >
>> > > >> >> >> >> > > > > > It can be an alpha-numeric string.
>> > > >> >> >> >> > > > >
>> > > >> >> >> >> > > > > No, I'm proposing a different solution where
>> > > >> >> >> >> > > > > mdev-core creates an alias based on an
>> > > >> >> >> >> > > > > abbreviated sha1. The user does not provide the
>> > > >> >> >> >> alias.
>> > > >> >> >> >> > > > >
>> > > >> >> >> >> > > > > > Instead of mdev imposing number of characters
>> > > >> >> >> >> > > > > > on the alias, it should be best
>> > > >> >> >> >> > > > > left to the user.
>> > > >> >> >> >> > > > > > Because in future if netdev improves on the
>> > > >> >> >> >> > > > > > naming scheme, mdev will be
>> > > >> >> >> >> > > > > limiting it, which is not right.
>> > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
>> > > >> >> >> >> > > > > > User configuring mdev for networking devices
>> > > >> >> >> >> > > > > > in a given kernel knows what
>> > > >> >> >> >> > > > > user is doing.
>> > > >> >> >> >> > > > > > So user can choose alias name size as it finds suitable.
>> > > >> >> >> >> > > > >
>> > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
>> > > >> >> >> >> > > > > Thanks,
>> > > >> >> >> >> > > >
>> > > >> >> >> >> > > > I understood your point. But mdev doesn't know how
>> > > >> >> >> >> > > > user is going to use
>> > > >> >> >> >> > > udev/systemd to name the netdev.
>> > > >> >> >> >> > > > So even if mdev chose to pick 12 characters, it
>> > > >> >> >> >> > > > could result in
>> > > >> >> collision.
>> > > >> >> >> >> > > > Hence the proposal to provide the alias by the
>> > > >> >> >> >> > > > user, as user know the best
>> > > >> >> >> >> > > policy for its use case in the environment its using.
>> > > >> >> >> >> > > > So 12 character sha1 method will still work by user.
>> > > >> >> >> >> > >
>> > > >> >> >> >> > > Haven't you already provided examples where certain
>> > > >> >> >> >> > > drivers or subsystems have unique netdev prefixes?
>> > > >> >> >> >> > > If mdev provides a unique alias within the
>> > > >> >> >> >> > > subsystem, couldn't we simply define a netdev prefix
>> > > >> >> >> >> > > for the mdev subsystem and avoid all other
>> > > >> >> >> >> > > collisions? I'm not in favor of the user providing
>> > > >> >> >> >> > > both a uuid and an alias/instance. Thanks,
>> > > >> >> >> >> > >
>> > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
>> > > >> >> >> >> > first 9 characters have
>> > > >> >> >> >> collision?
>> > > >> >> >> >>
>> > > >> >> >> >> I think it would be a mistake to waste so many chars on
>> > > >> >> >> >> a prefix, but
>> > > >> >> >> >> 9 characters of sha1 likely wouldn't have a collision
>> > > >> >> >> >> before we have 10s of thousands of devices. Thanks,
>> > > >> >> >> >>
>> > > >> >> >> >> Alex
>> > > >> >> >> >
>> > > >> >> >> >Jiri, Dave,
>> > > >> >> >> >Are you ok with it for devlink/netdev part?
>> > > >> >> >> >Mdev core will create an alias from a UUID.
>> > > >> >> >> >
>> > > >> >> >> >This will be supplied during devlink port attr set such
>> > > >> >> >> >as,
>> > > >> >> >> >
>> > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
>> > > >> >> >> >const char *mdev_alias);
>> > > >> >> >> >
>> > > >> >> >> >This alias is used to generate representor netdev's
>> phys_port_name.
>> > > >> >> >> >This alias from the mdev device's sysfs will be used by
>> > > >> >> >> >the udev/systemd to
>> > > >> >> >> generate predicable netdev's name.
>> > > >> >> >> >Example: enm<mdev_alias_first_12_chars>
>> > > >> >> >>
>> > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
>> > > >> >> >>
>> > > >> >> >Since users sees two devices with same phys_port_name, user
>> > > >> >> >should destroy
>> > > >> >> recently created mdev and recreate mdev with different UUID?
>> > > >> >>
>> > > >> >> Driver should make sure phys port name wont collide,
>> > > >> >So when mdev creation is initiated, mdev core calculates the
>> > > >> >alias and if there
>> > > >> is any other mdev with same alias exist, it returns -EEXIST error
>> > > >> before progressing further.
>> > > >> >This way user will get to know upfront in event of collision
>> > > >> >before the mdev
>> > > >> device gets created.
>> > > >> >How about that?
>> > > >>
>> > > >> Sounds fine to me. Now the question is how many chars do we want to
>> have.
>> > > >>
>> > > >12 characters from Alex's suggestion similar to git?
>> > >
>> > > Ok.
>> > >
>> >
>> > Can you please confirm this scheme looks good now? I like to get patches
>> started.
>>
>> My only concern is your comment that in the event of an abbreviated
>> sha1 collision (as exceptionally rare as that might be at 12-chars), we'd fail the
>> device create, while my original suggestion was that vfio-core would add an
>> extra character to the alias. For non-networking devices, the sha1 is
>> unnecessary, so the extension behavior seems preferred. The user is only
>> responsible to provide a unique uuid. Perhaps the failure behavior could be
>> applied based on the mdev device_api. A module option on mdev to specify the
>> default number of alias chars would also be useful for testing so that we can set
>> it low enough to validate the collision behavior. Thanks,
>>
>
>Idea is to have mdev alias as optional.
>Each mdev_parent says whether it wants mdev_core to generate an alias or not.
>So only networking device drivers would set it to true.
>For rest, alias won't be generated, and won't be compared either during creation time.
>User continue to provide only uuid.
>I am tempted to have alias collision detection only within children mdevs of the same parent, but doing so will always mandate to prefix in netdev name.
>And currently we are left with only 3 characters to prefix it, so that may not be good either.
>Hence, I think mdev core wide alias is better with 12 characters.
>
>I do not understand how an extra character reduces collision, if that's what you meant.
Also, that breaks the naming consistency for different creation order.
>Module options are almost not encouraged anymore with other subsystems/drivers.
>
>For testing collision rate, a sample user space script and sample mtty is easy and get us collision count too.
>We shouldn't put that using module option in production kernel.
>I practically have the code ready to play with; Changing 12 to smaller value is easy with module reload.
>
>#define MDEV_ALIAS_LEN 12
>
>> Alex
>>
>> > > >> >> in this case that it does
>> > > >> >> not provide 2 same attrs for 2 different ports.
>> > > >> >> Hmm, so the order of creation matters. That is not good.
>> > > >> >>
>> > > >> >> >>
>> > > >> >> >> >I took Ethernet mdev as an example.
>> > > >> >> >> >New prefix 'm' stands for mediated device.
>> > > >> >> >> >Remaining 12 characters are first 12 chars of the mdev alias.
>> > > >> >> >>
>> > > >> >> >> Does this resolve the identification of devlink port representor?
>> > > >> >> >Not sure if I understood your question correctly, attemping
>> > > >> >> >to answer
>> > > >> below.
>> > > >> >> >phys_port_name of devlink port is defined by the first 12
>> > > >> >> >characters of mdev
>> > > >> >> alias.
>> > > >> >> >> I assume you want to use the same 12(or so) chars, don't you?
>> > > >> >> >Mdev's netdev will also use the same mdev alias from the
>> > > >> >> >sysfs to rename
>> > > >> >> netdev name from ethX to enm<mdev_alias>, where en=Etherenet,
>> > > >> m=mdev.
>> > > >> >> >
>> > > >> >> >So yes, same 12 characters are use for mdev's netdev and mdev
>> > > >> >> >devlink port's
>> > > >> >> phys_port_name.
>> > > >> >> >
>> > > >> >> >Is that what are you asking?
>> > > >> >>
>> > > >> >> Yes. Then you have 3 chars to handle the rest of the name (pci, pf)...
>
^ permalink raw reply
* [PATCH net-next] dpaa2-eth: Add pause frame support
From: Ioana Radulescu @ 2019-08-23 15:19 UTC (permalink / raw)
To: netdev, davem; +Cc: ioana.ciornei
Starting with firmware version MC10.18.0, we have support for
L2 flow control. Asymmetrical configuration (Rx or Tx only) is
supported, but not pause frame autonegotioation.
The hardware can automatically send pause frames when the number
of buffers in the pool goes below a predefined threshold. Due to
this, flow control is incompatible with Rx frame queue taildrop
(both mechanisms target the case when processing of ingress
frames can't keep up with the Rx rate; for large frames, the number
of buffers in the pool may never get low enough to trigger pause
frames as long as taildrop is enabled). So we set pause frame
generation and Rx FQ taildrop as mutually exclusive.
Pause frame configuration is done via ethtool. By default, we start
with flow control enabled on both Rx and Tx. Changes are propagated
to hardware through firmware commands; current FC state is stored
in the driver and we only interrogate the firmware when we receive
a notification that something (flow control or other link options)
has changed.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
---
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 85 +++++++++++++++++++---
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h | 7 ++
.../net/ethernet/freescale/dpaa2/dpaa2-ethtool.c | 74 +++++++++++++++----
drivers/net/ethernet/freescale/dpaa2/dpni-cmd.h | 3 +-
drivers/net/ethernet/freescale/dpaa2/dpni.c | 40 +++++++++-
drivers/net/ethernet/freescale/dpaa2/dpni.h | 5 ++
6 files changed, 186 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 0acb115..e0816d6 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -1208,9 +1208,37 @@ static void disable_ch_napi(struct dpaa2_eth_priv *priv)
}
}
+static void dpaa2_eth_set_rx_taildrop(struct dpaa2_eth_priv *priv, bool enable)
+{
+ struct dpni_taildrop td = {0};
+ int i, err;
+
+ if (priv->rx_td_enabled == enable)
+ return;
+
+ td.enable = enable;
+ td.threshold = DPAA2_ETH_TAILDROP_THRESH;
+
+ for (i = 0; i < priv->num_fqs; i++) {
+ if (priv->fq[i].type != DPAA2_RX_FQ)
+ continue;
+ err = dpni_set_taildrop(priv->mc_io, 0, priv->mc_token,
+ DPNI_CP_QUEUE, DPNI_QUEUE_RX, 0,
+ priv->fq[i].flowid, &td);
+ if (err) {
+ netdev_err(priv->net_dev,
+ "dpni_set_taildrop() failed\n");
+ break;
+ }
+ }
+
+ priv->rx_td_enabled = enable;
+}
+
static int link_state_update(struct dpaa2_eth_priv *priv)
{
struct dpni_link_state state = {0};
+ bool tx_pause;
int err;
err = dpni_get_link_state(priv->mc_io, 0, priv->mc_token, &state);
@@ -1220,11 +1248,18 @@ static int link_state_update(struct dpaa2_eth_priv *priv)
return err;
}
+ /* If Tx pause frame settings have changed, we need to update
+ * Rx FQ taildrop configuration as well. We configure taildrop
+ * only when pause frame generation is disabled.
+ */
+ tx_pause = !!(state.options & DPNI_LINK_OPT_PAUSE) ^
+ !!(state.options & DPNI_LINK_OPT_ASYM_PAUSE);
+ dpaa2_eth_set_rx_taildrop(priv, !tx_pause);
+
/* Chech link state; speed / duplex changes are not treated yet */
if (priv->link_state.up == state.up)
- return 0;
+ goto out;
- priv->link_state = state;
if (state.up) {
netif_carrier_on(priv->net_dev);
netif_tx_start_all_queues(priv->net_dev);
@@ -1236,6 +1271,9 @@ static int link_state_update(struct dpaa2_eth_priv *priv)
netdev_info(priv->net_dev, "Link Event: state %s\n",
state.up ? "up" : "down");
+out:
+ priv->link_state = state;
+
return 0;
}
@@ -2443,6 +2481,32 @@ static void set_enqueue_mode(struct dpaa2_eth_priv *priv)
priv->enqueue = dpaa2_eth_enqueue_fq;
}
+static int set_pause(struct dpaa2_eth_priv *priv)
+{
+ struct device *dev = priv->net_dev->dev.parent;
+ struct dpni_link_cfg link_cfg = {0};
+ int err;
+
+ /* Get the default link options so we don't override other flags */
+ err = dpni_get_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
+ if (err) {
+ dev_err(dev, "dpni_get_link_cfg() failed\n");
+ return err;
+ }
+
+ link_cfg.options |= DPNI_LINK_OPT_PAUSE;
+ link_cfg.options &= ~DPNI_LINK_OPT_ASYM_PAUSE;
+ err = dpni_set_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
+ if (err) {
+ dev_err(dev, "dpni_set_link_cfg() failed\n");
+ return err;
+ }
+
+ priv->link_state.options = link_cfg.options;
+
+ return 0;
+}
+
/* Configure the DPNI object this interface is associated with */
static int setup_dpni(struct fsl_mc_device *ls_dev)
{
@@ -2498,6 +2562,13 @@ static int setup_dpni(struct fsl_mc_device *ls_dev)
set_enqueue_mode(priv);
+ /* Enable pause frame support */
+ if (dpaa2_eth_has_pause_support(priv)) {
+ err = set_pause(priv);
+ if (err)
+ goto close;
+ }
+
priv->cls_rules = devm_kzalloc(dev, sizeof(struct dpaa2_eth_cls_rule) *
dpaa2_eth_fs_count(priv), GFP_KERNEL);
if (!priv->cls_rules)
@@ -2529,7 +2600,6 @@ static int setup_rx_flow(struct dpaa2_eth_priv *priv,
struct device *dev = priv->net_dev->dev.parent;
struct dpni_queue queue;
struct dpni_queue_id qid;
- struct dpni_taildrop td;
int err;
err = dpni_get_queue(priv->mc_io, 0, priv->mc_token,
@@ -2554,15 +2624,6 @@ static int setup_rx_flow(struct dpaa2_eth_priv *priv,
return err;
}
- td.enable = 1;
- td.threshold = DPAA2_ETH_TAILDROP_THRESH;
- err = dpni_set_taildrop(priv->mc_io, 0, priv->mc_token, DPNI_CP_QUEUE,
- DPNI_QUEUE_RX, 0, fq->flowid, &td);
- if (err) {
- dev_err(dev, "dpni_set_threshold() failed\n");
- return err;
- }
-
/* xdp_rxq setup */
err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
fq->flowid);
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
index 9af18c2..8a0e65b 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
@@ -392,6 +392,7 @@ struct dpaa2_eth_priv {
struct dpaa2_eth_drv_stats __percpu *percpu_extras;
u16 mc_token;
+ u8 rx_td_enabled;
struct dpni_link_state link_state;
bool do_link_poll;
@@ -476,6 +477,12 @@ enum dpaa2_eth_rx_dist {
#define DPAA2_ETH_DIST_L4DST BIT(8)
#define DPAA2_ETH_DIST_ALL (~0ULL)
+#define DPNI_PAUSE_VER_MAJOR 7
+#define DPNI_PAUSE_VER_MINOR 13
+#define dpaa2_eth_has_pause_support(priv) \
+ (dpaa2_eth_cmp_dpni_ver((priv), DPNI_PAUSE_VER_MAJOR, \
+ DPNI_PAUSE_VER_MINOR) >= 0)
+
static inline
unsigned int dpaa2_eth_needed_headroom(struct dpaa2_eth_priv *priv,
struct sk_buff *skb)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
index 7b182f4..7000638 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
@@ -78,29 +78,20 @@ static int
dpaa2_eth_get_link_ksettings(struct net_device *net_dev,
struct ethtool_link_ksettings *link_settings)
{
- struct dpni_link_state state = {0};
- int err = 0;
struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
- err = dpni_get_link_state(priv->mc_io, 0, priv->mc_token, &state);
- if (err) {
- netdev_err(net_dev, "ERROR %d getting link state\n", err);
- goto out;
- }
-
/* At the moment, we have no way of interrogating the DPMAC
* from the DPNI side - and for that matter there may exist
* no DPMAC at all. So for now we just don't report anything
* beyond the DPNI attributes.
*/
- if (state.options & DPNI_LINK_OPT_AUTONEG)
+ if (priv->link_state.options & DPNI_LINK_OPT_AUTONEG)
link_settings->base.autoneg = AUTONEG_ENABLE;
- if (!(state.options & DPNI_LINK_OPT_HALF_DUPLEX))
+ if (!(priv->link_state.options & DPNI_LINK_OPT_HALF_DUPLEX))
link_settings->base.duplex = DUPLEX_FULL;
- link_settings->base.speed = state.rate;
+ link_settings->base.speed = priv->link_state.rate;
-out:
- return err;
+ return 0;
}
#define DPNI_DYNAMIC_LINK_SET_VER_MAJOR 7
@@ -125,6 +116,9 @@ dpaa2_eth_set_link_ksettings(struct net_device *net_dev,
}
}
+ /* Make sure current options are not overwritten */
+ cfg.options = priv->link_state.options;
+
cfg.rate = link_settings->base.speed;
if (link_settings->base.autoneg == AUTONEG_ENABLE)
cfg.options |= DPNI_LINK_OPT_AUTONEG;
@@ -145,6 +139,58 @@ dpaa2_eth_set_link_ksettings(struct net_device *net_dev,
return err;
}
+static void dpaa2_eth_get_pauseparam(struct net_device *net_dev,
+ struct ethtool_pauseparam *pause)
+{
+ struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
+ u64 link_options = priv->link_state.options;
+
+ pause->rx_pause = !!(link_options & DPNI_LINK_OPT_PAUSE);
+ pause->tx_pause = pause->rx_pause ^
+ !!(link_options & DPNI_LINK_OPT_ASYM_PAUSE);
+}
+
+static int dpaa2_eth_set_pauseparam(struct net_device *net_dev,
+ struct ethtool_pauseparam *pause)
+{
+ struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
+ struct dpni_link_cfg cfg = {0};
+ int err;
+
+ if (!dpaa2_eth_has_pause_support(priv)) {
+ netdev_info(net_dev, "No pause frame support for DPNI version < %d.%d\n",
+ DPNI_PAUSE_VER_MAJOR, DPNI_PAUSE_VER_MINOR);
+ return -EOPNOTSUPP;
+ }
+
+ if (pause->autoneg)
+ netdev_err(net_dev, "Pause frame autoneg not supported\n");
+
+ cfg.rate = priv->link_state.rate;
+ cfg.options = priv->link_state.options;
+ if (pause->rx_pause)
+ cfg.options |= DPNI_LINK_OPT_PAUSE;
+ else
+ cfg.options &= ~DPNI_LINK_OPT_PAUSE;
+ if (!!pause->rx_pause ^ !!pause->tx_pause)
+ cfg.options |= DPNI_LINK_OPT_ASYM_PAUSE;
+ else
+ cfg.options &= ~DPNI_LINK_OPT_ASYM_PAUSE;
+
+ if (cfg.options == priv->link_state.options)
+ return 0;
+
+ err = dpni_set_link_cfg(priv->mc_io, 0, priv->mc_token, &cfg);
+ if (err) {
+ netdev_err(net_dev, "dpni_set_link_state failed\n");
+ return err;
+ }
+
+ priv->link_state.options = cfg.options;
+
+ return 0;
+}
+
static void dpaa2_eth_get_strings(struct net_device *netdev, u32 stringset,
u8 *data)
{
@@ -722,6 +768,8 @@ const struct ethtool_ops dpaa2_ethtool_ops = {
.get_link = ethtool_op_get_link,
.get_link_ksettings = dpaa2_eth_get_link_ksettings,
.set_link_ksettings = dpaa2_eth_set_link_ksettings,
+ .get_pauseparam = dpaa2_eth_get_pauseparam,
+ .set_pauseparam = dpaa2_eth_set_pauseparam,
.get_sset_count = dpaa2_eth_get_sset_count,
.get_ethtool_stats = dpaa2_eth_get_ethtool_stats,
.get_strings = dpaa2_eth_get_strings,
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpni-cmd.h b/drivers/net/ethernet/freescale/dpaa2/dpni-cmd.h
index 7b44d7d..d9b6918 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpni-cmd.h
+++ b/drivers/net/ethernet/freescale/dpaa2/dpni-cmd.h
@@ -84,6 +84,7 @@
#define DPNI_CMDID_SET_RX_FS_DIST DPNI_CMD(0x273)
#define DPNI_CMDID_SET_RX_HASH_DIST DPNI_CMD(0x274)
+#define DPNI_CMDID_GET_LINK_CFG DPNI_CMD(0x278)
/* Macros for accessing command fields smaller than 1byte */
#define DPNI_MASK(field) \
@@ -284,7 +285,7 @@ struct dpni_rsp_get_statistics {
__le64 counter[DPNI_STATISTICS_CNT];
};
-struct dpni_cmd_set_link_cfg {
+struct dpni_cmd_link_cfg {
/* cmd word 0 */
__le64 pad0;
/* cmd word 1 */
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpni.c b/drivers/net/ethernet/freescale/dpaa2/dpni.c
index 220dfc8..05e3089 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpni.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpni.c
@@ -838,13 +838,13 @@ int dpni_set_link_cfg(struct fsl_mc_io *mc_io,
const struct dpni_link_cfg *cfg)
{
struct fsl_mc_command cmd = { 0 };
- struct dpni_cmd_set_link_cfg *cmd_params;
+ struct dpni_cmd_link_cfg *cmd_params;
/* prepare command */
cmd.header = mc_encode_cmd_header(DPNI_CMDID_SET_LINK_CFG,
cmd_flags,
token);
- cmd_params = (struct dpni_cmd_set_link_cfg *)cmd.params;
+ cmd_params = (struct dpni_cmd_link_cfg *)cmd.params;
cmd_params->rate = cpu_to_le32(cfg->rate);
cmd_params->options = cpu_to_le64(cfg->options);
@@ -853,6 +853,42 @@ int dpni_set_link_cfg(struct fsl_mc_io *mc_io,
}
/**
+ * dpni_get_link_cfg() - return the link configuration
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPNI object
+ * @cfg: Link configuration from dpni object
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpni_get_link_cfg(struct fsl_mc_io *mc_io,
+ u32 cmd_flags,
+ u16 token,
+ struct dpni_link_cfg *cfg)
+{
+ struct fsl_mc_command cmd = { 0 };
+ struct dpni_cmd_link_cfg *rsp_params;
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPNI_CMDID_GET_LINK_CFG,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpni_cmd_link_cfg *)cmd.params;
+ cfg->rate = le32_to_cpu(rsp_params->rate);
+ cfg->options = le64_to_cpu(rsp_params->options);
+
+ return err;
+}
+
+/**
* dpni_get_link_state() - Return the link state (either up or down)
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpni.h b/drivers/net/ethernet/freescale/dpaa2/dpni.h
index a521242..3e8fc6c 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpni.h
+++ b/drivers/net/ethernet/freescale/dpaa2/dpni.h
@@ -485,6 +485,11 @@ int dpni_set_link_cfg(struct fsl_mc_io *mc_io,
u16 token,
const struct dpni_link_cfg *cfg);
+int dpni_get_link_cfg(struct fsl_mc_io *mc_io,
+ u32 cmd_flags,
+ u16 token,
+ struct dpni_link_cfg *cfg);
+
/**
* struct dpni_link_state - Structure representing DPNI link state
* @rate: Rate
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next 5/6] net: dsa: program VLAN on CPU port from slave
From: Vladimir Oltean @ 2019-08-23 15:44 UTC (permalink / raw)
To: Vivien Didelot; +Cc: netdev, David S. Miller, Florian Fainelli, Andrew Lunn
In-Reply-To: <20190822201323.1292-6-vivien.didelot@gmail.com>
On Thu, 22 Aug 2019 at 23:13, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> DSA currently programs a VLAN on the CPU port implicitly after the
> related notifier is received by a switch.
>
> While we still need to do this transparent programmation of the DSA
> links in the fabric, programming the CPU port this way may cause
> problems in some corners such as the tag_8021q driver.
>
> Because the dedicated CPU port is specific to a slave, make their
> programmation explicit a few layers up, in the slave code.
>
> Note that technically, DSA links have a dedicated CPU port as well,
> but since they are only used as conduit between interconnected switches
> of a fabric, programming them transparently this way is fine.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
> ---
> net/dsa/slave.c | 14 ++++++++++++++
> net/dsa/switch.c | 5 ++++-
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 82e48d247b81..8267c156a51a 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -332,6 +332,10 @@ static int dsa_slave_vlan_add(struct net_device *dev,
> if (err)
> return err;
>
> + err = dsa_port_vlan_add(dp->cpu_dp, &vlan, trans);
> + if (err)
> + return err;
> +
> return 0;
> }
>
> @@ -383,6 +387,9 @@ static int dsa_slave_vlan_del(struct net_device *dev,
> if (dp->bridge_dev && !br_vlan_enabled(dp->bridge_dev))
> return 0;
>
> + /* Do not deprogram the CPU port as it may be shared with other user
> + * ports which can be members of this VLAN as well.
> + */
+1 for the comments, the deletion of dp->cpu_dp is less likely to get
patched into the code in the future now.
> return dsa_port_vlan_del(dp, SWITCHDEV_OBJ_PORT_VLAN(obj));
> }
>
> @@ -1121,6 +1128,10 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
> if (ret && ret != -EOPNOTSUPP)
> return ret;
>
> + ret = dsa_port_vid_add(dp->cpu_dp, vid, 0);
> + if (ret && ret != -EOPNOTSUPP)
> + return ret;
> +
I think it's worth understanding what the EOPNOTSUPP -> 0 is avoiding.
> return 0;
> }
>
> @@ -1151,6 +1162,9 @@ static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
> if (ret == -EOPNOTSUPP)
> ret = 0;
>
> + /* Do not deprogram the CPU port as it may be shared with other user
> + * ports which can be members of this VLAN as well.
> + */
> return ret;
> }
>
> diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> index 489eb7b430a4..6a9607518823 100644
> --- a/net/dsa/switch.c
> +++ b/net/dsa/switch.c
> @@ -232,7 +232,7 @@ static bool dsa_switch_vlan_match(struct dsa_switch *ds, int port,
> if (ds->index == info->sw_index && port == info->port)
> return true;
>
> - if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> + if (dsa_is_dsa_port(ds, port))
Much better, thank you.
> return true;
>
> return false;
> @@ -288,6 +288,9 @@ static int dsa_switch_vlan_del(struct dsa_switch *ds,
> if (ds->index == info->sw_index)
> return ds->ops->port_vlan_del(ds, info->port, info->vlan);
>
> + /* Do not deprogram the DSA links as they may be used as conduit
> + * for other VLAN members in the fabric.
> + */
> return 0;
> }
>
> --
> 2.23.0
>
^ permalink raw reply
* [PATCH net-next] drop_monitor: Make timestamps y2038 safe
From: Ido Schimmel @ 2019-08-23 15:47 UTC (permalink / raw)
To: netdev; +Cc: davem, nhorman, arnd, andrew, ayal, mlxsw, Ido Schimmel
From: Ido Schimmel <idosch@mellanox.com>
Timestamps are currently communicated to user space as 'struct
timespec', which is not considered y2038 safe since it uses a 32-bit
signed value for seconds.
Fix this while the API is still not part of any official kernel release
by using 64-bit nanoseconds timestamps instead.
Fixes: ca30707dee2b ("drop_monitor: Add packet alert mode")
Fixes: 5e58109b1ea4 ("drop_monitor: Add support for packet alert mode for hardware drops")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
Arnd, I have followed your recommendation to use 64-bit nanoseconds
timestamps. I would appreciate it if you could review this change.
Thanks!
---
include/uapi/linux/net_dropmon.h | 2 +-
net/core/drop_monitor.c | 14 ++++++--------
2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h
index 75a35dccb675..8bf79a9eb234 100644
--- a/include/uapi/linux/net_dropmon.h
+++ b/include/uapi/linux/net_dropmon.h
@@ -75,7 +75,7 @@ enum net_dm_attr {
NET_DM_ATTR_PC, /* u64 */
NET_DM_ATTR_SYMBOL, /* string */
NET_DM_ATTR_IN_PORT, /* nested */
- NET_DM_ATTR_TIMESTAMP, /* struct timespec */
+ NET_DM_ATTR_TIMESTAMP, /* u64 */
NET_DM_ATTR_PROTO, /* u16 */
NET_DM_ATTR_PAYLOAD, /* binary */
NET_DM_ATTR_PAD,
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index bfc024024aa3..cc60cc22e2db 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -552,7 +552,7 @@ static size_t net_dm_packet_report_size(size_t payload_len)
/* NET_DM_ATTR_IN_PORT */
net_dm_in_port_size() +
/* NET_DM_ATTR_TIMESTAMP */
- nla_total_size(sizeof(struct timespec)) +
+ nla_total_size(sizeof(u64)) +
/* NET_DM_ATTR_ORIG_LEN */
nla_total_size(sizeof(u32)) +
/* NET_DM_ATTR_PROTO */
@@ -592,7 +592,6 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
u64 pc = (u64)(uintptr_t) NET_DM_SKB_CB(skb)->pc;
char buf[NET_DM_MAX_SYMBOL_LEN];
struct nlattr *attr;
- struct timespec ts;
void *hdr;
int rc;
@@ -615,8 +614,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
if (rc)
goto nla_put_failure;
- if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
- nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
+ if (nla_put_u64_64bit(msg, NET_DM_ATTR_TIMESTAMP,
+ ktime_to_ns(skb->tstamp), NET_DM_ATTR_PAD))
goto nla_put_failure;
if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len))
@@ -716,7 +715,7 @@ net_dm_hw_packet_report_size(size_t payload_len,
/* NET_DM_ATTR_IN_PORT */
net_dm_in_port_size() +
/* NET_DM_ATTR_TIMESTAMP */
- nla_total_size(sizeof(struct timespec)) +
+ nla_total_size(sizeof(u64)) +
/* NET_DM_ATTR_ORIG_LEN */
nla_total_size(sizeof(u32)) +
/* NET_DM_ATTR_PROTO */
@@ -730,7 +729,6 @@ static int net_dm_hw_packet_report_fill(struct sk_buff *msg,
{
struct net_dm_hw_metadata *hw_metadata;
struct nlattr *attr;
- struct timespec ts;
void *hdr;
hw_metadata = NET_DM_SKB_CB(skb)->hw_metadata;
@@ -761,8 +759,8 @@ static int net_dm_hw_packet_report_fill(struct sk_buff *msg,
goto nla_put_failure;
}
- if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
- nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
+ if (nla_put_u64_64bit(msg, NET_DM_ATTR_TIMESTAMP,
+ ktime_to_ns(skb->tstamp), NET_DM_ATTR_PAD))
goto nla_put_failure;
if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len))
--
2.21.0
^ permalink raw reply related
* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-23 15:52 UTC (permalink / raw)
To: Parav Pandit
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866867150DAABA422F25FF8D1A40@AM0PR05MB4866.eurprd05.prod.outlook.com>
On Fri, 23 Aug 2019 14:53:06 +0000
Parav Pandit <parav@mellanox.com> wrote:
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, August 23, 2019 7:58 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> > Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> >
> > On Fri, 23 Aug 2019 08:14:39 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >
> > > Hi Alex,
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jiri Pirko <jiri@resnulli.us>
> > > > Sent: Friday, August 23, 2019 1:42 PM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>; Kirti
> > > > Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > <cohuck@redhat.com>;
> > > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Jiri Pirko <jiri@resnulli.us>
> > > > >> Sent: Thursday, August 22, 2019 5:50 PM
> > > > >> To: Parav Pandit <parav@mellanox.com>
> > > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > <cohuck@redhat.com>;
> > > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >>
> > > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
> > > > >> >
> > > > >> >
> > > > >> >> -----Original Message-----
> > > > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > > > >> >> To: Parav Pandit <parav@mellanox.com>
> > > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > >> >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > >> >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > >> <cohuck@redhat.com>;
> > > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >> >>
> > > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com wrote:
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >> -----Original Message-----
> > > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > > > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > >> >> <cohuck@redhat.com>;
> > > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > >> >> >> core
> > > > >> >> >>
> > > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST, parav@mellanox.com
> > wrote:
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> >> -----Original Message-----
> > > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
> > > > >> >> >> >> linux-kernel@vger.kernel.org; cjia <cjia@nvidia.com>;
> > > > >> >> >> >> netdev@vger.kernel.org
> > > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > >> >> >> >> mdev core
> > > > >> >> >> >>
> > > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's set.
> > > > >> >> >> >> > > > > In fact, proposing that the user does not set
> > > > >> >> >> >> > > > > it, mdev-core provides one
> > > > >> >> >> >> > > automatically.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > > > Since there seems to be some prefix
> > > > >> >> >> >> > > > > > > overhead, as I ask about above in how many
> > > > >> >> >> >> > > > > > > characters we actually have to work with in
> > > > >> >> >> >> > > > > > > IFNAMESZ, maybe we start with
> > > > >> >> >> >> > > > > > > 8 characters (matching your "index"
> > > > >> >> >> >> > > > > > > namespace) and expand as necessary for
> > > > >> >> >> disambiguation.
> > > > >> >> >> >> > > > > > > If we can eliminate overhead in IFNAMESZ,
> > > > >> >> >> >> > > > > > > let's start with
> > > > >> 12.
> > > > >> >> >> >> > > > > > > Thanks,
> > > > >> >> >> >> > > > > > >
> > > > >> >> >> >> > > > > > If user is going to choose the alias, why does
> > > > >> >> >> >> > > > > > it have to be limited to
> > > > >> >> >> >> sha1?
> > > > >> >> >> >> > > > > > Or you just told it as an example?
> > > > >> >> >> >> > > > > >
> > > > >> >> >> >> > > > > > It can be an alpha-numeric string.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > No, I'm proposing a different solution where
> > > > >> >> >> >> > > > > mdev-core creates an alias based on an
> > > > >> >> >> >> > > > > abbreviated sha1. The user does not provide the
> > > > >> >> >> >> alias.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > > Instead of mdev imposing number of characters
> > > > >> >> >> >> > > > > > on the alias, it should be best
> > > > >> >> >> >> > > > > left to the user.
> > > > >> >> >> >> > > > > > Because in future if netdev improves on the
> > > > >> >> >> >> > > > > > naming scheme, mdev will be
> > > > >> >> >> >> > > > > limiting it, which is not right.
> > > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > > > >> >> >> >> > > > > > User configuring mdev for networking devices
> > > > >> >> >> >> > > > > > in a given kernel knows what
> > > > >> >> >> >> > > > > user is doing.
> > > > >> >> >> >> > > > > > So user can choose alias name size as it finds suitable.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > > > >> >> >> >> > > > > Thanks,
> > > > >> >> >> >> > > >
> > > > >> >> >> >> > > > I understood your point. But mdev doesn't know how
> > > > >> >> >> >> > > > user is going to use
> > > > >> >> >> >> > > udev/systemd to name the netdev.
> > > > >> >> >> >> > > > So even if mdev chose to pick 12 characters, it
> > > > >> >> >> >> > > > could result in
> > > > >> >> collision.
> > > > >> >> >> >> > > > Hence the proposal to provide the alias by the
> > > > >> >> >> >> > > > user, as user know the best
> > > > >> >> >> >> > > policy for its use case in the environment its using.
> > > > >> >> >> >> > > > So 12 character sha1 method will still work by user.
> > > > >> >> >> >> > >
> > > > >> >> >> >> > > Haven't you already provided examples where certain
> > > > >> >> >> >> > > drivers or subsystems have unique netdev prefixes?
> > > > >> >> >> >> > > If mdev provides a unique alias within the
> > > > >> >> >> >> > > subsystem, couldn't we simply define a netdev prefix
> > > > >> >> >> >> > > for the mdev subsystem and avoid all other
> > > > >> >> >> >> > > collisions? I'm not in favor of the user providing
> > > > >> >> >> >> > > both a uuid and an alias/instance. Thanks,
> > > > >> >> >> >> > >
> > > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
> > > > >> >> >> >> > first 9 characters have
> > > > >> >> >> >> collision?
> > > > >> >> >> >>
> > > > >> >> >> >> I think it would be a mistake to waste so many chars on
> > > > >> >> >> >> a prefix, but
> > > > >> >> >> >> 9 characters of sha1 likely wouldn't have a collision
> > > > >> >> >> >> before we have 10s of thousands of devices. Thanks,
> > > > >> >> >> >>
> > > > >> >> >> >> Alex
> > > > >> >> >> >
> > > > >> >> >> >Jiri, Dave,
> > > > >> >> >> >Are you ok with it for devlink/netdev part?
> > > > >> >> >> >Mdev core will create an alias from a UUID.
> > > > >> >> >> >
> > > > >> >> >> >This will be supplied during devlink port attr set such
> > > > >> >> >> >as,
> > > > >> >> >> >
> > > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
> > > > >> >> >> >const char *mdev_alias);
> > > > >> >> >> >
> > > > >> >> >> >This alias is used to generate representor netdev's
> > phys_port_name.
> > > > >> >> >> >This alias from the mdev device's sysfs will be used by
> > > > >> >> >> >the udev/systemd to
> > > > >> >> >> generate predicable netdev's name.
> > > > >> >> >> >Example: enm<mdev_alias_first_12_chars>
> > > > >> >> >>
> > > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > > > >> >> >>
> > > > >> >> >Since users sees two devices with same phys_port_name, user
> > > > >> >> >should destroy
> > > > >> >> recently created mdev and recreate mdev with different UUID?
> > > > >> >>
> > > > >> >> Driver should make sure phys port name wont collide,
> > > > >> >So when mdev creation is initiated, mdev core calculates the
> > > > >> >alias and if there
> > > > >> is any other mdev with same alias exist, it returns -EEXIST error
> > > > >> before progressing further.
> > > > >> >This way user will get to know upfront in event of collision
> > > > >> >before the mdev
> > > > >> device gets created.
> > > > >> >How about that?
> > > > >>
> > > > >> Sounds fine to me. Now the question is how many chars do we want to
> > have.
> > > > >>
> > > > >12 characters from Alex's suggestion similar to git?
> > > >
> > > > Ok.
> > > >
> > >
> > > Can you please confirm this scheme looks good now? I like to get patches
> > started.
> >
> > My only concern is your comment that in the event of an abbreviated
> > sha1 collision (as exceptionally rare as that might be at 12-chars), we'd fail the
> > device create, while my original suggestion was that vfio-core would add an
> > extra character to the alias. For non-networking devices, the sha1 is
> > unnecessary, so the extension behavior seems preferred. The user is only
> > responsible to provide a unique uuid. Perhaps the failure behavior could be
> > applied based on the mdev device_api. A module option on mdev to specify the
> > default number of alias chars would also be useful for testing so that we can set
> > it low enough to validate the collision behavior. Thanks,
> >
>
> Idea is to have mdev alias as optional.
> Each mdev_parent says whether it wants mdev_core to generate an alias
> or not. So only networking device drivers would set it to true.
> For rest, alias won't be generated, and won't be compared either
> during creation time. User continue to provide only uuid.
Ok
> I am tempted to have alias collision detection only within children
> mdevs of the same parent, but doing so will always mandate to prefix
> in netdev name. And currently we are left with only 3 characters to
> prefix it, so that may not be good either. Hence, I think mdev core
> wide alias is better with 12 characters.
I suppose it depends on the API, if the vendor driver can ask the mdev
core for an alias as part of the device creation process, then it could
manage the netdev namespace for all its devices, choosing how many
characters to use, and fail the creation if it can't meet a uniqueness
requirement. IOW, mdev-core would always provide a full sha1 and
therefore gets itself out of the uniqueness/collision aspects.
> I do not understand how an extra character reduces collision, if
> that's what you meant.
If the default were for example 3-chars, we might already have device
'abc'. A collision would expose one more char of the new device, so we
might add device with alias 'abcd'. I mentioned previously that this
leaves an issue for userspace that we can't change the alias of device
abc, so without additional information, userspace can only determine via
elimination the mapping of alias to device, but userspace has more
information available to it in the form of sysfs links.
> Module options are almost not encouraged
> anymore with other subsystems/drivers.
We don't live in a world of absolutes. I agree that the defaults
should work in the vast majority of cases. Requiring a user to twiddle
module options to make things work is undesirable, verging on a bug. A
module option to enable some specific feature, unsafe condition, or test
that is outside of the typical use case is reasonable, imo.
> For testing collision rate, a sample user space script and sample
> mtty is easy and get us collision count too. We shouldn't put that
> using module option in production kernel. I practically have the code
> ready to play with; Changing 12 to smaller value is easy with module
> reload.
>
> #define MDEV_ALIAS_LEN 12
If it can't be tested with a shipping binary, it probably won't be
tested. Thanks,
Alex
^ permalink raw reply
* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Horatiu Vultur @ 2019-08-23 15:57 UTC (permalink / raw)
To: Andrew Lunn
Cc: Nikolay Aleksandrov, roopa, davem, UNGLinuxDriver,
alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <20190823132538.GO13020@lunn.ch>
Hi Andrew
The 08/23/2019 15:25, Andrew Lunn wrote:
> External E-Mail
>
>
> > > > Why do the devices have to be from the same driver ?
> > After seeing yours and Andrews comments I realize that I try to do two
> > things, but I have only explained one of them.
> >
> > Here is what I was trying to do:
> > A. Prevent ports from going into promisc mode, if it is not needed.
>
> The switch definition is promisc is a bit odd. You really need to
> split it into two use cases.
>
> The Linux interface is not a member of a bridge. In this case, promisc
> mode would mean all frames ingressing the port should be forwarded to
> the CPU. Without promisc, you can program the hardware to just accept
> frames with the interfaces MAC address. So this is just the usual
> behaviour of an interface.
Yes, this is well understood.
>
> When the interface is part of the bridge, then you can turn on all the
> learning and not forward frames to the CPU, unless the CPU asks for
> them. But you need to watch out for various flags. By default, you
> should flood to the CPU, unknown destinations to the CPU etc. But some
> of these can be turned off by flags.
>
> > B. Prevent adding the CPU to the flood-mask (in Ocelot we have a
> > flood-mask controlling who should be included when flooding due to
> > destination unknown).
>
> So destination unknown should be flooded to the CPU. The CPU might
> know where to send the frame.
Exactly the CPU should be in the flood mask by default. But if the
network driver knows that CPU will not forward it anywhere else, then it
is safe to remove the CPU from the flood mask. The network driver will
know this by monitoring which interfaces are added to the bridge.
>
> > To solve item "B", the network driver needs to detect if there is a
> > foreign interfaces added to the bridge. If that is the case then to add
> > the CPU port to the flooding mask otherwise no.
>
> It is not just a foreign interface. What about the MAC address on the
> bridge interface?
I think the network driver will get this event and it can install a
entry in the MAC table to copy the frames to the CPU.
>
> > > > This is too specific targeting some devices.
> > Maybe I was wrong to mention specific HW in the commit message. The
> > purpose of the patch was to add an optimization (not to copy all the
> > frames to the CPU) for HW that is capable of learning and flooding the
> > frames.
>
> To some extent, this is also tied to your hardware not learning MAC
> addresses from frames passed from the CPU. You should also consider
> fixing this. The SW bridge does send out notifications when it
> adds/removes MAC addresses to its tables. You probably want to receive
> this modifications, and use them to program your hardware to forward
> frames to the CPU when needed.
Yes we will fix this. We will listen to the notifications and then update
the HW so it would send those frames to CPU.
Maybe intially I should just resend this patch, with the changes that I
mention previously. And after that to send a new patch series with all
these remarks that you mention here Andrew.
>
> Andrew
>
--
/Horatiu
^ permalink raw reply
* Re: libbpf distro packaging
From: Alexei Starovoitov @ 2019-08-23 16:00 UTC (permalink / raw)
To: Jiri Olsa, Julia Kartseva
Cc: Andrii Nakryiko, labbott@redhat.com, acme@kernel.org,
debian-kernel@lists.debian.org, netdev@vger.kernel.org,
Andrey Ignatov, Yonghong Song, jolsa@kernel.org, Daniel Borkmann
In-Reply-To: <20190823092253.GA20775@krava>
On 8/23/19 2:22 AM, Jiri Olsa wrote:
> btw, the libbpf GH repo tag v0.0.4 has 0.0.3 version set in Makefile:
>
> VERSION = 0
> PATCHLEVEL = 0
> EXTRAVERSION = 3
>
> current code takes version from libbpf.map so it's fine,
> but would be great to start from 0.0.5 so we don't need to
> bother with rpm patches.. is 0.0.5 planned soon?
Technically we can bump it at any time.
The goal was to bump it only when new kernel is released
to capture a collection of new APIs in a given 0.0.X release.
So that libbpf versions are synchronized with kernel versions
in some what loose way.
In this case we can make an exception and bump it now.
^ permalink raw reply
* Re: [PATCH] ethernet: Delete unnecessary checks before the macro call “dev_kfree_skb”
From: Scott Branden @ 2019-08-23 16:10 UTC (permalink / raw)
To: Christophe JAILLET, Markus Elfring, netdev, linux-arm-kernel,
linux-stm32, intel-wired-lan, bcm-kernel-feedback-list,
UNGLinuxDriver, Alexandre Torgue, Alexios Zavras, Allison Randal,
Bryan Whitehead, Claudiu Manoil, David S. Miller, Doug Berger,
Douglas Miller, Florian Fainelli, Giuseppe Cavallaro,
Greg Kroah-Hartman, Jeff Kirsher, Jilayne Lovejoy, Jonathan Lemon,
Jose Abreu, Kate Stewart
Cc: kernel-janitors, LKML
In-Reply-To: <4ab7f2a5-f472-f462-9d4c-7c8d5237c44e@wanadoo.fr>
On 2019-08-23 7:08 a.m., Christophe JAILLET wrote:
> Hi,
>
> in this patch, there is one piece that looked better before. (see below)
>
> Removing the 'if (skb)' is fine, but concatening everything in one
> statement just to save 2 variables and a few LOC is of no use, IMHO,
> and the code is less readable.
Agreed.
>
> just my 2c.
>
>
> CJ
>
>
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index d3a0b614dbfa..8b19ddcdafaa 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -2515,19 +2515,14 @@ static int bcmgenet_dma_teardown(struct
> bcmgenet_priv *priv)
> static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
> {
> struct netdev_queue *txq;
> - struct sk_buff *skb;
> - struct enet_cb *cb;
> int i;
>
> bcmgenet_fini_rx_napi(priv);
> bcmgenet_fini_tx_napi(priv);
>
> - for (i = 0; i < priv->num_tx_bds; i++) {
> - cb = priv->tx_cbs + i;
> - skb = bcmgenet_free_tx_cb(&priv->pdev->dev, cb);
> - if (skb)
> - dev_kfree_skb(skb);
> - }
> + for (i = 0; i < priv->num_tx_bds; i++)
> + dev_kfree_skb(bcmgenet_free_tx_cb(&priv->pdev->dev,
> + priv->tx_cbs + i));
>
> for (i = 0; i < priv->hw_params->tx_queues; i++) {
> txq = netdev_get_tx_queue(priv->dev, priv->tx_rings[i].queue);
^ permalink raw reply
* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-23 16:14 UTC (permalink / raw)
To: Alex Williamson
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <20190823095229.210e1e84@x1.home>
> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, August 23, 2019 9:22 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>
> On Fri, 23 Aug 2019 14:53:06 +0000
> Parav Pandit <parav@mellanox.com> wrote:
>
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, August 23, 2019 7:58 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 08:14:39 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > Hi Alex,
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jiri Pirko <jiri@resnulli.us>
> > > > > Sent: Friday, August 23, 2019 1:42 PM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > <cohuck@redhat.com>;
> > > > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
> > > > > >
> > > > > >
> > > > > >> -----Original Message-----
> > > > > >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > >> Sent: Thursday, August 22, 2019 5:50 PM
> > > > > >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > > <cohuck@redhat.com>;
> > > > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > >> core
> > > > > >>
> > > > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
> > > > > >> >
> > > > > >> >
> > > > > >> >> -----Original Message-----
> > > > > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > > > > >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > >> <cohuck@redhat.com>;
> > > > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > >> >> core
> > > > > >> >>
> > > > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com
> wrote:
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >> -----Original Message-----
> > > > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > > > > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > >> >> <cohuck@redhat.com>;
> > > > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > >> >> >> mdev core
> > > > > >> >> >>
> > > > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST,
> > > > > >> >> >> parav@mellanox.com
> > > wrote:
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> >> -----Original Message-----
> > > > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > > > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > > > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > > > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
> > > > > >> >> >> >> linux-kernel@vger.kernel.org; cjia
> > > > > >> >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > >> >> >> >> mdev core
> > > > > >> >> >> >>
> > > > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's
> set.
> > > > > >> >> >> >> > > > > In fact, proposing that the user does not
> > > > > >> >> >> >> > > > > set it, mdev-core provides one
> > > > > >> >> >> >> > > automatically.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > > > Since there seems to be some prefix
> > > > > >> >> >> >> > > > > > > overhead, as I ask about above in how
> > > > > >> >> >> >> > > > > > > many characters we actually have to work
> > > > > >> >> >> >> > > > > > > with in IFNAMESZ, maybe we start with
> > > > > >> >> >> >> > > > > > > 8 characters (matching your "index"
> > > > > >> >> >> >> > > > > > > namespace) and expand as necessary for
> > > > > >> >> >> disambiguation.
> > > > > >> >> >> >> > > > > > > If we can eliminate overhead in
> > > > > >> >> >> >> > > > > > > IFNAMESZ, let's start with
> > > > > >> 12.
> > > > > >> >> >> >> > > > > > > Thanks,
> > > > > >> >> >> >> > > > > > >
> > > > > >> >> >> >> > > > > > If user is going to choose the alias, why
> > > > > >> >> >> >> > > > > > does it have to be limited to
> > > > > >> >> >> >> sha1?
> > > > > >> >> >> >> > > > > > Or you just told it as an example?
> > > > > >> >> >> >> > > > > >
> > > > > >> >> >> >> > > > > > It can be an alpha-numeric string.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > No, I'm proposing a different solution where
> > > > > >> >> >> >> > > > > mdev-core creates an alias based on an
> > > > > >> >> >> >> > > > > abbreviated sha1. The user does not provide
> > > > > >> >> >> >> > > > > the
> > > > > >> >> >> >> alias.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > > Instead of mdev imposing number of
> > > > > >> >> >> >> > > > > > characters on the alias, it should be best
> > > > > >> >> >> >> > > > > left to the user.
> > > > > >> >> >> >> > > > > > Because in future if netdev improves on
> > > > > >> >> >> >> > > > > > the naming scheme, mdev will be
> > > > > >> >> >> >> > > > > limiting it, which is not right.
> > > > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > > > > >> >> >> >> > > > > > User configuring mdev for networking
> > > > > >> >> >> >> > > > > > devices in a given kernel knows what
> > > > > >> >> >> >> > > > > user is doing.
> > > > > >> >> >> >> > > > > > So user can choose alias name size as it finds
> suitable.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > > > > >> >> >> >> > > > > Thanks,
> > > > > >> >> >> >> > > >
> > > > > >> >> >> >> > > > I understood your point. But mdev doesn't know
> > > > > >> >> >> >> > > > how user is going to use
> > > > > >> >> >> >> > > udev/systemd to name the netdev.
> > > > > >> >> >> >> > > > So even if mdev chose to pick 12 characters,
> > > > > >> >> >> >> > > > it could result in
> > > > > >> >> collision.
> > > > > >> >> >> >> > > > Hence the proposal to provide the alias by the
> > > > > >> >> >> >> > > > user, as user know the best
> > > > > >> >> >> >> > > policy for its use case in the environment its using.
> > > > > >> >> >> >> > > > So 12 character sha1 method will still work by user.
> > > > > >> >> >> >> > >
> > > > > >> >> >> >> > > Haven't you already provided examples where
> > > > > >> >> >> >> > > certain drivers or subsystems have unique netdev
> prefixes?
> > > > > >> >> >> >> > > If mdev provides a unique alias within the
> > > > > >> >> >> >> > > subsystem, couldn't we simply define a netdev
> > > > > >> >> >> >> > > prefix for the mdev subsystem and avoid all
> > > > > >> >> >> >> > > other collisions? I'm not in favor of the user
> > > > > >> >> >> >> > > providing both a uuid and an alias/instance.
> > > > > >> >> >> >> > > Thanks,
> > > > > >> >> >> >> > >
> > > > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
> > > > > >> >> >> >> > first 9 characters have
> > > > > >> >> >> >> collision?
> > > > > >> >> >> >>
> > > > > >> >> >> >> I think it would be a mistake to waste so many chars
> > > > > >> >> >> >> on a prefix, but
> > > > > >> >> >> >> 9 characters of sha1 likely wouldn't have a
> > > > > >> >> >> >> collision before we have 10s of thousands of
> > > > > >> >> >> >> devices. Thanks,
> > > > > >> >> >> >>
> > > > > >> >> >> >> Alex
> > > > > >> >> >> >
> > > > > >> >> >> >Jiri, Dave,
> > > > > >> >> >> >Are you ok with it for devlink/netdev part?
> > > > > >> >> >> >Mdev core will create an alias from a UUID.
> > > > > >> >> >> >
> > > > > >> >> >> >This will be supplied during devlink port attr set
> > > > > >> >> >> >such as,
> > > > > >> >> >> >
> > > > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
> > > > > >> >> >> >const char *mdev_alias);
> > > > > >> >> >> >
> > > > > >> >> >> >This alias is used to generate representor netdev's
> > > phys_port_name.
> > > > > >> >> >> >This alias from the mdev device's sysfs will be used
> > > > > >> >> >> >by the udev/systemd to
> > > > > >> >> >> generate predicable netdev's name.
> > > > > >> >> >> >Example: enm<mdev_alias_first_12_chars>
> > > > > >> >> >>
> > > > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > > > > >> >> >>
> > > > > >> >> >Since users sees two devices with same phys_port_name,
> > > > > >> >> >user should destroy
> > > > > >> >> recently created mdev and recreate mdev with different UUID?
> > > > > >> >>
> > > > > >> >> Driver should make sure phys port name wont collide,
> > > > > >> >So when mdev creation is initiated, mdev core calculates the
> > > > > >> >alias and if there
> > > > > >> is any other mdev with same alias exist, it returns -EEXIST
> > > > > >> error before progressing further.
> > > > > >> >This way user will get to know upfront in event of collision
> > > > > >> >before the mdev
> > > > > >> device gets created.
> > > > > >> >How about that?
> > > > > >>
> > > > > >> Sounds fine to me. Now the question is how many chars do we
> > > > > >> want to
> > > have.
> > > > > >>
> > > > > >12 characters from Alex's suggestion similar to git?
> > > > >
> > > > > Ok.
> > > > >
> > > >
> > > > Can you please confirm this scheme looks good now? I like to get
> > > > patches
> > > started.
> > >
> > > My only concern is your comment that in the event of an abbreviated
> > > sha1 collision (as exceptionally rare as that might be at 12-chars),
> > > we'd fail the device create, while my original suggestion was that
> > > vfio-core would add an extra character to the alias. For
> > > non-networking devices, the sha1 is unnecessary, so the extension
> > > behavior seems preferred. The user is only responsible to provide a
> > > unique uuid. Perhaps the failure behavior could be applied based on
> > > the mdev device_api. A module option on mdev to specify the default
> > > number of alias chars would also be useful for testing so that we
> > > can set it low enough to validate the collision behavior. Thanks,
> > >
> >
> > Idea is to have mdev alias as optional.
> > Each mdev_parent says whether it wants mdev_core to generate an alias
> > or not. So only networking device drivers would set it to true.
> > For rest, alias won't be generated, and won't be compared either
> > during creation time. User continue to provide only uuid.
>
> Ok
>
> > I am tempted to have alias collision detection only within children
> > mdevs of the same parent, but doing so will always mandate to prefix
> > in netdev name. And currently we are left with only 3 characters to
> > prefix it, so that may not be good either. Hence, I think mdev core
> > wide alias is better with 12 characters.
>
> I suppose it depends on the API, if the vendor driver can ask the mdev core for
> an alias as part of the device creation process, then it could manage the netdev
> namespace for all its devices, choosing how many characters to use, and fail
> the creation if it can't meet a uniqueness requirement. IOW, mdev-core would
> always provide a full sha1 and therefore gets itself out of the
> uniqueness/collision aspects.
>
This doesn't work. At mdev core level 20 bytes sha1 are unique, so mdev core allowed to create a mdev.
And then devlink core chooses only 6 bytes (12 characters) and there is collision. Things fall apart.
Since mdev provides unique uuid based scheme, it's the mdev core's ownership to provide unique aliases.
> > I do not understand how an extra character reduces collision, if
> > that's what you meant.
>
> If the default were for example 3-chars, we might already have device 'abc'. A
> collision would expose one more char of the new device, so we might add
> device with alias 'abcd'. I mentioned previously that this leaves an issue for
> userspace that we can't change the alias of device abc, so without additional
> information, userspace can only determine via elimination the mapping of alias
> to device, but userspace has more information available to it in the form of
> sysfs links.
>
> > Module options are almost not encouraged anymore with other
> > subsystems/drivers.
>
> We don't live in a world of absolutes. I agree that the defaults should work in
> the vast majority of cases. Requiring a user to twiddle module options to make
> things work is undesirable, verging on a bug. A module option to enable some
> specific feature, unsafe condition, or test that is outside of the typical use case
> is reasonable, imo.
>
> > For testing collision rate, a sample user space script and sample mtty
> > is easy and get us collision count too. We shouldn't put that using
> > module option in production kernel. I practically have the code ready
> > to play with; Changing 12 to smaller value is easy with module reload.
> >
> > #define MDEV_ALIAS_LEN 12
>
> If it can't be tested with a shipping binary, it probably won't be tested. Thanks,
>
It is not the role of mdev core to expose collision efficiency/deficiency of the sha1.
It can be tested outside before mdev choose to use it.
I am saying we should test with 12 characters with 10,000 or more devices and see how collision occurs.
Even if collision occurs, mdev returns EEXIST status indicating user to pick a different UUID for those rare conditions.
^ permalink raw reply
* Re: [PATCH net-next] drop_monitor: Make timestamps y2038 safe
From: Neil Horman @ 2019-08-23 16:17 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, arnd, andrew, ayal, mlxsw, Ido Schimmel
In-Reply-To: <20190823154721.9927-1-idosch@idosch.org>
On Fri, Aug 23, 2019 at 06:47:21PM +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
>
> Timestamps are currently communicated to user space as 'struct
> timespec', which is not considered y2038 safe since it uses a 32-bit
> signed value for seconds.
>
> Fix this while the API is still not part of any official kernel release
> by using 64-bit nanoseconds timestamps instead.
>
> Fixes: ca30707dee2b ("drop_monitor: Add packet alert mode")
> Fixes: 5e58109b1ea4 ("drop_monitor: Add support for packet alert mode for hardware drops")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
> Arnd, I have followed your recommendation to use 64-bit nanoseconds
> timestamps. I would appreciate it if you could review this change.
>
> Thanks!
> ---
> include/uapi/linux/net_dropmon.h | 2 +-
> net/core/drop_monitor.c | 14 ++++++--------
> 2 files changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h
> index 75a35dccb675..8bf79a9eb234 100644
> --- a/include/uapi/linux/net_dropmon.h
> +++ b/include/uapi/linux/net_dropmon.h
> @@ -75,7 +75,7 @@ enum net_dm_attr {
> NET_DM_ATTR_PC, /* u64 */
> NET_DM_ATTR_SYMBOL, /* string */
> NET_DM_ATTR_IN_PORT, /* nested */
> - NET_DM_ATTR_TIMESTAMP, /* struct timespec */
> + NET_DM_ATTR_TIMESTAMP, /* u64 */
> NET_DM_ATTR_PROTO, /* u16 */
> NET_DM_ATTR_PAYLOAD, /* binary */
> NET_DM_ATTR_PAD,
> diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
> index bfc024024aa3..cc60cc22e2db 100644
> --- a/net/core/drop_monitor.c
> +++ b/net/core/drop_monitor.c
> @@ -552,7 +552,7 @@ static size_t net_dm_packet_report_size(size_t payload_len)
> /* NET_DM_ATTR_IN_PORT */
> net_dm_in_port_size() +
> /* NET_DM_ATTR_TIMESTAMP */
> - nla_total_size(sizeof(struct timespec)) +
> + nla_total_size(sizeof(u64)) +
> /* NET_DM_ATTR_ORIG_LEN */
> nla_total_size(sizeof(u32)) +
> /* NET_DM_ATTR_PROTO */
> @@ -592,7 +592,6 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
> u64 pc = (u64)(uintptr_t) NET_DM_SKB_CB(skb)->pc;
> char buf[NET_DM_MAX_SYMBOL_LEN];
> struct nlattr *attr;
> - struct timespec ts;
> void *hdr;
> int rc;
>
> @@ -615,8 +614,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
> if (rc)
> goto nla_put_failure;
>
> - if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
> - nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
> + if (nla_put_u64_64bit(msg, NET_DM_ATTR_TIMESTAMP,
> + ktime_to_ns(skb->tstamp), NET_DM_ATTR_PAD))
> goto nla_put_failure;
>
> if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len))
> @@ -716,7 +715,7 @@ net_dm_hw_packet_report_size(size_t payload_len,
> /* NET_DM_ATTR_IN_PORT */
> net_dm_in_port_size() +
> /* NET_DM_ATTR_TIMESTAMP */
> - nla_total_size(sizeof(struct timespec)) +
> + nla_total_size(sizeof(u64)) +
> /* NET_DM_ATTR_ORIG_LEN */
> nla_total_size(sizeof(u32)) +
> /* NET_DM_ATTR_PROTO */
> @@ -730,7 +729,6 @@ static int net_dm_hw_packet_report_fill(struct sk_buff *msg,
> {
> struct net_dm_hw_metadata *hw_metadata;
> struct nlattr *attr;
> - struct timespec ts;
> void *hdr;
>
> hw_metadata = NET_DM_SKB_CB(skb)->hw_metadata;
> @@ -761,8 +759,8 @@ static int net_dm_hw_packet_report_fill(struct sk_buff *msg,
> goto nla_put_failure;
> }
>
> - if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
> - nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
> + if (nla_put_u64_64bit(msg, NET_DM_ATTR_TIMESTAMP,
> + ktime_to_ns(skb->tstamp), NET_DM_ATTR_PAD))
> goto nla_put_failure;
>
> if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len))
> --
> 2.21.0
>
>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply
* Re: [PATCH net-next 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Florian Fainelli @ 2019-08-23 16:23 UTC (permalink / raw)
To: Vladimir Oltean, Vivien Didelot; +Cc: netdev, David S. Miller, Andrew Lunn
In-Reply-To: <CA+h21hpgCJ9oKwQxzu62hmvyCOyDZ52R5fYnejprGHWeZR7L6Q@mail.gmail.com>
On 8/22/19 4:44 PM, Vladimir Oltean wrote:
> On Fri, 23 Aug 2019 at 02:43, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>>
>> Hi Vladimir,
>>
>> On Fri, 23 Aug 2019 01:06:58 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
>>> Hi Vivien,
>>>
>>> On 8/22/19 11:13 PM, Vivien Didelot wrote:
>>>> Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
>>>>
>>>> This function is used in the tag_8021q.c code to offload the PVID of
>>>> ports, which would simply not work if .port_vlan_add is not supported
>>>> by the underlying switch.
>>>>
>>>> Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
>>>> that is to say in dsa_slave_vlan_rx_add_vid.
>>>>
>>>
>>> Do you know why Florian suppressed -EOPNOTSUPP in 061f6a505ac3 ("net:
>>> dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")?
>>> I forced a return value of -EOPNOTSUPP here and when I create a VLAN
>>> sub-interface nothing breaks, it just says:
>>> RTNETLINK answers: Operation not supported
>>> which IMO is expected.
>>
>> I do not know what you mean. This patch does not change the behavior of
>> dsa_slave_vlan_rx_add_vid, which returns 0 if -EOPNOTSUPP is caught.
>>
>
> Yes, but what's wrong with just forwarding -EOPNOTSUPP?
It makes us fail adding the VLAN to the slave network device, which
sounds silly, if we can't offload it in HW, that's fine, we can still do
a SW VLAN instead, see net/8021q/vlan_core.c::vlan_add_rx_filter_info().
Maybe a more correct solution is to set the NETIF_F_HW_VLAN_CTAG_FILTER
feature bit only if we have the ability to offload, now that I think
about it. Would you want me to cook a patch doing that?
--
Florian
^ permalink raw reply
* Re: [PATCH net-next] dpaa2-eth: Add pause frame support
From: Andrew Lunn @ 2019-08-23 16:30 UTC (permalink / raw)
To: Ioana Radulescu; +Cc: netdev, davem, ioana.ciornei
In-Reply-To: <1566573579-9940-1-git-send-email-ruxandra.radulescu@nxp.com>
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
> @@ -78,29 +78,20 @@ static int
> dpaa2_eth_get_link_ksettings(struct net_device *net_dev,
> struct ethtool_link_ksettings *link_settings)
> {
> - struct dpni_link_state state = {0};
> - int err = 0;
> struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
>
> - err = dpni_get_link_state(priv->mc_io, 0, priv->mc_token, &state);
> - if (err) {
> - netdev_err(net_dev, "ERROR %d getting link state\n", err);
> - goto out;
> - }
> -
> /* At the moment, we have no way of interrogating the DPMAC
> * from the DPNI side - and for that matter there may exist
> * no DPMAC at all. So for now we just don't report anything
> * beyond the DPNI attributes.
> */
> - if (state.options & DPNI_LINK_OPT_AUTONEG)
> + if (priv->link_state.options & DPNI_LINK_OPT_AUTONEG)
> link_settings->base.autoneg = AUTONEG_ENABLE;
> - if (!(state.options & DPNI_LINK_OPT_HALF_DUPLEX))
> + if (!(priv->link_state.options & DPNI_LINK_OPT_HALF_DUPLEX))
> link_settings->base.duplex = DUPLEX_FULL;
> - link_settings->base.speed = state.rate;
> + link_settings->base.speed = priv->link_state.rate;
>
> -out:
> - return err;
> + return 0;
Hi Ioana
I think this patch can be broken up a bit, to help review.
It looks like this change to report state via priv->link_state should
be a separate patch. I think this change can be made without the pause
changes. That then makes the pause changes themselves simpler.
> +static void dpaa2_eth_get_pauseparam(struct net_device *net_dev,
> + struct ethtool_pauseparam *pause)
> +{
> + struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
> + u64 link_options = priv->link_state.options;
> +
> + pause->rx_pause = !!(link_options & DPNI_LINK_OPT_PAUSE);
> + pause->tx_pause = pause->rx_pause ^
> + !!(link_options & DPNI_LINK_OPT_ASYM_PAUSE);
Since you don't support auto-neg, you should set pause->autoneg to
false. It probably already is set to false, by a memset, but setting
it explicitly is a form of documentation, this hardware only supports
forced pause configuration.
> +}
> +
> +static int dpaa2_eth_set_pauseparam(struct net_device *net_dev,
> + struct ethtool_pauseparam *pause)
> +{
> + struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
> + struct dpni_link_cfg cfg = {0};
> + int err;
> +
> + if (!dpaa2_eth_has_pause_support(priv)) {
> + netdev_info(net_dev, "No pause frame support for DPNI version < %d.%d\n",
> + DPNI_PAUSE_VER_MAJOR, DPNI_PAUSE_VER_MINOR);
> + return -EOPNOTSUPP;
> + }
> +
> + if (pause->autoneg)
> + netdev_err(net_dev, "Pause frame autoneg not supported\n");
And here you should return -EOPNOTSUPP. No need for an netdev_err. It
is not an error, you simply don't support it.
There is also the issue of what is the PHY doing? It would not be good
if the MAC is configured one way, but the PHY is advertising something
else. So it appears you have no control over the PHY. But i assume you
know what the PHY is actually doing? Does it advertise pause/asym
pause?
It might be, to keep things consistent, we only accept pause
configuration when auto-neg in general is disabled.
Andrew
^ permalink raw reply
* Re: [PATCH net-next 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Vladimir Oltean @ 2019-08-23 16:32 UTC (permalink / raw)
To: Florian Fainelli; +Cc: Vivien Didelot, netdev, David S. Miller, Andrew Lunn
In-Reply-To: <2a43ee4c-0e20-1037-d856-3945d516ea7b@gmail.com>
Hi Florian,
On Fri, 23 Aug 2019 at 19:23, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On 8/22/19 4:44 PM, Vladimir Oltean wrote:
> > On Fri, 23 Aug 2019 at 02:43, Vivien Didelot <vivien.didelot@gmail.com> wrote:
> >>
> >> Hi Vladimir,
> >>
> >> On Fri, 23 Aug 2019 01:06:58 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> >>> Hi Vivien,
> >>>
> >>> On 8/22/19 11:13 PM, Vivien Didelot wrote:
> >>>> Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
> >>>>
> >>>> This function is used in the tag_8021q.c code to offload the PVID of
> >>>> ports, which would simply not work if .port_vlan_add is not supported
> >>>> by the underlying switch.
> >>>>
> >>>> Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
> >>>> that is to say in dsa_slave_vlan_rx_add_vid.
> >>>>
> >>>
> >>> Do you know why Florian suppressed -EOPNOTSUPP in 061f6a505ac3 ("net:
> >>> dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")?
> >>> I forced a return value of -EOPNOTSUPP here and when I create a VLAN
> >>> sub-interface nothing breaks, it just says:
> >>> RTNETLINK answers: Operation not supported
> >>> which IMO is expected.
> >>
> >> I do not know what you mean. This patch does not change the behavior of
> >> dsa_slave_vlan_rx_add_vid, which returns 0 if -EOPNOTSUPP is caught.
> >>
> >
> > Yes, but what's wrong with just forwarding -EOPNOTSUPP?
>
> It makes us fail adding the VLAN to the slave network device, which
> sounds silly, if we can't offload it in HW, that's fine, we can still do
> a SW VLAN instead, see net/8021q/vlan_core.c::vlan_add_rx_filter_info().
>
> Maybe a more correct solution is to set the NETIF_F_HW_VLAN_CTAG_FILTER
> feature bit only if we have the ability to offload, now that I think
> about it. Would you want me to cook a patch doing that?
sja1105 doesn't support offloading NETIF_F_HW_VLAN_CTAG_FILTER even
though it does support programming VLANs.
Adding an offloaded VLAN sub-interface on a standalone switch port
(vlan_filtering=0, uses dsa_8021q) would make the driver insert a VLAN
entry whilst the TPID is ETH_P_DSA_8021Q.
Maybe just let the driver set the netdev features, similar to how it
does for the number of TX queues?
> --
> Florian
Regards,
-Vladimir
^ permalink raw reply
* [PATCH] net: fix skb use after free in netpoll_send_skb_on_dev
From: Feng Sun @ 2019-08-23 16:32 UTC (permalink / raw)
To: davem
Cc: edumazet, dsterba, dbanerje, fw, davej, tglx, matwey,
sakari.ailus, netdev, linux-kernel, Feng Sun, Xiaojun Zhao
After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into two use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo->txq.
hit the first case with following kernel log:
[ 117.864773] kernel BUG at mm/slub.c:306!
[ 117.864773] invalid opcode: 0000 [#1] SMP PTI
[ 117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P OE 5.3.0-050300rc5-generic #201908182231
[ 117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[ 117.864781] Call Trace:
[ 117.864781] ? tun_net_xmit+0x21c/0x460
[ 117.864781] kfree_skbmem+0x4e/0x60
[ 117.864782] kfree_skb+0x3a/0xa0
[ 117.864782] tun_net_xmit+0x21c/0x460
[ 117.864782] netpoll_start_xmit+0x11d/0x1b0
[ 117.864788] netpoll_send_skb_on_dev+0x1b8/0x200
[ 117.864789] __br_forward+0x1b9/0x1e0 [bridge]
[ 117.864789] ? skb_clone+0x53/0xd0
[ 117.864790] ? __skb_clone+0x2e/0x120
[ 117.864790] deliver_clone+0x37/0x50 [bridge]
[ 117.864790] maybe_deliver+0x89/0xc0 [bridge]
[ 117.864791] br_flood+0x6c/0x130 [bridge]
[ 117.864791] br_dev_xmit+0x315/0x3c0 [bridge]
[ 117.864792] netpoll_start_xmit+0x11d/0x1b0
[ 117.864792] netpoll_send_skb_on_dev+0x1b8/0x200
[ 117.864792] netpoll_send_udp+0x2c6/0x3e8
[ 117.864793] write_msg+0xd9/0xf0 [netconsole]
[ 117.864793] console_unlock+0x386/0x4e0
[ 117.864793] vprintk_emit+0x17e/0x280
[ 117.864794] vprintk_default+0x29/0x50
[ 117.864794] vprintk_func+0x4c/0xbc
[ 117.864794] printk+0x58/0x6f
[ 117.864795] loop_fun+0x24/0x41 [printmsg_loop]
[ 117.864795] kthread+0x104/0x140
[ 117.864795] ? 0xffffffffc05b1000
[ 117.864796] ? kthread_park+0x80/0x80
[ 117.864796] ret_from_fork+0x35/0x40
Signed-off-by: Feng Sun <loyou85@gmail.com>
Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com>
---
net/core/netpoll.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 2cf27da..b4bffe6 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -335,7 +335,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
HARD_TX_UNLOCK(dev, txq);
- if (status == NETDEV_TX_OK)
+ if (status == NETDEV_TX_OK || status == NET_XMIT_DROP)
break;
}
@@ -352,7 +352,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
}
- if (status != NETDEV_TX_OK) {
+ if (status != NETDEV_TX_OK && status != NET_XMIT_DROP) {
skb_queue_tail(&npinfo->txq, skb);
schedule_delayed_work(&npinfo->tx_work,0);
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net v2] openvswitch: Fix conntrack cache with timeout
From: Yi-Hung Wei @ 2019-08-23 16:40 UTC (permalink / raw)
To: Pravin Shelar; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAOrHB_A6Hn9o=8uzHQTp=cttMQsf=dYpobvq7C7_W398sw8UJA@mail.gmail.com>
On Thu, Aug 22, 2019 at 11:51 PM Pravin Shelar <pshelar@ovn.org> wrote:
>
> On Thu, Aug 22, 2019 at 1:28 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> >
> > This patch addresses a conntrack cache issue with timeout policy.
> > Currently, we do not check if the timeout extension is set properly in the
> > cached conntrack entry. Thus, after packet recirculate from conntrack
> > action, the timeout policy is not applied properly. This patch fixes the
> > aforementioned issue.
> >
> > Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
> > Reported-by: kbuild test robot <lkp@intel.com>
> > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > ---
> > v1->v2: Fix rcu dereference issue reported by kbuild test robot.
> > ---
> > net/openvswitch/conntrack.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > index 848c6eb55064..4d7896135e73 100644
> > --- a/net/openvswitch/conntrack.c
> > +++ b/net/openvswitch/conntrack.c
> > @@ -1657,6 +1666,10 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
> > ct_info.timeout))
> > pr_info_ratelimited("Failed to associated timeout "
> > "policy `%s'\n", ct_info.timeout);
> > + else
> > + ct_info.nf_ct_timeout = rcu_dereference(
> > + nf_ct_timeout_find(ct_info.ct)->timeout);
> Is this dereference safe from NULL pointer?
Hi Pravin,
Thanks for your review. I am not sure if
nf_ct_timeout_find(ct_info.ct) will return NULL in this case.
We only run into this statement when ct_info.timeout[0] is set, and it
is only set in parse_ct() when CONFIG_NF_CONNTRACK_TIMEOUT is
configured. Also, in this else condition the timeout extension is
supposed to be set properly by nf_ct_set_timeout().
Am I missing something?
Thanks,
-Yi-Hung
^ permalink raw reply
* Re: [PATCH bpf-next 1/4] xsk: avoid store-tearing when assigning queues
From: Jonathan Lemon @ 2019-08-23 16:43 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-2-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid
> potential store-tearing. These members are read outside of the control
> mutex in the mmap implementation.
>
> Fixes: 37b076933a8e ("xsk: add missing write- and data-dependency barrier")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: [PATCH bpf-next 3/4] xsk: avoid store-tearing when assigning umem
From: Jonathan Lemon @ 2019-08-23 16:44 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-4-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The umem member of struct xdp_sock is read outside of the control
> mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
> potentional store-tearing.
>
> Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: [PATCH bpf-next 4/4] xsk: lock the control mutex in sock_diag interface
From: Jonathan Lemon @ 2019-08-23 16:44 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-5-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> When accessing the members of an XDP socket, the control mutex should
> be held. This commit fixes that.
>
> Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: [PATCH bpf-next 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
From: Jonathan Lemon @ 2019-08-23 16:46 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-3-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
>
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
>
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP
> rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> net/xdp/xsk.c | 57
> ++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f3351013c2a5..31236e61069b 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs,
> struct xdp_buff *xdp, u32 len)
> return err;
> }
>
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> + if (READ_ONCE(xs->state) == XSK_BOUND) {
> + /* Matches smp_wmb() in bind(). */
> + smp_rmb();
> + return true;
> + }
> + return false;
> +}
> +
> int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> {
> u32 len;
>
> + if (!xsk_is_bound(xs))
> + return -EINVAL;
> +
> if (xs->dev != xdp->rxq->dev || xs->queue_id !=
> xdp->rxq->queue_index)
> return -EINVAL;
>
> @@ -362,6 +375,8 @@ static int xsk_sendmsg(struct socket *sock, struct
> msghdr *m, size_t total_len)
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
>
> + if (unlikely(!xsk_is_bound(xs)))
> + return -ENXIO;
> if (unlikely(!xs->dev))
> return -ENXIO;
Can probably remove the xs->dev check now, replaced by checking
xs->state, right?
> if (unlikely(!(xs->dev->flags & IFF_UP)))
> @@ -378,10 +393,15 @@ static unsigned int xsk_poll(struct file *file,
> struct socket *sock,
> struct poll_table_struct *wait)
> {
> unsigned int mask = datagram_poll(file, sock, wait);
> - struct sock *sk = sock->sk;
> - struct xdp_sock *xs = xdp_sk(sk);
> - struct net_device *dev = xs->dev;
> - struct xdp_umem *umem = xs->umem;
> + struct xdp_sock *xs = xdp_sk(sock->sk);
> + struct net_device *dev;
> + struct xdp_umem *umem;
> +
> + if (unlikely(!xsk_is_bound(xs)))
> + return mask;
> +
> + dev = xs->dev;
> + umem = xs->umem;
>
> if (umem->need_wakeup)
> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> @@ -417,10 +437,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> {
> struct net_device *dev = xs->dev;
>
> - if (!dev || xs->state != XSK_BOUND)
> + if (xs->state != XSK_BOUND)
> return;
> -
> - xs->state = XSK_UNBOUND;
> + WRITE_ONCE(xs->state, XSK_UNBOUND);
>
> /* Wait for driver to stop using the xdp socket. */
> xdp_del_sk_umem(xs->umem, xs);
> @@ -495,7 +514,9 @@ static int xsk_release(struct socket *sock)
> local_bh_enable();
>
> xsk_delete_from_maps(xs);
> + mutex_lock(&xs->mutex);
> xsk_unbind_dev(xs);
> + mutex_unlock(&xs->mutex);
>
> xskq_destroy(xs->rx);
> xskq_destroy(xs->tx);
> @@ -589,19 +610,18 @@ static int xsk_bind(struct socket *sock, struct
> sockaddr *addr, int addr_len)
> }
>
> umem_xs = xdp_sk(sock->sk);
> - if (!umem_xs->umem) {
> - /* No umem to inherit. */
> + if (!xsk_is_bound(umem_xs)) {
> err = -EBADF;
> sockfd_put(sock);
> goto out_unlock;
> - } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> + }
> + if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> err = -EINVAL;
> sockfd_put(sock);
> goto out_unlock;
> }
> -
> xdp_get_umem(umem_xs->umem);
> - xs->umem = umem_xs->umem;
> + WRITE_ONCE(xs->umem, umem_xs->umem);
> sockfd_put(sock);
> } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
> err = -EINVAL;
> @@ -626,10 +646,15 @@ static int xsk_bind(struct socket *sock, struct
> sockaddr *addr, int addr_len)
> xdp_add_sk_umem(xs->umem, xs);
>
> out_unlock:
> - if (err)
> + if (err) {
> dev_put(dev);
> - else
> - xs->state = XSK_BOUND;
> + } else {
> + /* Matches smp_rmb() in bind() for shared umem
> + * sockets, and xsk_is_bound().
> + */
> + smp_wmb();
> + WRITE_ONCE(xs->state, XSK_BOUND);
> + }
> out_release:
> mutex_unlock(&xs->mutex);
> rtnl_unlock();
> @@ -869,7 +894,7 @@ static int xsk_mmap(struct file *file, struct
> socket *sock,
> unsigned long pfn;
> struct page *qpg;
>
> - if (xs->state != XSK_READY)
> + if (READ_ONCE(xs->state) != XSK_READY)
> return -EBUSY;
>
> if (offset == XDP_PGOFF_RX_RING) {
> --
> 2.20.1
^ permalink raw reply
* Re: [PATCH net-next 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Florian Fainelli @ 2019-08-23 16:49 UTC (permalink / raw)
To: Vladimir Oltean; +Cc: Vivien Didelot, netdev, David S. Miller, Andrew Lunn
In-Reply-To: <CA+h21hpzSNZTK6-wQJkJCC9vs0hao12_tpQRLM5JLXXD_26c_w@mail.gmail.com>
On 8/23/19 9:32 AM, Vladimir Oltean wrote:
> Hi Florian,
>
> On Fri, 23 Aug 2019 at 19:23, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> On 8/22/19 4:44 PM, Vladimir Oltean wrote:
>>> On Fri, 23 Aug 2019 at 02:43, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>>>>
>>>> Hi Vladimir,
>>>>
>>>> On Fri, 23 Aug 2019 01:06:58 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
>>>>> Hi Vivien,
>>>>>
>>>>> On 8/22/19 11:13 PM, Vivien Didelot wrote:
>>>>>> Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
>>>>>>
>>>>>> This function is used in the tag_8021q.c code to offload the PVID of
>>>>>> ports, which would simply not work if .port_vlan_add is not supported
>>>>>> by the underlying switch.
>>>>>>
>>>>>> Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
>>>>>> that is to say in dsa_slave_vlan_rx_add_vid.
>>>>>>
>>>>>
>>>>> Do you know why Florian suppressed -EOPNOTSUPP in 061f6a505ac3 ("net:
>>>>> dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")?
>>>>> I forced a return value of -EOPNOTSUPP here and when I create a VLAN
>>>>> sub-interface nothing breaks, it just says:
>>>>> RTNETLINK answers: Operation not supported
>>>>> which IMO is expected.
>>>>
>>>> I do not know what you mean. This patch does not change the behavior of
>>>> dsa_slave_vlan_rx_add_vid, which returns 0 if -EOPNOTSUPP is caught.
>>>>
>>>
>>> Yes, but what's wrong with just forwarding -EOPNOTSUPP?
>>
>> It makes us fail adding the VLAN to the slave network device, which
>> sounds silly, if we can't offload it in HW, that's fine, we can still do
>> a SW VLAN instead, see net/8021q/vlan_core.c::vlan_add_rx_filter_info().
>>
>> Maybe a more correct solution is to set the NETIF_F_HW_VLAN_CTAG_FILTER
>> feature bit only if we have the ability to offload, now that I think
>> about it. Would you want me to cook a patch doing that?
>
> sja1105 doesn't support offloading NETIF_F_HW_VLAN_CTAG_FILTER even
> though it does support programming VLANs.
The additional of the ndo_vlan_rx_{add,kill}_vid() is such that
standalone DSA ports continue to work while there is a bridge with
vlan_filtering=1 spanning other ports. In order for that ndo operation
to be called, we need to advertise the NETIF_F_HW_VLAN_CTAG_FILTER feature.
> Adding an offloaded VLAN sub-interface on a standalone switch port
> (vlan_filtering=0, uses dsa_8021q) would make the driver insert a VLAN
> entry whilst the TPID is ETH_P_DSA_8021Q.
> Maybe just let the driver set the netdev features, similar to how it
> does for the number of TX queues?
Why should we bend the framework because sja1105 and dsa_8021q are
special? Let me counter the argument: if the tagging is DSA_8021Q, why
not clear the feature then?
--
Florian
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox