From: Jiri Pirko <jiri@resnulli.us>
To: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com,
davem@davemloft.net, edumazet@google.com,
jacob.e.keller@intel.com, saeedm@nvidia.com, moshe@nvidia.com
Subject: Re: [patch net] devlink: change per-devlink netdev notifier to static one
Date: Mon, 15 May 2023 13:35:18 +0200 [thread overview]
Message-ID: <ZGIY9jOHkHxbnTjq@nanopsycho> (raw)
In-Reply-To: <600ddf9e-589a-2aa0-7b69-a438f833ca10@samsung.com>
Mon, May 15, 2023 at 11:09:10AM CEST, m.szyprowski@samsung.com wrote:
>On 10.05.2023 16:46, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>>
>> The commit 565b4824c39f ("devlink: change port event netdev notifier
>> from per-net to global") changed original per-net notifier to be
>> per-devlink instance. That fixed the issue of non-receiving events
>> of netdev uninit if that moved to a different namespace.
>> That worked fine in -net tree.
>>
>> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
>> separate devlink instance for ethernet auxiliary device") and
>> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
>> case of PCI device suspend") were merged, a deadlock was introduced
>> when removing a namespace with devlink instance with another nested
>> instance.
>>
>> Here there is the bad flow example resulting in deadlock with mlx5:
>> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
>> devlink_pernet_pre_exit() -> devlink_reload() ->
>> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
>> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
>> mlx5e_destroy_devlink() -> devlink_free() ->
>> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>>
>> Steps to reproduce:
>> $ modprobe mlx5_core
>> $ ip netns add ns1
>> $ devlink dev reload pci/0000:08:00.0 netns ns1
>> $ ip netns del ns1
>>
>> Resolve this by converting the notifier from per-devlink instance to
>> a static one registered during init phase and leaving it registered
>> forever. Use this notifier for all devlink port instances created
>> later on.
>>
>> Note what a tree needs this fix only in case all of the cited fixes
>> commits are present.
>>
>> Reported-by: Moshe Shemesh <moshe@nvidia.com>
>> Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
>> Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
>> Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>
>This patch landed recently in linux-next as commit e93c9378e33f
>("devlink: change per-devlink netdev notifier to static one").
>Unfortunately it causes serious regression with kernel compiled from
>multi_v7_defconfig (ARM 32bit) on all my test boards. Here is a example
>of the boot failure observed on QEMU's virt ARM 32bit machine:
>
>8<--- cut here ---
>Unable to handle kernel execution of memory at virtual address e5150010
>when execute
>[e5150010] *pgd=4267a811, *pte=04750653, *ppte=04750453
>Internal error: Oops: 8000000f [#1] SMP ARM
>Modules linked in:
>CPU: 0 PID: 779 Comm: ip Not tainted 6.4.0-rc2-next-20230515 #6688
>Hardware name: Generic DT based system
>PC is at 0xe5150010
>LR is at notifier_call_chain+0x60/0x11c
>pc : [<e5150010>] lr : [<c0365f50>] psr: 60000013
>...
>Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
>Control: 10c5387d Table: 4397806a DAC: 00000051
>...
>Process ip (pid: 779, stack limit = 0x82b757b5)
>Stack: (0xe9201a00 to 0xe9202000)
>...
> notifier_call_chain from raw_notifier_call_chain+0x18/0x20
> raw_notifier_call_chain from __dev_open+0x74/0x190
> __dev_open from __dev_change_flags+0x17c/0x1f4
> __dev_change_flags from dev_change_flags+0x18/0x54
> dev_change_flags from do_setlink+0x24c/0xefc
> do_setlink from rtnl_newlink+0x534/0x818
> rtnl_newlink from rtnetlink_rcv_msg+0x250/0x300
> rtnetlink_rcv_msg from netlink_rcv_skb+0xb8/0x110
> netlink_rcv_skb from netlink_unicast+0x1f8/0x2bc
> netlink_unicast from netlink_sendmsg+0x1cc/0x44c
> netlink_sendmsg from ____sys_sendmsg+0x9c/0x260
> ____sys_sendmsg from ___sys_sendmsg+0x68/0x94
> ___sys_sendmsg from sys_sendmsg+0x4c/0x88
> sys_sendmsg from ret_fast_syscall+0x0/0x54
>Exception stack(0xe9201fa8 to 0xe9201ff0)
Thanks for the report. From the first sight, don't have a clue what may
be wrong. Debugging.
>...
>---[ end trace 0000000000000000 ]---
>[....] Configuring network interfaces...Segmentation fault
>ifup: failed to bring up lo
>
>
>Reverting $subject patch on top of linux next-20230515 fixes this issue.
>
>
>> ---
>> net/devlink/core.c | 16 +++++++---------
>> net/devlink/devl_internal.h | 1 -
>> net/devlink/leftover.c | 5 ++---
>> 3 files changed, 9 insertions(+), 13 deletions(-)
>>
>> diff --git a/net/devlink/core.c b/net/devlink/core.c
>> index 777b091ef74d..0e58eee44bdb 100644
>> --- a/net/devlink/core.c
>> +++ b/net/devlink/core.c
>> @@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
>> if (ret < 0)
>> goto err_xa_alloc;
>>
>> - devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
>> - ret = register_netdevice_notifier(&devlink->netdevice_nb);
>> - if (ret)
>> - goto err_register_netdevice_notifier;
>> -
>> devlink->dev = dev;
>> devlink->ops = ops;
>> xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
>> @@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
>>
>> return devlink;
>>
>> -err_register_netdevice_notifier:
>> - xa_erase(&devlinks, devlink->index);
>> err_xa_alloc:
>> kfree(devlink);
>> return NULL;
>> @@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
>> xa_destroy(&devlink->params);
>> xa_destroy(&devlink->ports);
>>
>> - WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
>> -
>> xa_erase(&devlinks, devlink->index);
>>
>> devlink_put(devlink);
>> @@ -303,6 +294,10 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
>> .pre_exit = devlink_pernet_pre_exit,
>> };
>>
>> +static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
>> + .notifier_call = devlink_port_netdevice_event,
>> +};
>> +
>> static int __init devlink_init(void)
>> {
>> int err;
>> @@ -311,6 +306,9 @@ static int __init devlink_init(void)
>> if (err)
>> goto out;
>> err = register_pernet_subsys(&devlink_pernet_ops);
>> + if (err)
>> + goto out;
>> + err = register_netdevice_notifier(&devlink_port_netdevice_nb);
>>
>> out:
>> WARN_ON(err);
>> diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
>> index e133f423294a..62921b2eb0d3 100644
>> --- a/net/devlink/devl_internal.h
>> +++ b/net/devlink/devl_internal.h
>> @@ -50,7 +50,6 @@ struct devlink {
>> u8 reload_failed:1;
>> refcount_t refcount;
>> struct rcu_work rwork;
>> - struct notifier_block netdevice_nb;
>> char priv[] __aligned(NETDEV_ALIGN);
>> };
>>
>> diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
>> index dffca2f9bfa7..cd0254968076 100644
>> --- a/net/devlink/leftover.c
>> +++ b/net/devlink/leftover.c
>> @@ -7073,10 +7073,9 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
>> struct devlink_port *devlink_port = netdev->devlink_port;
>> struct devlink *devlink;
>>
>> - devlink = container_of(nb, struct devlink, netdevice_nb);
>> -
>> - if (!devlink_port || devlink_port->devlink != devlink)
>> + if (!devlink_port)
>> return NOTIFY_OK;
>> + devlink = devlink_port->devlink;
>>
>> switch (event) {
>> case NETDEV_POST_INIT:
>
>Best regards
>--
>Marek Szyprowski, PhD
>Samsung R&D Institute Poland
>
next prev parent reply other threads:[~2023-05-15 11:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230515090912eucas1p2489efdc97f9cf1fddf2aad0449e8a2c7@eucas1p2.samsung.com>
2023-05-10 14:46 ` [patch net] devlink: change per-devlink netdev notifier to static one Jiri Pirko
2023-05-11 1:01 ` Jakub Kicinski
2023-05-11 12:18 ` Simon Horman
2023-05-12 1:10 ` patchwork-bot+netdevbpf
2023-05-15 9:09 ` Marek Szyprowski
2023-05-15 11:35 ` Jiri Pirko [this message]
2023-05-15 12:05 ` Ido Schimmel
2023-05-15 12:28 ` Marek Szyprowski
2023-05-15 12:37 ` Jiri Pirko
2023-05-15 13:40 ` Ido Schimmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGIY9jOHkHxbnTjq@nanopsycho \
--to=jiri@resnulli.us \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jacob.e.keller@intel.com \
--cc=kuba@kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox