netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Antoine Tenart <atenart@kernel.org>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	edumazet@google.com, netdev@vger.kernel.org, mhocko@suse.com,
	stephen@networkplumber.org
Subject: Re: [RFC PATCH net-next 1/4] net-sysfs: remove rtnl_trylock from device attributes
Date: Wed, 18 Oct 2023 18:49:18 +0200	[thread overview]
Message-ID: <2023101840-scabbed-visitor-3fdd@gregkh> (raw)
In-Reply-To: <20231018154804.420823-2-atenart@kernel.org>

On Wed, Oct 18, 2023 at 05:47:43PM +0200, Antoine Tenart wrote:
> We have an ABBA deadlock between net device unregistration and sysfs
> files being accessed[1][2]. To prevent this from happening all paths
> taking the rtnl lock after the sysfs one (actually kn->active refcount)
> use rtnl_trylock and return early (using restart_syscall)[3] which can
> make syscalls to spin for a long time when there is contention on the
> rtnl lock[4].
> 
> There are not many possibilities to improve the above:
> - Rework the entire net/ locking logic.
> - Invert two locks in one of the paths — not possible.
> 
> But here it's actually possible to drop one of the locks safely: the
> kernfs_node refcount. More details in the code itself, which comes with
> lots of comments.
> 
> [1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
> [2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
> [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
> [4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/
> 
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> ---
>  net/core/net-sysfs.c | 142 ++++++++++++++++++++++++++++++++++---------
>  1 file changed, 112 insertions(+), 30 deletions(-)
> 
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index fccaa5bac0ed..76d8729340b7 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -40,6 +40,88 @@ static inline int dev_isalive(const struct net_device *dev)
>  	return dev->reg_state <= NETREG_REGISTERED;
>  }
>  
> +/* We have a possible ABBA deadlock between rtnl_lock and kernfs_node->active,
> + * when unregistering a net device and accessing associated sysfs files. Tx/Rx
> + * queues do embed their own kobject for their sysfs files but the same issue
> + * applies as a net device being unresgistered will remove those sysfs files as
> + * well. The potential deadlock is as follow:
> + *
> + *         CPU 0                                         CPU 1
> + *
> + *    rtnl_lock                                   vfs_read
> + *    unregister_netdevice_many                   kernfs_seq_start
> + *    device_del / kobject_put                      kernfs_get_active (kn->active++)
> + *    kernfs_drain                                sysfs_kf_seq_show
> + *    wait_event(                                 rtnl_lock
> + *       kn->active == KN_DEACTIVATED_BIAS)       -> waits on CPU 0 to release
> + *    -> waits on CPU 1 to decrease kn->active       the rtnl lock.
> + *
> + * The historical fix was to use rtnl_trylock with restart_syscall to bail out
> + * of sysfs accesses when the lock wasn't taken. This fixed the above issue as
> + * it allowed CPU 1 to bail out of the ABBA situation.
> + *
> + * But this comes with performances issues, as syscalls are being restarted in
> + * loops when there is contention, with huge slow downs in specific scenarios
> + * (e.g. lots of virtual interfaces created at startup and userspace daemons
> + * querying their attributes).
> + *
> + * The idea below is to bail out of the active kernfs_node protection
> + * (kn->active) while still being in a safe position to continue the execution
> + * of our sysfs helpers.
> + */
> +static inline struct kernfs_node *sysfs_rtnl_lock(struct kobject *kobj,
> +						  struct attribute *attr,
> +						  struct net_device *ndev)
> +{
> +	struct kernfs_node *kn;
> +
> +	/* First, we hold a reference to the net device we might use in the
> +	 * locking section as the unregistration path might run in parallel.
> +	 * This will ensure the net device won't be freed before we return.
> +	 */
> +	dev_hold(ndev);
> +	/* sysfs_break_active_protection was introduced to allow self-removal of
> +	 * devices and their associated sysfs files by bailing out of the
> +	 * sysfs/kernfs protection. We do this here to allow the unregistration
> +	 * path to complete in parallel. The following takes a reference on the
> +	 * kobject and the kernfs_node being accessed.
> +	 *
> +	 * This works because we hold a reference onto the net device and the
> +	 * unregistration path will wait for us eventually in netdev_run_todo
> +	 * (outside an rtnl lock section).
> +	 */
> +	kn = sysfs_break_active_protection(kobj, attr);
> +	WARN_ON_ONCE(!kn);

If this triggers, you will end up rebooting the machines that set
panic-on-warn, do you mean to do that?  And note, the huge majority of
Linux systems in the world have that enabled, so be careful.

thanks,

greg k-h

  reply	other threads:[~2023-10-18 16:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 15:47 [RFC PATCH net-next 0/4] net-sysfs: remove rtnl_trylock/restart_syscall use Antoine Tenart
2023-10-18 15:47 ` [RFC PATCH net-next 1/4] net-sysfs: remove rtnl_trylock from device attributes Antoine Tenart
2023-10-18 16:49   ` Greg KH [this message]
2023-10-19  8:13     ` Antoine Tenart
2023-10-19 15:37       ` Greg KH
2023-10-18 18:13   ` Stephen Hemminger
2023-10-19  7:48     ` Antoine Tenart
2025-01-02 22:36   ` Jakub Kicinski
2025-01-03  8:41     ` Eric Dumazet
2025-01-07 16:30     ` Antoine Tenart
2025-01-07 17:06       ` Jakub Kicinski
2025-01-16 13:36         ` Antoine Tenart
2025-01-16 23:42           ` Jakub Kicinski
2025-01-16 23:43             ` Jakub Kicinski
2025-01-17  8:26               ` Antoine Tenart
2023-10-18 15:47 ` [RFC PATCH net-next 2/4] net-sysfs: move queue attribute groups outside the default groups Antoine Tenart
2023-10-18 15:47 ` [RFC PATCH net-next 3/4] net-sysfs: prevent uncleared queues from being re-added Antoine Tenart
2023-10-18 15:47 ` [RFC PATCH net-next 4/4] net-sysfs: remove rtnl_trylock from queue attributes Antoine Tenart
2023-10-18 15:57 ` [RFC PATCH net-next 0/4] net-sysfs: remove rtnl_trylock/restart_syscall use Stephen Hemminger
2023-10-18 16:34   ` Antoine Tenart
2023-10-18 18:15 ` Stephen Hemminger
2023-10-19  7:47   ` Antoine Tenart
2023-10-19 14:54     ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2023101840-scabbed-visitor-3fdd@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=atenart@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).