From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B0B62BEC23; Thu, 2 Apr 2026 03:08:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775099324; cv=none; b=qIVq+hce0SlbJWJ0lz3rWcg8MEl+S327cZEGAnjPpMWuWKhIONwfEOmTcem6D0N067m9GDBG5nXI/jSmmjN5F1krBaRSgipA7oXPBpFImZTed9aauFs5QN6TI9LVriy19pi1ImlL14KqJrgBx0lTqxCPNebBuPTG/OUk1IkX018= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775099324; c=relaxed/simple; bh=9WcfmINy8tur5KnVTbq1a7pXjZq5fNEd5bvED0yc7JA=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jF+hXnv/cTSyYOcF1SkayVEVHmpRu2xUy9I0FgioE5f9CPqTP8Hrg8eblBLohBx6oQiY2p/0MvkIzWER+gCKKAQUL+Bge+HYccPZX8KBqwgOP4CG1TrqTA9ulsqCcsHRMxIY0Pxv7vGilfQEH/ojZaM9bcNTf5FIeH0/AdHXC7c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RN3ejrT2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RN3ejrT2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 443C6C19421; Thu, 2 Apr 2026 03:08:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775099324; bh=9WcfmINy8tur5KnVTbq1a7pXjZq5fNEd5bvED0yc7JA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=RN3ejrT2w6RAnfT4sPeyNVCFLdmODJ3O5bfwMwMeRvu03kv3t1rqlamHOLDPn1iab BOqX9x1bjBGIRWAF7/A7JkJMOaNbfm1Cj9J9UxKL2ZXJMw5sGtU6mqD5gYFOLvMmKc lIKvU8SGkW2ZrtI1+L6DCfTDjHKTN3ET/w2A1UOqIIbB5+UOCIf7AJHXk5bfNyW8Us NhFVEPnnrbNCjRFHNO5M6cgw8AxKg3cpUT6QL2SSj3LUokhFyeGuAhqH6Y0KndMS7j j3gtdBI20V6LX4tQNGjM1G7xYZN7PM1OxXGq9OwbFwsGvRdxIe0tT1CEqi1JbZBv8P kIWSM9LN/VMgA== Date: Wed, 1 Apr 2026 20:08:42 -0700 From: Jakub Kicinski To: Tariq Toukan Cc: Eric Dumazet , Paolo Abeni , Andrew Lunn , "David S. Miller" , Saeed Mahameed , "Mark Bloch" , Leon Romanovsky , Shay Drory , Simon Horman , Kees Cook , Parav Pandit , Patrisious Haddad , Gal Pressman , , , Subject: Re: [PATCH net 1/3] net/mlx5e: SD, Fix race condition in secondary device probe/remove Message-ID: <20260401200842.79322a24@kernel.org> In-Reply-To: <20260330193412.53408-2-tariqt@nvidia.com> References: <20260330193412.53408-1-tariqt@nvidia.com> <20260330193412.53408-2-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 30 Mar 2026 22:34:10 +0300 Tariq Toukan wrote: > From: Shay Drory > > When utilizing Socket-Direct single netdev functionality the driver > resolves the actual auxiliary device using mlx5_sd_get_adev(). However, > the current implementation returns the primary ETH auxiliary device > without holding the device lock, leading to a potential race condition > where the ETH device could be unbound or removed concurrently during > probe, suspend, resume, or remove operations.[1] > > Fix this by introducing mlx5_sd_put_adev() and updating > mlx5_sd_get_adev() so that secondaries devices would acquire the device > lock of the returned auxiliary device. After the lock is acquired, a > second devcom check is needed[2]. > In addition, update The callers to pair the get operation with the new > put operation, ensuring the lock is held while the auxiliary device is > being operated on and released afterwards. Please explain why the "primary" designation is reliable, and therefore we can be sure there will be no ABBA deadlock here > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > index b6c12460b54a..5761f655f488 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > @@ -6657,8 +6657,11 @@ static int mlx5e_resume(struct auxiliary_device *adev) > return err; > > actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx); > - if (actual_adev) > - return _mlx5e_resume(actual_adev); > + if (actual_adev) { > + err = _mlx5e_resume(actual_adev); > + mlx5_sd_put_adev(actual_adev, adev); > + return err; > + } > return 0; Feels like I recently complained about similar code y'all were trying to add. Magically and conditionally locking something in a get helper makes for extremely confusing code. -- pw-bot: cr