From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dy1-f182.google.com (mail-dy1-f182.google.com [74.125.82.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46C311367 for ; Tue, 24 Feb 2026 00:37:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771893465; cv=none; b=gDe+0ZjtToK8VIMj7Uujaw2QTk5hmQGIljBxVKXm04aJ2ZiFQGhNKossZb3+MXj0mbZ1vKFEd6G8qmnvBKvBEK2c+qBl8W85up+nuxou+Iy3GQ9DfWDJVfjFRCWx4v6n0I108wT26016JQh0Iro7jMIQUL1xPBzfhW8GxWHpTtA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771893465; c=relaxed/simple; bh=K+qmuutGKcclAqnV1IVhGMHR3EzArTunLkoJZYosVKQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BCRU8/mSrgylPg2vdWLAGZlFz1CL6y1G/qnsOcwYLGjL15g9kK0hXtPjO9nv+VDlkQFkFyrvLna0JPDwQWh+ziOkK3J5sXL3qLTTQw8W494mK5APGxjNlhHgE2+6r4dQG6aAe40mM27mJlsKMfuRuHadN+e1QZYi9LK4RGUroWA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EQeAnzIy; arc=none smtp.client-ip=74.125.82.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EQeAnzIy" Received: by mail-dy1-f182.google.com with SMTP id 5a478bee46e88-2b82c605dbdso4965948eec.0 for ; Mon, 23 Feb 2026 16:37:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771893463; x=1772498263; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=HG1TExzINsV+xuMh5Kuyzh1hEQWYIpn7r5uQFpXRoDg=; b=EQeAnzIysEFmp0/o9H6Pe92p7Z5mVcGb4LizNqy/A5XZSFjflI+3Ss9gdgbMs5THdi CZjdxM4i+FKHx0r25QKt5P+TeFRXkU2C6ubhiDZPR48XfDEfqCHs95smkV4x71bnzmai RAvinl5oYfRe9Hcs+af7vNoUmvHwFEqZX3fooDg6HkQRWUT5aFVMOyEivkfWfrP1uxIu zNX+kmprGQ5SJzUgsTZFAjUbLOUu/W5qiY1jubXmaqWgzDdBVPZnz2WSPDfhpH2SWJC2 dyLWD4BqmVco+nsvCp1t2s0l9M+5Q/AbBFLns6yz6q0Zv6BCrWU4fU2ZbAWn/wJ+GNyX ZKxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771893463; x=1772498263; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HG1TExzINsV+xuMh5Kuyzh1hEQWYIpn7r5uQFpXRoDg=; b=Mgd6cbFgZHxZyQBDZIQEujO0Orkjyv0cSpeudcIgOLOOx/Cx6bBznL3xFze4EfFhSv c7qX+yHnXxbfR9WNnRtLS9zUQm86kkZ7z3PVPS/eEC+jTJVU5jmwyG7gTkzmb/sbNJDY me5q+sy1Qd+7KynRzt4ePPMm8VpBuJw4IszDk0SXvAsPG8raEhl7NPmmt5U4ugsT6tZx YDQAK3fNJsHlvrXhb+ZHRs0BDnuvvhisOAEngNmCcDEPbnrwO+pWT4untuLKr5Y5Frof ZyClYX/6tUroEi+Ssxa8Kkmdt2WTPYsjL0EjyWZIxVGMSx/qXtRZelSfuHYhg9X04c+R lAIw== X-Forwarded-Encrypted: i=1; AJvYcCUG6yU5Lin/8fYmAyTU0iaLvwX8AJKHiA0Y/VVnHHWWm+BxnnZxBdyDcya+oZZZjjqt/yY3q302LuLCA+U=@vger.kernel.org X-Gm-Message-State: AOJu0Yy7GRwG//vfuk+F7cdmOZ98hXvphiFdpc706gMotgU7dp8GKioX ybPirl6FoleXWgWBF8H8zcbYlDsemyL2VCIcz4W5ANDR0cNp9iMgsSD2 X-Gm-Gg: AZuq6aIn/v9jR0EKxhixoDOVkHzmhexSbdwOWz0j3IoR/WpRHC7Enf1ziuKj9udpyeY LeakjyaemoZsfHPb8RwLiEeVVnir2ZzDKaqYbBOxK0sX0iVYIcAR8+Ubn5wCYO0mUq6lbsByBjO AUU29jhSm6db22BAyxXU4EnhI48xbQQ5Qx5cTzWzXqZEa1XiM3gwHEHLQ+rQaJTbSpzW4hjuYlN BhjUcc82SLTwd91Kg4zjiZE7QUETtqQ10WvjT7ePeROUe6oW1Iu1PdgGdoWttEYi0g24uXP8SKI JTeYSHgPVWLroTgFl1KaKP1xmhQgz/ynuaeUf6ucWxuFufvQOappatc8rWqldfzI7iswhb2DXs8 KfKWqBjsHDLNjD26PbduNLdfR9mi+cufDhgmoNaAEw8cIdVDBsvsHAV5tvBRN+6z8STAo8hl0l9 PsqYDws5nPGffb4MHtch3gmpKdBg5i4cuRkVJ43sBAAaY5dRq6YD2XPvgHRdMkscEb X-Received: by 2002:a05:7022:2383:b0:11b:9386:7ecd with SMTP id a92af1059eb24-1276ad67e87mr5146475c88.42.1771893463136; Mon, 23 Feb 2026 16:37:43 -0800 (PST) Received: from google.com ([2a00:79e0:2ebe:8:1b48:5d6e:ab6e:5287]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-1276af8ac3fsm9170196c88.13.2026.02.23.16.37.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Feb 2026 16:37:42 -0800 (PST) Date: Mon, 23 Feb 2026 16:37:39 -0800 From: Dmitry Torokhov To: Vladimir Oltean Cc: Vinod Koul , Kishon Vijay Abraham I , Neil Armstrong , "Rafael J. Wysocki" , Geert Uytterhoeven , Johan Hovold , Claudiu Beznea , "Dr. David Alan Gilbert" , Peter Griffin , Dmitry Baryshkov , Krzysztof Kozlowski , Zijun Hu , linux-phy@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] phy: core: fix potential UAF in of_phy_simple_xlate() Message-ID: References: <20260223231500.zeffezslctqamhp7@skbuf> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260223231500.zeffezslctqamhp7@skbuf> Hi Vladimir, On Tue, Feb 24, 2026 at 01:15:00AM +0200, Vladimir Oltean wrote: > Hi Dmitry, > > On Fri, Feb 20, 2026 at 05:01:50PM -0800, Dmitry Torokhov wrote: > > On Thu, Feb 19, 2026 at 04:11:37PM -0800, Dmitry Torokhov wrote: > > > On Thu, Feb 19, 2026 at 03:57:11PM -0800, Dmitry Torokhov wrote: > > > > The implementation put_device()s located device and then uses > > > > container_of() on the pointer. The device may disappear by that time, > > > > resulting in UAF. > > > > > > > > Fix the problem by keeping the reference to the framer device, > > > > avoiding getting an extra reference to it in framer_get(), and making > > > > sure to drop the reference in error path when we fail to get the module. > > > > > > Hmm, I was too rash. There are bunch of other xlate functions that need > > > to be updated to take the reference. > > > > So I am convinced that xlate functions need to bump up the reference to > > phy devices they return. The question is how to deal with the ones that > > do not. I can either convert them in the same patch (the changes are > > quite mechanical) or we can do the whole song and dance, introduce a > > flag, set it up in converted xlate functions, have the core respect it, > > and then remove it from xlates and from the core when it is all done. > > > > Please let me know. > > You have hit on a weak spot of the Generic PHY framework and I would > like to encourage you to follow through with patches for this finding > (although it won't be exactly trivial, I think it's doable with some > determination). Thank you for the detailed response. I am not sure how much time I can spend on phy rework as this change was an essentially a drive-by. I only happen to look there because I want to remove class_find_device_by_of_node() in favor of class_find_device_by_fwnode(). > > On your direct question: > It will impact downstream in subtle and unpleasant ways if you change > the of_xlate() semantics w.r.t. reference counting, but the code will > still compile just as before, i.e. when looking just at your downstream > driver (what 99% of developers do) there will be no obvious way of > knowing that something in the API has changed. I would avoid doing that. Hmm, I was not aware that we needed to take particular measures for benefit of downstream trees... > > But first the problem. Too little has been said about the problem, and > too much about the solution. We can't find a good solution if we don't > call out the problem properly first. > > The phy_get() function follows a "lookup-then-get" approach for PHY > providers, rather than "atomic-lookup-and-get". > > Namely, the phy_provider->of_xlate() caller, which is _of_phy_get(), > returns a struct phy * with its underlying device reference count > nominally at 1. All callers of _of_phy_get() do later call get_device() > to bump this refcount for each PHY consumer, but it is "too late" in the > sense that a window has been opened where the PHY provider driver can > unbind, and the reference count of &phy->dev can drop to 0 and it can be > freed. Yes. > > Immediately after being created by phy_create(), the &phy->dev has a > refcount of 1, and only the action of its provider driver (calling > phy_destroy() directly or through devres, on driver unbind) can drop > that refcount to 0. As long as the driver doesn't do that, the &phy->dev > refcount is not in danger. Right. > > Then the PHY is exposed to the outside world using of_phy_provider_register(), > where the fact that the provider driver can concurrently unregister > starts being a problem. > > Therefore, I would say that the problem is that consumers, aka phy_get() > callers, can get a phy with a &phy->dev refcount that is not already bumped > by the time they are handed over that PHY. > > Mechanically, PHY lookup happens under &phy_provider_mutex, and I > believe it would be sufficient to call get_device() under this lock, in > order for consumers to get PHYs with their reference bumped. Yes, given that there is 1:1 mapping between device and provider (not phy instance but its parent) it looks like we can drop the reference and then bump it up again as long as we hold that mutex. > > The patch above leaves a few loose ends. > > The most obvious is of_phy_simple_xlate(), which has that put_device() > to balance out class_find_device_by_of_node() - which bumps the device > refcount to 2. It "looks" wrong but it is consistent with vendor > implementations of of_xlate(), which also provide a phy->dev refcount of 1. > And as explained, the refcount never _actually_ drops to 0 with the > above patch. > > I would actually address _only_ of_phy_simple_xlate(), for cosmetics sake. > Instead of devm_of_phy_provider_register(..., of_phy_simple_xlate), I > would offer a new helper, devm_of_phy_simple_provider_register(), which > would internally use a callback that doesn't drop the refcount (say > __of_phy_simple_xlate() for lack of imagination). I would convert vendor > PHY drivers one by one to use this (it would be valid at any time to use > either the old or the new method). > > You'd notice that most of the of_phy_simple_xlate() occurrences go away, > except for freescale/phy-fsl-lynx-28g.c which calls this function > directly from its own lynx_28g_xlate(). It will still have to keep > calling it, and for refcount equalization purposes that weird > put_device() will have to continue to exist. Just leave a comment as to > why that is. I this case I would simply add a comment telling why the reference should (and can) be dropped where it is being dropped in of_phy_simple_xlate() and call it a day. There is not much benefit in adding another helper. > > > But there are more important loose ends still. > > I mentioned that "zombie" device. We've solved the memory safety issue, > but it's possible for consumers to hold onto a phy whose provider has > disappeared. The refcount of &phy->dev hasn't dropped to 0, so > technically it's a valid object, but from PHY API perspective, it's > still possible to call phy_init(), phy_power_on() and friends on this > PHY, and the Generic PHY core will be happy to further call into the > phy->ops->init(), phy->ops->power_on() etc. But the driver has unbound, > so it should really be left alone. > > If we fix the UAF but leave the zombie PHY problem, we've effectively > done nothing but silence static analysis checkers, while the code path > where the PHY provider unbinds effectively remains treated as poorly as > before, just moving the crashes to a different place. > > I suspect what needs to be done here is to introduce a "bool dead" or > similar, which is to be set from phy_destroy() and checked from every > API call. The idea is that API functions on zombie PHYs should fail > without calling into their driver. I **suppose** that > try_module_get(phy->ops->owner) "tried" to avoid this situation, but > it just protects against module removal, not against "echo device > > /sys/bus/.../unbind". So it's absolutely incomplete and easily bypassable. I think this is a problem common to many kernel subsystems, where there are devices that are vital for other devices to function, but are not parents of said devices. Think about regulators or clocks and such. In many such cases even validating at API level is not sufficient, because if you enable a clock you typically keep it running at least for a while, if not for entire duration of device being bound to a driver. If that clock goes away the device will break even though you are not calling any clock APIs at that particular time. We need some kind of revocation mechanism to signal consumers that providers they rely upon are going away. > > > Finally, I have identified one more loose end still. > > /** > * of_phy_put() - release the PHY > * @phy: the phy returned by of_phy_get() > * > * Releases a refcount the caller received from of_phy_get(). > */ > void of_phy_put(struct phy *phy) > { > if (!phy || IS_ERR(phy)) > return; > > mutex_lock(&phy->mutex); > if (phy->ops->release) > phy->ops->release(phy); > mutex_unlock(&phy->mutex); > > module_put(phy->ops->owner); > put_device(&phy->dev); > } > EXPORT_SYMBOL_GPL(of_phy_put); > > This function is called by PHY consumers. A PHY provider can have > multiple consumers (example in the case of Ethernet: a QSGMII SerDes > lane has 4 MAC ports multiplexed onto it). > > If a single consumer calls of_phy_put() to release its reference to the > SerDes lane, the phy->ops->release() method makes absolutely no sense. > There are 3 remaining consumers with handles to the lane! But we aren't > even telling the PHY which consumer has disappeared! It has nothing > useful to do with this information. > > Looking at actual phy_ops implementations, what they want to know is > when _all_ consumers went away, not when individual consumers did. > So the phy->ops->release() call needs to be put somewhere which is > executed when all consumers disappear. If I were to guess, that would be > the phy_release() class callback, but this is completely untested. Shouldn't phy->ops->release() keep track of users and decide when to destroy the resources? The rest seem fine as they simply drop references (I assume each user of shared phy will bump up references as needed?) > > Sorry that this email is a bit long, it got a bit late writing it and I > don't have a lot of energy left to trim it down. > In summary I found 4 highly related problems: > - use after free > - misleading of_phy_simple_xlate() API pattern (not a functional issue) > - zombie PHYs > - premature release call Thanks. -- Dmitry