public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
To: Mike Isely at pobox <isely@pobox.com>
Cc: mike.isely@cobaltdigital.com, Daniel Scally <djrscally@gmail.com>,
	Heikki Krogerus <heikki.krogerus@linux.intel.com>,
	Sakari Ailus <sakari.ailus@linux.intel.com>,
	linux-acpi@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/1] sofware node: Only the managing device can unreference managed software node
Date: Sat, 28 Feb 2026 13:02:31 +0200	[thread overview]
Message-ID: <aaLLRyeK7GwR978-@ashevche-desk.local> (raw)
In-Reply-To: <64161dbd-975f-a12d-8e9c-ba2e791e48c2@isely.net>

On Fri, Feb 27, 2026 at 11:55:51AM -0600, Mike Isely wrote:
> On Thu, 26 Feb 2026, Andy Shevchenko wrote:
> > On Thu, Feb 26, 2026 at 01:06:42PM -0600, Mike Isely wrote:
> > > On Thu, 26 Feb 2026, Andy Shevchenko wrote:
> > > > On Wed, Feb 25, 2026 at 02:16:39PM -0600, Mike Isely wrote:
> > > > > On Wed, 25 Feb 2026, Andy Shevchenko wrote:
> > > > > > On Wed, Feb 25, 2026 at 01:42:30PM -0600, Mike Isely wrote:
> > > > > > > On Wed, 25 Feb 2026, Andy Shevchenko wrote:
> > > > > > > > On Tue, Feb 24, 2026 at 01:19:22PM -0600, mike.isely@cobaltdigital.com wrote:

...

> > > > > > > > > A scenario exists where device_create_managed_software_node() is used
> > > > > > > > > to create an swnode instance that will be implicitly shared to a child
> > > > > > > > > device despite best intentions not to permit such sharing (per the
> > > > > > > > > comment in device_create_managed_software_node()).  I encountered this
> > > > > > > > > with the sfp kernel module when it was instantiated with properties
> > > > > > > > 
> > > > > > > > SFP? Or is it the name of the actual module in the kernel?
> > > > > > > 
> > > > > > > Actual kernel module name, sfp.ko, CONFIG_SFP in .config, named after 
> > > > > > > the piece of hardware it works with, an SFP cage.  This is logic which 
> > > > > > > monitors SFP cages for hotplug appearance / removal of SFP transceivers.  
> > > > > > > When a transceiver appears, the sfp kernel module will create a child 
> > > > > > > hwmon device instance to monitor various bits of metadata from the 
> > > > > > > transceiver.  When that transceiver goes away, the sfp kernel module 
> > > > > > > will tear down that child hwmon device instance.
> > > > > > > 
> > > > > > > The sfp kernel module needs resources configured to know where to 
> > > > > > > monitor; in our case that is set up dynamically by another locally 
> > > > > > > written kernel module (which iteracts with an FPGA we have where the SFP 
> > > > > > > hardware elements reside), and that kernel module will combine 
> > > > > > 
> > > > > > > devicetree information with some run-time information to generate the 
> > > > > > > properties handed off to the sfp kernel module instantiation.
> > > > > > 
> > > > > > What runtime information? Why this can't be done via DT overlay as others do?
> > > > > 
> > > > > I don't recall the specifics.  It might be calculation of a unit name.  
> > > > > The connectivity in this case is I2C so that should be a constant.  We 
> > > > > have some variants where the FPGA is PCIE-connected to the host and so 
> > > > > the memory map is a run-time calculation.  We have other drivers that 
> > > > > have to be instantiated with run-time computed properties.  So we handle 
> > > > > this as a general case.
> > > > 
> > > > But the configurations are semi-static, right? For the contents of FPGA we have
> > > > a specific manager that reloads the FPGA configuration.
> > > > 
> > > > Using swnode for dynamically calculated data seems weird. The data in swnodes
> > > > is usually static (const), I can't remember the case where we need to supply
> > > > run-time calculated values.
> > > 
> > > Semi-static means partially dynamic...
> > > 
> > > We use the kernel FPGA manager in one platform (zynqmp) and a userspace 
> > > loader program in another (PCIE connected to bcm2711 soc).  However it 
> > > is not reasonable to reload on-the-fly as that would disrupt every piece 
> > > of logic (kernel modules and userspace code) currently pointing inside 
> > > its memory space.  It would require tearing down the application space 
> > > and unloading every kernel module using the FPGA space (some of which is 
> > > ours and others which are stock kernel code).  Might as well reboot - so 
> > > we load once at boot time.  If the image needs to be swapped out (and 
> > > there are defined cases for this), a symbolic link is updated to point 
> > > to the new bitfile and the processor is rebooted.
> > 
> > To me the (full) reversing of the kernel state seems the correct approach.
> > The modules not really needs to be unloaded. The driver should be unbound
> > from the device.
> > 
> > > But this is getting off-topic.
> > 
> > Not really, it gets more understanding on why this way was chosen and not
> > another.
> 
> I think there's a miscommunication here.  The nature / mechanism by how 
> / when we load the FPGA has exactly nothing to do with this problem.  
> The issue has to do with sfp kernel module behavior when SFP 
> transceivers are hotplugged, which is completely unrelated to FPGA 
> loading.  The problem would be the same no matter how we manage the 
> FPGA.
> 
> > > We actually don't directly invoke the use of swnode.  It's being done 
> > > indirectly inside of platform_device_register_full(), when its 
> > > pdevinfo->properties argument is non-null, which is a normal thing that 
> > > happens in many other invocations of that function.  See 
> > > drivers/base/platform.c at the call to 
> > > device_create_managed_software_node().
> > 
> > I'm aware of this. I reviewed a lot of swnode code.
> > 
> > So, the properties supplied are not const, that's what you are trying to say?
> > And this I believe a problem. I don't remember if we have such a case in kernel
> > (when we use dynamic properties with platform_device_register_full().
> 
> In our case it's not const, but I think you're getting hung up on that.  
> The const-ness of the data is not germane to the problem.
> 
> It isn't that the data is dynamic, it's that swnode is being used at all 
> in a scenario where a parent device is managing it and there are child 
> devices that come & go over time (due in our case to SFP transceiver 
> hotplug).  That scenario happens if platform_device_register_full() is 
> called, with the pdevinfo->properties argument set, to create a device 
> that itself may later generate child devices that come and go over time.  
> The platform_device_register_full() implementation notices that it's 
> getting a properties structure via its pdevinfo argument, triggering a 
> usage of device_create_managed_software_node().  It doesn't matter if 
> the incoming properties to platform_device_register_full() are static or 
> not; the behavior will be the same.
> 
> We are using platform_device_register_full() to instantiate the sfp 
> kernel module.  A grep through the kernel sources shows many other cases 
> of this sort of pattern (call to platform_device_register_full() with 
> pdevinfo->properties set), so unless every single one of those cases 
> happens to not create / remove child devices, this problem is not 
> specific to our usage of the sfp kernel module.  And there's certainly 
> no rule written anywhere that though shalt not use 
> platform_device_register_full() in cases where child devices may be 
> created / removed...
> 
> > > I had no idea of even the existence of swnode until I started tracking down
> > > the kernel misbehavior.
> > 
> > > Whether statically provided or calculated, this code path is happening.  
> > > Our case may be dynamic, but certainly any other managed swnode usage 
> > > which involves child devices is going to be a reference-counting problem 
> > > before this patch.
> > 
> > I have a hardware that uses drivers/mfd/intel_quark_i2c_gpio.c. Do you think
> > I can reproduce the issue with it?
> > 
> > The hierarchy is that
> > 
> > PCI device (associated ACPI node)
> >   I2C host controller (associated node in ACPI and managed swnode from the driver)
> >     I2C-client (it has it's own ACPI node)
> >   GPIO controller (associated node in ACPI and managed swnode from the driver)
> >     GPIO controller port (child swnode from the driver)
> >       ...users of GPIO pins of the given port...
> 
> I am unfamiliar with that driver or its surroundings, so I don't know if 
> it is a viable means to reproduce the problem.  I can spend time 
> analyzing that code further if you wish.  However it might be more 
> productive to try to succinctly, generically, spell out the steps that 
> lead to trouble:
> 
> 1. Somebody creates a struct device instance.
> 
> 2. Somebody calls device_create_managed_software_node() associating it 
> with that struct device instance.  This causes two kobject references to 
> be counted against the swnode instance (one for the struct device and 
> one because the managed flag was set).  refcount=2
> 
> 3. Some time later, a child struct device is created and the properties 
> of the parent are shared with the child.  This causes another kobject 
> reference to be counted against the swnode instance.  refcount=3
> 
> 4. Some time later, that child struct device instance goes away (perhaps 
> due to a hotplug removal).  During the tear-down, 
> software_node_notify_remove() notices that the managed field of the 
> swnode instance is set, so TWO kobject references are removed instead of 
> the single reference created by the previous step.  refcount=1
> 
> 5. Repeat step 3.  refcount=2
> 
> 6. Repeat step 4.  refcount=0
> 
> 7. Now the kobject reference count inside the swnode instance has 
> dropped to zero and so the swnode instance is released.  Notice that the 
> parent device is still holding a physical reference, now pointing into 
> freed memory.  Chaos likely follows.  If lucky, a NULL pointer kernel 
> oops is the result.
> 
> I confirmed this exact sequence of behavior in the context of the sfp 
> kernel module with appropriate printk debug trace added.  While that's 
> going to be hard for you to reproduce without sfp hardware, anything 
> else which performs those steps should produce similar results.
> 
> The patch fixes this by causing step 4 to only do the second reference 
> decrement if the device being torn down is the same one that established 
> the managing relationship in the first place.  Then everything stays 
> balanced.
> 
> Notice that this has nothing to do with whether or not the properties 
> are const, nor does it have anything to do with FPGA management.  I 
> apologize if I caused confusion leading to that.

No problem, now with this very deep and comprehensive analysis everything
is clear. Thanks for providing this all information!

> It's purely the
> combination of a managed swnode instance used for a device that will 
> create / remove child devices over time (due perhaps to hotplug 
> activity).

> And because platform_device_register_full() will implicitly employ a 
> managed swnode instance any time it is passed properties in its pdevinfo 
> argument, then by implication anyone calling 
> platform_device_register_full() in that manner - which happens in 
> multiple other places in the kernel - is at risk for the same issue.

Yeah, I will think about this more, but if somebody beats me up to it
and confirms that this is the best what we can do, I will have no
objections. I dunno, let's say in a couple of weeks?

-- 
With Best Regards,
Andy Shevchenko



  reply	other threads:[~2026-02-28 11:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 19:19 [PATCH 0/1] software node: Use-after-free fix in drivers/base/swnode.c mike.isely
2026-02-24 19:19 ` [PATCH 1/1] sofware node: Only the managing device can unreference managed software node mike.isely
2026-02-25 11:22   ` Andy Shevchenko
2026-02-25 19:42     ` Mike Isely
2026-02-25 20:01       ` Andy Shevchenko
2026-02-25 20:16         ` Mike Isely
2026-02-26  7:16           ` Andy Shevchenko
2026-02-26 19:06             ` Mike Isely
2026-02-26 20:42               ` Andy Shevchenko
2026-02-27 17:55                 ` Mike Isely
2026-02-28 11:02                   ` Andy Shevchenko [this message]
2026-02-28 16:34                     ` Mike Isely
2026-02-25  9:46 ` [PATCH 0/1] software node: Use-after-free fix in drivers/base/swnode.c Andy Shevchenko
2026-02-25 18:59   ` Mike Isely
2026-02-25 19:17     ` Andy Shevchenko
2026-02-25 19:48       ` Mike Isely
2026-02-25 20:05         ` Andy Shevchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaLLRyeK7GwR978-@ashevche-desk.local \
    --to=andriy.shevchenko@linux.intel.com \
    --cc=djrscally@gmail.com \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=isely@pobox.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.isely@cobaltdigital.com \
    --cc=sakari.ailus@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox